Ling 472/CSE 472: Introduction to Computational Linguistics
Spring 2023
Final project information
Overview
The term project is meant to help you deepen your familiarity with computational linguistics. You will have an opportunity to complete one of three project types: (1) an Error Analysis, (2) a Literature Review or Research Proposal, or (3) a Project Extension.
Project groups
We encourage you to complete the term project in groups of 2-3 people, but solo projects are also allowed. Please form groups ASAP and discuss your learning goals with one another; doing so early on will help you accommodate each other's learning goals, leading to a more rewarding project experience for everyone.
Project milestones
M1: Project proposal - due April 21
Submit a proposal for your group's selected term project. This proposal should include:
- Your team members names
- The type of project (e.g., Error Analysis, Literature Review)
- A short description of the project, according to the M1 specification for your selected project type
Submission instructions: Identify one member in your group to submit the project proposal as a PDF on Canvas. The remaining group members should submit a Canvas comment with everyone's names.
M2: Feasibility update - due May 12
Submit an update on how your project is doing, and what needs to change from your original plan to be successful clear. This update should include:
- A detailed description of project, according to the M2 specification for your selected project type.
- A description what needs to change from your original proposal, and a rationale.
Presentations!
We will use class time in Week 10 for final project presentations.
M3: Project deliverables - due June 6
Submit your final project deliverable! This will likely include a write-up and maybe some code, depending on the project type. Write-ups should be no more than 8 pages (double-spaced), excluding references.
All write-ups should be submitted as PDFs.
Project type 1: Error analysis
Perform a linguistically informed error analysis of an existing NLP package.
This project is great for: Everyone! Whether your strengths lie in computer science, linguistics, or both, this project is a great opportunity to draw on linguistic insights while analyzing a system's performance.
Recommended partners: Try to form groups where each member brings different strengths.
Overview
- Find an NLP package described in a peer-reviewed paper that reports results using a quantitative evaluation metric (e.g., precision, recall, F1)
- Be sure to find a package that you can run easily and quickly. The focus of this project is to perform an error analysis-not to reproduce an experiment. Avoid packages that will waste your time
- Run the package on the dataset specified in the paper (or, in exceptional cases, on a different dataset)
- Perform a careful, linguistically informed error analysis of the results. This analysis should reference ~100 errors, as well as a few correct outputs for comparison
- A strong error analysis will categorize errors according to shared linguistic properties, count how many errors fall within each category, and provide a meaningful discussion, including either hypothetical or directly observed reasons for why the system is making such errors
- Consider the following questions:
- What systematic mistakes does the system tend to make?
- Do these mistakes correspond to specific linguistic properties?
- What linguistic structures does the system handle well? What linguistic structures does it handle poorly?
- How might these strengths and weaknesses be reflected in the system's global performance?
- Optional: How might these errors be addressed in future systems?
M1: Project proposal
- Identify 3 potential systems and rank them in order of preference. For each system, specify:
- The paper's bibliographical info (i.e., the authors, title, and publication venue)
- URLs to the paper, package/code, and dataset
- Submit a sample input and output from using the package, including the command you ran to obtain the output. (In other words, please try running the package on the selected dataset before submitting M1. This will help you to determine whether there are enough substantive errors to analyze!)
M2: Feasibility update
- Specify which paper, package, an ddataset you will use for the term project, including:
- The paper's bibliographical info (i.e., the authors, title, and publication venue)
- URLs to the paper, package/code, and dataset
- Which systems you tried in addition to your selected system, if any, and the reasons you decided they were not feasible.
- Submit a clear and detailed description of the package:
- What is the tool intended for?
- How was it implemented (at a high level)?
- How was it evaluated?
- Submit a clear and detailed description of the dataset, akin to the short data statements proposed by Bender and Friedman (2018) or the executive summary element from Bender et al 2021.
- Include the main results table from the paper (copy and pasted) and contextualize the results by describing the evaluation metrics reported in the table.
- Describe the error analysis, if any, that the authors had performed.
- Include a clear plan of how you will perform your error analysis, including a few key error examples and your preliminary error categories.
M3: Project deliverables
- Please submit a write-up with the following sections:
- Introduction and Background. Describe and cite your selected package: What is
the tool intended for? How was it implemented and evaluated?
- Data. Describe and cite the dataset you used for the analysis, akin to the short data statements proposed by Bender and Friedman (2018)
- Replication results. Include the results table from the original paper, as well as a separate table that reports the numbers you obtained from evaluating the package on the dataset, using the same evaluation metrics as the original paper.
- Methodology. Describe your approach to classifying errors: What error categories did you define and why? How did you sample the errors?
- Error Analysis Results. Report the number of errors for each error type in an easy- to-read table as well as in succinct prose.
- Discussion. Discuss your findings in detail, particularly the meaning of the error categories and any technical hypotheses you might have about why the errors occurred. Given these errors, what are the implications of this tool being deployed? How might your analysis inform the development of future systems and/or broader inquiry in computational linguistics?
- References. Include a proper citation and bibliography entry for any resources addressed or utilized in your project, including papers, packages, and corpora
Project Type 2: Literature review (+ research proposal)
Take a deep dive into a topic that interests you.
This project is great for: Anyone who is super intrigued by a particular topic or wants to develop a proposal for later research!
Recommended partners: Forming groups where each member brings different strengths will help you address each other's reading questions and promote interesting discussions.
Overview
- Conduct a literature review on a topic that interests you
- Discuss how the papers relate to one another and to your broader topic of interest. E.g.,
- How do the papers define the general area of research?
- How do the papers build on earlier work?
- How do they differentiate themselves from earlier work?
- Based on these papers, what is the current state of knowledge?
- What research questions do these papers leave open or unanswered?
- You may also find these critical reading questions useful
- Optional: Use this literature review to motivate a proposal for future research
- Group-to-paper ratio:
- Groups of 1 should analyze at least 3 papers
- Groups of 2 should analyze at least 4 papers
- Groups of 3 should analyze at least 6 papers
M1: Project proposal
- Briefly describe the topic of your literature review
- Suggest papers to read, including a URL and bibliography entry for each (i.e., a proper citation and the paper's provided abstract)
M2: Feasibility update
- Present a clear and detailed description of your selected literature review topic
- Include brief summaries for a subset of the papers. Each summary should describe the paper's research question, methods, data, and key findings, as well as anything else noteworthy about the paper
- Groups of 1 should summarize at least 1 paper
- Groups of 2 should summarize at least 2 papers
- Groups of 3 should summarize at least 3 papers
- If you feel that you don't have enough to synthesize, add more papers to yourbibliography at this point.
- Optional: Provide a brief sketch of a proposal for future research
M3: Project deliverable
- Submit a write-up with the following sections (though you are welcome to modify the structure and/or headers of the paper):
- Introduction. Describe your topic of interest and what initially drew you to the topic
- Background. Summarize the papers you reviewed. Each paper's summary should describe its research question, methods, data, and key findings
- Discussion. Provide a broader discussion that relates the papers to one another and to your broader topic of interest. E.g., were there any details that were particularly interesting to you? Were some of the papers more compelling than the others and why? What research questions do these papers leave open? What questions did the papers raise for you? Who is funding this kind of research? What are the social implications of this body of work?
- Proposal (optional). Propose a future research project, using the literature review to motivate the proposal's research question and/or methods. Be sure to articulate the proposal's research question and methods. (It's okay if some aspects of the proposal are hypothetical!)
- References. Include a proper citation and bibliography entry for each paper you reference in your write-up. You should also cite any work referenced in the write- up, even if it's not one of your core reviewed piece.
Project Type 3: Project extension
Extend a relevant term project from another class.
This project is great for: Anyone who has a relevant term project from another class!
Recommended partners: Anyone who was partnered with you on the original term project.
Overview
- Extend a relevant term project from another class. This can be a class you're currently enrolled in or one you've already taken. Possible extensions include (but are not limited to):
- Performing a linguistically informed error analysis of an NLP project for another class
- Analyzing the linguistic dimensions of an NLP/AI-related problem
- Supplementing the literature review for a linguistics term project with relevant papers that utilize computational methods
M1: Project proposal
- Describe the original term project and how you plan to extend it to incorporate LING/CSE 472 course concepts
- Specify the course of the original term project.
- Proposegoals for M2 that will help you manage your progress throughout the quarter
- Optional: Submit the prompt for the original term project or the original term project itself (if it is already completed)
M2: Project sketch
- M2 will depend on the proposed extension, the goals you suggest in M1 for M2, and instructor feedback on M1
- M2 should include a proposed paper outline for M3. Use the paper outlines for M3 for the other two project types as a starting point.
M3: Project deliverable
- Submit the full extended term project
- (M3 will largely depend on the nature of the original term project)
Back to course page
Last modified: