Ling 472/CSE 472: Introduction to Computational Linguistics
Spring 2020

Final project information

General remarks about the project

Project groups

Finding candidate packages

Visit the ACL Anthology, which provides access to most recent publications in computational linguistics/NLP.

  1. Pick a conference or workshop (e.g. ACL, COLING (more linguistics), SIGBIOMED...)
  2. Pick a year (more recent papers are more likely to have software that still works, but you don't have to do 2019 or even 2018)
  3. Browse the titles. If the title interests you, open the paper and check that it has a link to their GitHub repository or other place where they store their code and data
  4. Alternatively, some titles have "software" and/or "dataset" icons right next to them, in which case you may not need to dig in and look for the project repository (e.g. on GitHub). But it is rare to have software and data uploaded directly with the paper, so, do look inside the paper to see if code and data are easily discoverable and usable.
  5. Another option is to browse the listings at Papers with Code
  6. VERIFY that you can download both code and data and run the tool on at least sample data.

Resources

Here are a few resources about error analysis in NLP:

Project milestones

Term paper expectations

A good term paper is detailed, focused, and clear. It amalgamates what you learned during the entire quarter, as it relates to the particular tool that you were exploring. This includes levels of linguistic structure, algorithms, computational approaches, evaluation. A good paper demonstrates a strong ability to reason about all of those things in a manner such that the reasoning is easy to follow and clear to the reader. The descriptions of the items below are just minimal guidelines; please be warned that minimal effort may result in a low score.

You paper should be 8 pages maximum, double-spaced and include the following sections:

  1. Introduction and Background: Present the tool that you chose. For example, what is the package doing? Why is it important and interesting? What is the novelty of their approach?
  2. Data: What dataset are you using? Describe. If it did not come with the package, explain why and how you came to use this data and where you got it.
  3. Replication results: Include both the table from the paper you are working from and a separate table giving the numbers you got when you ran the software and evaluation scripts.
  4. Methodology: Describe the structure and the meaning of your error analysis. What error categories did you define? How did you come up with them? How did you sample the errors to analyze?
  5. Error Analysis Results: Present the number of errors of each type in clear, easy to understand tables accompanied by clear, informative but ideally minimal prose.
  6. Discussion: Discuss your findings in detail, particularly the meaning of the error categories and any technical hypotheses that you might have about the reasons behind the errors and the implications of the tool being deployed with these errors. How does your error analysis inform possible future development of the tool? What are the implications of this project for broader inquiry in computational linguistics?
  7. Work allocation: A clear description of who did what in running software, designing error analysis, performing error analysis, and writing the results. All project members are expected to contribute to the writing.
  8. Bibliography: For any resources you are using (corpora, toolkits, etc) you should include a proper citation, both in the text (as (Author, year)) and in the bibliography. Likewise for any works cited.

Back to course page