Ling 573 - Natural Language Processing Systems and Applications
Spring 2011
Deliverable #3: Passage Retrieval: Due May 10, 2011: 09:00


Goals

In this deliverable, you will continue development of your question-answering system. You will

Passage Retrieval

For this deliverable, you will need to implement a passage retrieval system. You may build on techniques presented in class, described in the reading list, and proposed in other research articles.

You system must include indexing and retrieval based on a standard IR engine, such as those described in the resource list. In addition, you system may exploit:

Data

Document Collection
The Aquaint Corpus was employed as the document collection for the question-answering task for a number of years, and will form the basis of retrieval for this deliverable . The collection can be found on patas in /corpora/LDC/LDC02T31/.
Training Data and Development Test Data
You may use any of the pre-2005 TREC question collections for training retrieval. For 2003, 2004 there are prepared gold standard documents and answer patterns to allow you to train and tune your passage retrieval system.

All pattern files appear in /dropbox/10-11/573/Data/patterns.

All question files appear in /dropbox/10-11/573/Data/Questions.

Test Data
You should evaluate on the TREC-2005 questions and their corresponding documents and answer string patterns. You are only required to test on the factoid questions.
NOTE:Please do NOT tune on these questions.

Evaluation

You will compute two sets of evaluation measures.

Outputs

Create two output files in the outputs directory, based on running your passage retrieval system on the test data file. You should do this in two phases:

Extending the project report

This extended version should include all the sections from the original report (with many still as stubs) and additionally include the following new material:

Please name your report D3.pdf.

Presentation

Your presentation may be prepared in any computer-projectable format, including HTML, PDF, PPT, and Word. Your presentation should take about 10 minutes to cover your main content, including: Your presentation should be deposited in your doc directory, but it is not due until the actual presentation time. You may continue working on it after the main deliverable is due.

Summary