Linguistics 573
Phase #3
This is the third and final phase of our project, at least for the portion of the project we can complete in the context of the class. For phase #2, you processed questions and returned results consisting of a ranked list of a thousand or fewer documents. Evaluation was recall biased, with some modifications favoring higher ranking answer-bearing documents. For phase #3, you are to produce output that consists of a much smaller set of answer-bearing documents. Results will be more precision biased, in that those documents included that are not answer-bearing will receive a penalty. This is a much more challenging phase since it is essentially passage retrieval, where the passages returned are the documents themselves.
Output format:
Follow the TREC guidelines for 2005 for answer submissions. The basic format is as follows:
66.1 your-group-name XIE20000822.0203
66.2 your-group-name NYT20000816.0039
66.3 your-group-name XIE20000822.0297
66.4 your-group-name APW20000819.0131
66.5 your-group-name APW20000821.0185
66.5 your-group-name XIE20000821.0036
66.5 your-group-name XIE20000822.0018
66.5 your-group-name XIE20000822.0046
66.6 your-group-name XIE20000824.0003
66.7 your-group-name NYT20000831.0401
66.7 your-group-name NYT20000831.0401
67.1 your-group-name APW20000513.0001
67.2 your-group-name XIE20000611.0169
67.3 your-group-name APW20000513.0016
The only deviation from the TREC guidelines is the absence of the answers/passages themselves (which would normally be appended to the end of the output record). Also, you will not be responsible for generating output for the “Other” questions.
Evaluation will be precision-biased. Inclusion of irrelevant documents, that is, documents that do not contain answers, will lower your score. For factoid questions, the presence of one answering bearing document is sufficient, although the inclusion of more than one will have no effect on your score. List questions will be evaluated separately, and will be scored based on the number of relevant documents included in the output.
More information on evaluation, and the specific tools that will be used, is forthcoming.
Some notes and rules:
1. You may train against any of the 2003 and 2004 data, including the phase2Eval and phase3Eval output. 2005 data and results must be held out and are not available to training.
2. For calculating the factoid results, the inclusion of more than one answer bearing document will have no effect on your score.
3. Ignore the “Other” questions.
4. Get started early. The due date given is the last possible day and time that could be given, and cannot be revised or extended for any reason.
due date: 6 am, Thursday, June
Submit the following: