Ling 573 - Natural Language Processing Systems and Applications
Spring 2011
Deliverable #3: Document Ranking Format
A document ranking submission consists of a single file with 6 columns per line. White space is used to separate columns. The width of the columns is not important, but it is important to have exactly six columns per line with at least one space between the columns.
- 1.1 Q0 ZF08-175-870 1 4238 prise1
- 1.1 Q0 ZF08-306-044 2 4223 prise1
- 1.1 Q0 ZF09-477-757 3 4207 prise1
- 1.1 Q0 ZF08-312-422 4 4194 prise1
- 1.1 Q0 ZF08-013-262 5 4189 prise1
where:
- * the first column is the question number (of the form X.Y).
- * the second column is currently unused and should always be Q0.
- * the third column is the official document number of the retrieved document and is the number found in the "docno" field of the document.
- * the fourth column is the rank the document is retrieved, and the fifth column shows the score (integer or floating point) that generated the ranking. This score MUST be in descending (non-increasing) order and is important to include so that we can handle tied scores (for a given run) in a uniform fashion (the evaluation routines rank documents from these scores, not from your ranks). If you want the precise ranking you submit to be evaluated, the SCORES must reflect that ranking.
- * the sixth column is called the "run tag" and should be a unique identifier for your group AND for the method used. That is, each run should have a different tag that identifies the group and the method that produced the run. Please change the tag from year to year, since often we compare across years (for graphs and such) and having the same name show up for both years is confusing. Also run tags must contain 12 or fewer letters and numbers, with *NO* punctuation, to facilitate labeling graphs with the tags. (If you are also participating in the main task, then your run tags must contain 11 or fewer letters and numbers.)
Each question must have at least one document retrieved for it. Provided you have at least one document, you may return fewer than 1000 documents for a question, though note that the standard evaluation measures used in TREC count empty ranks as not relevant. You cannot hurt your score, and could conceivably improve it for these measures, by returning 1000 documents per question.