Syllabus for Linguistics 573

Systems/Applications

Spring 2006

 

 

Professor:

William Lewis

Time & Location:

TTh 3:30-4:50, MGH 287

 

 

Office:

Padelford A-215

Hours:

Tu 5:00-6:00, Th 2:15-3:15

 

 

Office Phone:

616-5728

e-mail

wlewis2@u.washington.edu

Please include "Ling 573" in the subject line.

 

Course Description:

 

This course looks at building coherent systems designed to tackle practical applications. Particular topics will vary year to year.  This year the focus of the class will be on Question Answering, Search and Information Retrieval.

 

Course Objectives:

§         Give students hands-on experience integrating existing components into working end-to-end systems.

§         Give students a chance to directly apply skills and knowledge acquired in the program (specifically from 570, 571, and 572) to the same end.

§         Give students hands-on experience developing evaluation criteria for multi-component systems.

 

Course Texts:

 

Voorhees, Ellen M. and Donna K. Harman (2005).  TREC:  Experiment and Evaluation in Information Retrieval.  Cambridge, Mass:  The MIT Press.

 

Highly Recommended:              Manning & Schütze, Foundations of Statistical NLP

                                                Jurafsky & Martin, Speech and Language Processing

 

Other Materials:

 

Miscellaneous readings as required.

 

Prerequisites:    Ling 570, 571 and 572 or equivalents.

 

Grading:

Project documents (proposals, design specs, class presentations, etc.):  50%

Final project:  40%

Class participation: 10%

 


 

Tentative Course Schedule:

Week

Date

Topic

Reading/Homework Assignments

1

3/28

3/30

 

3/28 Slides

 

3/30

Slides

(ppt)

Introduction and Overview

- Themes for the quarter:

- IR/QA

- TREC

 

- Why is it of interest?

- How it is relevant or useful to CompLing?

- Tie back to the skills and knowledge you’ve acquired

- Describe project, project goals and rough project design

- Structure of project proposals

- Project foci

- Project components/phases

- Software

- Software, corpora and tools that will be needed for student projects (some general guidelines)

- Where to acquire software, tools, and corpora

- What’s loaded on Pongo

- Form project teams (2+ students per)

- Lucene intro

 

TREC text:  Ch 1,2 (§ ), 10

TREC 2005 QA Track Guidelines (password req’d) and materials

Hirschman & Gaizauskas 2001

Hovy et al 2001

    The other Hovy et al 2001 (TREC 2001)

 

Welcome to TREC e-mail

 

 

2

4/4

4/6

 

4/4 Slides:

Hovy,

Soub.,

Hara.

 

4/6 slides:

Google

Lucene

LingPipe

 

Project Overview

- Detailed analysis of the problem

- Methodology to be employed

- Methods for evaluation

 

Harabagiu et al 2005

 

Additional readings related to questions:

Dumais et al 2002

Ravichandran & Hovy 2002

Soubbotin and Soubbotin 2001

 

 

Additional case studies in TREC text (for idea mining):

    IBM (Xu & Croft Query Expansion paper)

    EU Twenty-One

    UMASS

    Waterloo

    CUNY (PIRCS)

    CU, London (Okapi)

    Cornell (SMART)

    Addtl. papers on TREC site may be useful (but there is some junk)

3

4/11

4/13

Project Proposals Reviewed

Student meetings

- Presentations of problems and solutions

- How each of the challenges will be addressed and surmounted

- Critical review by instructor and fellow students

Preliminary project proposals due by Monday, April 10th at midnight.  Students to review all other proposals prior to class on 4/11.

 

Note:  Project proposals must explicitly detail the methods to be used to evaluate results.

4

4/18

4/20

 

Quest.:

ISI

UIUC Mapping

Tagged Quest.

 

 

Project Component(s)/Phase #1

- Question

- Interpretation

- Categorization

- Building the Query

- Component overview

- Relevance to the project at hand

- Issues related to design, implementation/integration

- Overview of relevant literature

Reading:  Literature relevant to the current task at hand

 

> Weeks 4-9 concentrate on the various components that make-up the project.  Readings will integrate with each component, with topics relevant to the design and implementation/integration of that component. 

 

> Progress Reports will correspond to each component.

5

4/25

4/27

(Software)

Student presentations #1

- Design relevant to team focus

- Implementation/integration to team focus

Student Presentations 4/27

Progress Report #1, Results, and Code Due 4/30

 

6

5/2

5/4

Evaluation of Project Phase #1

     (In class)

Project Component(s)/Phase #2

- Finding the relevant documents

- What preprocessing is necessary

- Ranking documents according to TREC guidelines

 

Reading:  Literature relevant to the current task at hand

7

5/9

5/11

Student presentations #2

Progress Report #2 Due

8

5/16

5/18

(Phase #3 Desc.)

Project Component(s)/Phase #3

- Answer generation (Limited)

 

Reading:  Literature relevant to the current task at hand

 

Note:  Because of the limited amount of time, full answer generation is not expected.  This phase may be limited to fairly coarse passage retrieval for all systems.

 

9

5/23

5/25

Student presentations #3

- How will your system locate passages that contain the answer(s)

- How will your system present the answers to the user (list of coherent passages, generation, etc.?)

 

Progress Report #3 Due

10

5/30

6/1

Project Presentations

Projects Due 6/1

Note:  Project Presentations must include evaluation statistics so that the various projects can be compared and evaluated.