Syllabus for LING 572

Advanced Statistical Methods in Natural Language Processing

Winter 2006

 

Instructor:                                                                       Fei Xia

Time & Location:                                                             TR 1:30-2:50pm, MEB 242

 

Office Hours:                                                                  Wed: 3-5pm

Office Phone:                                                                  (206) 543-9764

Email:                                                                              fxia at u

                                                                                                                    (include "Ling572" in the subject line).

 

Website: http://faculty.washington.edu/fxia/courses/LING572-Winter2006.shtml                                                                         

 

 

 

Course Description:

In this course, we will study statistical algorithms that produce state-of-the-art results on NLP tasks. We will compare supervised learning algorithms that require a lot of training data with the unsupervised ones. We will also study a few important discriminative models.  Students will gain hands-on experience by applying these algorithms to real NLP tasks.

 

 

Course Texts:

  None.  Reading materials are online.

 

  Background reading (for people who have not taken LING570 or LING571) can be found in

“Foundations of statistical natural language processing” (M&S) by Manning and Schütze, 1999.

 

 

Prerequisites:

  LING 570 and LING 571

  Stat 391 (Prob. and Stats for CS) or equivalent

  Programming:

-         C/C++ or Java

-         basic unix/linux commands (e.g., ls, cd, ln, sort, head):  tutorials on unix

-         Perl (optional): tutorials on Perl

 

 

Grading: All homework and projects are due on Thursday, except for Parts 3, 4 and 6 of the project.

   Project (60%): there are six parts. Each part is 10%.

   Homework (30%): there are three assignments.

   Class participation (10%).

  

 

Schedule:

 

Week

Date

Topic

Reading

 

Homework

due

1

1/3

1/5

Introduction

FSA and HMM

M&S 2

M&S 9.1, 9.2

Hw1 (key)

(due 1/12)

 

2

1/10

1/12

Supervised Learning I:

-         Decision tree

-         Decision list

 

 

Decision tree tutorial

Rivest (1987)

Yarowsky (1994):

 

P1: Trigram (Baseline, due 1/19)

 

 

Hw1

3

1/17

1/19

Supervised Learning II

-         TBL

 

 

Brill (1995)

 

 

Hw2 (key)

 (due 1/26 and 2/2)

 

 

P1

4

1/24

1/26

Supervised Learning III

-         Review #1

-         Bagging

-         Project Part 2

Breiman (1996)

 

 

 

P2: TBL

(due 2/16)

Hw2 (Part I)

5

1/31

2/2

Supervised Learning IV

-         System Combination

-         Boosting

 

 

Henderson and Brill (1999)

 

Henderson and Brill (2000)

 

Freund & Schapire (1999)

 

Abney et. al.  (1999)

 

 

Hw3 (key)

 (due 2/9)

 

 

Hw2 (Part II)

6

2/7

2/9

Supervised Learning IV

-         MaxEnt

 

 

 

Ratnaparkhi (1997)

 

Ratnaparkhi (1996)

 

Hw3

7

2/14

2/16

Supervised Learning V:      

 - Review #2 and Project Part 4

 

Semi-supervised Learning I

-    Self-training

     (Bootstrapping)

 

Yarowsky (1995)

 

 

 

P3: MaxEnt

(due 3/7)

 

P4: Part 4

(due 3/7)

 

P5-6: (P5 due 3/9, P6 due 3/14)

P2

8

2/21

2/23

Semi-supervised learning II:

- Co-training

 

Unsupervised Learning I

-   The EM algorithm (Part 1)

-    Forward-backward algorithm

Blum (1998)

 

Nigam and Ghani (2000)

 

 

M&S 9.3

 

 

 

9

2/28

3/2

Unsupervised Learning II

-     Inside-outside algorithm

-    The EM algorithm (Part 2)

 

 

M&S 11.3

 

 

 

 

 

10

3/7

3/9

Final Review

 

Presentation by students

-         Bagging (David and Lap)

 

-         Bagging and System Combination (Joshua Minor, Ping, and Dan)

 

-         System Combination (Brian, Shauna and Joshua Johanson)

 

-         Self-training (Albert, Achim and Zhengbo)

 

-         Bagging (Anna, Gabriel, and Yowren)

 

 

 

 

P3 and P4 (due 3/7)

 

P5 (due 3/9)

 

P6 (due 3/14)

 

 

Reading materials: (required)

 

  • Decision Tree:
    • Decision Tree tutorial at http://www.decisiontrees.net/tutorial/1_intro.html

 

  • Decision List:
    • Ronald Rivest (1987): Learning Decision Lists. Machine Learning. 2(3), pp. 229-246.
    • David Yarowsky (1994): Decision Lists for Lexical Ambiguity Resolution: Application to Accent Restoration in Spanish and French. Proceedings of ACL-94. pp. 88-85.

 

  • Transformation-based learning
    • Eric Brill (1995): Transformation-Based Error-Driven Learning and Natural Language Processing: A Case Study in Part-of-Speech Tagging (1995). Computational Linguistics, volume 21, number 4, pp. 543-565.

 

  • Bagging:
    • Breiman (1996): Bagging predictors. Machine Learning, 24(2), 123-140.

 

 

  • Boosting
    • Y. Freund  and R. Schapire (1999): A short introduction to boosting . Journal of the Japanese Society for Artificial Intelligence. 14(5), pages 771-780, 1999.
    • Steven Abney et. al.  (1999): Boosting Applied to Tagging and PP Attachment. Proceedings of the 1999 Joint SIGDAT Conference on Empirical Methods in Natural Language Processing and Very Large Corpora, pp. 38-45. 1999. 

 

  • Maximal Entropy:
    •  Adwait Ratnaparkhi (1997): A simple introduction to maximum entropy models for natural language processing. Technical Report 97-08, Institute for Research in Cognitive Science, University of Pennsylvania.
    • Adwait Ratnaparkhi (1996): A Maximum Entropy Model for Part-of-Speech Tagging. In Proceedings of the Conference on Empirical Methods in Natural Language Processing. pp. 133-142

 

 

  • Self-training (Bootstrapping):
    • David Yarowsky (1995):  Unsupervised Word Sense Disambiguation Rivaling Supervised Methods, Proceedings of ACL-95. pp. 189-196

 

  • Co-training:
    • Avrim Blum (1998): Combining Labeled and Unlabeled Data with Co-training. Avrim Blum and Tom Mitchell. In Proc. of the Workshop on Computational Learning Theory (COLT98). 1998.
    • Kamal Nigam and Rayid Ghani (2000): Analyzing the Effectiveness and Applicability of Co-training. In Ninth International Conference on Information and Knowledge Management (CIKM-2000), pp. 86-93. 2000.

 

 

 

Additional papers: (not required)

  • FSA and HMM:
    • Dupont et. al. (2004): Links between Probabilistic Automata and HMMs: probability distributions, learning models, and induction algorithms. Pattern Recognition, 2004.

 

  • Decision tree:
    • Book: Ross Quinlan (1993): C4.5: Programs for Machine Learning. Morgan Kaufmann Series in Machine Learning

 

  • Transformation-based learning

 

 

  • Maximum Entropy Models:

 

  • Bootstrapping:
    • B. Efron and R. Tibshirani (1993): An introduction to the Bootstrap. Chapman and Hall

 

  • Semi-supervised learning:
    • Olivier Chapelle et al. (2005): Semi-supervised Learning. The MIT Press.

 

 

  • Inside-outside algorithm:
    • K Lari and S. J. Young (1990). The estimate of stochastic context-free grammars using the inside-outside algorithm. Computer Speech and Language 4:35-56.