Syllabus for Linguistics 571

Syllabus for LING 572

Advanced Statistical Methods in Natural Language Processing

Winter 2006

Instructor: Fei Xia

Time & Location: TR 1:30-2:50pm, MEB 242

Office Hours: Wed: 3-5pm

Office Phone: (206) 543-9764

Email: fxia at u

(include "Ling572" in the subject line).

Website: http://faculty.washington.edu/fxia/courses/LING572-Winter2006.shtml

Course Description:

In this course, we will study statistical algorithms that produce state-of-the-art results on NLP tasks. We will compare supervised learning algorithms that require a lot of training data with the unsupervised ones. We will also study a few important discriminative models. Students will gain hands-on experience by applying these algorithms to real NLP tasks.

Course Texts:

None. Reading materials are online.

Background reading (for people who have not taken LING570 or LING571) can be found in

“Foundations of statistical natural language processing” (M&S) by Manning and Schütze, 1999.

Prerequisites:

LING 570 and LING 571

Stat 391 (Prob. and Stats for CS) or equivalent

Programming:

- C/C++ or Java

- basic unix/linux commands (e.g., ls, cd, ln, sort, head): tutorials on unix

- Perl (optional): tutorials on Perl

Grading: All homework and projects are due on Thursday, except for Parts 3, 4 and 6 of the project.

Project (60%): there are six parts. Each part is 10%.

Homework (30%): there are three assignments.

Class participation (10%).

Schedule:

Week

Date

Topic

Reading

Homework

due

1/3

1/5

Introduction

FSA and HMM

M&S 2

M&S 9.1, 9.2

Hw1 (key)

(due 1/12)

1/10

1/12

Supervised Learning I:

- Decision tree

- Decision list

Decision tree tutorial

Rivest (1987)

Yarowsky (1994):

P1: Trigram (Baseline, due 1/19)

Hw1

1/17

1/19

Supervised Learning II

- TBL

Brill (1995)

Hw2 (key)

(due 1/26 and 2/2)

1/24

1/26

Supervised Learning III

- Review #1

- Bagging

- Project Part 2

Breiman (1996)

P2: TBL

(due 2/16)

Hw2 (Part I)

1/31

2/2

Supervised Learning IV

- System Combination

- Boosting

Henderson and Brill (1999)

Henderson and Brill (2000)

Freund & Schapire (1999)

Abney et. al. (1999)

Hw3 (key)

(due 2/9)

Hw2 (Part II)

2/7

2/9

Supervised Learning IV

- MaxEnt

Ratnaparkhi (1997)

Ratnaparkhi (1996)

Hw3

2/14

2/16

Supervised Learning V:

- Review #2 and Project Part 4

Semi-supervised Learning I

- Self-training

(Bootstrapping)

Yarowsky (1995)

P3: MaxEnt

(due 3/7)

P4: Part 4

(due 3/7)

P5-6: (P5 due 3/9, P6 due 3/14)

2/21

2/23

Semi-supervised learning II:

- Co-training

Unsupervised Learning I

- The EM algorithm (Part 1)

- Forward-backward algorithm

Blum (1998)

Nigam and Ghani (2000)

M&S 9.3

2/28

3/2

Unsupervised Learning II

- Inside-outside algorithm

- The EM algorithm (Part 2)

M&S 11.3

3/7

3/9

Final Review

Presentation by students

- Bagging (David and Lap)

- Bagging and System Combination (Joshua Minor, Ping, and Dan)

- System Combination (Brian, Shauna and Joshua Johanson)

- Self-training (Albert, Achim and Zhengbo)

- Bagging (Anna, Gabriel, and Yowren)

P3 and P4 (due 3/7)

P5 (due 3/9)

P6 (due 3/14)

Reading materials: (required)

Decision Tree:

Decision Tree tutorial at http://www.decisiontrees.net/tutorial/1_intro.html

Decision List:

Ronald Rivest (1987): Learning Decision Lists. Machine Learning. 2(3), pp. 229-246.
David Yarowsky (1994): Decision Lists for Lexical Ambiguity Resolution: Application to Accent Restoration in Spanish and French. Proceedings of ACL-94. pp. 88-85.

Transformation-based learning

Eric Brill (1995): Transformation-Based Error-Driven Learning and Natural Language Processing: A Case Study in Part-of-Speech Tagging (1995). Computational Linguistics, volume 21, number 4, pp. 543-565.

Bagging:

Breiman (1996): Bagging predictors. Machine Learning, 24(2), 123-140.

System combination:

John Henderson and Eric Brill (1999): Exploiting Diversity in Natural Language Processing: Combining Parsers: EMNLP 1999.
John Henderson and Eric Brill (2000): Bagging and Boosting a Treebank Parser. ANLP 2000.

Boosting

Y. Freund and R. Schapire (1999): A short introduction to boosting . Journal of the Japanese Society for Artificial Intelligence. 14(5), pages 771-780, 1999.
Steven Abney et. al. (1999): Boosting Applied to Tagging and PP Attachment. Proceedings of the 1999 Joint SIGDAT Conference on Empirical Methods in Natural Language Processing and Very Large Corpora, pp. 38-45. 1999.

Maximal Entropy:

Adwait Ratnaparkhi (1997): A simple introduction to maximum entropy models for natural language processing. Technical Report 97-08, Institute for Research in Cognitive Science, University of Pennsylvania.
Adwait Ratnaparkhi (1996): A Maximum Entropy Model for Part-of-Speech Tagging. In Proceedings of the Conference on Empirical Methods in Natural Language Processing. pp. 133-142

Self-training (Bootstrapping):

David Yarowsky (1995): Unsupervised Word Sense Disambiguation Rivaling Supervised Methods, Proceedings of ACL-95. pp. 189-196

Co-training:

Avrim Blum (1998): Combining Labeled and Unlabeled Data with Co-training. Avrim Blum and Tom Mitchell. In Proc. of the Workshop on Computational Learning Theory (COLT98). 1998.
Kamal Nigam and Rayid Ghani (2000): Analyzing the Effectiveness and Applicability of Co-training. In Ninth International Conference on Information and Knowledge Management (CIKM-2000), pp. 86-93. 2000.

Additional papers: (not required)

FSA and HMM:

Dupont et. al. (2004): Links between Probabilistic Automata and HMMs: probability distributions, learning models, and induction algorithms. Pattern Recognition, 2004.

Decision tree:

Book: Ross Quinlan (1993): C4.5: Programs for Machine Learning. Morgan Kaufmann Series in Machine Learning

Transformation-based learning

Florian et. al. (2000): Coaxing confidences from an old friend: probabilistic classifications from transformation rule lists, EMNLP-2000.
Ngai and Florian (2001): Transformation-based learning in the fast lane, NAACL-2001.

Maximum Entropy Models:

Adam Berger et. al. (1996): A maximum entropy approach to natural language processing. Computational Linguistics, (22-1), March 1996.
Klein and Manning (2003): Maxent Model, Conditional Estimation, and Optimization, ACL 2003 tutorial.

Bootstrapping:

B. Efron and R. Tibshirani (1993): An introduction to the Bootstrap. Chapman and Hall

Semi-supervised learning:

Olivier Chapelle et al. (2005): Semi-supervised Learning. The MIT Press.

Co-training

Sally Goldman and Yan Zhou (2000): Enhancing Supervised Learning with Unlabeled Data. Proc. 17th International Conf. on Machine Learning (ICML-2000). pages 327--334, 2000.
Dasgupta et. al. (2001): PAC generalization bounds for co-training. Neural Information Processing Systems (NIPS), 2001.
Steven Abney (2002): Bootstrapping. In Proc. of ACL-02. 2002.

Inside-outside algorithm:

K Lari and S. J. Young (1990). The estimate of stochastic context-free grammars using the inside-outside algorithm. Computer Speech and Language 4:35-56.

Dempster, A., Laird, N., and Rubin, D. (1977). Maximum Likelihood from Incomplete Data via the EM Algorithm Journal of the Royal Statistical Society, Series B, 39(1):1–38.
The EM Algorithm and Extensions, by Geoffrey J. McLachlan, Thriyambakam Krishnan, 1997, John Wiley & Sons, Inc.
David Elworthy (1994): Does Baum-Welch Re-estimation help taggers? Proceedings of 4th ACL Conf on ANLP, Stuttgart. pp. 53-58.
Jeff Bilmes (1997): A Gentle Tutorial on the EM Algorithm and its Application to Parameter Estimation for Gaussian Mixture and Hidden Markov Models. Technical Report, University of Berkeley, ICSI-TR-97-021.
Michael Collins (1997): The EM algorithm. Manuscript.
Shu Ng et. al: the EM algorithm. Online tutorial.