Learning from Large Corpora
Hand-coded Approaches Limited
- Corpus + Automatic Training
- Face Recognition, Speech Recognition, Part-of-Speech Tagging (Sung & Poggio 1995, Rabiner & Juang 1993, Brill et al. 1991)
Cautionary Note: “Big” problem
Task Features Influences Training Data Results
ASR 625 3 Phones 5000 sentences 95%
Part-of-Speech 60 Tags 2 POS 1.5 million words 97%
WSD 74,000 senses 100+ words ??? ???