Linguistics 290A/1: Answers

Back to labs/answer key page

The 68,000 number was based on incorrect calculations. By using the files in the directory brown1, and removing the line numbers, I get the following:

[dwin-1210-2:/home/ebender] wc brown1*
   5903   62542  397250 brown1_h.txt
   5045   58490  329761 brown1_n.txt
   5073   58758  329928 brown1_p.txt

The file sizes remain fairly comparable. The following commands count the number of foreign words:

tr ' ' '\012' < brown_h.tag | grep '_FW' | wc
tr ' ' '\012' < brown_n.tag | grep '_FW' | wc
tr ' ' '\012' < brown_p.tag | grep '_FW' | wc

The result is 33 foreign words in category h, 36 in n and 71 in p. The file sizes are close enough to know just from those numbers that there are many more foreign words in p than in h and n. Normalizing, we find the following:

Category      FW/1000 words
---------------------------
       H      0.528
       N      0.615 
       P      1.208

Back to labs/answer key page

Emily M. Bender
Last modified: Fri Dec 8 12:02:40 2000