Assignment 3, Part I: Ngrams.

1. In your own words, what are the purposes of each of these flags?
      In ngram-count
         a. -text
         b. -order
         c. -wbdiscount
         d. -lm
      
      In ngram
         a. -lm
         b. -order
         c. -ppl


2. What is the perplexity of your bigram model against 'hislastbow.txt'?


3. Against 'lostworld.txt'?


4. Against 'otherauthors.txt'?


5. Why do you think the files with the higher perplexity got the higher perplexity?


6. Show the perplexity figures for
      trigram against hislastbow.txt:
      trigram against lostworld.txt:
      trigram against otherauthors.txt:
      4gram against histlastbow.txt:
      4gram against lostworld.txt:
      4gram against otherauthors.txt:


7. Which combination gives the best perplexity result?
      N-gram order:
      discounting:
      test file:


8. How are the data files you used formatted? I.e., what preprocessing was done on the texts?


9. What additional preprocessing step should have been taken?


10. Discuss briefly how/whether this affects the quality of the language models built.