Ling/CSE 472: Assignment 2: Morphology, Parsing

Due October 21st, by the start of class

This assignment involves some reflection about English morphology, a little bit of coding with xfst, and some playing with the LKB. For the xfst part, you'll need to turn in some code and results files. For the other parts, write up your answers in the editor format of your choice (plain text, html, pdf or doc). All files should be submitted via ESubmit.

1. Morphology and FSTs

(Adapted form of:) Problem 3.2 (p.89)

Give one example of each of the noun, verb and adjective classes in Figure 3.6, and find two stems (noun, verb, or adjective) which are exceptions to the rules, i.e., which fit the apparent definition of the class, but don't in fact have all of the forms that the machine predicts.

Write up your answer to the above and include it in the documents you submit via ESubmit.

(Adapted form of:) Problem 3.4 (p.89)

Using xfst, write a finite-state transducer that can generate and analyze a small set of verbs in all of their inflected forms. This FST will handle two spelling change rules: the rule that deletes a final e before -ing or -ed, and the rule that inserts a k when c appears between a vowel and -ing or -ed. The first rule is provided already. Your job is to write the second.

NB: xfst defines a language for regular expressions which makes it relatively easy to write morphophonological rewrite rules. For this assignment, however, you must stay with the basic operators. No credit will be given for answers that use -> or its kin. On the other hand, if you get stuck, you might find it helpful to write the rule in that notation, and then examine the network that xfst produces.

To do this assignment, you'll need the following two files:

Download them. If you're using Windows, make sure that it doesn't add any new file extensions.

verb_lexicon is the lexicon of verbs (in citation form) that we'll be working with. k.xfst is the xfst script that does the work. k.xfst is the file you'll need to modify for this part of the assignment.

To start xfst under Windows, select "Run..." from the Start menu, and type "cmd". That will give you a command line interface to the OS. Now navigate to the directory containing xfst (cd c:\ etc). When you're there, type "xfst" at the command prompt. You'll get an xfst prompt.

To start xfst under Linux, navigate to the directory containing it, and type ./xfst.

To run the script, enter:

source k.xfst

After you've run the script, there should be an FST on the stack. To apply that FST, try:

apply up spruced

apply down picnic+ing

Observe that it doesn't yet have the right behavior in the second example.

To examine a network, type:

print net

The network defined in k.xfst is too large to be usefully examined like this, but you might try some others:

read regex [a b c];

print net

read regex [a+ b c];

print net

read regex [e %+ -> 0 || _ [e|i] ];

print net

Modify k.xfst until it has the right behavior. The files produced by the script (underlying, onerule, tworules, and threerules) should be helpful in testing it as you go. You can also use apply up and apply down to observe the behavior of the network. Here is a short summary of xfst syntax.

The documents you submit via ESubmit should include your your modified k.xfst and the output file threerules.

2. Parsing

Start the LKB, and load the grammar

The LKB System (Copestake 2002) (a parser, generator and grammar development environment) is installed on the machines in Denny 109 as well as the machines in the Treehouse.

Start the LKB by going to "My Computer" in the start menu, and then opening C:\Program Files\lkb_windows\windows. In this directory, you'll find lkb.exe. Double click lkb.exe, to start the LKB. You should see a window entitled "Lkb Top". This window has menus that you will use, as well as some space to print messages that you should pay attention to.

Next, you'll need to download the grammar, which consists of these files:

Create a directory in My Documents called Grammar, and save the files there. (No need to cut and paste -- just right click the links and select "Save Target As..."). If Windows added the .txt extension to any of the files, use Rename in (available in the right-click pop-up menu in Windows Explorer) to get rid of it.

Or you can download them all at once in .tgz or .zip format.

Load the grammar by selecting Load > Complete Grammar in the Lkb Top window. In the dialogue that pops up, click "My Documents" (on the left hand side), then "Grammar", then "script" then "open" (the button).

A. Charts and trees, edges and nodes

B. Start symbols

C. Parsers and Grammars

For this part of the assignment, you'll need to edit the file rules.tdl. You need to make sure that when you save it, it is still a text file (although without the .txt extension) and not an rtf file or a Word document or anything else. WordPad seems to be okay. Emacs is a fine choice, if you know how to use it.

In general, parsers (and grammar development environments) should be written to make as few assumptions about the grammars they parse as possible. For example, the LKB allows the user to define the (set of possible) start symbol(s) for their grammar, rather than assuming it's S (see above). However, parsers (and grammar development environments) do need to define what they will accept as well-formed grammars, check for well-formedness, and anticipate other kinds of user errors.

Write up your answers to the above questions and include them in the documents you submit via ESubmit.


Back to main course page