Ling/CSE 472: Assignment 4: Parsing

Due May 12th (Part 1) and May 18th (Parts 2 & 3)

For this assignment, you will be using the LKB grammar development environment. The LKB is installed on the machines in the Treehouse. It is open source software, and runs on linux. If you want to run it on a Mac or Windows machine, you can set up Linux as a virtual machine using VirtualBox. (Instructions for installing KNOPPIX+LKB as a VB appliance) In fact, even for linux users, we recommend installing the virtual machine as the most reliable way to get the LKB running.

For this assignment, you will be asked to turn in typed answers (Parts 1, 2 and 3) and working grammars (Parts 2 and 3). Here are template answer files for Part 1 and Parts 2 and 3. Recall that we are looking for complete sentences/paragraphs in answer to these questions.

Be sure to get an early start so that you have time to ask questions on Canvas.

Part 1. CFGs and the LKB Parse Chart

Start the LKB, and load the grammar

A. Charts and trees, edges and nodes

B. Start symbols

Part 2. Feature structures

In order to give you some experience with feature structures and unification, this part of the assignment asks you to do some grammar engineering, using the LKB. In particular, you will implement the feature-based analysis of agreement (subject-verb and determiner-noun) described in the chapter, and a simple feature-based version of subcategorization.

Getting started

Background: the LKB, types, and tdl syntax

grammar1 (used in part 1 above) looked like a CFG, but in fact, it, like grammar2, is a typed feature-structure (TFS) grammar, since that is what the LKB can parse. (The CFGs are just CFGs coded up as TFS grammars.) Furthermore, they are stated in Type Description Language (TDL), with the following syntax:

typename := supertype1 & supertype2 & ... & supertype3 &
 [ FEATURE_1 value_1,
   FEATURE_2 value_2,
      ...
   FEATURE_N value_n ].

Features with complex values look like this:

typename := supertype &
  [ FEATURE_1 [ FEATURE_2 value_2,
                FEATURE_3 value_3 ],
    FEATURE_4 value_4 ].

If you just want to talk about one feature inside the value of another, you can write it like this: (Note the period in between the feature names.)

typename := supertype &
   [ FEATURE_1.FEATURE_2 value2 ].

NB: The following is not equivalent to the above:

typename := supertype &
   [ FEATURE_2 value2 ].

Reentrancy is indicated with variables beginning with #. For example:

typename := supertype &
 [ FEATURE_2 #same,
   FEATURE_3.FEATURE_4 #same ].

Lists are represented as feature structures with two features, FIRST and REST. The value of FIRST is the first element on the list. The value of REST is another list, or *null*. (The relevant types are defined at the bottom of types.tdl.) You can use these features to constrain particular members of lists, or you can use the following notation:

typename := supertype &
  [ LIST_FEATURE < [ FEATURE_1 value_1 ] , ... > ].

typename := supertype &
  [ LIST_FEATURE < othertype , [ FEATURE_1 value_1 ] > ].

The first example above involves a list with at least one element on it. The second involves a list with exactly two elements on it.

The LKB requires that all features be declared for exactly one type (they can be inherited and further constrained by subtypes). If you try to introduce features with the same name on different types (and one is not a subtype of the other) it will complain.

The LKB distinguishes between types and instances, the latter to be found (in this grammar) in rules.tdl and lexicon.tdl. It is only because of this that we can 'overload' symbols like s and use them as both rule names and type names.

Here is a partial LKB/Grammar Engineering FAQ, which may be helpful.

Feature-based analyses of agreement and valence

To turn in your grammar for this part, create a tar archive, and upload it to Canvas. You can create the tar archive by running the following command in the directory above `grammar2'.

tar czf grammar2.tgz grammar2/

Part 3: Using types to capture generalizations

Make a copy of grammar2 to modify for this part of the assignment:

cp -r grammar2 grammar3

For this part of the assignment, you should modify grammar3.

You may have noticed that you were typing the same thing over and over again in Part 1. This part of the assignment asks you to make use of the types to state each constraint exactly once, and make any instance or type that needs it inherit from that supertype.

Hints:

When you have a grammar that doesn't have any repeated constraints and that accounts for all of the data (both positive and negative) in test.items, create a tar archive of it, and turn it in!

Don't forget to answer the questions in the file parts-2-3.txt for your write up.


Back to main course page