Ling/CSE 472: Assignment 2: Morphology

Due April 13th, by 6:00pm

This assignment involves a little bit of coding with xfst. You'll need to turn in some code and results files, which should be submitted via Canvas.

Problem 1. Morphology and FSTs

(Adapted form of:) Problem 3.3 (p.81)

Using xfst, write a finite-state transducer that can generate and analyze a small set of verbs in all of their inflected forms. This FST will handle two spelling change rules: the rule that deletes a final e before -ing or -ed, and the rule that inserts a k when c appears between a vowel and -ing or -ed. The first rule is provided already. Your job is to write the second.

Note: xfst defines a language for regular expressions which makes it relatively easy to write morphophonological rewrite rules. For this problem, however, you must stay with the basic operators. No credit will be given for answers that use the xfst operator -> or its kin. On the other hand, if you get stuck, you might find it helpful to write the rule in that notation, and then examine the network that xfst produces.

To do this assignment, you'll need the following two files:

Copy them somewhere onto your Patas home directory. If you're using Windows to download them, make sure that it doesn't add any new file extensions.

verb_lexicon is the lexicon of verbs (in citation form) that we'll be working with.
k.xfst is the xfst script that does the work. It is the file you'll need to modify for this part of the assignment.

To start xfst, log onto Patas and type "xfst" (your $path variable should already be set appropriately). You'll get an xfst prompt.

To run the script, enter:

source k.xfst

After you've run the script, there should be an FST on the stack. To apply that FST, try:

apply up spruced

apply down picnic+ing

Observe that it doesn't yet have the right behavior in the second example.

Modify k.xfst until it has the right behavior. The files produced by the script (underlying, onerule, tworules and threerules) should be helpful in testing it as you go. You can also use apply up and apply down to observe the behavior of the network. Here is a short summary of xfst syntax.

To examine a network, type:

print net

The network defined in k.xfst is too large to be usefully examined like this, but you might try some others:

read regex [a b c];

print net

read regex [a+ b c];

print net

read regex [e %+ -> 0 || _ [e|i] ];

print net

Turn in

k.xfst
this file as modified by you.
threerules
This file is output when the script k.xfst is run.

Problem 2. Text-to-Speech

Text-to-speech systems rely on large pronunciation dictionaries in combination with rules for dealing with unknown words. One large class of typically unknown words is names, including people's names. This assignment will focus on one particular aspect of predicting the pronunciation of names, namely stress assignment. In particular, there are 'name suffixes' (recurring forms at the end of many different names) which leave the stress assignment of the stem unchanged (stress-neutral name suffixes) and those which cause the stress to move (stress-changing suffixes). Stress assignment is important for figuring out pronunciation because a) phonological rules affecting the pronunciation of segments are sensitive to stress and b) lexical stress interacts with other factors to determine the prosody of an utterance.

Your tasks

Find 6 name stems (surname stems; perhaps by consulting a phone book), with the following properties:
1. A one-syllable name stem
2. A two-syllable name stem with stress on the first syllable
3. A two-syllable name stem with stress on the second syllable
4. A three-syllable name stem with stress on the first syllable
5. A three-syllable name stem with stress on the second syllable
6. A three-syllable name stem with stress on the third syllable
Stress notation: Indicate stress by placing a ' (single quote) before the vowel of the stressed syllable.
Find two name suffixes, one which is stress neutral and one which changes the stress on the stem. The stress-changing suffix should leave the stress somewhere on the stem, NOT on the suffix itself. If you're not a native English speaker, and you don't have intuitions about these stress patterns, find a native speaker and ask them.
Determine how the stress changes: write out all 12 names you can make by attaching one or the other suffix to your 6 names, and indicate in this list where the (primary) stress is in each one, using the stress notation given above. This list is what you will use to evaluate whether or not your system is working.
Write a regular expression in xfst which represents the possible underlying forms (stem only; stem + suffix, with the morpheme boundary and underlying stress indicated). You may find it useful to use the 'explode' operators { }. In xfst regular expressions, characters not separated by spaces are treated as multicharacter symbols. This is not what you want. Therefore, (part of) your regular expression will either have to look like this:
```
[ c a t | d o g ]
```
Or, using the 'explode' operators, like this:
```
[ {cat} | {dog} ]
```
Recall that % is the escape character, and that most punctuation marks have a special meaning in xfst and therefore need to be escaped. A short summary of xfst syntax can be found here.
Write a rule (or rules, if necessary) to relate the underlying forms to the surface forms. You are welcome to use the xfst rule notations for this assignment. In particular, note the following:
```
[ A -> B || C _ D ]
```
This is the xfst form for A is rewritten as B when it occurs between C and D (where C and D refer to the upper tape context). A, B, C and D are all regular expressions.
Rules that run in parallel are separated by ,, (two commas):
```
[ A -> B || C _ D ,, E -> F || G _ H ]
```
Rules of epenthesis (which insert something where nothing was before) are written like this:
```
[ [..] -> A || B _ C ]
```
This can be read as "nothing goes to A in the context B _ C". If you used 0 (epsilon) instead of [..] in this rule, it would try to insert an infinite number of As between B and C, because there are an infinite number of empty strings between B and C.
Write a rule which removes the morpheme boundary, so that it does not appear in the surface forms.
Combine the above into an xfst script similar to the one we used for problem 2. The script should create one FST composed out of the underlying forms regular expression and the rules.
Test your FST by loading it and then calling "print lower". This will print the strings of the lower-tape language. (You can use "print upper" to see the upper tape language.) You may find it convenient to put this command at the end of your script.
Describe (in prose) three phonological changes that you observe in your data which correlate with stress. (I.e., in what ways do the pronunciations of the name stems change when the stress is moved?) Note that these are things we haven't explicitly modeled in the FST.

Don't worry about

Modeling aspects of pronunciation other than stress in your xfst script. Represent the names with their orthographies (i.e., not phonetic representations).
Which name suffixes usually appear with which names. Pretend for the purposes of this assignment that any suffix can go with any name.
Secondary stress. Just keep track of the primary stress on each name.

Turn in

tts-writeup.txt
A file containing:
Six name stems, with stress indicated
Two name suffixes
Twelve stem+suffix names, with stress indicated
Your answer to the question about effects of stress
tts.xfst
Your xfst script containing the FST which accepts/generates all 18 names with the right (primary) stress assignment.

Back to main course page