# Lingusitics/CSE 472 # Autumn 2004 # Assignment 2 # Build a small FST for the verbal suffixes we're going to use, # and store it in the variable suffixes. Note that the suffixes # all begin with a morpheme boundary symbol, and one of the suffixes # is otherwise the empty string. define suffixes [ %+ e d | %+ i n g | %+ s | %+[]]; # Build a small FST for a list of words which will be our # roots, and store it in the variable verbs. read text verb_lexicon define verbs # Build an FST for underlying forms, i.e., all concatenations of # verbs and suffixes. define underlying [verbs suffixes]; # Print out the underlying forms to a file. read regex underlying; write text > underlying pop stack # Define one rule which maps underlying "e+e" to "e" and # underlying "e+i" to "i". Also maps "e+" followed by something # other than e or i to just "e". # In Rule1, the first disjunct says (effectively) "remove e and + if # the next thing is an e or an i." The second disjunct says say # "remove a + if it's between an e and something that's not an e or an # i". The final two let everything else through unchanged, where # everything else is strings where there's a + that is preceded by # something other than e, or, for completeness, strings with no +. # I've chosen to get rid of the + boundary symbols here because # of a rule ordering problem: I want the k rule to be later, but # this rule could produce c+[e|i] sequences if I didn't get rid # of the + sign. # Note that % is an escape character, so that %+ is the literal # plus sign. ? is a wildcard, matching any character. 0 is epsilon. # \[e|i] means a single character which is neither e nor i. () indicate # optionality. define Rule1 [ [ ?* e:0 %+:0 [e|i] ?*] | [ ?* e %+:0 (\[e|i])] | [ ?* \e %+ ?* ] | [ \[%+]* ] ]; # XFST provides additional regular expression syntax that allows # us to define the a very similar rule as follows: (It's not quite # the same because the || operator matches the upper tape context only, # whereas Rule1 above requires the relevant context on both upper and # lower tapes. The difference won't matter for our current purposes.) # define Rule1 [ e %+ -> 0 || _ [e|i] ,, %+ -> 0 || e _ \[e|i] ]; # This notation is intentionally very close to ordinary linguistic # rewrite rules. The point of this exercise is to understand how # that can actually be handled in terms of regular relations by # explicitly not using the easier notation. # Compose the underlying forms network with the Rule1 network. define onerule [ underlying .o. Rule1 ]; # Print the forms that are part of the lower language of onerule # to a file. read regex onerule.l; write text > onerule pop stack # Define another rule which maps "_c+ed" and "_c+ing" in the # underlying form to "_ck+ed" and "_ck+ing" just in case _ is # a vowel. Pass all other forms through unchanged. # This is the rule you need to modify for the assignment. In order # for the script to start in a working state, I've put in a # dummy definition of the rule here: in this initial version, the # rule is just the identity relation; all strings are mapped to themselves. define Rule2 ?*; # Compose the new rule with the network so far. define tworules [ onerule .o. Rule2 ]; # Print the forms that are part of the lower language of tworules # to a file. read regex tworules.l; write text > tworules pop stack # Remove any remaining morpheme boundaries. define Rule3 [ [ ?* %+:0 ?* ] | [ \[%+]* ] ]; define threerules [ tworules .o. Rule3 ]; # Print the forms that are part of the `surface' language of # the final composed transducer to a file. read regex threerules.l; write text > threerules pop stack # Leave the new transducer on the stack so it can # be played with interactively. read regex threerules;