xfst syntax (subset)
Commands
- source filename;
Reads in a script file.
- read regex regular-expression;
Compiles the regular expression into a network and places
it on the top of the stack. The regular expression may extend
over multiple lines. It must be terminated with a semi-colon.
- read text filename
Reads in a list of words, one per line, and compiles it
into the network that ecodes the language made up of those words.
- write text > filename
Writes the list of words recognized by the network on the
top of the stack to a text file.
- define variable-name regular-expression;
Stores a regular expression in a variable.
The regular expression may extend over multiple lines.
It must be terminated with a semi-colon.
- apply up string
Looks up the string using the top FST on the stack.
If the string is in the lower language, you'll get the
corresponding upper language strings back. If it's not
in the lower language, you'll just get another xfst prompt.
- apply down string
Analogously, but matching the string against the upper
side language.
- pop stack
Pops the top network off the stack and discards it.
- print net
Prints a description of the top network on the stack.
In these descriptions, s indicates a non-final state, fs a final state.
The start state is always numbered 0.
Regular expressions
- Square brackets delimit regular expressions.
- | indicates disjuction.
- A space indicates concatenation. The language [a b] consists
of the single string "ab", which itself consists of two symbols,
"a" and "b". The language [ab] also consists of the string "ab",
but this time that string is only one symbol long. (It just
happens to be a multi-character symbol.) Don't define multi-character
symbols. Use the space.
- * is Kleene star.
- + is Kleene plus.
- : separates lower and upper tape.
- % is the escape character.
- ? is the wild card.
- () indicate optionality.
- \ indicates negation: \e is any single character other than e.
- ~ indicates language complement: ~[a] is the language of all
strings other than the string "a".
- .#. is a word boundary.
- .o. is composition, it is used to compose one rule with another.
- ,, is used to separate rules that run in parallel (i.e., will run simultaneously on input if they apply).
Examples
These are two examples of linguistics-style rules (note, these are not permissible for Part 1).
- define unspecifiedN [ N -> m || _ p ];
- define partialEnglishS [ %+ -> e || s _ s ,, %+ s -> z || d _ ,, %+ -> 0 || t _ s ];
Back to assignment 2
Back to main course page