Ling/CSE 472: Assignment 1: Regular expressions

Due October 12th, by 1:30 pm

1. Elizalike

This part of the assignment asks you to create a program that behaves like Weizenbaum's ELIZA (see p.32-33 of the text). We have provided a skeleton of a script that handles input and output, and provides an example of the Perl syntax for using regular expressions to modify strings.

Each student should develop their own program, although you are welcome to ask each other questions (in person, over email, or on the EPost bulletin board). You will need to find a partner for this project, as one of the tasks is to test each other's programs (see below).

Specifications: The basic strategy is to read in a string of input from the user, modifying it successively (sometimes subtly, sometimes drastically, depending on the input string), and print out the result. To maintain the illusion of AI, it is crucial that elizalike print out grammatical strings. (You may assume that it is given grammatical input.) Furthermore, elizalike should be able to handle person deixis, referring to itself in the first person and to the user in the second person.

Before you start, look at the list of items to turn in below, so you know what you'll need to save.

Your tasks:

To turn in:

All of the above should be turned in via ESubmit by 1:30pm on Tuesday, October 12.

2. Tokenizer

This part of the assignment asks you to write a perl script that will take an ordinary text file, and return a file with the same content, reformatted to be one sentence per line.

Each student should develop their own program, although you are welcome to ask each other questions (in person, over email, or on the EPost bulletin board).

Once again, we will supply a skeleton perl script which handles input and output (this time reading in a file and writing out to a file). We will also supply a test file that you will use to develop the script.

The basic algorithm is the following:

Specifications: Treat .?!: as sentence-ending punctuation. Quotation marks after a sentence-final element should be on the same line as that element. Don't worry about breaking a single quote over different lines.

Your tasks:

To turn in:

All of the above should be turned in via ESubmit by 1:30 pm on October 12, 2004


Back to main course page