Ling/CSE 472 Assignment 1: Regular expressions

Due in two parts:

elizalike1.py: Due April 3rd, by 10:00 am
Complete assignment:Due April 6th, by 5:00pm

1. Elizalike

This part of the assignment asks you to create a program that behaves like Weizenbaum's ELIZA (see p.25-26 of the text). We have provided a skeleton of a script that handles input and output, and provides an example of the Python syntax for using regular expressions to modify strings.

Each student should develop their own program, although you are welcome to ask each other questions (in person, over email, or on the Canvas discussion area). You will need to find a partner for this project, as one of the tasks is to test each other's programs (see below).

Specifications: The basic approach is to read in a string of input from the user, modifying it successively (sometimes subtly, sometimes drastically, depending on the input string), and print out the result. To maintain the illusion of AI, it is crucial that elizalike print out grammatical strings. (You may assume that it is given grammatical input.) Furthermore, elizalike should be able to handle person deixis, referring to itself in the first person and to the user in the second person.

Before you start, look at the list of items to turn in below, so you know what you'll need to save.

Your tasks:

Turn in the following via Canvas. Submit these files, with these names:

sentences.txt Your list of test sentences
elizalike1.py The first version of your program (Please resubmit this file as part of your complete assignment.)
partner.txt The name of your partner and the problems you found with their program, or an explanation of how they avoided 5 pitfalls you thought up.
elizalike2.py The second version of your program
eliza_discussion.txt Your discussion of English and other language morphology and syntax --- see the last task above.

Note: We will be executing your code, so make sure it runs on Patas.

2. Tokenizer

This part of the assignment asks you to write a Python script that will take an ordinary text file, and return a file with the same content, reformatted to be one sentence per line.

Each student should develop their own program, although you are welcome to ask each other questions (in person, over email, or on the Canvas discussion board ).

Once again, we will supply a skeleton Python script which handles input and output (this time reading in a file and writing out to a file). We will also supply a test file that you will use to develop the script.

The basic algorithm is the following:

Specifications: Treat .?!: as sentence-ending punctuation. Quotation marks after a sentence-final element should be on the same line as that element. Don't worry if your script breaks a single quote that contains several sentences into different lines.

Your tasks:

Submit your answers via Canvas. Submit these files, with these names:

tokenizer1.py The first version of your script
misses.txt A brief description of the cases you didn't handle properly
tokenizer2.py The second version of your script

Again, make sure we can run your scripts on Patas.


Back to main course page