Ling/CSE 472 Assignment 1:
Regular expressions
Preface: Two options for programming assignments
We are experimenting with Jupyter Notebooks as
an optional way to complete programming
assignments this quarter. If you choose this option,
your write ups for the programming portion will be
included in your Jupyter notebook.
If you don't want to use Jupyter, you can use python directly.
In this case, should turn in a python program (in this assignment,
in two different versions) and a
separate pdf file with your write up.
In either
case, some of the writing portion of the assignment
will be separate. (See the turn in
instructions below).
Elizalike
This assignment asks you to create a program that behaves like
Weizenbaum's ELIZA (see Ch 2, p.9 of J&M).
We have provided a skeleton of a
script that handles input and output, and provides an example of the Python
syntax for using regular expressions to modify strings.
Each student should develop their own program, although you are welcome to ask
each other questions (in person, over email, or on the Canvas discussion area).
You will need to find a partner for this project, as one of the tasks is to
test each other's programs (see below).
Specifications:
The basic approach is to read in a string of input from the user, modifying it
successively (sometimes subtly, sometimes drastically, depending on the input
string), and print out the result. To maintain the illusion of AI, it is
crucial that elizalike print out grammatical strings. (You may assume that it
is given grammatical input.) Furthermore, elizalike should be able to handle
person deixis, referring to itself in the first person and to the user in the
second person.
Before you start, look at the list of items to turn in
below, so you know what you'll need to save.
Your tasks:
- Develop a list of sentences that you will use to test your program to make sure
it handles the person deixis correctly.
- This list must illustrate all ways in which
1st and 2nd person are marked in English. To be clear, this means that you must include
all 1st and 2nd person pronouns, all 1st and 2nd person subject-verb pairs with the
verb be and all possible forms of the verb.
- Optionally, properly capitalize and punctuate
your test sentences. You can also just output all lower- or all uppercase. (Do not output randomly capitalized things though).
-
The thoroughness of the coverage of these sentences will be a significant part
of the grade for this assignment.
- You need to add these sentences, the expected
responses, and a comment indicating what is being tested, to the
file sentences.txt. Carefully follow the format
described in the file.
- Modify the elizalike script to implement the handling of
person deixis.
- Script source:
- The basic strategy is to first replace any second person
reference in the input with some string that's unlikely to show up otherwise.
(The sample expression in the script we've given you
replaces it with third-person reference to Eliza). Then replace
any first person reference in the input with second person reference. Finally,
replace your otherwise unlikely string (from the first step) with first person
reference. Each of these steps will take several lines as you handle pronouns and
verbs and upper and lower case letters (i.e., if the user types "My friend..."
Elizalike's output should be "Your friend..." and not "your friend...").
- Be sure to read all of the comments in the file (lines starting with #, which
are for human consumption and ignored by Python). You should probably test each
line as you add it, by running the program again and using an appropriate
sentence from your test file. Note that before you make any changes, the
program runs, just in a boring way: It repeats whatever the user types in,
except that it changes all occurrences of "you are" to "---Eliza-is---".
-
Add at least two statements that find one keyword in the input and change the
whole string to something different. (See the third and fourth examples at the bottom of page
9 of JM Ch2 for a model, but don't copy them exactly!)
-
Add at least two statements that find some keyword in the input, and return a
significantly changed output that noneless contains some part of the input that
may vary from time to time. (See the first and second examples on page 9, but
feel free to get fancier than that!)
- This is the first version of your program. It should be included in
your final Jupyter notebook (Jupyter users) or turned in as elizalike1.py (others).
-
Find a partner and exchange programs. Looking at the code for your partner's
program, try to find at least 2 interestingly different inputs that cause their
program to produce ungrammatical output. (Keep your inputs grammatical!) We're
pretty sure you'll be able to find these, but if your partner's program is too
perfect, you can get full credit for this part of the assignment by turning in
an explanation of 5 pitfalls you looked for and how they were avoided.
-
Modify your program to avoid the ungrammatical outputs your partner found (if
any). Document and explain the changes you made at the end of your write-up file.
- Write up Part I: In ~5 paragraphs, discuss:
- Why English morphology and syntax make this program
relatively straightforward, and how it would be more complicated in some other
specific language. What did you learn about English morphology and syntax/what
knowlege of English morphology and syntax did you apply in this project?
- What did you learn about regular expressions? What are the pros
and cons of using regular expressions to model language? What are the
pros and cons of using regular expressions to process language?
- Write up Part II: In ~2 paragraphs, answer:
- Describe of a use case for chatbots in the real world (this
could be a real instance you've already encountered or something
you make up).
- What are the beneficial aspects of deploying chatbots in this way, and who
receives the benefits of such use?
- Imagine that a person is fooled by the chatbot into thinking they're interacting
with a real person rather than a computer. What might happen in that case?
Turn in instructions
Turn in the following via Canvas. Submit these files, with these names:
File | Contents | Jupyter notes |
sentences.txt |
Your list of test sentences |
Included as a markdown cell in Jupyter notebook for Jupyter users |
elizalike1.py |
The first version of your program |
Included as a separate single code cell in Jupyter notebook for Jupyter users |
elizalike2.py |
The second version of your program. |
Included as a separate single code cell in Jupyter notebook for Jupyter users |
eliza_discussion.pdf |
Your discussion of English and other language morphology and syntax --- see the
last task above. |
Included as markdown cells, one or more, formatted as proper write up, in Jupyter notebook for Jupyter users |
partner.pdf |
1. The name of your partner and the problems you found with their program, or an
explanation of how they avoided 5 pitfalls you thought up.
2. What you changed in your own program to address the issues your partner brought up. |
Included as markdown cells, one or more, formatted as proper write up, in Jupyter notebook for Jupyter users |
chatbots.pdf |
Your answers to Write up Part II |
Separate file for all students |
Note: We will be executing your code, so make sure it runs on Patas.