Practice with regular expressions
This is not a graded assignment but rather
optional preparation for assignment 1. Please feel free
to ask each other, David, or Emily for assistance with
any aspect of this assignment, including how to get
your computer set up to do it.
In this exercise, you will use grep (generalized regular
expression printer) to get practice with writing regular
expressions in such a way that the computer will give you
feedback on how you're doing. You can find grep on dante
(which is a unix system), or, on a mac with OSX, by using
the terminal program.
Note: In the following, I've used "\n" to indicate
that you should type return.
Getting started, on Dante:
- Log on to Dante using ssh from one of the PCs in the
lab (LLC 112), or elsewhere. On the PCs in the lab, ssh
assumes you want to use it for ftp, which you don't in this
case. Instead, click the button called "quick connection"
(EB/DG: double check this).
- At the first screen once you're on Dante, type S. This
puts you into a Unix shell.
- Type "which grep\n". It should respond with something like
"/bin/grep". This means that there is a program called grep
on the system, and the computer knows where to find it.
- Grep takes two arguments, a regular expression to search
for and a file to search in. To do this exercise, you'll
need to make a file to do the searching, with this
content. To make the file, using emacs:
- Type "emacs grepfile\n". This causes Dante to start the program
emacs (a very powerful text editor), and create a file called "grepfile".
- Cut and paste the text found by following this link:
grepfile into the emacs window.
- Save the file by typing ctrl-x s.
- Quit emacs by typing ctrl-x ctrl-c. Your Dante window should
now be back to the state where it gives you a prompt like "dante05%".
- Type "ls\n" to verify that you have created the file, and "cat grepfile\n"
to verify its contents.
- Continue with the exercises below.
Getting started, on a Mac
- Run the terminal program.
- Type "which grep\n". It should respond with something like
"/usr/bin/grep". This means that there is a program called grep
on the system, and the computer knows where to find it.
- Change directories within the terminal program to your
documents directory by typing: "cd ~/Documents".
- Grep takes two arguments, a regular expression to search
for and a file to search in. To do this exercise, you'll
need to make a file to do the searching, with this
content. You can make this file with textedit or emacs (see
the instructions under 'getting started, on Dante' above). You can also use
MS Word, but in that case be sure to save your file as text only.
Be sure to save the file in the Documents directory in your home
directory.
- Continue with the exercises below.
Exercises
- At your unix prompt, type "grep 'dog'\n". Observe that nothing
happens. This is because grep is waiting for the other half of
its input. Type ctrl-c. Observe that you get your prompt back.
- At your unix prompt, type "grep 'dog' grepfile\n". Observe
what happens.
- Before proceeding, some notes on grep and regular expressions:
- It's a good idea to enclose your regular expression in single
quotes ('), because otherwise the shell will try to interpret some
of the special characters, notably *.
- Some of the regular expression syntax is different for grep
than for perl. Refer to the table on page 831 of the textbook.
Note that grep doesn't have + or ? as part of its regular expression
syntax.
- Now work out a regular expression to use that will make grep
return only the line containing just the word dog. Test it by
typing "grep 'xxx' grepfile\n', with your regular expression instead
of xxx.
- Now work out a regular expression that will make grep return
only the lines "dog" and "doog".
- Work out a different regular expression that will make grep
return only the lines "dog" and "doog".
- Work out a regular expression that will make grep return
only the line "dog dog".
- Work out another one.
- And another one. (If you haven't used \( and \) yet, try.)
- Do the exercise 2.1, parts a-f, on paper. Use either Perl
or grep regular expression syntax.
- Consider these sample answers to
the exercise. How are they different from your answers?
- Look at this version of the sample answers,
where I've given prose descriptions of the regular expressions. Make
sure you understand how the prose corresponds to the regular expressions.
- Using emacs (or, on a mac, your favorite text editor) as described
above, make a file to test the answers to 2.1. Be sure to include
strings that should match and similar strings that shouldn't.
- Using grep and that file, test the regular expressions. Be
sure to use the ones written in grep syntax.
- Consider testing any
regular expressions that you wrote differently from the sample answers
to see how they behave differently.
- You may need to add beginning and end of line anchors (^ and $) to
make sure you're not just matching substrings. (This is again the
difference between regular expressions as definitions of regular languages,
and regular expressions used in search.)