CS 261 Homework 5 - Random Sentence Generator (RSG)

Due Fri Apr 05 at 11:59pm extended to Tues Apr 09 at 11:59pm

Overview

For this assignment, you will create a program that reads a user-specified grammar and generates a user-specified number of strings ("sentences", but they may be longer than a single sentence!) from that grammar. You will also be creating at least one grammar of your own. This grammar can be for anything: extension requests, Star Trek episode summaries, haikus, political soundbites--be creative!

Beyond simply being entertaining, this program can be very helpful for generating data sets to test programs with. We only need to create a grammar to descibe the test data, and then can produce as much of it as we want: as a simple example, imagine using this program to create random movies to help test your Movie Library!

Objectives

Necessary Files

You will want a copy of the Hwk5.zip file which contains the README for this assignment. It also has an example grammar file (the cat-&-dog example from class) in the format we will be using. This can be helpful for testing your code.

A Few Notes on Grammars

A grammar is a collection of substitution rules that describe a set of strings or sentences. Each sentence is a sequence of symbols called terminals. Different kinds of sentence fragments are represented by nonterminal symbols called variables. A set of rules for each variable specify how it can be replaced by a sequence of variables and terminals. One of the variables is designated as the start variable, which means that it represents an entire sentence.

An example grammar (in BNF format) is below, to supplement the ones described in class. The variables are A and B, while the terminals are 0, 1, and #. The start variable is A.

<A> ::= 0<A>1<A> | <B>
<B> ::= #

This grammar says that the variable A can be replaced either with the sequence 0A1A or the variable B, while the variable B can only be replaced with #.

From a conceptual point of view, a grammar can be used to generate strings of terminals in the following manner. (Note that this isn't precisely how your program will work, but will have an equivalent effect)

For example, the string 00#1#1# can be generated by the grammar above. The following derivation---which begins with the start variable, with one substitution made to the leftmost variable at each step (appropriately called a leftmost derivation)---proves that it can be done; the new part of each step is shown underlined.

A => 0A1A => 00A1A1A => 00B1A1A => 00#1A1A => 00#1B1A => 00#1#1A => 00#1#1B => 00#1#1#

Since 00#1#1# can be generated by the grammar, we would say that the string 00#1#1# is in the language of the grammar. In other words, the language of a grammar is the set of all strings that can be generated from it. (Many grammars, including this one, have an infinite number of strings in their languages. This grammar generates an infinite number of strings since the first rule can be used an arbitrary number of times in a derivation.)

In your random sentence generator, a grammar will describe a set of "sentences" (which may indeed be infinite). Each sentence you generate will be a sequence of characters, with the characters being the terminals in the grammar. The variables in the grammar will describe sentence fragments, with the start variable describing an entire sentence.

Details

There are a number of pieces ot this program. Rather than walking you through the classes and methods to make, the following sections will give details about the individual components of your program. It is up to you to figure out the design of the code!

Input and Output

Reading Grammar Files

Data Structures for the Grammar

Generating Random Sentences

Creating the Output

Submitting

BEFORE YOU SUBMIT: make sure your code is fully documented and functional! If your code doesn't run, I can't give you credit!

Also make sure you that have filled out the README.txt! There are actually some questions of note here that you will need to answer.

Submit your program to the Hwk5 submission folder on hedwig, following the instructions detailed in Lab A. You should upload all you classes, as well as the README.txt.

The homework is due at midnight on Fri Apr 05.

Extensions

Grading

This assignment will be graded based on approximately the following criteria: