CS 261 Lab G - Anagram Solver

Due Wed Mar 13 at 9:00am

Overview

In this lab, you will practice using recursive backtracking to write a program that finds anagrams of provided phrases. Your program will look through a dictionary of words and find all combinations of words that contain the same letters as the given phrase. You will also report on the time required by your algorithm (and experiment with some possible optimizations!)

An example output of your progam should look like the following:

Welcome to the anagram finder!
First, enter the name of a dictionary file to use: words4000.txt

Enter phrase to anagram-ify (return to quit): Hello World!
Max words to include (0 for no max!): 0
[hell, old, row]
[hell, row, old]
[hello, world]
[hold, or, well]
[hold, roll, we]
[hold, we, roll]
[hold, well, or]
[how, led, roll]
[how, roll, led]
[led, how, roll]
[led, roll, how]
[led, roll, who]
[led, who, roll]
[old, hell, row]
[old, row, hell]
[or, hold, well]
[or, well, hold]
[roll, hold, we]
[roll, how, led]
[roll, led, how]
[roll, led, who]
[roll, we, hold]
[roll, who, led]
[row, hell, old]
[row, old, hell]
[we, hold, roll]
[we, roll, hold]
[well, hold, or]
[well, or, hold]
[who, led, roll]
[who, roll, led]
[world, hello]
  search completed in 0.464 seconds

You will be required to use recursive backtracking to complete this lab. Note that your solution will be very similar to the solution to the N-Queens problem; I recommend you review that code for this lab!

This lab will be completed in pairs. Review the pair programming guidelines!

Objectives

To practice writing a recursive backtracker
To review using existing classes based on their documentation.
To experiment with optimzation techniques.

Necessary Files

You will need to download the copy of the AnagramLab.zip file. This file contains two classes you will need to import into Eclipse:

LetterSet is a class that represents a set (a count) of letters in a phrase. The documentation for this class can be found at the bottom of the page. A LetterSet object has a number of helpful methods that you can use to figure out whether you can make one word out of a collection of letters (hint: look carefully at the sub(LetterSet) method).
AnagramFinderTester contains a main method with some logic to run your anagram finding program. You will likely want to write your own tester to easy in development (e.g., one that doesn't prompt for the dictionary to use!)

You should NOT modify these classes in any way---we will be testing your code using them as written, so any modifications will not be respected.

In addition, the zip file contains a number of different dictionaries files: some that we have used before, and a couple new ones.

Details

For this lab, you will be making a single class: AnagramFinder.java. This class should have the following two methods:
- public AnagramFinder(List<String> dictionary)
  This constructor should take the given dictionary (a list of words) and use that dictionary in finding anagrams. You are not allowed to modify the list.
- public void print(String phrase, int max)
  This method prints to System.out all of the combinations of words from the finder's dictionary that are anagraphs of the given phrase and include at most max words (or unlimited words if max is 0). You should throw an IllegalArgumentException if max is negative.
Your task is to complete the AnagramFinder class so that these two methods are supposed. You are welcome to use as many or as few helper methods as you like (I suggest at least once: the recursive backtracking method!)
Your recursive backtracking method will work very similar to the method used in the N-Queens problem (and that you are using for your Sudoku game). Your program should build up a solution one word at a time. On each recursive call, you should search the dictionary from beginning to end and check if each word is a match for the current set of letters.
- You are welcome to use iteration to check through the possible words in the dictionary, though your backtracking should be done recursively.
- Don't make this harder than it needs to be: you are performing an exhaustive search. It is okay. nay required for your output to include multiple permutations of the same words (e.g., [old, row, hell] and [hell, old, row] in the example above). Similarly, you don’t need any special cases for words that have already been used. If someone asks you for the anagrams of “foo foo foo”, you should include [foo, foo, foo].
- Look at the LetterSet class; it will help you figuring out if a word is a match for the current set of letters!
- Hint: printing out an ArrayList<String> will produce out output of the form [string1, string2, ...]
In addition, note that you are supposed to print out all of the anagrams for a phrase; unlike with the N-Queens problem, we don't want to stop when we've found a single solution. You might rethink your method signature: what do we need to return?
Remember to stop when you've hit the max number of words, unless the max is 0 in which case you stop when you've exhausted the list.
Your print should also display the amount of time your program took to find the anagrams. Remember that you can use System.currentTimeMillis() to get the current time in milliseconds.
Recursive backtracking is, in general, highly inefficient because it is a brute force technique that checks every possibility, but there are still things you can do to make sure that your solution is as efficient as it can be. You can be careful not to compute something twice if you don’t need to. You can not continue to explore branches that you know will never be printed. To this end, once you have finished your recursive backtracker, you should consider and explore possible ways to optimize the efficiency of your program. For example:
- One common way of optimizing algorithms is to precompute values or data that we may need later. You might compute LetterSets for all of your dictionary words in advance.
- Similarly, for any given phrase you can reduce the dictionary to to a smaller dictionary of “relevant” words. A word is relevant if it can be constructed from the letters of a given phrase. Only a fraction of the dictionary will be relevant to any given phrase, so reducing the dictionary before you begin the recursion will allow you to speed up the searches that happen on each recursive invocation. You can do this once before the recursion begins, or once during each recursive call--the later will involve some overhead for each call, but the reduction in search space may outweight the increased cost of the computation. Try both options and see which is faster!
- You might also be able to "trim" down your solution space by ordering your dictionary words by size; this way if you get to a word that is longer than the number of letters you have left, you do not have to keep searching down that branch.
- Printing is often the slowest part of an algorithm; can you put off printing until the search is completed? Does that help the perceived speed (e.g., how fast things seem to be running from the user's perspective)?
- There are lots of other options as well--be creative! How might you reduce the number of operations per search, or the number of searchs to make (or both?)
You must include at least one such optimization. Remember: no matter what optimizations you make, your code must continue to provide all functionality.
In the class comment of your AnagramFinder class, be sure to describe any optimizations you made and their relative effects on the speed. What seemed to provide the most "bang for the buck?"
Also be sure to include documentation of your methods. In particular, we're looking for inline documentation comments to explain what your code is doing (especially around optimizations), as well as Javadoc comments for your recursive methods.

Submitting

Submit your AnagramFinder class to the LabG submission folder on hedwig. Make sure you upload your work to the correct folder! The lab is due at the start of class on the day after the lab.

Remember to fill out your lab partner evaluation!

Grading

This assignment will be graded based on approximately the following criteria:

You have implemented an AnagramFinder that works with the provided tester [15%]
Your AnagramFinder prints all all anagrams for a given phrase [30%]
Your AnagramFinder respects the provided maximum number of words [15%]
Your AnagramFinder reports the amount of time spent searching for anagrams [5%]
Your AnagramFinder includes at least one documentated optimization [15%]
Your code is well documented and follows the edicts of good style [15%]
You completed your lab partner evaluation [5%]

Documentation

For your convenience, below is the documentation of the LetterSet class.

Class LetterSet

java.lang.Object
  LetterSet

public class LetterSet extends java.lang.Object

An inventory of letters, without order. Inventories contain the count of each letter. Based on program by Stuart Reges

Version:: Sp2013
Author:: Joel Ross

Constructor Summary
`LetterSet()` Constructs a new empty LetterSet object (all letters have count of 0)
`LetterSet(String data)` Constructs a new LetterSet object that represents the alphabetic letters in the given string, ignoring the case of letters and ignoring any non-alphabetic characters.

Method Summary
`LetterSet`	`add(LetterSet other)` Returns a new LetterSet object that represents the "sum" (the union) of this and the given LetterSet.
`int`	`getCount(char c)` Returns a count of how many of this letter are in the set, ignoring case.
`boolean`	`isEmpty()` Returns whether or not the set is empty (has size 0)
`void`	`setCount(char c, int num)` Sets the count of how many times this letter appears in the set, ignoring case.
`int`	`size()` Gives the "size" (the number of total characters) in the set.
`LetterSet`	`sub(LetterSet other)` Returns a new LetterSet object that represents the "difference" of this and the given LetterSet.
`String`	`toString()` Returns a String represention of the set.

Methods inherited from class java.lang.Object
`clone, equals, finalize, getClass, hashCode, notify, notifyAll, wait, wait, wait`

Constructor Detail

LetterSet

public LetterSet()

Constructs a new empty LetterSet object (all letters have count of 0)

LetterSet

public LetterSet(String data)

Constructs a new LetterSet object that represents the alphabetic letters in the given string, ignoring the case of letters and ignoring any non-alphabetic characters.

Parameters:: data - The String used to populate the LetterSet

Method Detail

add

public LetterSet add(LetterSet other)

Returns a new LetterSet object that represents the "sum" (the union) of this and the given LetterSet. The new LetterSet has the sum of the counts of each individual letter

Parameters:: other - the LetterSet to add
Returns:: a new LetterSet with the sum of the letter counts

getCount

public int getCount(char c)

Returns a count of how many of this letter are in the set, ignoring case.

Parameters:: c - The character to get the count of
Returns:: The number of times the character appears in the set

isEmpty

public boolean isEmpty()

Returns whether or not the set is empty (has size 0)

Returns:: Whether the set is empty.

setCount

public void setCount(char c,
                     int num)

Sets the count of how many times this letter appears in the set, ignoring case.

Parameters:: c - The character to set the count of; num - The number of times the character should appear in the set

size

public int size()

Gives the "size" (the number of total characters) in the set.

Returns:: The size of the set

sub

public LetterSet sub(LetterSet other)

Returns a new LetterSet object that represents the "difference" of this and the given LetterSet. The new LetterSet has letter counts that are the current counts minus the couts of the other object. If any of the letter counts would be negative (so the other cannot be subtracted), this method returns null

Parameters:: other - the LetterSet to subtract
Returns:: A new LetterSet with the subtracted letter counts, or null if could not be subtracted

toString

public String toString()

Returns a String represention of the set. For example, an inventory of 4 a's, 1 b, 1 l and 1 m would be represented as "[aaaablm]".

Overrides:: toString in class java.lang.Object