CS 261 Lab G - Anagram Solver
Due Wed Mar 13 at 9:00am
Overview
In this lab, you will practice using recursive backtracking to write a program that finds anagrams of provided phrases. Your program will look through a dictionary of words and find all combinations of words that contain the same letters as the given phrase. You will also report on the time required by your algorithm (and experiment with some possible optimizations!)
An example output of your progam should look like the following:
Welcome to the anagram finder! First, enter the name of a dictionary file to use: words4000.txt Enter phrase to anagram-ify (return to quit): Hello World! Max words to include (0 for no max!): 0 [hell, old, row] [hell, row, old] [hello, world] [hold, or, well] [hold, roll, we] [hold, we, roll] [hold, well, or] [how, led, roll] [how, roll, led] [led, how, roll] [led, roll, how] [led, roll, who] [led, who, roll] [old, hell, row] [old, row, hell] [or, hold, well] [or, well, hold] [roll, hold, we] [roll, how, led] [roll, led, how] [roll, led, who] [roll, we, hold] [roll, who, led] [row, hell, old] [row, old, hell] [we, hold, roll] [we, roll, hold] [well, hold, or] [well, or, hold] [who, led, roll] [who, roll, led] [world, hello] search completed in 0.464 seconds
You will be required to use recursive backtracking to complete this lab. Note that your solution will be very similar to the solution to the N-Queens problem; I recommend you review that code for this lab!
This lab will be completed in pairs. Review the pair programming guidelines!
Objectives
- To practice writing a recursive backtracker
- To review using existing classes based on their documentation.
- To experiment with optimzation techniques.
Necessary Files
You will need to download the copy of the AnagramLab.zip file. This file contains two classes you will need to import into Eclipse:
-
LetterSet
is a class that represents a set (a count) of letters in a phrase. The documentation for this class can be found at the bottom of the page. A LetterSet object has a number of helpful methods that you can use to figure out whether you can make one word out of a collection of letters (hint: look carefully at thesub(LetterSet)
method). -
AnagramFinderTester
contains a main method with some logic to run your anagram finding program. You will likely want to write your own tester to easy in development (e.g., one that doesn't prompt for the dictionary to use!)
You should NOT modify these classes in any way---we will be testing your code using them as written, so any modifications will not be respected.
In addition, the zip file contains a number of different dictionaries files: some that we have used before, and a couple new ones.
Details
-
For this lab, you will be making a single class:
AnagramFinder.java
. This class should have the following two methods:-
public AnagramFinder(List<String> dictionary)
This constructor should take the given dictionary (a list of words) and use that dictionary in finding anagrams. You are not allowed to modify the list.
-
public void print(String phrase, int max)
This method prints to System.out all of the combinations of words from the finder's dictionary that are anagraphs of the given
phrase
and include at mostmax
words (or unlimited words if max is 0). You should throw anIllegalArgumentException
if max is negative.
-
- Your task is to complete the AnagramFinder class so that these two methods are supposed. You are welcome to use as many or as few helper methods as you like (I suggest at least once: the recursive backtracking method!)
-
Your recursive backtracking method will work very similar to the method used in the N-Queens problem (and that you are using for your Sudoku game). Your program should build up a solution one word at a time. On each recursive call, you should search the dictionary from beginning to end and check if each word is a match for the current set of letters.
- You are welcome to use iteration to check through the possible words in the dictionary, though your backtracking should be done recursively.
- Don't make this harder than it needs to be: you are performing an exhaustive search. It is okay. nay required for your output to include multiple permutations of the same words (e.g., [old, row, hell] and [hell, old, row] in the example above). Similarly, you don’t need any special cases for words that have already been used. If someone asks you for the anagrams of “foo foo foo”, you should include [foo, foo, foo].
- Look at the LetterSet class; it will help you figuring out if a word is a match for the current set of letters!
- Hint: printing out an ArrayList<String> will produce out output of the form [string1, string2, ...]
- In addition, note that you are supposed to print out all of the anagrams for a phrase; unlike with the N-Queens problem, we don't want to stop when we've found a single solution. You might rethink your method signature: what do we need to return?
- Remember to stop when you've hit the max number of words, unless the max is 0 in which case you stop when you've exhausted the list.
- Your print should also display the amount of time your program took to find the anagrams. Remember that you can use
System.currentTimeMillis()
to get the current time in milliseconds. -
Recursive backtracking is, in general, highly inefficient because it is a brute force technique that checks every possibility, but there are still things you can do to make sure that your solution is as efficient as it can be. You can be careful not to compute something twice if you don’t need to. You can not continue to explore branches that you know will never be printed. To this end, once you have finished your recursive backtracker, you should consider and explore possible ways to optimize the efficiency of your program. For example:
- One common way of optimizing algorithms is to precompute values or data that we may need later. You might compute LetterSets for all of your dictionary words in advance.
- Similarly, for any given phrase you can reduce the dictionary to to a smaller dictionary of “relevant” words. A word is relevant if it can be constructed from the letters of a given phrase. Only a fraction of the dictionary will be relevant to any given phrase, so reducing the dictionary before you begin the recursion will allow you to speed up the searches that happen on each recursive invocation. You can do this once before the recursion begins, or once during each recursive call--the later will involve some overhead for each call, but the reduction in search space may outweight the increased cost of the computation. Try both options and see which is faster!
- You might also be able to "trim" down your solution space by ordering your dictionary words by size; this way if you get to a word that is longer than the number of letters you have left, you do not have to keep searching down that branch.
- Printing is often the slowest part of an algorithm; can you put off printing until the search is completed? Does that help the perceived speed (e.g., how fast things seem to be running from the user's perspective)?
- There are lots of other options as well--be creative! How might you reduce the number of operations per search, or the number of searchs to make (or both?)
- You must include at least one such optimization. Remember: no matter what optimizations you make, your code must continue to provide all functionality.
- In the class comment of your
AnagramFinder
class, be sure to describe any optimizations you made and their relative effects on the speed. What seemed to provide the most "bang for the buck?" - Also be sure to include documentation of your methods. In particular, we're looking for inline documentation comments to explain what your code is doing (especially around optimizations), as well as Javadoc comments for your recursive methods.
Submitting
Submit your AnagramFinder class to the LabG submission folder on hedwig. Make sure you upload your work to the correct folder! The lab is due at the start of class on the day after the lab.
Remember to fill out your lab partner evaluation!
Grading
This assignment will be graded based on approximately the following criteria:
- You have implemented an AnagramFinder that works with the provided tester [15%]
- Your AnagramFinder prints all all anagrams for a given phrase [30%]
- Your AnagramFinder respects the provided maximum number of words [15%]
- Your AnagramFinder reports the amount of time spent searching for anagrams [5%]
- Your AnagramFinder includes at least one documentated optimization [15%]
- Your code is well documented and follows the edicts of good style [15%]
- You completed your lab partner evaluation [5%]
Documentation
For your convenience, below is the documentation of the LetterSet class.
Class LetterSet
java.lang.Object LetterSet
public class LetterSet extends java.lang.Object
An inventory of letters, without order. Inventories contain the count of each letter. Based on program by Stuart Reges
- Version:
- Sp2013
- Author:
- Joel Ross
Constructor Summary | |
---|---|
LetterSet()
Constructs a new empty LetterSet object (all letters have count of 0) |
|
LetterSet(String data)
Constructs a new LetterSet object that represents the alphabetic letters in the given string, ignoring the case of letters and ignoring any non-alphabetic characters. |
Method Summary | |
---|---|
LetterSet |
add(LetterSet other)
Returns a new LetterSet object that represents the "sum" (the union) of this and the given LetterSet. |
int |
getCount(char c)
Returns a count of how many of this letter are in the set, ignoring case. |
boolean |
isEmpty()
Returns whether or not the set is empty (has size 0) |
void |
setCount(char c,
int num)
Sets the count of how many times this letter appears in the set, ignoring case. |
int |
size()
Gives the "size" (the number of total characters) in the set. |
LetterSet |
sub(LetterSet other)
Returns a new LetterSet object that represents the "difference" of this and the given LetterSet. |
String |
toString()
Returns a String represention of the set. |
Methods inherited from class java.lang.Object |
---|
clone, equals, finalize, getClass, hashCode, notify, notifyAll, wait, wait, wait |
Constructor Detail |
---|
LetterSet
public LetterSet()
- Constructs a new empty LetterSet object (all letters have count of 0)
LetterSet
public LetterSet(String data)
- Constructs a new LetterSet object that represents the alphabetic letters in the given string,
ignoring the case of letters and ignoring any non-alphabetic characters.
- Parameters:
-
data
- The String used to populate the LetterSet
Method Detail |
---|
add
public LetterSet add(LetterSet other)
- Returns a new LetterSet object that represents the "sum" (the union) of this and the given LetterSet.
The new LetterSet has the sum of the counts of each individual letter
- Parameters:
-
other
- the LetterSet to add - Returns:
- a new LetterSet with the sum of the letter counts
getCount
public int getCount(char c)
- Returns a count of how many of this letter are in the set, ignoring case.
- Parameters:
-
c
- The character to get the count of - Returns:
- The number of times the character appears in the set
isEmpty
public boolean isEmpty()
- Returns whether or not the set is empty (has size 0)
- Returns:
- Whether the set is empty.
setCount
public void setCount(char c, int num)
- Sets the count of how many times this letter appears in the set, ignoring case.
- Parameters:
-
c
- The character to set the count ofnum
- The number of times the character should appear in the set
size
public int size()
- Gives the "size" (the number of total characters) in the set.
- Returns:
- The size of the set
sub
public LetterSet sub(LetterSet other)
- Returns a new LetterSet object that represents the "difference" of this and the given LetterSet.
The new LetterSet has letter counts that are the current counts minus the couts of the other object.
If any of the letter counts would be negative (so the other cannot be subtracted), this method returns null
- Parameters:
-
other
- the LetterSet to subtract - Returns:
- A new LetterSet with the subtracted letter counts, or null if could not be subtracted
toString
public String toString()
- Returns a String represention of the set. For example, an inventory of 4 a's, 1 b, 1 l and 1 m would be represented as "[aaaablm]".
- Overrides:
-
toString
in classjava.lang.Object