CS 261 Lab G - Anagram Finder
Due Wed Mar 12 at 9:00am
Overview
In this lab, you will practice using recursive backtracking to write a program that finds anagrams of a provided phrase. These anagrams can be combinations of words--in effect, the words of a sentence that would be an anagram of the given sentence. Your program will look through a dictionary of words and find all combinations of words that contain the same letters as the given phrase.
An example output of your progam should look like the following:
Welcome to the anagram finder! First, enter the name of a dictionary file to use: words4000.txt Enter phrase to anagram-ify (return to quit): Hello World! Max words to include (0 for no max!): 0 [hell, old, row] [hell, row, old] [hello, world] [hold, or, well] [hold, roll, we] [hold, we, roll] [hold, well, or] [how, led, roll] [how, roll, led] [led, how, roll] [led, roll, how] [led, roll, who] [led, who, roll] [old, hell, row] [old, row, hell] [or, hold, well] [or, well, hold] [roll, hold, we] [roll, how, led] [roll, led, how] [roll, led, who] [roll, we, hold] [roll, who, led] [row, hell, old] [row, old, hell] [we, hold, roll] [we, roll, hold] [well, hold, or] [well, or, hold] [who, led, roll] [who, roll, led] [world, hello] Anagrams generated in 0.464 seconds
This phrase ("Hello World") happened to produce a lot of anagrams that are three words long; longer phrases or a bigger dictionary will produce more variety. However, you can use this output as a sample for testing.
You will be required to use recursive backtracking to complete this lab. Note that your solution will be very similar to the solution to the n-Queens problem that we discussed in class; I recommend you review that code (available on Moodle) for this lab!
This lab will be completed in pairs. Remember to review the pair programming guidelines before you get started!
Objectives
- To practice writing a recursive backtracker
- To practice revising and optimizing working code
Necessary Files
You will need to download the copy of the AnagramLab.zip file. This file contains a number of classes and files you will need to import into Eclipse:
-
A number of different dictionaries files, of different sizes. (Note that the
words127000.txt
will take a very long time to produce anagrams--it is included for fun rather than for practical use). I recommend you test usingsimple_test.txt
andwords4000.txt
; both of which should produce output similar to the above. -
AnagramFinderTester
a class that contains a main method with some logic to run your anagram finding program. You may also want to write your own tester to ease the development cycle (e.g., a tester that doesn't prompt for the dictionary to use, so you can quickly test your work). You should import this into Eclipse, as normal. -
LetterSet
is a class you wrote for homework! This lab will give you a chance to make use of a class you wrote before; the concept of a Letter Set will make it easy to find anagrams of words.- You are welcome to use your implementation of the homework in this lab (just grab your code off hedwig). However, you'll need to be sure that your code works flawlessly; bugs in your class can cause further bugs in your anagram finder.
-
Alternatively, I have provided the class file for a working copy of the assignment. This is not the source code, but the compiled machine code. You can load it into Eclipse:
- Import the
LetterSet.class
file, as if it were a normal file (such as how you have imported txt files in the past). Put this file in alib
folder (stands for "library") in the top level directory of your project. - You need to tell Eclipse where to find these class files. Right-click your Lab G project project and select Properties. Select Java Build Path and then select the Libraries tab. On the right-hand side, select Add Class Folder, and then select your
lib
folder that contains theLetterSet.class
file. Click OK, and your Properties window should look something like this: - Hit OK on the Properties window to save your changes. You should now have an additional Referenced Libraries option in your Package Explorer that has the lib folder inside it
Fun Fact: this is also how you load external .jar files, such as if you want to use a third-party library.
- Import the
Assignment Details
For this lab, you will be making a single class: AnagramFinder.java
. In order to work with the provided tester, your class will need to have the following public interface:
-
public AnagramFinder(List<String> dictionary)
This constructor should take the given dictionary (a list of words--not an
ArrayList
, but aList
, utilizing polymorphism!) and use that dictionary in finding anagrams. You are not allowed to modify the provided list. -
public void findAnagrams(String phrase, int max)
This method prints to System.out all of the combinations of words from the finder's dictionary that are anagrams of the given
phrase
and include at mostmax
words (or unlimited words if max is 0). This method should throw anIllegalArgumentException
if max is negative. The method also reports the wall clock time spent generating anagrams.
And that's it! Your task is to complete the AnagramFinder class so that these two methods are supported. You are welcome to use as many or as few other helper methods as you like (I suggest at least one: the recursive backtracking method!)
Backtracking
Your recursive backtracking method will work very similar to the method used in the n-Queens problem (and that you are using for your Sudoku game).
- To find anagrams, the "piece" of the solution that will try to solve will be to find a single word that is a partial anagram of the given phrase. For example, in "hello world", your solver might start by finding "hell" as a piece of the solution. Taking away these letters will leave you with a smaller phrase: "o world". You can then recurse to try and find an anagram that uses the letters from that smaller phrase. If you run out of letters, then you have successfully found an anagram--a solution to the problem. If you can't find any anagrams for the remaining letters, then you must have hit a deadend, and so will need to backtrack.
-
You can use the
LetterSet
class to easily figure out if a given word can be made as an anagram of the phrase's letters (since what matters are which letters are in the word, not what order they are in)! Think about thesub()
method in particular--since "row" can be made of the letters of "hello world", subtracting these sets will let you "bite off" the letters of "row" and continue recursing.- Details about the class's public interface can be found in original homework writeup. Be sure and read the descriptions of the methods carefully!
- You are welcome to use iteration to check through the possible words in the dictionary, though your backtracking should be done recursively.
-
In addition, note that you are supposed to print out all of the anagrams for a phrase; unlike with the n-Queens problem, we don't want to stop when we've found a single solution. You might rethink your method signature: what do we need to return?
- Don't make this harder than it needs to be: you are performing an exhaustive search. It is
okayrequired for your output to include multiple permutations of the same words (e.g., [old, row, hell] and [hell, old, row] in the example above). Similarly, you don’t need any special cases for words that have already been used. If someone asks you for the anagrams of “foo foo foo”, you should include [foo, foo, foo].
- Don't make this harder than it needs to be: you are performing an exhaustive search. It is
- Hint: printing out an
ArrayList<String>
will produce out output of the form[string1, string2, ...]
, which is the output you're looking for. - Remember to stop when you've hit the max number of words, unless the max is 0 in which case you stop when you've exhausted the list.
- Your
findAnagrams()
method should also display the amount of wall clock time your program took to find the anagrams. Remember that you can useSystem.currentTimeMillis()
to get the current time in milliseconds.
Once you have everything working, try making anagrams of a few choice phrases! For example, what are anagrams of your name?
Optimization [Optional]
Recursive backtracking is, in general, highly inefficient because it is a brute force technique that checks every possibility. But there are still things you can do to make sure that your solution is as efficient as it can be. For example, you can be careful not to compute something twice if you don’t need to (e.g., similar to how we fixed the Fibonacci calculation). Or you can stop exploring solution paths that you know will never work out. To this end, once you have finished your recursive backtracker, you should consider and explore at least one possible way to optimize the efficiency of your program:
- One common way of optimizing algorithms is to precompute values or data that we may need later. You might compute
LetterSets
for all of your dictionary words in advance, so that you don't need to do it again every time you want to consider a word. - Similarly, for any given phrase, you can reduce the dictionary to to a smaller dictionary of “relevant” words. A word is relevant if it can be constructed from the letters of a given phrase. Only a fraction of the dictionary will be relevant to any given phrase, so reducing the dictionary before you begin the recursive search will allow you to speed up the searches that happen on each recursive invocation. You can do this once before the recursion begins, or once during each recursive call--the later will involve some overhead for each call, but the reduction in search space may outweight the increased cost of the computation. You might try both options and see which is faster!
- You might also be able to "trim" down your solution space by ordering your dictionary words by size; this way if you get to a word that is longer than the number of letters you have left, you do not have to keep searching down that branch.
For this lab, you must include at least two (1) of the above optimizations, but can earn extra credit for including more. You should add these optimizations after you have implemented a working backtracking algorithm.
AnagramFinder
, be sure and list which optimizations you performed. Also include plenty of inline comments explaining what you did!
Extensions
- There are lots of other ways you can try and optimize your code--be creative! How might you reduce the number of operations per search, or the number of searches to make, or both?
- Printing is often the slowest part of an algorithm; can you put off printing until the search is completed? Does that help the perceived speed (e.g., how fast things seem to be running from the user's perspective)? Does it change the actual speed?
Submitting
Make sure your name is in a class comment at the top of your AnagramFinder
class, and upload it to the LabG submission folder on hedwig. Make sure you upload your work to the correct folder! The lab is due at the start of class on the day after the lab.
After you have submitted your solution, log onto Moodle and submit the Lab G Partner Evaluation. Both partners need to submit evaluations.
Grading
This assignment will be graded based on approximately the following criteria:
- You have implemented an AnagramFinder that works with the provided tester [20%]
- Your AnagramFinder uses recursive backtracking to print all anagrams for a given phrase [50%]
- Your AnagramFinder respects the provided maximum number of words [20%]
- Your AnagramFinder reports the amount of time spent searching for anagrams [5%]
- You completed your lab partner evaluation [5%]
- You have implemented an AnagramFinder that works with the provided tester [15%]
- Your AnagramFinder uses recursive backtracking to print all anagrams for a given phrase [35%]
- Your AnagramFinder respects the provided maximum number of words [15%]
- Your AnagramFinder reports the amount of time spent searching for anagrams [5%]
- Your AnagramFinder includes and documents the required optimizations [25%]
- You completed your lab partner evaluation [5%]