CS 261 Homework 7 - Heroes and Hashes

Due Wed Dec 10 at 11:59pm

Overview

For this assignment you will practice implementing a HashMap, programmatically testing the efficiencies of different load factors and hashing algorithms.

In particular, you will be implementing an ADT that represent a HashSet--a Set (an unordered list of unique items) built on top of a HashMap. Note that a HashSet is basically a HashMap but only deals with single values instead of key-value pairs, making the implementation a bit more straightforward. Specifically, you will be creating a HashSet to store a "roster" of Super Heroes, testing the efficiency of your program using a large database of superheroes! You will run a series of "benchmark" tests on your implementation to measure the impact of different load factors, comparing these values with the theoretical ones discussed in class and in the textbook.

This assignment should be completed individually. You are welcome to ask for help from me, or from your classmates (but remember to follow the Gilligan's Island rule)! Note that much of the code for this assignment can be adapted from chapter 7 of the textbook or from class lecture code, so it should be straightforward to write (and doable in the relatively short time provided).

Objectives

Necessary Files

You will need a copy of the Hwk7.zip file. This file contains two interfaces that your classes will need to implement:

The zip file also contains a large sample data file you can use: marvelheroes.txt is a list of Marvel super heroes and their appearances. Each line in the file gives the name of the comic character, followed by a semicolon (;), followed by a code indicating a comic the hero appeared in. This file contains about 6500 characters with almost 100,000 appearances between them (it's a big file).

Lastly, the zip contains the README for the assignment, which you will need to fill out. You will need to include detailed results of your benchmark tests, so don't forget that!

Assignment Details

You will be creating a couple of new classes for this assignment: the HeroRoster ADT, a CorrectnessTester testing class, and a RosterBenchmarker testing class.

HeroRoster

Your primary class will be your HeroRoster: an ADT representing a set of heroes. This class will be implemented directly as a HashSet using linear probing (the same way that the DecomposableShape was a LinkedList, and the HuffmanTree was a BinaryTree). You'll need to develop this class from scratch--except it won't really be from scratch; the textbook has lots of example code (see Chapter 7) that you can adapt, and we will be going over much of the implementation in class (again, which you can adapt). You are welcome to use this code in your program---just be sure and to credit the source of your code, and to add copious comments to demonstrate you understand how it works!

The other methods your HeroRoster needs to implement are explained in the interface. I've included a few scattered notes below, but at this point you should be capable of writing an ADT to match an interface.

As always, if there are any questions please ask (and sooner rather than later)!

CorrectnessTester

As you are implementing a new ADT from scratch, you will also need to test that it works! This class should include a set of unit tests (like we've used before) that demonstrates that your HeroRoster works as expected.

RosterBenchmarker

Once you have your implemented working (and you're sure that it works, because you tested it!) you will need to run a number of efficiency tests (or "benchmarks") on your data structure, experimenting with different load thresholds and your different HeroHashers.

Timeline

There are a lot of tasks to complete in this assignment--particularly in the benchmarking (though most of that is just adjusting variables and calling methods multiple times). But most of the work (implementing the HeroRoster as a HashSet) can be adapted from the textbook and the lecture code. But since we'll be continuing to talk about HashMaps, you may consider completing this assignment in the following order:

Extensions

As an extension, you can implement a ChainedHeroRoster that uses chaining to resolve collisions, a QuadraticHeroRoster that uses quadratic probing to resolve collisions, or a DoubleHeroRoster that uses double hashing to resolve collisions. You should run your benchmarks on this new configuration and include the results in your report.

You can earn up to TBD points of extra credit implementing and benchmarking these other Rosters.

Submitting

BEFORE YOU SUBMIT: make sure your code is fully documented and functional! If your code doesn't run, I can't give you credit! Your name should be in the class comment at the top of each class.

Submit all your classes to the Hwk7 submission folder on vhedwig. Make sure you upload your work to the correct folder!. You should upload your entire src folder with all your classes.

Additionally, upload your written report in .pdf format. Your report should include tables of data as well as a brief analysis of that data (including answers to the above questions).

Finally, be sure to complete the provided README.txt file with details about your program.

The homework is due at midnight on Wednesday, Dec 10.

Grading

This assignment will be graded out of 37 points.