CS 261 Homework 3 - Baby Names

Due Mon Oct 06 at 11:59pm

Overview

The Social Security Administration (SSA) provides annual information about the popular of particular names for babies. They have a website that allows you to look up the popularity of a particular name. You will be writing a program that provides similar functionality, allowing the user to look up a baby name and see a graph of its popularity over time:

This assignment should be completed individually. You are welcome to ask for help from me, or from your classmates (following the Gilligan's Island rule)!

Objectives

Necessary Files

You will need to download the cs261-hwk3.zip file. which contains a few GUI classes to help get you started (and make sure you can get finished in the time allotted!) Read through the code in these classes to make sure you understand what is going on.

It also contains the README.txt for this assignment--be sure and answer the questions thoroughly for full credit!

Additionally, you will also need to download the data set of national baby name rankings from the SSA's website at http://www.ssa.gov/oact/babynames/limits.html. This is a 7mb zip file, with full name data for the years 1880 through 2013. Be sure and extract the files from the zip folder for processing.

The Data

One of the neat things about this project is that you will be using real data (and lots of it!). This has not been filtered or simplified by the professor; you are working with actual, real numbers. This data set includes pretty much every name ever given in the United States over the past 130 years!

The data can be found in multiple text files, each named in a format like "yob2013.txt"--the number in the filename represents the year. Each file contains a list of baby names ordered by popularity--that is, the first name in the file is the most popular female name, the second name is the second most popular female name, etc. Female names are listed before male names, so the first male name listed in the file would be the most popular male name, etc.

Each name entry is a comma separated list that looks like has the following format:

Joel,M,2697
The first item is the name, the second is the sex (M for male and F for female), and the third is the total number of people born with that name in that year.

Note that the data just includes literally what people put on the forms, so there are things like "A" and "Baby" recorded as names (the data is more cleaned up in the later years). We will not worry about that, and we will not combine names that are similar in some sense – "Cathy" and "Catherine" and "Kathryn" and "Katie" and "Kati" will all count as different names.

Assignment Details

You will be developing this program mostly from scratch, though I will give you some guidance on how to structure the classes and the algorithms. There are really two parts, each of which is moderately complex, but doable. There will be bugs (oh yes, there will be bugs), so make sure to get started early so you don't run out of time!

1. Loading the Names

Your first step will be to load the data from the data file into a format that you are able to work with. You should use two classes for this: NameRecord and NameDatabase. NameRecord objects hold all of the ranking data for a particular name (e.g., the ranking of that name for all years), and the NameDatabase object will hold a collection of NameRecords and provide methods that allow the user to find the data on a particular name.

The biggest (and hardest) part of this assignment will be parsing the data files. You should write a readDataFiles() method in your NameDatabase that does this work, and then call that method from elsewhere (such as inside the constructor).

2. Drawing the Graph

The next step is to be able to draw the graph of a given name. To do this, you'll need to modify the provided NameFrame and NameGraph classes.

The NameGraph

The NameFrame

Note that as always, be sure and include plenty of comments on your classes, including full JavaDoc comments!

3. Exploring the Data

Once you have your program working, take some time to explore the data (to answer the questions in the README). There are all kinds of fun searches to do (examples from Nick Parlante):

The sociology of baby naming is a really fascinating topic. Randall Munroe has a nice writeup about the Baby Name Wizard Blog which explores some of these issues, and might be a nice inspiration for searches.

Timeline

This assignment isn't as large as the previous, but there are more larger blocks of code you need to write. There are a number of moving parts and the file processing can be complicated. My advice, as always is to GET STARTED EARLY. Work on it for an hour each day and you should be okay.

Extensions

Due to time limitations, I've kept this assignment pretty basic. But there are lots of features and extensions you could add to this visualization. You can earn up to 10% extra credit by completing the extensions on this assignment--which is a good way to make up any lost points on earlier assignments if you have time! Note that these are all significantly outside the scope of the assignment.

Make sure to finish the assignment before writing extensions. If your base program doesn't work, you can't get credit for other things!

Submitting

BEFORE YOU SUBMIT: make sure your code is fully documented and functional! If your code doesn't run, I can't give you credit! Your name should be in the class comment at the top of each class.

Submit your program to the Hwk3 submission folder on vhedwig, following the instructions detailed in Lab A. Make sure you upload your work to the correct folder!. You should upload your entire src folder with all your classes. Be sure to complete the provided README.txt file with details about your program.

The homework is due at midnight on Mon Oct 06.

Grading

This assignment will be graded out of 30 points: