Homework Assignment #2 DUE MONDAY, OCTOBER 16
All homework assignments must typed or legibly handwritten, and must show your name, the assignment number, course number, and the date clearly. Homework is due at the beginning of class on the specified date.
Note: Links to Survey Documentation and Analysis (SDA) and WebStat are on the course web page (http://faculty.washington.edu/ddbrewer/bls315/).
Part A: Exercises in text:
ch. 7 - #12 (explain why for each)
Part B: WebStat
Go to the WebStat 2.0 web site. Click on the orange button at the upper right of the screen to start WebStat (it can take a minute or so before this button appears). Go to the "Data" part of the menu and then down to "Sample data sets" and then click on "Billionaires92" (the shorthand for these steps is: Data --> Sample data sets --> Billionaires92). Wait a second or two and then the data should appear in the main part of the screen in rows and columns. Then go to Data --> Show info to get a brief description of the data set and the variables. You can maximize the window to see all the information.
1) What are the units of analysis for this
data set?
2) Describe the shape of the interval
variables' distributions in approximate terms. Base your description
on appropriate graphic displays of these variables. The display tools
can be found in the Graphics section of the menu. When you use the
histogram option, try using different class widths to see how the picture
changes. The steps to follow for all of the graphics tools should
be fairly straightforward, but if you run into trouble, go to the Help
section of the WebStat menu. Once in the Help window, scroll down
to find the procedure/topic you need help with, and click on it for more
information and instructions.
3) Load the "Labor Force" sample data
set (located in the same submenu as the Billionaires92 data set).
What are the units of analysis for this data set?
Now load the "1000 random digits" sample data set. Plot these data with a histogram or dotplot.
4) What is the shape of this distribution?
Interpret the meaning of this shape. Why would you expect random
numbers to be distributed in this way?
Part C: Describing distributions
For the next section, comprehensively describe the distribution of a nominal or ordinal scale variable and an interval scale variable separately. Your description should cover the three main aspects of a distribution: central tendency, dispersion, and shape. Go to the SDA web site and pick one or two of the data sets to explore (you can use the steps you went through in homework #1 as a guide). You may use a nominal/ordinal scale variable from one data set and an interval scale variable from another data set if you like. (There are only a few interval variables [usually pertaining to respondent demographics] in most data sets in the main SDA archive, but the U.S. Census Microdata and NHIS have more interval variables, as do many of the data sets accessed through the "Other Archives" link on the main SDA Archive page). To get information about the data set in terms of units of analysis and sampling procedures, click on the "Introduction" part of the data set's codebook.
Hit the "Extra Codebook Window" button to keep the survey's codebook open in another window. This way you can look up the variables during the exercise if you forget their names or other information about them. Minimize the codebook window so you can work in the prior window. Select "Frequencies or crosstabulations" and then hit the "Start" button 2/3 of the way down the screen.
On the next screen, type in the variable name(s) you have selected to investigate in the Row box. Type the names exactly as they are shown in the codebook, making sure to leave a space between each variable name. (If you are using one variable from one data set and a second variable from another data set, you would repeat this process for each data set). Move down the page a little and put a check in the Percentaging section for "column." This means that SDA will not only produce a frequency distribution but a percentage distribution for each of the three variables. Then move down the page a little more to the "Other options" section and put checks for "Statistics" and "Question text." SDA will then compute various measures of central tendency and dispersion and show the full text of the question for each variable. (Note that the measures of central tendency and dispersion will only be meaningful for ordinal and interval scale variables. Some of these measures we will not cover in this course and some are not appropriate for ordinal variables). Finally, hit the "Run the table" button.
Print out the results with your browser's print function. SDA unfortunately does not produce cumulative frequency/percentage distributions. If your variable(s) is on an ordinal or interval scale, compute the cumulative frequency and percentage distributions for the ordinal scale variable by hand or with a calculator, and write them next to the noncumulative distributions.
If the interval scale variable you have selected has many different observed values, then you may need to recode them into a smaller set of values for the purposes of creating a more interpretable percentage distribution (otherwise, the percentage distribution may go on for pages). Follow the procedures for recoding values that we reviewed in class. If you need a reminder, once you are on the SDA Tables Program page (where you enter the variable names), click on "Recoding variables" for more information on how to do this. Remember to use the results from the original interval scale data (not recoded) for measures of central tendency and dispersion.
With the percentage distributions and statistics
output, you have all you need to give a comprehensive description of these
variables. Even though SDA does not include graphical procedures,
you should still draw (by hand) a box plot and histogram for the interval
scale variable, and an appropriate graph for the nominal/ordinal scale
variable. For each of your two variables, write a summary paragraph
for your comprehensive description of the distribution. Begin by
identifying the data set, units of analysis, and sampling procedure.
Then report on all appropriate measures of central tendency and dispersion
for each variable. Include with your summary the 10th and 90th percentiles.
When describing the shape of the distribution, be sure to note any special
features (e.g., outliers, multimodality, etc.). Be sure to include
all relevant SDA output and graphical displays you draw.
Part D: Standard deviation
Click on the "Mean, median, and standard deviation demonstration" link on the course web site. Once there, click on the "Begin" button as you did last week in the computer classroom. Change the distribution around like you did before. Keep the range fixed and make it so each value has at least one observation (each value has a bar at least "1" tall). Play around to make the standard deviation as large and as small as you can. (This may take some time, but it is fun). What are the features of distributions that have the greatest standard deviations, given the constraints of at least one observation per value? What are the features of distributions that have the smallest standard deviations?
Draw (by hand) histograms for the distributions you've created with the largest and smallest standard deviations.
What is the standard deviation when all
values have the same number of observations? How does this change
when the height of all bars increases or decreases by the same amount?