Lectures PDFs of slides are best viewed in Adobe Acrobat, rather than in your browser.
You may want to read through Kevin Quinn’s matrix algebra and probability distribution reviews, or consult my undergrad lectures on discrete and continuous distributions. For a more general review, you can find the lecture notes for the CSSS Math Camp here. There are also R code and data for exploratory data analysis using histograms and boxplots, code and data for a simple bivariate linear regression, and code and data for a multiple regression example. Finally, you’ll find detailed instructions for downloading, installing, and learning my recommended software for quantitative social science here. Focus on steps 1.1 and 1.3 for now, and then, optionally, step 1.2.
The R code to simulate heteroskedastic data and model that data using a heteroskedastic normal maximum likelihood is here. There are additional mini-lectures on two topics. The first is a review of Bayes Rule. The second presents the Monty Hall problem in color and printable pdf formats. Three versions of the Monte Hall simulation code are available: the first uses a loop and many small steps, the second uses a loop and more compact code, and the third uses lapply() to avoid looping.
There are separate R scripts for interpreting and selecting binary logit models, as well as an example dataset. The goodness of fit code also relies on R functions for computing the percent correctly predicted and making predicted-versus-actual plots and ROC plots, which you should place in your working directory. An example trio of plots showing actual versus predicted probabilities, error versus predicted probabilities, and the ROC curve can be seen here.
R code for a multinomial logit, which produces a variety of graphical summaries of a multinomial logit model: for expected values plotted together, expected values plotted separately in a tiled format, first difference plotted for a single scenario and all categories, relative risks plotted for a single scenario and all categories, and relative risks plotted for many scenarios at once.
Two code examples are discussed in this lecture. The first example analyzes bounded counts using Binomial, Beta-Binomial, and Quasibinomial models of turnout in the 2004 general election in Washington state. Example output includes this plot of expected counts under different models for various counterfactual scenarios. Note this example uses multiple imputation to fill in missing data. You will need:
The second example analyzes unbounded counts using Poisson, Negative Binomial, Quasipoisson, Zero-inflated Poisson, and Zero-inflated Negative Binomial models of foreclosure filings by Houston, Texas area Home Owner Associations (HOAs). Example output includes this plot of expected values from a zero-inflated negative binomial model. You will need:
Advanced Topic 1
See the Topic 6 example on turnout for an R code using multiple imputation of missing data.
Advanced Topic 2
For the curious, the R script used to construct the example plots in the first half of this lecture is here.
Self-Study Lecture 1
This lecture and the two below it introduce log-linear models of tabular data, and will not be presented as part of POLS/CSSS 510. They are posted here for interested students, especially for the use of mosaic plots to investigate cross-tabulated data (in this lecture, and in the third lecture on multidimensional tables). Students interested in a CSSS course on log-linear models should investigate CSSS 536.
Self-Study Lecture 2
Self-Study Lecture 3
Due in class Thursday 13 October 2016
Due in class Tuesday 25 October 2016
Due in class Thursday 3 November 2016
Data for problem 1 in comma-separated variable format.
Due in class Thursday 15 November 2016
Data for problem 1 in comma-separated variable format.
Due in class Tuesday 29 November 2016
Due in class Thursday 8 December 2016
Data will be provided.
29 November 2015 to 8 December 2016
Requirments and suggestions for poster presentations will be presented in class.
Due Tuesday 13 December 2016, 3:00 pm, both in my Gowen mailbox and by email
See the syllabus for paper requirements, and see my guidelines and recommendations for quantitative research papers.
Review of Probability and Introduction to R Programming
Please take a look at this handout and data, especially if you are new to R or need a refresher. We will be going over a list of commands that are important to know for the course, using this as a starting point. You can also check out this reference by former TA, Aaron Erlich. If you're already comfortable in R, that's great. You may want to take a look at the tile and simcf packages that we will be using for most the course. Here is the code from Lab 1.
Introduction to Maximum Likelihood Estimation
Please take a look at this handout. Here is the code from Lab 2. Next week, I'll post the code earlier so you can have a chance to look at it before Friday. We'll then go over it in detail and cover as much ground as possible. For those of you having trouble installing simcf, there should be an updated version on Chris's website soon.
Maximum Likelihood Estimation in R
Here is the lab code and data for this week. Please take a look at it before Friday to get an idea of what we'll be covering. Since we did not cover logistic regression during Thursday's lecture, we will be spending more of our time finishing up our discussion from last week.
Goodness of Fit and Variable Selection for Binary Outcome Models
You can find the lab code for this week under Topic 3 of the Lectures section above. Please take the time to install the required packages, and be sure to keep (but not run) the two helper R files in your working directory.
Models of Ordered Data
You can find the lab code and data for this week under Topic 4 of the Lectures section above. Again, please take the time to install the required packages.
Models of Nominal Data
We'll be covering the Gators example from lecture during lab session. You can find the updated code under Topic 5 of the Lectures section above.
Models of Count Data
We will be discussing count models using the voter turnout data in lab session with an application of multiple imputation using Amelia. You can find the code under Topic 6 above.