Chris Adolph

Maximum Likelihood Methods
for the Social Sciences

POLS/CSSS 510

Social science data seldom meet the assumptions of the linear regression model taught in introductory statistics courses. Our data often consist of discrete categorizations or counts of events, and may be correlated across periods or clustered by groups. Students will use maximum likelihood methods to derive models appropriate for their own data, learn to communicate their findings to a broad audience, and gain familiarity with statistical programming in R.

POLS/CSSS 510

Maximum Likelihood Methods
for the Social Sciences

Offered every Fall at the
University of Washington

Syllabus

Readings

Fall 2023

Class meets:
TTh 4:30—5:50 pm
Smith Hall 205

TA:	Ramses Llobet (UW Political Science)

Section meets:
F 3:30—5:20 pm
Taught by Zoom

Lectures Click on lecture titles to view slides or the buttons to download them as PDFs.

Topic 1

Introduction to the Course, Probability, and R

You may want to read through Kevin Quinn’s matrix algebra and probability distribution reviews, or consult my undergrad lectures on discrete and continuous distributions.

There are also R code and data for exploratory data analysis using histograms and boxplots, code and data for a simple bivariate linear regression, and code and data for a multiple regression example.

Finally, you’ll find detailed instructions for downloading, installing, and learning my recommended software for quantitative social science here. Focus on steps 1.1 and 1.3 for now, and then, optionally, step 1.2. (Note: These recommendations may seem dated, as many students prefer to use RStudio as an integrated design environment in combination with RMarkdown. You are free to follow that model, which minimizes start-up costs. I still prefer a combination of Emacs, the plain R console, and Latex/XeLatex for my own productivity, with occasional use of Adobe Illustrator for graphics touch-up.)

Topic 2

Introduction to Maximum Likelihood

The R code to simulate heteroskedastic data and model that data using a heteroskedastic normal maximum likelihood is here.

There are additional mini-lectures on two topics. The first is a review of Bayes Rule (download PDF). The second presents the Monty Hall problem (download PDF). Three versions of the Monte Hall simulation code are available: the first uses a loop and many small steps, the second uses a loop and more compact code, and the third uses lapply() to avoid looping.

Topic 3

Models of Binary Data

There are separate R scripts for interpreting and selecting binary logit models, as well as an example dataset. The goodness of fit code also relies on R functions for computing the percent correctly predicted and making predicted-versus-actual plots and ROC plots, which you should place in your working directory. An example trio of plots showing actual versus predicted probabilities, error versus predicted probabilities, and the ROC curve can be seen here.

Topic 4

Models of Ordered Data

R code and data for an ordered probit, which produces graphics for expected value plots and first difference plots.

Topic 5

Models of Nominal Data

R code for a multinomial logit, which produces a variety of graphical summaries of a multinomial logit model: for expected values plotted together, expected values plotted separately in a tiled format, first difference plotted for a single scenario and all categories, relative risks plotted for a single scenario and all categories, and relative risks plotted for many scenarios at once.

Topic 6

Models of Count Data

Two code examples are discussed in this lecture.

The first example analyzes bounded counts using Binomial, Beta-Binomial, and Quasibinomial models of turnout in the 2004 general election in Washington state. Example output includes this plot of expected counts under different models for various counterfactual scenarios. Note this example uses multiple imputation to fill in missing data. You will need:

The main R script to run the models, cross-validation, and graphics
Data (csv format) from the Washington Secretary of State & US Census
An R helper file with cross-validation functions

The second example analyzes unbounded counts using Poisson, Negative Binomial, Quasipoisson, Zero-inflated Poisson, and Zero-inflated Negative Binomial models of foreclosure filings by Houston, Texas area Home Owner Associations (HOAs). Example output includes this plot of expected values from a zero-inflated negative binomial model. You will need:

The main R script to run the models and graphics
Data (csv format) from HOAdata.org

Advanced Topic 1

Missing Data and Multiple Imputation

See the Topic 6 example on turnout for an R code using multiple imputation of missing data. Also available is an example (R script, data, plot) showing the use of overimputation to compute coverage of multiple imputation prediction intervals for real data.

Advanced Topic 2

Introduction to Multilevel Models

For the curious, the R script used to construct the example plots in the first half of this lecture is here.

Self-Study Lecture 1

Introduction to Contingency Tables

This lecture and the two below it introduce log-linear models of tabular data, and will not be presented as part of POLS/CSSS 510. They are posted here for interested students, especially for the use of mosaic plots to investigate cross-tabulated data (in this lecture, and in the third lecture on multidimensional tables). Students interested in a CSSS course on log-linear models should investigate CSSS 536.

Self-Study Lecture 2

Log-linear Models of Contingency Tables: 2D tables

Self-Study Lecture 3

Log-linear Models of Contingency Tables: 3D+ tables

Student Assignments

Problem Set 1

Due in Canvas by start of class Wednesday 11 October 2023

Problem Set 2

Due in Canvas by start of class Monday 23 October 2023

Problem Set 3

Due in Canvas by start of class Wednesday 8 November 2023

Data for problem 1 in comma-separated variable format.

Problem Set 4

Due in Canvas by start of class Wednesday 22 November 2023

Data for problem 1 in comma-separated variable format.

Problem Set 5

Due in Canvas by start of class Wednesday 29 November 2023

Data for problem 1 in comma-separated variable format; data for problem 2 in R data format.

Poster Presentations

29 November 2023 to 6 December 2023

Requirements and suggestions for poster presentations will be presented in class.

Final Paper

Due Tuesday 12 December 2023, 3:00 pm by email

See the syllabus for paper requirements, and see my guidelines and recommendations for quantitative research papers.

Labs

Lab 1

Course logistics and review of R

Supplementary material: Find here the lab section syllabus. For today's lab, you will need to download the following R script and datasets: review_script.R, pop.csv, and gapminder.csv. You can download all the materials in the following ZIP file. You can access to the lab recording in this link.

Lab 2

Intro to RMarkdown and Overleaf

Supplementary material: Here's an RMarkdown template and today's code practice exercise. Additionally, you can access the code practice key in RMarkdown and script files. You can download all the materials in the following ZIP file. The lab recording is available at this link.

Lab 3

Heteroskedastic Normal

Supplementary material: First, download the following R script file. Here's today's code practice exercise. You can now download the lab code practice solutions and the script file with all the code I used in this lab. All the materials are in the following ZIP file. The lab recording is available at this link.

Lab 4

Review, Simulations, and Quantities of Interest (QoI)

Supplementary material: You can access here to the lab's script file. We did not have time to start with binary models, but please feel free to review the code from this script for the binary model estimation and visualization. The lab recording is available at this link.

Lab 5

Binary model and tile

Supplementary material: You can access here the lab's script file. To run the lab script, you must download the following data file and the support code "theme_caviz" for ggplot, originally created by former TA Brian Leung. You can download all the materials in the following ZIP file. Note: if you cannot download the ZIP file, please copy the link's address and paste it into your internet browser to access the ZIP file. The lab recording is available at this link.

Lab 6

Goodness of Fit and Model Selection

Supplementary material: You can access here the lab's script file. See RMarkdown files for HW02 solutions and HW03 pre-view. For some functions, you will need to use Chris's source code binaryGOF and binPredict as support in computing some quantities and visualization. You can download all the materials in the following ZIP file. Note: if you cannot download the ZIP file, please copy the link's address and paste it into your internet browser to access the ZIP file. The lab recording is available at this link.

Lab 7

Ordinal Probit and Multinominal Logit

Supplementary material: You can access here the lab's script file. This code is adapted from Chris's ordered and multinominal code scripts. You can download all the materials in the following ZIP file. Note: if you cannot download the ZIP file, please copy the link's address and paste it into your internet browser to access the ZIP file. The lab recording is available at this link.

Lab 8

Count models

You can access here the lab's script file. This code is adapted from Chris's HOA code scripts. Find also Brian's ggplot theme . You can download all the materials in the following ZIP file. Note: if you cannot download the ZIP file, please copy the link's address and paste it into your internet browser to access the ZIP file. The lab recording is available at this link.

Lab 9

Course review

You can access here the lab's script file. We will be reviewing homework 3 and 4, so you will need to download nes92 and cyyoung datasets, including the support code here and here. You can download all the materials in the following ZIP file.