SISG 2020

Module 14: Association Mapping: GWAS and Sequencing Data

Instructors: Timothy Thornton and Michael Wu

This page will feature slides, exercises, some solutions, and video recordings (all to follow).


Install R and Rstudio, and download PLINK

Prior to the module, please install up-to-date versions of R (Version 4.0.2), RStudio, and Plink on the laptop you will use during the summer institute. All three are free.


R Packages for Module 14
The following R packages from CRAN will be used and should be installed prior to the module:
  • qqman
  • SKAT
The R commands below can be used to install the two CRAN R packages :

install.packages("qqman")
install.packages("SKAT")

The following R packages from Bioconductor will be used and should be installed prior to the module
  • GWASTools
  • gdsfmt
  • SNPRelate
  • GENESIS
The R commands below can be used to install the R packages from Bioconductor with the latest version of R (Version 3.6) :

if (!requireNamespace("BiocManager", quietly = TRUE))
install.packages("BiocManager")
BiocManager::install(c("GWASTools", "gdsfmt","SNPRelate","GENESIS"))

Session Format

The module has 10 sessions, each of 80 minutes. The standard format for a session is approximately:

  • 40 minutes of lecture material that will be recorded via Zoom live and posted at the end of the day
  • 25 minutes of exercises for you to try, with small-group "breakout" Zoom sessions available, attended by other class participants, and Teaching Assistants
  • 15 minute discussion of exercises, where the instructors will present possible solutions and answer questions

Please join the module's Slack channel, where you can ask questions and see real-time updates from the instructors and TAs.


Schedule, Slides, Exercises, and Video Recordings:

For each exercises performed in R and PLINK, script files will be posted. To make them work on your computer, remember to modify file names and locations appropriately. Also note that many different 'correct' solutions are possible.

Video recordings of the lectures and exercises will be posted at the end of each day.

All times listed below for the schedule are Pacific Standard Time (PST).

Monday, July 27th
Time Topic Lecture Exercises/Discussion
8:00am-9:20am 1. Introduction, Case Control Association Testing Slides: (Intro) [.pdf ], (Lecture) [.pdf], video Exercises [.pdf], video , R Script:[ .R] , Key: [.html ], [.Rmd]
9:40am-11:00am 2. Association Testing with Quantitative Traits Slides: [.pdf], video Exercises: [.pdf], video R Script:[ .R], Key: [.html ], [.Rmd]
11:30am-12:50pm 3. Introduction to the PLINK Software for GWAS Slides [.pdf], video Exercises [.pdf], video, Plink Script: [ .txt], R Script:[.R], Key (Rscript) :[.R]
1:10am-2:30pm 4. Gene and Pathway Level Analysis of Genetic Association Studies. Slides [.pdf], video Exercises [.pdf], video, Plink and R Script: [ .txt]
Tuesday, July 28th
Time Topic Lecture Exercises/Discussion
8:00am-9:20am 5. Population Structure Inference Slides [.pdf], video Exercises [.pdf], video ,R Script: [.R], Key: [.html], [.Rmd]
9:40am-11:00am 6. GWAS in Samples with Structure Slides [.pdf], video Exercises [.pdf ],video ,R Script:[ .R]
11:30am-12:50pm 7. Interaction Analysis Slides [.pdf], video Exercises [.pdf], video , R Script:[ .txt]
1:10am-2:30pm 8. Introduction to Rare Variant Analysis and Collapsing Tests Slides [.pdf], video Exercises [.pdf], video , Key: R Script:[ .txt]
Wednesday, July 29th
Time Topic Lecture Exercises/Discussion
8:00am-9:20am 9. Rare Variant Analysis: Kernel (Variance Component) Tests and Omnibus Tests Slides [.pdf], video Exercises [.pdf], video , R Script:[ .txt]
9:40am-11:00am 10. Power and Sample Size, Design Considerations, and Emerging Issues Slides [.pdf], video Exercises [.pdf], video , R Script:[ .txt]

Datasets

All individual data files below can be downloaded as a single zipped folder from dropbox. This file can be downloaded here: SISG2021Data.zip

Alternatively, you can download each of the data files below. Before trying to read data into an R or PLINK session, we recommend looking at it first, in a text editor. Is the data comma- or tab-delimited? Does it have a 'header' row containing variable names?



Other Resources