Summer Institute in Statistical Genetics
Module 2: Computing for Statistical Genetics
Instructors: Thomas Lumley and Ken Rice

This page will feature slides from our sessions, exercises for you to complete, and their solutions (all to follow). Prior to the module, please install on an up-to-date version of R on the laptop you will use during the summer institute. R is free, and is available from this site.

To download and install Bioconductor to your laptop, first log on to the internet. Then open an R session and enter the following;

source("http://bioconductor.org/biocLite.R")
biocLite()


Slides and exercises

Script files are posted following each session; these will contain our R code for the exercises. To make them work on your computer, remember to modify file names and locations appropriately. Also note that many different 'correct' solutions are possible.

Session 1, Introductions, reading in data (exercises) (R script file)

Session 2; Learning to Draw (exercises) (R script file)

Session 3; Data Manipulation (exercises) (R script file)

Session 4; Model Fitting (exercises) (R script file)

Session 5; Permutation Tests and Debugging (exercises) (R script file)

Session 6; High-throughout Work, and Writing Loops (code for timing and speedup) (exercises) (R script file)

Session 7; Handling Large Datasets (exercises) (R script file)

Session 8; Bioconductor #1 (exercises) (R script file)

Special Exercise: This is a more in-depth programming problem, for you to try on Tuesday night; we'll discuss it in the final session (R script file)

Session 9; Bioconductor #2 (exercises) (R script file)

Session 10; Special Exercise review (see above), interfacing R to other software (SVG+ example) (Code for the Google Maps example - unfortunately we can't post the data)

Bonus Tracks! Sometimes we get asked about haplotype analysis - so here's a session on haplotypes. There are some associated exercises, and a script file with one way to code them. You may also find this documentation useful.

For easier searching, here are all the slides in one document (PDF).


Datasets - in alphabetical order

Before trying to read data into your R session, we recommend looking at it first, in a text editor. Is the data comma- or tab-delimited? Does it have a 'header' row containing variable names?

AMDchrom1snpStats.Rdata -- see session 8
annt.txt
bpdata.csv
data.vsn.csv
example-pheno.txt  
example-pheno.csv 
example-snp.txt
foursnps.csv
foursnps.txt
genepi.txt
justsnps.txt
niehs.csv
psa.txt
ribogreen.rda
salary.dat
sampleinfo.csv
sisg.nc -- see session 7
swirl.zip -- see session 8


Other resources

Some recommended books;