R for Large Data & Bioinformatics
28 November 2013 to 29 November 2013
Instructors: Thomas Lumley and Ken Rice

This page will feature slides from our sessions, exercises for you to complete, and their solutions (all to follow). Prior to the module, please install on an up-to-date version of R on the laptop you will use during the summer institute. R is free, and is available from this site.

To download and install Bioconductor to your laptop, first log on to the internet. Then open an R session and enter the following;

source("http://bioconductor.org/biocLite.R")
biocLite()

After doing this download, to download new Bioconductor packages (for example the hexbin package) use the following commands;

source("http://bioconductor.org/biocLite.R")
biocLite("hexbin")


Slides and exercises

Script files are posted following each session; these will contain our R code for the exercises. To make them work on your computer, remember to modify file names and locations appropriately. Also note that many different 'correct' solutions are possible.

Session 1, Introductions, overview of R (exercises) (R script file)

Session 2, Graphics in R (exercises) (R script file)

Session 3, More advanced graphics (R code for color wheels and boxes in the slides) (exercises) (R script file)

Session 4, Data manipulation (exercises) (R script file)

Session 5, Permutations tests (i.e. replication) (exercises) (R script file)

Session 6, Writing big loops (code for timing and speedup) (exercises) (R script file)

Session 7, Working with big data (exercises) (R script file)

Session 8, Bioconductor (exercises) (R script file)

For easier searching, here are all the slides in one document (PDF).


Datasets - in alphabetical order

Before trying to read data into your R session, we recommend looking at it first, in a text editor. Is the data comma- or tab-delimited? Does it have a 'header' row containing variable names?

AMDchrom1snpStats.Rdata -- see session 8
annt.txt
bpdata.csv
data.vsn.csv
example-pheno.txt  
example-pheno.csv 
example-snp.txt
foursnps.csv
foursnps.txt
genepi.txt
justsnps.txt
niehs.csv
ribogreen.rda
salary.dat
SEAflights.db
SEAflightslocs.csv
sleep.csv
sampleinfo.csv
sisg.nc -- see session 7 (large file, 3M)
sisg.db (very large file, 40M)


Other resources

Some recommended books;