SISG 2008, Seattle
R and Bioconductor: Computing for Statistical Genetics

Instructor: Thomas Lumley, Ken Rice
Email: see thomas or ken, in class


Downloading Bioconductor

source("http://bioconductor.org/biocLite.R")
biocLite()


This takes some time! - but you only need to do it once. When you need a new package, use e.g.

source("http://bioconductor.org/biocLite.R")
biocLite("siggenes")


Slides, exercises from sessions

Session 1, Introductions, reading in data (exercises) (solutions)

Session 2; Learning to Draw (exercises) (solutions)

Session 3; Data Manipulation (exercises) (solutions)

Session 4; Model Fitting (exercises) (solutions)

Session 5; Writing Loops (exercises) (solutions) (alternate solutions - taking out the missing values before you start)

Session 6; Permutation tests (exercises) (solutions)

Session 7; Large Data Files (exercises) (solutions)

Session 8; Bioconductor #1 (exercises) (solutions)

Session 9; Bioconductor #2 (exercises) (solutions)

Session 10; Interfacing R - no exercises, but here's the Google Earth code

Special exercise! This is a more in-depth programming problem, for you to try tonight; we'll talk more about this in the final session. (Some solutions for you to try - note that the timings are very approximate and "your mileage may vary")


Datasets

AMDchrom1.Rdata -- download this to your machine and use load() - not the more usual read.table() or read.csv()

sisg.nc -- download this to your machine - and see Session 7

foursnps.csv

foursnps.txt

genepi.txt

justsnps.txt

sampleinfo.csv

bpdata.csv

example-pheno.csv

psa.txt

salary.dat

niehs.csv

swirl.zip

data.vsn.csv

annt.txt



Other links

R site

Bioconductor

A reference card of common commands (and a slightly longer reference card)

To change languages, put a .Renviron file in the startup working directory - use en for English, or de, it (To get the working directory, use getwd())


Books

  • J. M. Chambers, Programming with Data: A Guide to the S Language. New York: Springer, 1998. [for programmers who find 'thinking' in R difficult] Describes the S language, on which R is based. Also known as "The Green Book"
  • J. M. Chambers, Software for Data Analysis: Programming with R. New York: Springer, 2008. [also for programmers] Chambers's newest book has not yet been published but will be out shortly.
  • P. Dalgaard, Introductory Statistics with R: Springer, 2002 [good if you haven't seen much statistics before, though it uses very simple methods, and doesn't emphasize programming much. There are several other texts at this level]
  • Gentleman et al, Bioinformatics and Computational Biology Solutions Using R and Bioconductor: Springer 2005 [guides you through practical bioinformatics data analysis using the Bioconductor toolkit, emphasising microarrays]
Send mail to: (see thomas or ken in class)