Summer Institute in Statistical Genetics
Module 18: Advanced R Programming for Bioinformatics
Instructors: Thomas Lumley and Ken Rice

  • Wednesday, July 10: 1:30pm - 5:00pm in SC348, followed by a reception in South Campus Center
  • Thursday, July 11: 8:30am - 5:00pm in SC348. There is no evening session for this module
  • Friday, July 12: 8:30am - 5:00p, in SC348
  • Map

This page will feature slides from our sessions, exercises for you to complete, and their solutions. Prior to the module, please install an up-to-date version of R on the laptop you will use during the summer institute. R is available from this site.

It may also be helpful to install the base Bioconductor packages; to download and install these, enter the following;

source("http://bioconductor.org/biocLite.R")
biocLite()

After doing this download, to download new Bioconductor packages (for example the hexbin package) use the following commands;

source("http://bioconductor.org/biocLite.R")
biocLite("hexbin")

Tools for calling C from R - Windows
(For sessions 7/8) Resources for builing R packages under windows are available for download here. Make sure you get the version for your (current) version of R. We recommend allowing the installer to add these tools to your default Path; once you have installed the the tools, your first command entered at the command line should be e.g.

path C:\Program Files\R\R-3.0.1\bin;%PATH%

...which adds the directory containing R to your path. If you don't know the directory containing R on your machine, right-click on the R shortcut icon

Slides and exercises

Script files are posted following each session; these will contain our R code for the exercises. Many exercises will be open-ended - and have many different solutions; the code posted here illustrates possible approaches.

Session 1, Introductions, simulation, debugging, timing (Exercises: .docx, .pdf) (R script file)

Session 2, Graphics (R code for color wheels and boxes in the slides) (Exercises: .docx, .pdf) (R script file)

Session 3, Object Systems (No exercises for this session)

Session 4, Lab exercise (Exercises: .docx, .pdf) (R script file)

Session 5, Packages (Exercises: .docx, .pdf) (R script file)
We also mentioned Sweave, particularly for writing vignettes. RStudio makes it easy to make Sweave output; here are some introductory notes on RStudio and Sweave.

Session 6, XML (SVG Funnel plot example) (Exercises: .docx, .pdf) (R script files: R script file)

Session 7, Embedding C code (R script for the convolve example, including some Command Line commmands; it uses convolve.c. See also R script for the convolve example using .Call, which uses convolve2.c - no exercises for this session).

For C operations on arrays, see this short example that uses the inline package. Also see Rcpp for other ways to use R objects in C code

Session 8, Lab exercise (Exercises: .docx, .pdf) (R script file)

Session 9, Handling large datasets (no exercises for this session) (Example featuring RSQLite which uses the newexample.db database file)

Session 10, Lab exercise (Exercises: .docx, .pdf) (R script file)

For easier searching, here are all the slides in one file.


Datasets and other files for exercises

Other resources

Some recommended books;