Summer Institute in Statistical Genetics Module 7: Computing for Statistical Genetics Instructors: Thomas Lumley and Ken RiceThis page will feature slides from our sessions, exercises for you to complete, and their solutions (all to follow). Prior to the module, please install on an up-to-date version of R on the laptop you will use during the summer institute. R is free, and is available from this site.To download and install Bioconductor to your laptop, first log on to the UW network. Then open an R session and enter the following;source("http://bioconductor.org/biocLite.R") biocLite() Proxy use in R: use this command `Sys.setenv("http_proxy"="http://course3:7890uiop@162.105.250.142:7777")` Slides and exercises Script files are posted following each session; these will contain our R code for the exercises. To make them work on your computer, remember to modify file names and locations appropriately. Also note that many different 'correct' solutions are possible. Session 1, Introductions, reading in data (exercises) (R script file) Session 2; Learning to Draw (exercises) (R script file) Session 3; Data Manipulation (exercises) (R script file) Session 4; Model Fitting (exercises) (R script file) Session 5; Permutation Tests (exercises) (R script file) Session 6; Writing Loops (exercises) (speeding-up examples) (R script file) Session 7; Handling Large Datasets (exercises) (R script file) Session 8; Bioconductor #1 (exercises) (R script file) Special Exercise: This is a more in-depth programming problem, for you to try on Thursday night; we'll discuss it in the final session (PowerPoint) (R script file) Session 9; Bioconductor #2 (exercises) (R script file) Session 10; Special Exercise review (see above), interfacing R to other software (SVG+ example) Bonus Tracks! Sometimes we get asked about haplotype analysis - so here's a session on haplotypes. There are some associated exercises, and a script file with one way to code them. You may also find this documentation useful. Datasets - in alphabetical order. Before trying to read data into your R session, we recommend looking at it first, in a text editor. Is the data comma- or tab-delimited? Does it have a 'header' row containing variable names? Other resources The R site, which includes the Comprehensive R Archive Network (CRAN) of downloads and packages [this link is to a local 'mirror']Bioconductor is a collection of R packages for bioinformatics/genomics. It will be helpful to download and install the 'base' Bioconductor packages before sessions 8/9/10A reference card of common R commands (and a slightly longer reference card)To change languages, put a .Renviron file in the startup working directory - use en for English, or de, it (To get the working directory, use getwd())Some recommended books;P. Dalgaard, Introductory Statistics with R: Springer, 2nd Edition 2008 [good if you haven't seen much statistics before, though it uses very simple methods, and doesn't emphasize programming much. There are several other texts at this level] J. M. Chambers, Programming with Data: A Guide to the S Language. New York: Springer, 1998. [reference for programmers who find 'thinking' in R difficult] Describes the S language, on which R is based. Also known as "The Green Book" J. M. Chambers, Software for Data Analysis: Programming with R. New York: Springer, 2008. [also for programmers; it begins with simple interactive use and gradually progresses to advanced use of R]Gentleman et al, Bioinformatics and Computational Biology Solutions Using R and Bioconductor: Springer 2005 [guides you through practical bioinformatics data analysis using the Bioconductor toolkit, emphasizing microarrays] Hahne et al, Bioconductor Case Studies; Springer 2008 [Several fairly advanced examples using R and Bioconductor, emphasizing genomic data]Braun and Murdoch, A First Course in Statistical Programming with R, CUP 2007 [A different emphasis from the other books here, good for those new to programming] Send mail to: see Thomas or Ken in class Last modified: 6/15/2010 5:02 PM