Sequence Data

Files from the core facility are downloaded locally to my computer; “NGS Drive”.


This drive is backuped in multiple places including


Primary Analysis


Documentation via IPython Notebook

I have been using for a few months and just started hosting on Github IPython is great for so many reasons.


GitHub is nice given the iterative nature of going back to a notebook, thus having version control?

Here is a recent entry acessible via

A screencast from early on

Secondary Analysis

Most of the “secondary” analysis (which I consider playing with large text files) I try to do within my IPython Notebook or SQLShare. I put BLAST in this category and this is accomplished locally on 16 core machine (hummingbird) for big jobs or 4 core machine (greenbird) for light jobs. See example above for how I try to go from IPython to SQLShare.

These files are housed on a Synology NAS. My folder is public here. This is where I write and read working files to and from.


Pretty Pictures

After the secondary analysis, pretty pictures need to be made. Some examples of how this might be done…


A few things I can’t do without (besides whats listed above)

General Data Workflows and Tools