General Data Workflows and Tools

Posted in tumblelog

This has been reposted from our lab tumblelog

Sequence Data

Files from the core facility are downloaded locally to my computer; “NGS Drive”.

NGS_Raw_Data_-_837.8_GB_17A0A4B5.png

This drive is backuped in multiple places including

a)


Primary Analysis

SimpleMind_Free_17A0A687.png


Documentation via IPython Notebook

I have been using for a few months and just started hosting on Github https://github.com/sr320/ipython_nb. IPython is great for so many reasons.

demo

GitHub is nice given the iterative nature of going back to a notebook, thus having version control?

Here is a recent entry acessible via http://nbviewer.ipython.org/

http://nbviewer.ipython.org/urls/raw.github.com/sr320/ipython_nb/master/BlackAb_Annot.ipynb

A screencast from early on


Secondary Analysis

Most of the “secondary” analysis (which I consider playing with large text files) I try to do within my IPython Notebook or SQLShare. I put BLAST in this category and this is accomplished locally on 16 core machine (hummingbird) for big jobs or 4 core machine (greenbird) for light jobs. See example above for how I try to go from IPython to SQLShare.

These files are housed on a Synology NAS. My folder is public here. This is where I write and read working files to and from.

the_eagle_17A0B185.png


Pretty Pictures

After the secondary analysis, pretty pictures need to be made. Some examples of how this might be done…


Essentials

A few things I can’t do without (besides whats listed above)


via the Lab Tumblr: http://genefish.tumblr.com/post/56440034816