Software

Here is some software that I have either authored or contributed significantly to. Most of these projects are hosted under the user name "nhoffman" on GitHub - icons below link to individual repositories.

Molecular microbiology and the microbiome

Taxtastic and DeeNuRP are a collaboration with Erick Matsen and his group in Computational Biology at FHCRC, and were developed in parallel with Erick's fantastic pplacer, which adds aligned sequences to a ML phylogenetic tree. My interest in pplacer is mainly for performing sequence-based taxonomic assignment of microorganisms using a phylogenetic approach, but it offers much more than that, so check it out!

Taxtastic

Taxtastic is used for assembling phylogenetic "reference packages" for use with pplacer, but more generally provides tools for representing, querying, and manipulating the NCBI taxonomy as a relational database. It is used in both research and clinical pipelines. I was the initial author and am the current maintainer; see the project repo for the full list of contributors.

GitHub pypi Build Status

DeeNuRP

A package for 16S rRNA gene sequence curation and phylogenetic reference set creation. I've contributed most significantly to deenurp filter-outliers, which predicts likely mis-annotation of sequence records by identifying outliers based on sequence identity.

GitHub Build Status

Yet another 16S rRNA database (ya16sdb)

One of my active areas of research is creating a curated and up to date set of bacterial 16S rRNA sequences. Christopher Rosenthal (crosenth), a longtime programmer in my research group has written ya16sdb, implementing a pipeline for downloading and curating 16S rRNA records from NCBI.

GitHub

barcodecop

A simple utility for reducing NGS read mis-assignment based on index read match identity and quality score.

GitHub pypi Build Status

Bioinformatics and reproducible research

bioscons

There has been quite a proliferation of build tools that are either designed for or may be adapted to bioinformatics pipelines (eg, luigi, airflow, and many, many more), but I haven't found one better suited to my needs than SCons. Most of the pipelines that I have designed for both research and clinical applications are built using scons, with some additional functionality (at this point, mostly job dispatch with slurm) provided by bioscons.

GitHub pypi Build Status

fastalite

This "simplest possible fasta parser" finds its way into most of my projects.

GitHub pypi Build Status

org-export

Although rmarkdown and Jupyter notebooks are much better known, emacs org-mode is another nice option for notebook-style reproducible research. org-export is a command-line utility for compiling org-mode documents non-interactively, and has the option of styling html documents with bootstrap.

GitHub Build Status

alnvu

Formats multiple alignments in plain text, pdf, and html formats, for example:

GitHub pypi Build Status

Tools and applications for the clinical laboratory

Most of the applications that I write for the clinical laboratory are for internal use. There are a few that I have been able to publish that may be of general interest.

Laboratory Test Guide

I am the primary author of our public Laboratory Test Guide. This application provides a searchable interface for clinical laboratory tests offered by the University of Washington Department of Laboratory Medicine.

Pending Log Monitor

I designed and wrote most of the code for a web application (known as the "Pending Log Monitor" or PLM) that displays the status of pending orders for lab tests. I described this application in a presentation at the 2017 Pathology Informatics Summit (annual meeting of the Association for Pathology Informatics). Here's the abstract for that presentation:

Many laboratory information systems (LIS) do not provide real-time notification of new orders, relying instead on batched, asynchronous display of information such as printed pending lists. To improve situational awareness of pending laboratory orders, we developed a web application (the "Pending Log Monitor") that displays data continually updated from our LIS on large wall-mounted monitors or PC workstations. Users may enter comments associated with individual items. A survey was administered to evaluate usage patterns. The application is implemented in Python 2.7 using the Flask web microframework, and is hosted on a virtual machine running Ubuntu 14.04. Data is extracted from the LIS database (Sunquest Information Systems, Tucson, AZ) using custom code written in Cache (InterSystems Corporation, Cambridge, MA), and is transferred to the application server by a batched process using secure shell. User-provided comments associated with pending tests are stored in an SQLite database. The application was designed for maintainability, ease of customization, stability, and rapid recovery in the result of a component failure. Logic for display and formatting of pending tests is implemented as Python functions. A simple JSON-format specification can accommodate any tabular data. Lists of pending tests defined for a given area typically correspond to one or more worksheets defined in the LIS. Customized displays of pending tests have been implemented for over 35 combinations of worksheets in multiple lab areas. Pending orders for each lab area are filtered, ordered, and color coded based on elapsed time since order or receipt, priority, specimen stability, or other criteria. Data is transferred from the LIS by a batched process every four minutes. This application has replaced the use of printed pending lists in many areas. The majority of survey respondents described the application as "very important" to lab operations, with many lab areas referring to the monitor "constantly." Use of comments varies widely between lab areas, but most respondents strongly agreed with the statement that comments improve communication. A simple web application implemented at low cost using open source technology has provided significant workflow and communication improvements throughout the laboratory.

Automated Chemistry Quality Control

Right around the time I started my faculty position, I implemented a system for QC review of our automated chemistry analyzers, consisting of some R scripts that emitted Levy-Jennings charts highlighting out of control standards. Here's an example:

QC checks were documented in a roundup bug tracker. This was the primary mechanism for monitoring and documenting quality control for 7 or 8 years, until it was replaced by a commercial product in 2016.

Opiates

Automated QA for a clinical LC/MS urine opaites assay.

Published as Dickerson JA, Schmeling M, Hoofnagle AN, and Hoffman NG. Design and implementation of software for automated quality control and data analysis for a complex LC/MS/MS assay for urine opiates and metabolites. Clin Chim Acta. 2012 Nov 15. PubMed: 23159299

GitHub

moin-labmanual

A plugin for using the MoinMoin wiki as a CMS for document control in the clinical laboratory. Our department has many hundreds of policies and procedures managed as MoinMoin wiki pages. Go open source!

GitHub

Infrastructure for role-based user management in the clinical laboratory

Significant (and mainly hidden) administrative costs in any organization relate to processes and tools for user management and access to electronic resources. Compounding factors include:

We have all of these! Because of the heterogeneity of our environment, no existing system or domain could serve as a single source of truth for users and their roles. To provide a single source of truth for our department and affiliates, I wrote an internal web application (Flask, Postgresql) for user management. Users are associated with attributes (role, location, departmental/divisional affiliations, etc) or assigned directly to groups (eg for access to a specific application). Groups are then synchronized to multiple domains so that they can be used as the basis for authorization for a wide variety of applications.

UW Groups API

Python bindings for the UW Groups web services API.

GitHub

Yak shaving

borborygmi: a blog built with emacs org-mode and pelican

I built this blog when I was particularly into using org-mode; it's a useful platform for publishing notes and lectures.

GitHub

My .emacs.d

I'm pretty happy with my emacs config, and have gotten a number of people started with emacs using this. It's written as an org-mode file that can be exported to html and published.

GitHub

argparse-bash

Why doesn't bash have decent command line argument parsing? Who knows? Let's use Python's argparse instead!

GitHub Build Status

links