Here is some software that I have either authored or contributed significantly to. Most of these projects are hosted under the user name "nhoffman" on GitHub - icons below link to individual repositories.
- Molecular microbiology and the microbiome
- Bioinformatics and reproducible research
- Tools and applications for the clinical laboratory
- Yak shaving
Molecular microbiology and the microbiome
Taxtastic and DeeNuRP are a collaboration with Erick Matsen and his group in Computational Biology at FHCRC, and were developed in parallel with Erick's fantastic pplacer, which adds aligned sequences to a ML phylogenetic tree. My interest in pplacer is mainly for performing sequence-based taxonomic assignment of microorganisms using a phylogenetic approach, but it offers much more than that, so check it out!
Taxtastic is used for assembling phylogenetic "reference packages" for use with pplacer, but more generally provides tools for representing, querying, and manipulating the NCBI taxonomy as a relational database. It is used in both research and clinical pipelines. I was the initial author and am the current maintainer; see the project repo for the full list of contributors.
A package for 16S rRNA gene sequence curation and phylogenetic
reference set creation. I've contributed most significantly to
deenurp filter-outliers, which predicts likely mis-annotation of
sequence records by identifying outliers based on sequence identity.
Yet another 16S rRNA database (ya16sdb)
One of my active areas of research is creating a curated and up to date set of bacterial 16S rRNA sequences. Christopher Rosenthal (crosenth), a longtime programmer in my research group has written ya16sdb, implementing a pipeline for downloading and curating 16S rRNA records from NCBI.
A simple utility for reducing NGS read mis-assignment based on index read match identity and quality score.
Bioinformatics and reproducible research
There has been quite a proliferation of build tools that are either
designed for or may be adapted to bioinformatics pipelines (eg,
airflow, and many, many
more), but I haven't
found one better suited to my needs than
SCons. Most of the pipelines that I have
designed for both research and clinical applications are built using
scons, with some additional functionality (at this point, mostly job
dispatch with slurm) provided by
This "simplest possible fasta parser" finds its way into most of my projects.
Although rmarkdown and Jupyter notebooks are much better known, emacs
org-mode is another nice option for
notebook-style reproducible research.
org-export is a command-line
utility for compiling
org-mode documents non-interactively, and has
the option of styling html documents with bootstrap.
Formats multiple alignments in plain text, pdf, and html formats, for example:
Tools and applications for the clinical laboratory
Most of the applications that I write for the clinical laboratory are for internal use. There are a few that I have been able to publish that may be of general interest.
Laboratory Test Guide
I am the primary author of our public Laboratory Test Guide. This application provides a searchable interface for clinical laboratory tests offered by the University of Washington Department of Laboratory Medicine.
Pending Log Monitor
I designed and wrote most of the code for a web application (known as the "Pending Log Monitor" or PLM) that displays the status of pending orders for lab tests. I described this application in a presentation at the 2017 Pathology Informatics Summit (annual meeting of the Association for Pathology Informatics). Here's the abstract for that presentation:
Many laboratory information systems (LIS) do not provide real-time notification of new orders, relying instead on batched, asynchronous display of information such as printed pending lists. To improve situational awareness of pending laboratory orders, we developed a web application (the "Pending Log Monitor") that displays data continually updated from our LIS on large wall-mounted monitors or PC workstations. Users may enter comments associated with individual items. A survey was administered to evaluate usage patterns. The application is implemented in Python 2.7 using the Flask web microframework, and is hosted on a virtual machine running Ubuntu 14.04. Data is extracted from the LIS database (Sunquest Information Systems, Tucson, AZ) using custom code written in Cache (InterSystems Corporation, Cambridge, MA), and is transferred to the application server by a batched process using secure shell. User-provided comments associated with pending tests are stored in an SQLite database. The application was designed for maintainability, ease of customization, stability, and rapid recovery in the result of a component failure. Logic for display and formatting of pending tests is implemented as Python functions. A simple JSON-format specification can accommodate any tabular data. Lists of pending tests defined for a given area typically correspond to one or more worksheets defined in the LIS. Customized displays of pending tests have been implemented for over 35 combinations of worksheets in multiple lab areas. Pending orders for each lab area are filtered, ordered, and color coded based on elapsed time since order or receipt, priority, specimen stability, or other criteria. Data is transferred from the LIS by a batched process every four minutes. This application has replaced the use of printed pending lists in many areas. The majority of survey respondents described the application as "very important" to lab operations, with many lab areas referring to the monitor "constantly." Use of comments varies widely between lab areas, but most respondents strongly agreed with the statement that comments improve communication. A simple web application implemented at low cost using open source technology has provided significant workflow and communication improvements throughout the laboratory.
Automated Chemistry Quality Control
Right around the time I started my faculty position, I implemented a system for QC review of our automated chemistry analyzers, consisting of some R scripts that emitted Levy-Jennings charts highlighting out of control standards. Here's an example:
QC checks were documented in a roundup bug tracker. This was the primary mechanism for monitoring and documenting quality control for 7 or 8 years, until it was replaced by a commercial product in 2016.
Automated QA for a clinical LC/MS urine opaites assay.
Published as Dickerson JA, Schmeling M, Hoofnagle AN, and Hoffman NG. Design and implementation of software for automated quality control and data analysis for a complex LC/MS/MS assay for urine opiates and metabolites. Clin Chim Acta. 2012 Nov 15. PubMed: 23159299
A plugin for using the MoinMoin wiki as a CMS for document control in the clinical laboratory. Our department has many hundreds of policies and procedures managed as MoinMoin wiki pages. Go open source!
Infrastructure for role-based user management in the clinical laboratory
Significant (and mainly hidden) administrative costs in any organization relate to processes and tools for user management and access to electronic resources. Compounding factors include:
- multiple domains
- users with roles spanning institutions
- applications with varying technical requirements for implementing single sign-on (SSO)
- regulated environments with specific policy requirements for user management
- high user turnover.
We have all of these! Because of the heterogeneity of our environment, no existing system or domain could serve as a single source of truth for users and their roles. To provide a single source of truth for our department and affiliates, I wrote an internal web application (Flask, Postgresql) for user management. Users are associated with attributes (role, location, departmental/divisional affiliations, etc) or assigned directly to groups (eg for access to a specific application). Groups are then synchronized to multiple domains so that they can be used as the basis for authorization for a wide variety of applications.
UW Groups API
Python bindings for the UW Groups web services API.
borborygmi: a blog built with emacs org-mode and pelican
I built this blog when I was
particularly into using
org-mode; it's a useful platform for
publishing notes and lectures.
I'm pretty happy with my emacs config, and have gotten a number of people started with emacs using this. It's written as an org-mode file that can be exported to html and published.
Why doesn't bash have decent command line argument parsing? Who knows? Let's use Python's argparse instead!