Data Processing Pipelines
Data to manage malaria in Uganda comes from two main sources: malaria surveillance and malaria research. To facilitate iterative analysis, have established data processing pipelines to create stable, up-to-date data assets: they are Timely, Reproducible, Accessible, Cleaned, and Evolvable (TRACE).
Data assets are uploaded to a data warehouse maintained by the Department of Health Information, and DHI controls access. Each data asset has a well-developed processing pipeline and a plan to maintain the data asset and update the algorithms. The data processing pipeline has three phases: extract, transform, and load (ETL). Each ETL pipeline has an operating version and a development version stored on github,
and maintained with version control.
Version Control
Each ETL pipeline has its own repository on github with a defined operating version. Through version control, github supports ongoing development of ETL pipelines. New features in the dev
pipeline must be thoroughly tested before modifying the operating version.