TRACE Data Assets
Data Assets for Iterative Analytics
The data assets for adaptive malaria control are designed for repeated analysis. They all share common features that spell out TRACE:
Timely - the data assets are generated on schedule by an ETL pipeline. Updated data assets are generated on schedule, shortly after DHIS2 gets updated
Reproducible - as data from the DHIS2 source get modified, old values are stored so the database is effectively version-controlled; it is possible to recreate a data asset as it looked at any point in the past
Accessible - the data are extracted from DHIS2 and stored in an access-controlled location, making the data asset available for approved use without having to replicate the cumbersome process of starting from DHIS2
Cleaned - algorithms are developed that identify outliers, run consistency checks, and impute values for missing, inconsistent or outliered data points; the raw and cleaned data assets are both available
Evolvable - the ETL algorithms are version-controlled and available from github; the current operating version of the ETL is critically reviewed and new features are developed and tested in a development version. The operating version is occasionally updated