Data Systems
Malaria Research & Surveillance Data for Uganda
Data systems have been developed to manage the data and digital objects needed to support Uganda’s National Malaria Control Division (NMCD), including describing malaria research and surveillance data, and data or digital objects describing Uganda’s health systems and political geography, human populations and demography, land and environment, mosquito populations, and malaria control.
Differences in the way data are processed (e.g. outliers and imputation) by different parties or over time can bog down discussions and frustrate the development of policy advice, especially of analyses are inconsistent because of differences in the way the data have been cleaned.
We have developed data systems to ensure that the data being used by various analysts is consistent, accurate and up-to-date. The systems have been developed to ensure that clean data used for analyses are current and readily available (access is controlled by the Department of Health Information, DHI), and that all the data and digital objects required for analysis and visualization are version controlled and archived so that analyses can be repeated over time or replicated at some later time.
The data systems have two parts:
Data Assets include SQL databases, the master facility list, GIS shape files, rasters, or other data tables or digital objects needed for malaria analytics;
Data Processing Pipelines are designed to ensure that the data assets are clean and up to date, and that the algorithms that generate and update the data assets are transparent and well-documented.
The following is an overview of the data processing pipelines and the data assets, including the protocols and procedures that have been developed to ensure consistency across analyses. For a list of vignettes, click to expand any topic in the sidebar.
Health Facilities
Routine data from health facilities in Uganda are the primary source of information for managing malaria. we have developed stable data assets describing weekly malaria data (updated every week) and monthly data (updated every month) from every facility in Uganda. These data assets are SQL databases stored in the National Data Warehouse (NDWH) maintained by the Department of Health Information (DHI).
Routinely reported facility data is notoriously messy and and of variable quality. Data from facilities must be processed (checked for outliers and consistency and cleaned, and missing values imputed). The data assets are updated through a change-data-capture (CDC) system and the data then processed using extract-transform-load (ETL) algorithms (see the ETL vignette).
Access to the NDWH is controlled by DHI; the intent is to make the data available to anyone in the National Malaria Elimination Division (NMED) or any other entity in the Uganda Ministry of Health who is authorized to have access.
To analyze the data and visualize the outputs, facility data must be aggregated by sub-county, district, and region, which use the master facility list and various digital artefacts describing the geography of Uganda. We have also developed systems to maintain these digital objects to ensure they are up-to-date, accurate, and consistent.
Human Populations
As part of our activities, we need data describing human population distributions and demography.
Entomological Surveillance
We are in the process of developing data systems for entomological surveillance data.
Malaria Research Data
While we use malaria facility data, we recognize the limitations of those data. In particular, a key limitation of facility data is that it is a convenience sample. If we want to validate the data, it must be associated with malaria research metrics, such at the malaria parasite rate.
Environmental Data
We understand the propensity for malaria as a changing baseline that has been modified by control. In this paradigm, we seek to understand the relationship between malaria and the environment.