Data Systems

Malaria Research & Surveillance Data for Uganda

Data systems have been developed to manage the data and digital objects needed to support Uganda’s National Malaria Control Division (NMCD), including describing malaria research and surveillance data, and data or digital objects describing Uganda’s health systems and political geography, human populations and demography, land and environment, mosquito populations, and malaria control.

Differences in the way data are processed (e.g. outliers and imputation) by different parties or over time can bog down discussions and frustrate the development of policy advice, especially of analyses are inconsistent because of differences in the way the data have been cleaned.

We have developed data systems to ensure that the data being used by various analysts is consistent, accurate and up-to-date. The systems have been developed to ensure that clean data used for analyses are current and readily available (access is controlled by the Department of Health Information, DHI), and that all the data and digital objects required for analysis and visualization are version controlled and archived so that analyses can be repeated over time or replicated at some later time.

The data systems have two parts:

The following is an overview of the data processing pipelines and the data assets, including the protocols and procedures that have been developed to ensure consistency across analyses. For a list of vignettes, click to expand any topic in the sidebar.

Health Facilities

Routine data from health facilities in Uganda are the primary source of information for managing malaria. we have developed stable data assets describing weekly malaria data (updated every week) and monthly data (updated every month) from every facility in Uganda. These data assets are SQL databases stored in the National Data Warehouse (NDWH) maintained by the Department of Health Information (DHI).

Routinely reported facility data is notoriously messy and and of variable quality. Data from facilities must be processed (checked for outliers and consistency and cleaned, and missing values imputed). The data assets are updated through a change-data-capture (CDC) system and the data then processed using extract-transform-load (ETL) algorithms (see the ETL vignette).

Access to the NDWH is controlled by DHI; the intent is to make the data available to anyone in the National Malaria Elimination Division (NMED) or any other entity in the Uganda Ministry of Health who is authorized to have access.

To analyze the data and visualize the outputs, facility data must be aggregated by sub-county, district, and region, which use the master facility list and various digital artefacts describing the geography of Uganda. We have also developed systems to maintain these digital objects to ensure they are up-to-date, accurate, and consistent.

Human Populations

As part of our activities, we need data describing human population distributions and demography.

Entomological Surveillance

We are in the process of developing data systems for entomological surveillance data.

Malaria Research Data

While we use malaria facility data, we recognize the limitations of those data. In particular, a key limitation of facility data is that it is a convenience sample. If we want to validate the data, it must be associated with malaria research metrics, such at the malaria parasite rate.

Environmental Data

We understand the propensity for malaria as a changing baseline that has been modified by control. In this paradigm, we seek to understand the relationship between malaria and the environment.