The Counterfactual Baseline

A Changing Baseline Modified by Vector Control


Given a time series describing the parasite rate in a population, we fit a model to a PR time series to understand malaria as a changing baseline that has been modified by vector control.


Suppose we start with a time series that we want to analyze. We have two bits of data: the imputed PfPR over some time period, and the timing of vector control. We start out with a model that has been fitted to data (see The History of Malaria). At this point, the model was fit without using any information about the history of vector control. What we want to do now is to construct a model that reconstructs this time series as a counterfactual baseline with that includes interventions with estimated effect sizes.

library(ramp.xds)
library(ramp.uganda)
fit_mod <- readRDS("fit_mod.rds")

Consider the following example. The green vertical lines are mass distributions of ITNs, the type listed above (PBO or Royal Guard); the blue vertical lines are IRS, the insecticide tagged above (Fludora Fusion or Actellic). We note that the first ITN distribution, with PBO nets called PBO 0 was about a year before our time series starts. The next two ITN distributions, called PBO 1 and PBO 2 occurred on days 776 and 2175, respectively. Another ITN distribution, with Royal Guard nets, RG occurred on day 3240. This district also got IRS on days 2861, 3256, and 3622: the first one, Flu Fu was done with Fludora Fusion; the next two, called Act 1 and Act 2 used Actellic. This is a picture of what happened: the data in black with a fitted model in red:

The analysis we present is largely descriptive. These are all the data we have to describe malaria in a population of around 200,000 people. Our goal is to do the best we can with the data we’ve got so that we can develop sensible policy advice. Constructive criticism is welcome, but we’ve got a round file for other kinds of comments.

A Counterfactual Baseline

To reconstruct the counterfactual baseline, we will go through the list of interventions one at a time, and we need to make a determination of the effect size of the intervention, contingent on what would have happened otherwise.

The core challenge is that we need to estimate two things from one time series, so there is some intrinsic uncertainty.

The time series that was observed could have happened in many ways. We thus need an algorithm capable of propagating that uncertainty. The algorithm works like this:

  • Build a model and fit it to the data.

  • For the first intervention, reconstruct history as a baseline, modified by control:

    • Impute a baseline

    • Contingent on the imputed baseline, estimate the effect size

  • For the next intervation, estimate the effect size, contingent on the previous

To show how this works, we will walk through the time series for the district above.

Our goal here is to walk through the analysis to draw attention to the issues.

PBO 1

The first ITN distribution, PBO 0 occurred 351 days before our first data point, so it is possible that what we’re seeing at the beginning is a result of that distribution. The second ITN distribution, the one we want to assess, occurred on day 776.

In this model, the trend is configured as a spline that modifies a signal with a mean and a seasonal pattern. The spline knots and nodes are:

rbind(fit_mod$Lpar[[1]]$trend_par$tt[2:5], 
round(1000*fit_mod$Lpar[[1]]$trend_par$yy[2:5])/1000)
        [,1]    [,2]    [,3]     [,4]
[1,] 149.000 514.000 879.000 1244.000
[2,]   0.829   0.916   0.889    0.942

That third node is the one we want to consider replacing. The mean value of the two that precede it are:

mean(fit_mod$Lpar[[1]]$trend_par$yy[c(2,3)])
[1] 0.8726043

The fitted value we would want to replace is:

fit_mod$Lpar[[1]]$trend_par$yy[4]
[1] 0.888744

Based on this, we conclude it is pointless to try and fit an effect size to an intervention model. The model fitting would surely return no effect. For this reconstruction, we thus conclude that PBO 1 did not have any effect, and we move on.

PBO 2

The next ITN distribution, called PBO 1 is on day 2,175.

Now, our task of reconstructing a baseline is more complicated. The fitted values from the first 6 years of data, and it looks like we’ve got a trend:

knots <- fit_mod$Lpar[[1]]$trend_par$yy[2:7]
tm <- c(1:6)
plot(tm, knots)
llm <- lm(knots~tm)
abline(llm)

Malaria prevalence has increased by about 25% over six years of observation. Using a linear model, the trend is already statistically significant.

summary(llm)

Call:
lm(formula = knots ~ tm)

Residuals:
        1         2         3         4         5         6 
-0.011234  0.038933 -0.025836 -0.009866 -0.002322  0.010325 

Coefficients:
            Estimate Std. Error t value Pr(>|t|)    
(Intercept) 0.802930   0.023361  34.370 4.28e-06 ***
tm          0.037217   0.005999   6.204  0.00343 ** 
---
Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1

Residual standard error: 0.02509 on 4 degrees of freedom
Multiple R-squared:  0.9059,    Adjusted R-squared:  0.8823 
F-statistic: 38.49 on 1 and 4 DF,  p-value: 0.003433

With only six years, it’s hard to have a lot of confidence about what we would expect to happen next.

In fact, the upward trend does not continue, but the inflection point coincides with PBO 2. What happened? Did the trend stop on its own, or did PBO 2 reverse the upwards trend?

At this point, after the nets go out, the upward trend stops. Was the trend real?

We don’t learn a ton more with hindsight, because what happens next is also modified by vector control: a little less than two years later, the district implements IRS with Fludora Fusion. a year after that, after mass distributing Royal Guard nets and IRS with Actellic, malaria prevalence falls to around 20% of its long-term average. Here, we contrast two baseline scenarios: the projected linear trend (red); and regression to the mean (blue). These are compared to the fitted model (black). The solid vertical line segment marks the day of the intervention, and the dashed vertical line is one year later.

projected <- .802930 + .0372*c(7:9)
fit_mod1 <- fit_mod
fit_mod1$Lpar[[1]]$trend_par$yy[8:10] <- projected
fit_mod1$Lpar[[1]]$F_trend <- 
      make_function(fit_mod1$Lpar[[1]]$trend_par)
fit_mod1 <- xds_solve(fit_mod1, 2550, 10)

fit_mod2 <- fit_mod
fit_mod2$Lpar[[1]]$trend_par$yy[8:10] <- 
  mean(fit_mod$Lpar[[1]]$trend_par$yy[2:7])
fit_mod2$Lpar[[1]]$F_trend <- 
  make_function(fit_mod2$Lpar[[1]]$trend_par)
fit_mod2 <- xds_solve(fit_mod2, 2550, 10)

xds_plot_PR(fit_mod1, clrs = "darkred")
xds_plot_PR(fit_mod2, clrs = "darkblue", add=TRUE)
xds_plot_PR(fit_mod, add=TRUE)
segments(2175, 0, 2175, 1)
segments(2540, 0, 2540, 1, lty = 2)

If we used the linear model value for the baseline, the PBO net distribution would need to explain a 20% reduction in PfPR (darkred). If we assume regression to the mean from the previous six years (darkblue), the reduction was about 6%; the observed value in the seventh year is close to the average value observed in the previous six.

fit_mod1$Lpar[[1]]$trend_par$yy[8]/
  fit_mod$Lpar[[1]]$trend_par$yy[8]
[1] 1.208367
mean(fit_mod$Lpar[[1]]$trend_par$yy[2:7])/
  fit_mod$Lpar[[1]]$trend_par$yy[8]
[1] 1.060474

We need to make a decision, now, about how to handle the effect size of PBO 2 on malaria prevalence. This history already assumes that PBO 1 had no effect, and a 6% change in PfPR is not outside the normal variability. A reasonable conclusion is thus that it had no effect.

Flu Fu

686 days after PBO 2, on day 2,861, the district got IRS with Fludora Fusion. Once again, we create a baseline by setting the values from the previous 7 years, we simulate forward one year and we get this picture (blue is the putative baseline).

fit_mod3 <- fit_mod
fit_mod3$Lpar[[1]]$trend_par$yy[9:10] <- 
  mean(fit_mod$Lpar[[1]]$trend_par$yy[2:8])
fit_mod3$Lpar[[1]]$F_trend <- 
  make_function(fit_mod3$Lpar[[1]]$trend_par)
fit_mod3 <- xds_solve(fit_mod3, 3230, 10)
xds_plot_PR(fit_mod3, clrs = "darkblue") 
xds_plot_PR(fit_mod, clrs = grey(0.9), add=TRUE) 
segments(2861, 0, 2861, 1)
segments(3226, 0, 3226, 1)

ACT 1 + RG

In the last step, In modeling the effect sizes of the interfNow, we have a different problem. We have a massive change in malaria following the implementation of two kinds of vector control, but we are not certain about which one did the work.

Act 2

While there are no data, we can make a forecast of the effects of the actellic IRS round, Act 2, that could help us to resolve some of our open questions raised by the concurrence of RG and Act 1 .

Reconstruction Algorithm

We want to write an algorithm to construct a fitted model that describes malaria as a changing baseline modified by control.

We start with a fitted model, but we construct two different models:

  • CB – a counterfatual baseline model. This is the counterfactual baseline, with no interventions;

  • CB+VC - the CB model plus a configured set of interventions with effect sizes. The CB+VC model should fit the data about as well as the original model.

The algorithm works like the process we described, but we should acknowledge two concerns:

  • There are usually several reasonable ways to reconstruct the baseline;

  • When two or more modes of vector control are implemented at the same time, we can not estimate the effect sizes for the individual interventions, so we estimate a joint effect size.

The algorithm works like this:

  1. Fit a model to the data, \({\cal M}_0\)

  2. For every intervention, construct an ordered set of models \({\cal M}_i\) start with:

    • Construct a Counterfactual Baseline

    • Estimate an Effect Size