Counterfactual Baseline

A Changing Baseline Modified by Vector Control

Given a time series describing the parasite rate in a population, we fit a model to a PR time series to understand malaria as a changing baseline that has been modified by vector control.

Suppose we start with a time series that we want to analyze. We have two bits of data: the imputed PfPR over some time period, and the timing of vector control. We start out with a model that has been fitted to data (see The History of Malaria). At this point, the model was fit without using any information about the history of vector control. What we want to do now is to construct a model that reconstructs this time series as a counterfactual baseline with that includes interventions with estimated effect sizes.

library(ramp.xds)
library(ramp.uganda)
library(ramp.control)
library(ramp.work)
fit_mod <- readRDS("fit_mod.rds")

#devtools::load_all("~/git/ramp.control")
#devtools::load_all("~/git/ramp.work")

Consider the following example. The green vertical lines are mass distributions of ITNs, the type listed above (PBO or Royal Guard); the blue vertical lines are IRS, the insecticide tagged above (Fludora Fusion or Actellic). We note that the first ITN distribution, with PBO nets called PBO 0 was about a year before our time series starts. The next two ITN distributions, called PBO 1 and PBO 2 occurred on days 776 and 2175, respectively. Another ITN distribution, with Royal Guard nets, RG occurred on day 3240. This district also got IRS on days 2861, 3256, and 3622: the first one, Flu Fu was done with Fludora Fusion; the next two, called Act 1 and Act 2 used Actellic. This is a picture of what happened: the data in black with a fitted model in red:

The analysis we present is largely descriptive. These are all the data we have to describe malaria in a population of around 200,000 people. Our goal is to do the best we can with the data we’ve got so that we can develop sensible policy advice. Constructive criticism is welcome, but we’ve got a round file for other kinds of comments.

A Counterfactual Baseline

To reconstruct the counterfactual baseline, we will go through the list of interventions one at a time, and we need to make a determination of the effect size of the intervention, contingent on what would have happened otherwise.

The core challenge is that we need to estimate two things from one time series, so there is some intrinsic uncertainty.

The time series that was observed could have happened in many ways. We thus need an algorithm capable of propagating that uncertainty. The algorithm works like this:

Build a model and fit it to the data.
For the first intervention, reconstruct history as a baseline, modified by control:
- Impute a baseline
- Contingent on the imputed baseline, estimate the effect size
For the next intervation, estimate the effect size, contingent on the previous

To show how this works, we will walk through the time series for the district above.

Our goal here is to walk through the analysis to draw attention to the issues.

PBO 1

The first ITN distribution, PBO 0 occurred 351 days before our first data point, so it is possible that what we’re seeing at the beginning is a result of that distribution. The second ITN distribution, the one we want to assess, occurred on day 776.

In this model, the trend is configured as a spline that modifies a signal with a mean and a seasonal pattern. The spline knots and nodes are:

rbind(fit_mod$Lpar[[1]]$trend_par$tt[2:5], 
round(1000*fit_mod$Lpar[[1]]$trend_par$yy[2:5])/1000)

        [,1]    [,2]    [,3]     [,4]
[1,] 149.000 514.000 879.000 1244.000
[2,]   0.829   0.916   0.889    0.942

That third node is the one we want to consider replacing. The mean value of the two that precede it are:

mean(fit_mod$Lpar[[1]]$trend_par$yy[c(2,3)])

[1] 0.8726043

The fitted value we would want to replace is:

fit_mod$Lpar[[1]]$trend_par$yy[4]

[1] 0.888744

Based on this, we conclude it is pointless to try and fit an effect size to an intervention model. The model fitting would surely return no effect. For this reconstruction, we thus conclude that PBO 1 did not have any effect, and we move on.

PBO 2

The next ITN distribution, called PBO 1 is on day 2,175.

Now, our task of reconstructing a baseline is more complicated. The fitted values from the first 6 years of data, and it looks like we’ve got a trend:

knots <- fit_mod$Lpar[[1]]$trend_par$yy[2:7]
tm <- c(1:6)
plot(tm, knots)
llm <- lm(knots~tm)
abline(llm)

Malaria prevalence has increased by about 25% over six years of observation. Using a linear model, the trend is already statistically significant.

summary(llm)


Call:
lm(formula = knots ~ tm)

Residuals:
        1         2         3         4         5         6 
-0.011234  0.038933 -0.025836 -0.009866 -0.002322  0.010325 

Coefficients:
            Estimate Std. Error t value Pr(>|t|)    
(Intercept) 0.802930   0.023361  34.370 4.28e-06 ***
tm          0.037217   0.005999   6.204  0.00343 ** 
---
Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1

Residual standard error: 0.02509 on 4 degrees of freedom
Multiple R-squared:  0.9059,    Adjusted R-squared:  0.8823 
F-statistic: 38.49 on 1 and 4 DF,  p-value: 0.003433

With only six years, it’s hard to have a lot of confidence about what we would expect to happen next.

In fact, the upward trend does not continue, but the inflection point coincides with PBO 2. What happened? Did the trend stop on its own, or did PBO 2 reverse the upwards trend?

At this point, after the nets go out, the upward trend stops. Was the trend real?

We don’t learn a ton more with hindsight, because what happens next is also modified by vector control: a little less than two years later, the district implements IRS with Fludora Fusion. a year after that, after mass distributing Royal Guard nets and IRS with Actellic, malaria prevalence falls to around 20% of its long-term average. Here, we contrast two baseline scenarios: the projected linear trend (red); and regression to the mean (blue). These are compared to the fitted model (black). The solid vertical line segment marks the day of the intervention, and the dashed vertical line is one year later.

impute_baseline_linear = function(ix, model){
  yy <- head(model$Lpar[[1]]$trend_par$yy, ix-1)
  tt <- head(model$Lpar[[1]]$trend_par$tt, ix-1)
  yyv <- yy[which(tt>0)]
  llm <- lm(yyv ~ c(1:length(yyv)))
  c(yyv, tail(yyv, 1) + llm$coef[2]) 
}
plot(impute_baseline_linear(8, fit_mod))

projected <- .802930 + .0372*c(7:9)
projected

[1] 1.06333 1.10053 1.13773

fit_mod1 <- fit_mod
fit_mod1$Lpar[[1]]$trend_par$yy[8:10] <- projected
fit_mod1$Lpar[[1]]$F_trend <- 
      make_function(fit_mod1$Lpar[[1]]$trend_par)
fit_mod1 <- xds_solve(fit_mod1, 2550, 10)

fit_mod2 <- fit_mod
fit_mod2$Lpar[[1]]$trend_par$yy[8:10] <- 
  mean(fit_mod$Lpar[[1]]$trend_par$yy[2:7])
fit_mod2$Lpar[[1]]$F_trend <- 
  make_function(fit_mod2$Lpar[[1]]$trend_par)
fit_mod2 <- xds_solve(fit_mod2, 2550, 10)

xds_plot_PR(fit_mod1, clrs = "darkred")
xds_plot_PR(fit_mod2, clrs = "darkblue", add=TRUE)
xds_plot_PR(fit_mod, add=TRUE)
segments(2175, 0, 2175, 1)
segments(2540, 0, 2540, 1, lty = 2)

If we used the linear model value for the baseline, the PBO net distribution would need to explain a 20% reduction in PfPR (darkred). If we assume regression to the mean from the previous six years (darkblue), the reduction was about 6%; the observed value in the seventh year is close to the average value observed in the previous six.

fit_mod1$Lpar[[1]]$trend_par$yy[8]/
  fit_mod$Lpar[[1]]$trend_par$yy[8]

[1] 1.208367

mean(fit_mod$Lpar[[1]]$trend_par$yy[2:7])/
  fit_mod$Lpar[[1]]$trend_par$yy[8]

[1] 1.060474

We need to make a decision, now, about how to handle the effect size of PBO 2 on malaria prevalence. This history already assumes that PBO 1 had no effect, and a 6% change in PfPR is not outside the normal variability. A reasonable conclusion is thus that it had no effect.

Flu Fu

686 days after PBO 2, on day 2,861, the district got IRS with Fludora Fusion. Once again, we create a baseline by setting the values from the previous 7 years, we simulate forward one year and we get this picture (blue is the putative baseline).

fit_mod3 <- fit_mod
fit_mod3$Lpar[[1]]$trend_par$yy[9:10] <- 
  mean(fit_mod$Lpar[[1]]$trend_par$yy[2:8])
fit_mod3$Lpar[[1]]$F_trend <- 
  make_function(fit_mod3$Lpar[[1]]$trend_par)
fit_mod3 <- xds_solve(fit_mod3, 3230, 10)
xds_plot_PR(fit_mod3, clrs = "darkblue") 
xds_plot_PR(fit_mod, clrs = grey(0.9), add=TRUE) 
segments(2861, 0, 2861, 1)
segments(3226, 0, 3226, 1)

ACT 1 + RG

In the last step, In modeling the effect sizes of the interfNow, we have a different problem. We have a massive change in malaria following the implementation of two kinds of vector control, but we are not certain about which one did the work.

Act 2

While there are no data, we can make a forecast of the effects of the actellic IRS round, Act 2, that could help us to resolve some of our open questions raised by the concurrence of RG and Act 1. With these data, at least, we will not be able to evaluate Act 2. Maybe next year.

Algorithm

We want to write an algorithm that implements, in some reasonable ways, all the steps described above:

Pseudocode

INPUTS

We pass a PfPR time series;
We pass a list of the vector control interventions:
- IRS rounds by type, timing, and coverage;
- Bed Net rounds by type, timing, and coverage;

ALGORITHM

We set up a base model and fit it to the time series, a history of malaria;
We reconstruct the history of malaria as a changing baseline modified by control;

OUTPUTS

We return two models:

A counterfatual baseline model. This is the counterfactual baseline – what would have happened with no interventions;
A full model, including the counterfactual baseline plus a configured set of interventions with effect sizes. The full model should fit the data about as well as the history model.

The algorithm works like this:

Fit a model to the data: the history model.
Set up a model with multiple rounds of control, setting coverage to 0.
For every intervention, in order modify the history model:
- Replace the knot that follows the mass distribution
- Set coverage to the observed value
- Fit the contact parameter
Return the two models:
- the baseline, modified by control model, as fitted in 3
- the baseline with no interventions.

Flexibility and Constraints

The algorithm works like the process we described, but we should acknowledge two concerns:

There are usually several reasonable ways to reconstruct the baseline. The replacement knot value is:
- the mean of the previous \(n\) knots
- the median of the previous \(n\) knots
- a linear projection of the previous \(n\) knots
- subsampled from the previous \(n\) knots
- drawn from a distribution fitted to the previous \(n\) knots
- done in some other reasonable way.
When two or more modes of vector control are implemented at the same time, we can not estimate the effect sizes for the individual interventions, so we estimate a joint effect size.

Example

We have already fitted a model, so now we set up:

The model fitting was described in [History]

fit_mod <- readRDS("fit_mod.rds")

fit_mod$Lpar[[1]]$trend_par

$tt
 [1] -216  149  514  879 1244 1609 1974 2339 2704 3069 3434 3799

$yy
 [1] 0.8289127 0.8289127 0.9162959 0.8887440 0.9419302 0.9866906 1.0365552
 [8] 0.8799728 0.9380405 0.7780241 0.1847500 1.7420531

attr(,"class")
[1] "spline2"

We do only two rounds of IRS. The last one is too late to be evaluated.

get_irs_jdates(dname)

[1] 2861 3256 3622

make_irs_rounds(dname)

[[1]]
[1] "Terego District"

$t_init
[1] 2861 3256

$coverage
[1] 0 0

$type
[1] "fludora_fusion" "actellic"

Jinja District never had any IRS rounds, so make_irs_rounds returns a null object.

make_irs_rounds("Jinja District")

$dname
[1] "Jinja District"

$t_init
NULL

$coverage
NULL

$type
NULL

Apac District had some IRS rounds, but all of them happened before our evaluation period, so make_irs_rounds returns a null object.

make_irs_rounds("Apac District")

$dname
[1] "Apac District"

$t_init
NULL

$coverage
NULL

$type
NULL

irs_rounds <- make_irs_rounds(dname)
irs_rounds

[[1]]
[1] "Terego District"

$t_init
[1] 2861 3256

$coverage
[1] 0 0

$type
[1] "fludora_fusion" "actellic"

work_mod <- setup_irs(fit_mod, coverage_name="multiround", coverage_opts=irs_rounds, effect_sizes_name = "simple")

get_itn_jdates(dname)

[1] 3240 -200  776 2175

make_bednet_rounds(dname) -> itn_rounds
itn_rounds

[[1]]
[1] "Terego District"

$t_init
[1] 3240  776 2175

$coverage
[1] 0 0 0

$type
[1] "pbo" "pbo" "pbo"

work_mod <- setup_bednets(work_mod, coverage_name = "multiround", coverage_opts = itn_rounds, effect_sizes_name = "lemenach")

work_mod <- make_vc_event_list(work_mod)
work_mod$vc_events$summary

  time ix    type conflict
1  776  4 bednets    FALSE
2 2175  8 bednets    FALSE
3 2861 10     irs    FALSE
4 3240 11 bednets     TRUE
5 3256 11     irs     TRUE

recon <- reconstruct_baseline(work_mod, prts$pfpr, prts$jdate, irs_rounds, itn_rounds, "mean")
saveRDS(recon, "recon.RDS")

recon <- readRDS("recon.rds")

baseline <- setup_irs(recon, coverage_name="multiround", coverage_opts=irs_rounds, effect_sizes_name = "simple")
baseline <- setup_bednets(baseline, coverage_name = "multiround", coverage_opts = itn_rounds, effect_sizes_name = "lemenach")
baseline <- xds_solve(baseline, times = c(-365, 0, prts$jdate))

recon <- xds_solve(recon, times = c(-365, 0, prts$jdate))
profile_pr(recon, prts)

xds_plot_PR(recon)
xds_plot_PR(baseline, clrs="darkblue", add=TRUE)