---
title: "SISG Bayes: Key 7"
author: "Ken Rice"
date: "`r Sys.Date()`"
output: pdf_document
---

```{r setup, echo=FALSE, message=FALSE, warning=FALSE}
library(knitr)
opts_chunk$set(collapse=TRUE, fig.align='center', tidy=TRUE, tidy.opts=list(blank=TRUE, width.cutoff=40), warning=FALSE,message=FALSE,cache=TRUE)
```

# Q1 

*The site also contains an R script fitting a widely-used Bayesian model, in Stan or JAGS, using data from a blood pressure GWAS meta-analysis, combining linear regression estimates across 6 studies.  It assumes the sampling model for the observed $\hat{\beta}_i$ is $N(\beta_i, \sigma_i^2)$ with known $\sigma_i^2$. The prior for the $\beta_i$ is specified hierarchically, and (as the data have been suitably scaled) can be written as*
\begin{eqnarray*}
		\beta_i & \sim & N(\mu,\tau^2)\mbox{ for each }i,\\
		\mu &\sim & N(0, 1), \\
		\tau & \sim & U(0, T),
\end{eqnarray*}
*i.e. a flat distribution between 0 and a known, fixed value $T$.
For $T$=1/5,1 and 5, what is the posterior for $\beta_F=(\sum_{i=1}^6\beta_i/\sigma_i^2)/(\sum_{i=1}^61/\sigma_i^2)$? For the same values of $T$, what is the posterior for $\mu$, the mean of the random effects distribution?*

In the code below, we perform non-Bayesian meta-analysis, and set up the Stan version of the meta-analysis code supplied;

```{r}
study.names <- c("AGES","ARIC","CHS","FHS","RS","RES")
betahat <- c( -0.37, -0.23, -0.31, -0.79, -0.13, -0.96) 
stderr  <- c(  0.26,  0.16,  0.33,  0.16,  0.25,  0.39)
n       <- c(  3219,  8047,  3277,  8096,  4737,  1760)

#install.packages("metafor")
library("metafor")
# Non-Bayesian meta-analysis, default Fixed Effects and Random Effects versions:
rma1 <- rma.uni(yi=betahat, vi=stderr^2, method="FE")
rma2 <- rma.uni(yi=betahat, vi=stderr^2, method="DL")
forest(rma1)
forest(rma2)

# Setting up Bayes meta in Stan

cat(file="metaanalysis.stan", "
data {
  int p; //the number of studies
  vector[p] betahat; // study-specific estimates
  vector[p] sigma; // study-specific standard errors
  real T; // upper bound of uniform prior on tau
}
parameters {
  vector[p] beta; // study-specific parameters  
  real mu; // mean of the mixing distribution for beta[i]'s
  real tau; // SD of the mixing distribution
}
transformed parameters {
  vector[p] invsigma2;
  vector[p] betafpart;
  real betaf;
  for(i in 1:p){ invsigma2[i] = pow(sigma[i],-2);}
  for(i in 1:p){ betafpart[i] = beta[i]*invsigma2[i];}
  betaf = sum(betafpart[1:p])/sum(invsigma2[1:p]); // precision-weighted mean of study-specific effects
}
model {  
  for(i in 1:p){
	betahat[i] ~ normal(beta[i], sigma[i]);
	beta[i] ~ normal(mu, tau);
   }
   mu ~ normal(0,1);
   tau ~ uniform(0,T);
}
")
```

Now to actually do the MCMC, storing the results, for each value of $T$:

```{r}
library("rstan")
stan.meta1 <- stan(file = "metaanalysis.stan", 
data = list(betahat=betahat, sigma=stderr, p=6, T=1),
iter = 100000, chains = 1, pars=c("betaf", "mu"))
stan.meta0.2 <- stan(file = "metaanalysis.stan", 
data = list(betahat=betahat, sigma=stderr, p=6, T=0.2),
iter = 100000, chains = 1, pars=c("betaf", "mu"))
stan.meta5 <- stan(file = "metaanalysis.stan", 
data = list(betahat=betahat, sigma=stderr, p=6, T=5),
iter = 100000, chains = 1, pars=c("betaf", "mu"))

print(stan.meta1)
print(stan.meta0.2)
print(stan.meta5)
```

And writing some code to plot the credible intervals:

```{r}
do.one <- function(mmm){
	mypos <- 6:1 + c(-1,1,-1,1,-1,1)*0.2
	plot(0,0,xlim=c(-0.9,0), ylim=c(1, 6), axes=FALSE, xlab="value", ylab="")
	segments(mmm[,2], mypos, mmm[,3], mypos)
	points(y=mypos, x=mmm[,1], pch=19)
	axis(side=1)
	axis(side=2, las=1, at=mypos, c(
		expression(beta[F]), expression(mu),
		expression(beta[F]), expression(mu),
		expression(beta[F]), expression(mu)
))
	mtext(side=2, las=1, line=2, at=c(5.5, 3.5, 1.5), c("T=0.2","T=1","T=5"))
	}

#par(mar=c(4,6,0,0)+0.2)
do.one( rbind(
	summary(stan.meta0.2)$summary[1:2,c(6,4,8)],
	summary(stan.meta1)$summary[1:2,c(6,4,8)],
	summary(stan.meta5)$summary[1:2,c(6,4,8)])
)
```

Compare these to the default non-Bayesian meta-analysis' estimates of $\beta_F$ (fixed effects) and $\mu$ (random effects)

```{r}
round(unlist(rma1[2:7]),2)
round(unlist(rma2[2:7]),2)
```

We see that, with respect to the prior on $\tau$, inference on $\beta_F$ is *much* more stable than inference on $\mu$.

# Q2

*Using each of the same priors as in Q1, what is the marginal prior for any individual $\beta_i$? For the same values of $T$ what is the marginal prior for any $\beta_i-\beta_j$, i.e. the difference between two study parameters? Do any differences surprise you?*

*Hint: for Q2 use the Monte Carlo method in base R: take a large sample from the prior, and make a histogram with many bars.*

The code below, for a specified value of $T$, samples a million observations from the prior for $\beta_i$ and $\beta_j$ and makes the histograms suggested; of the distributions $\beta_i$ and also $\beta_i-\beta_j$.  

To show it can be done in this setting, we also give numeric integration code to do the same thing -- working out the marginal distribution of $\beta_i$ and $\beta_i-\beta_j$, i.e. averaging over the prior on $\mu, \tau$.

```{r}

par(mfrow=c(3,2))
par(mar=c(4,4,1,1))
B <- 1E6
	set.seed(4)
do.one <- function(T){
	mu  <- rnorm(B, 0, 1)
	tau <- runif(B, 0, T)
	beta1 <- rnorm(B,mean=mu,sd=tau)
	beta2 <- rnorm(B,mean=mu,sd=tau)
	v1 <- function(x){
	integrate(function(t){dnorm(x,0,sqrt(1+t^2))*dunif(t,0,T)},0,T)$value }
	v1 <- Vectorize(v1)
	hist(beta1, xlab=expression(beta[1]), main="", freq=FALSE, ylim=c(0,v1(0)),n=60)
#	curve(dnorm(x,0,sqrt(1+T^2)), min(beta1), max(beta1), add=TRUE, col=4)
	curve(v1, min(beta1), max(beta1), add=TRUE, col=2)
	legend("topleft", bty="n", lty=0, paste("T =",T)) 

	v2 <- function(x){
	integrate(function(t){dnorm(x,0,sqrt(2*t^2))*dunif(t,0,T)},0,T)$value }
	v2 <- Vectorize(v2)
	hist(beta1-beta2, xlab=expression(beta[1]-beta[2]), main="", freq=FALSE, ylim=c(0,v2(0.01)), n=60)
#	curve(dnorm(x,0,sqrt(2*T^2)), min(beta1-beta2), max(beta1-beta2), add=TRUE, col=4)
	curve(v2, min(beta1-beta2), max(beta1-beta2), add=TRUE, col=2)
	legend("topleft", bty="n", lty=0, paste("T =",T)) 
}
```

And running the code for the three values of $T$:

```{r}
do.one(1/5)
```

```{r}
do.one(1)
```

```{r}
do.one(5)
```

The marginal prior on each $\beta_i$ is very close to Normal for small $T$, and slightly heavier-tailed at larger $T$. The impact on the *difference* $\beta_i-\beta-j$ is more noticeable: the marginal prior is strongly peaked near zero, particularly for large $T$, with a non-smooth mode there. 

This may seem non-intuitive, but note that the prior on $\tau$ indicates quite strong support for all the $\beta_i$ being very close ($\tau\approx 0$) and also strong support for them all being far apart ($\tau \gg 0$) -- so it follows that the prior on $\beta_i-\beta_j$ will have something like a spike near zero, and a heavy tail on both sides.