---
title: "Key 2"
author: "Ken Rice"
date: "`r Sys.Date()`"
output:
  pdf_document: default
  html_document:
    df_print: paged
---

```{r setup, include=FALSE}
knitr::opts_chunk$set(echo = TRUE)
```

## Prob theory and Binomial Sampling  (Lecture 2)

## Q1
Suppose we observe data with N = 20, y = 20 and we assume a binomial likelihood with probability theta.

* What are the MLE and standard error of the MLE?

\[
MLE=\hat\theta=20/20=1 \mbox{ and estimated }se = 0
\]

* Plot the posterior distribution under a Beta(2,2) prior

```{r}
y <- n <- 20
a <- b <- 2
theta <- seq(0,1,.01)
plot(dbeta(theta,y+a,n-y+b)~theta,type="l", ylab="density", main="posterior under beta(2,2) prior")
```

* Simulate from the posterior and draw a histogram of the posterior samples.

```{r}
set.seed(4)
thsamp <- rbeta(10000,y+a,n-y+b)
hist(thsamp,xlim=c(0,1), main="sample from posterior, under beta(2,2) prior", freq=FALSE)
```

* What is the posterior median, and give a 90\% credible interval for $\theta$: evaluate these quantities in two ways, one via the `qbeta` function and the other via sampling.

```{r}
qbeta(p=c(0.05,.5,.95),y+a,n-y+b)  # exact answer
quantile(thsamp,p=c(0.05,.5,0.95)) # using posterior sample
```

* What is the MLE for the odds $\theta/(1 -\theta)$?

MLE for odds is undefined

* What is the posterior probability that the odds are greater than 100?

```{r}
oddssamp <- thsamp/(1-thsamp)
mean(oddssamp>100) # proportion of sample with value > 100
hist(oddssamp) # long right tail!
hist(log(oddssamp))
```

## Q2 

Redo the seroprevalence example with a Beta(1,9) prior on the prevalence.

```{r}
y <- 50; n <- 3300
delta <- 0.8; gamma <- 0.995
A <- delta + gamma - 1; B <- 1-gamma
MLE <- (y - n*B)/(n*A)
loglik <- function(n,y,prev,delta,gamma){
  A <- delta + gamma - 1; B <- 1-gamma
  p <- prev*A + B
  loglik <- y*log(p) + (n-y)*log(1-p)
  loglik
}
maxl <- loglik(n,y,MLE,delta,gamma)
nsim <- 1000
success <- 0
a <- 1
b <- 9

thvals <- seq(0,1,l=101)
plot( loglik(n,y,thvals,delta, gamma)~thvals, type="l")

post <- rep(NA, nsim)
set.seed(4)
while(success<nsim+1){
  U <- runif(1); theta <- rbeta(1,a,b)
  test <- loglik(n,y,theta,delta,gamma)
  if (log(U) < test - maxl) {
    success <- success + 1
    post[success] <- theta
  }
}
mean(post)
quantile(post,p=c(0.05,0.5,0.95))
hist(post, freq=FALSE)

# adding prior and (scaled) likelihood
hist(post, freq=FALSE)
curve(dbeta(x,a,b), col="blue", add=TRUE)
curve( exp(loglik(n,y,x,delta, gamma)+264), add=TRUE, col="red")

# adding prior and (scaled) likelihood, full range
hist(post, freq=FALSE, breaks=seq(0,1,l=101))
curve(dbeta(x,a,b), col="blue", add=TRUE)
curve( exp(loglik(n,y,x,delta, gamma)+264), add=TRUE, col="red")
```