Introduction

This document demonstrates the implementation of the Empirical Bootstrap methods discussed in Section 2 of the lecture notes. We will focus on estimating the population median \(\theta\) and constructing various confidence intervals.

We will generate synthetic data from a skewed distribution (Chi-squared) to highlight the differences between the interval methods, particularly how they handle asymmetry.

1. Data Generation

We simulate \(n=50\) observations from a \(\chi^2_3\) distribution. The true median of a \(\chi^2_k\) distribution is approximately \(k(1 - \frac{2}{9k})^3\).

n <- 50
df <- 3
X <- rchisq(n, df = df)

# True population median
true_median <- qchisq(0.5, df = df)

# Our estimator: Sample Median
theta_hat <- median(X)

cat("Sample Size (n):", n, "\n")

## Sample Size (n): 50

cat("Sample Median:", round(theta_hat, 4), "\n")

## Sample Median: 1.7681

cat("True Population Median:", round(true_median, 4), "\n")

## True Population Median: 2.366

2. The Empirical Bootstrap Procedure

As described in Section 2, we generate \(B\) bootstrap samples by sampling with replacement from the original data \(X_1, \dots, X_n\). For each sample, we compute the bootstrap estimator \(\hat{\theta}^*_n\).

B <- 2000 # Number of bootstrap replicates

# Generate B bootstrap samples and compute medians
# This creates the distribution of \hat{\theta}^*
boot_theta <- replicate(B, {
  X_star <- sample(X, size = n, replace = TRUE)
  median(X_star)
})

# Visualizing the Bootstrap Distribution
hist(boot_theta, breaks = 30, main = "Bootstrap Distribution of Sample Median",
     xlab = expression(hat(theta)^"*"), freq = FALSE, col = "lightblue")
abline(v = theta_hat, col = "red", lwd = 2, lty = 2)
legend("topright", legend = c("Original Est."), col = c("red"), lty = 2, lwd = 2)

2.1 Variance and MSE Estimation

We can estimate the variance and Mean Squared Error (MSE) of our estimator using the bootstrap replicates.

# Bootstrap Mean
theta_bar_star <- mean(boot_theta)

# Bootstrap Variance
var_boot <- sum((boot_theta - theta_bar_star)^2) / (B - 1)

# Bootstrap MSE
# Note: This estimates MSE of \hat{\theta}_n against the *bootstrap* center, 
# acting as a proxy for the true MSE.
mse_boot <- mean((boot_theta - theta_hat)^2)

cat("Bootstrap Variance estimate:", round(var_boot, 5), "\n")

## Bootstrap Variance estimate: 0.22474

cat("Bootstrap MSE estimate:", round(mse_boot, 5), "\n")

## Bootstrap MSE estimate: 0.22836

3. Bootstrap Confidence Intervals

We will now construct the \(1-\alpha\) confidence intervals (\(\alpha = 0.05\)) using the four methods described in Section 2.1.

alpha <- 0.05

Method 1: The Percentile Interval

This uses the \(\alpha/2\) and \(1-\alpha/2\) percentiles of the bootstrap distribution directly. It is transformation invariant but assumes the bootstrap distribution maps directly to the sampling distribution.

# Quantiles of the bootstrap distribution
ci_percentile <- quantile(boot_theta, probs = c(alpha/2, 1 - alpha/2))

print(ci_percentile)

##     2.5%    97.5% 
## 1.228389 2.872790

Method 2: Symmetric Percentile Interval

This method enforces symmetry around the original estimate \(\hat{\theta}_n\). We compute the deviations \(D^{*(b)} = |\hat{\theta}^{*(b)}_n - \hat{\theta}_n|\) and find the percentile \(s_{1-\alpha/2}\).

# 1. Compute absolute deviations
D_star <- abs(boot_theta - theta_hat)

# 2. Find the (1 - alpha) quantile of these deviations
s_val <- quantile(D_star, probs = 1 - alpha) 

# Using the strict notation from lecture notes:
ci_symmetric <- c(theta_hat - s_val, theta_hat + s_val)

print(ci_symmetric)

##       95%       95% 
## 0.7497969 2.7863820

Method 3: Normal Interval with Bootstrap Variance

This relies on the asymptotic normality of the estimator. We use the standard normal critical value \(z_{1-\alpha/2}\) and the bootstrap estimate of the standard error.

z_score <- qnorm(1 - alpha/2)
se_boot <- sqrt(var_boot) # Derived from standard deviation of boot_theta

ci_normal <- c(theta_hat - z_score * se_boot, 
               theta_hat + z_score * se_boot)

print(ci_normal)

## [1] 0.8389374 2.6972415

Method 4: Studentized Confidence Interval (Bootstrap-t)

This method generally offers the best coverage accuracy (\(O(n^{-1})\)). However, it requires a standard error estimate \(\hat{\sigma}\) for each bootstrap sample.

Note: For the median, estimating the standard error \(\hat{\sigma}\) requires estimating the density \(f(\hat{\theta})\), which is computationally expensive (requires a double bootstrap or kernel density estimation).

To demonstrate the mechanics of the Studentized interval without the complexity of density estimation, we will demonstrate this method on the Sample Mean for the same dataset, where \(\hat{\sigma} = s/\sqrt{n}\) is easily calculated.

# --- Switching to Mean for Studentized Demonstration ---
mu_hat <- mean(X)
sigma_hat <- sd(X) / sqrt(n) # Standard error of the mean

# Generate Studentized Bootstrap Statistics (T*)
t_star_replicates <- replicate(B, {
  # 1. Resample
  X_star <- sample(X, size = n, replace = TRUE)
  
  # 2. Compute bootstrap estimators
  mu_star <- mean(X_star)
  sigma_star <- sd(X_star) / sqrt(n) # SE estimate for this specific bootstrap sample
  
  # 3. Compute t-statistic
  (mu_star - mu_hat) / sigma_star
})

# Determine quantiles of T*
# Note: We swap the quantiles in the subtraction formula
eta_lower <- quantile(t_star_replicates, probs = alpha/2)
eta_upper <- quantile(t_star_replicates, probs = 1 - alpha/2)

# Construct Interval
ci_studentized_mean <- c(mu_hat - sigma_hat * eta_upper, 
                         mu_hat - sigma_hat * eta_lower)

cat("Studentized Interval for the MEAN:", ci_studentized_mean, "\n")

## Studentized Interval for the MEAN: 2.05061 3.495127

4. Summary and Comparison (For the Median)

Visualizing the intervals for the Median computed in sections 3.1 - 3.3.

Discussion

The Percentile Method is asymmetric, respecting the skewness of the \(\chi^2\) distribution.
The Symmetric and Normal methods are forced to be symmetric around \(\hat{\theta}_n\). In skewed distributions, this often leads to poorer coverage or shifting the interval away from the mass of the true distribution.

Lecture 12: Implementing Bootstrap Confidence Intervals

Instructor: Yen-Chi Chen (codes generated by Gemini 3.0)

Autumn 2025