I have this bootstrap:
library(ggplot2)
n <- 30
set.seed(1)
orig_mean <- 1
orig_sd <- 2
X <- rnorm(n, mean = orig_mean, sd = orig_sd)
set.seed(NULL)
Bootstrapping
m_reps <- 20000
boot_means <- c()
boot_vars <- c()
for (i in 1:m_reps) {
current_sample <- sample(X, n, replace = T)
boot_means <- append(boot_means, mean(current_sample))
boot_vars <- append(boot_vars, var(current_sample))
}
boot_means <- as.data.frame(boot_means) #for plotting
boot_vars <- as.data.frame(boot_vars)
The central limit theorem tells us that those bootstrapped means $\bar{x}$ should be $\bar{X} \sim N\left(\mu, \frac{\sigma^2}{n} \right)$. We can see this with the plot of $\bar{X} \sim N\left(1, \frac{2^2}{30} \right)$ on top of the bootstrapped means' histogram:
ggplot(boot_means, aes(x = boot_means)) +
geom_histogram(
aes(y = after_stat(density)), bins = 20, color = "black", fill = "white") +
stat_function(fun = dnorm, args = list(
mean = orig_mean, sd = orig_sd/sqrt(n)))
The bell curve does not precisely line up with the histogram, but it is close.
I would like to do something like this with the variance. The variance histogram is:
ggplot(boot_vars, aes(x = boot_vars)) +
geom_histogram(
bins = 20, color = "black", fill = "white")
This is where I'm stuck. Is that a chi-squared distribution? And how do I set up a "target" curve like I did with the bootstrapped means?
What I'm really looking for is to figure out what the distribution is of bootstrapped variances. From there the coding should be fairly straightforward.