Get p-value of coefficients in regression models using bootstrap

Question

I've been reading a lot these last few days about how to get a p-value from bootstrap for regression models (not by permutation). For each coefficient of the model, the null hypothesis is that the coefficient equals 0 and H1 is that it is different to 0 (bilateral test).

The most noticeable similar subjects are the following two questions on stackexchange, but the answers confuse me a lot:

From what I've read, I noticed several approaches, but I can't figure out which is the valid one:

Should my bootstrap function return the test statistic calculated for each sample, or the estimate?
Should I calculate the proportion of the test statistic/estimate above 0 or above the point estimate of the base model?
Should I multiply the result by 2 because the test is bilateral or use absolute values?

score 5 · Answer 1 · answered Nov 16 '19 at 17:52

Suppose we want to test the null hypothesis that a regression coefficient = 0 using bootstrap, and say we decide 0.05 to be the level of significance. Now, we can generate the sampling distribution for each coefficient using bootstrap. It is easy to check if 0 falls within 95% confidence interval, thus we can easily decide whether we can reject the null or not.

To get a p-value, we need to check what is the quantile value of 0 in the sampling distribution. (I am using a quantile based approach there are other methods to do it which can be found here Fox on Regression) After you get the quantile,Q, the p-value is 2*Q or 2*(1-Q) depending on whether Q > 0.5 or less than 0.5.

As an illustration of the approach, consider this

library(faraway)

Build linear model

mdl <- lm(divorce ~ ., data = divusa)

Bootstrap

bootTest <- sapply(1:1e4,function(x){
  rows <- sample(nrow(divusa),nrow(divusa),replace = T)
  mdl <- lm(divorce ~ ., data = divusa[rows,])
  return(mdl$coefficients)
})

Here is the model

summary(mdl)

Call:
lm(formula = divorce ~ ., data = divusa)

Residuals:
    Min      1Q  Median      3Q     Max 
-2.9087 -0.9212 -0.0935  0.7447  3.4689 

Coefficients:
             Estimate Std. Error t value Pr(>|t|)    
(Intercept) 380.14761   99.20371   3.832 0.000274 ***
year         -0.20312    0.05333  -3.809 0.000297 ***
unemployed   -0.04933    0.05378  -0.917 0.362171    
femlab        0.80793    0.11487   7.033 1.09e-09 ***
marriage      0.14977    0.02382   6.287 2.42e-08 ***
birth        -0.11695    0.01470  -7.957 2.19e-11 ***
military     -0.04276    0.01372  -3.117 0.002652 ** 
---
Signif. codes:  0 ‘***’ 0.001 ‘**’ 0.01 ‘*’ 0.05 ‘.’ 0.1 ‘ ’ 1

Residual standard error: 1.513 on 70 degrees of freedom
Multiple R-squared:  0.9344,    Adjusted R-squared:  0.9288 
F-statistic: 166.2 on 6 and 70 DF,  p-value: < 2.2e-16

Notice the p-values

Subset of regression coefficient generated using bootstrap

bootTest[,1:5]
                    [,1]          [,2]         [,3]         [,4]         [,5]
(Intercept) 335.70970574 372.525260160 569.85830341 338.70069977 344.69261238
year         -0.18107568  -0.201798080  -0.30579380  -0.18125215  -0.18328105
unemployed   -0.02916575   0.006828023   0.01197723  -0.05610887  -0.11463230
femlab        0.79078784   0.842924808   1.02607863   0.77527548   0.76472406
marriage      0.17372382   0.199571033   0.18782967   0.15289119   0.15693996
birth        -0.11613752  -0.118507758  -0.11998122  -0.11666450  -0.13344442
military     -0.04051730  -0.056277118  -0.04062756  -0.05167556  -0.07251748

Generate p-values with bootstrap

pvals <- sapply(1:nrow(bootTest),function(x) {
  distribution <- ecdf(bootTest[x,])
  qt0 <- distribution(0)
  if(qt0 < 0.5){
        return(2*qt0)
      } else {
        return(2*(1-qt0))
      }
})

Comparing p-values from t-test and bootstrap

T test

summary(mdl)$coefficients[,4]
 (Intercept)         year   unemployed       femlab     marriage        birth     military 
2.744830e-04 2.966776e-04 3.621708e-01 1.085196e-09 2.419284e-08 2.191964e-11 2.652003e-03

Bootstrap

> pvals
[1] 0.0008 0.0008 0.2196 0.0000 0.0000 0.0000 0.0188

The highly significant p-values with coefficients < 1e-8 are all 0 with bootstrap with 1e4 iterations. Furthermore, the ranking of p-values is comparable as well.

score 1 · Answer 2 · answered Mar 12 '23 at 19:15

Should my bootstrap function return the test statistic calculated for each sample, or the estimate?

We can bootstrap both the coefficient estimates and the test statistics but it would be better to bootstrap the $t$-statistics. If we take care how we calculate the $t$-statistic in each bootstrap sample, we increase the power of the test as discussed in Correct creation of the null distribution for bootstrapped -values.

Should I calculate the proportion of the test statistic/estimate above 0 or above the point estimate of the base model?

This is perhaps the most confusing step as the reasoning behind the p-value calculation is different depending on whether we bootstrap coefficient estimates or test statistics.

The bootstrap principle states that the bootstrap distribution of $\beta^*$ is close to the sampling distribution of $\hat{\beta}$, and that $\hat{\beta}$ itself is close to the true value $\beta$. This is helpful as we can construct confidence intervals for $\beta$. However, unless the true $\beta$ is indeed equal to 0, the bootstrap sample is not simulated under the null hypothesis $H_0:\beta = 0$. Instead we can "invert" a confidence interval to compute a p-value; for example, $\operatorname{Pr}\left\{\beta^* \geq 0\right\}$ is the p-value for the one-sided right-tail test.

The bootstrap principle also states that the distribution of $t^* = (\beta^* - \hat{\beta}) / \operatorname{se}(\beta^*)$ is close to the distribution of $t = (\hat{\beta} - \beta) / \operatorname{se}(\hat{\beta})$. This is even more helpful because the $t$-statistic is (approximately) pivotal. A pivot is a random variable whose distribution doesn't depend on the parameters. In this case, the distribution of the $t$-statistic doesn't depend on the true value of $\beta$. So while $\operatorname{E}\hat{\beta} = 0$ under the null and $\operatorname{E}\hat{\beta}\neq 0$ under the alternative, the $t$-statistic has the same distribution under the null and under the alternative. The p-value for the one-sided right-tail test is $\operatorname{Pr}\left\{t^* \geq \hat{t}\right\}$ where the $t^*$s are the bootstrapped test statistics and $\hat{t}$ is the observed test statistic.

Should I multiply the result by 2 because the test is bilateral or use absolute values?

To report a two-sided p-value, calculate both tail area probabilities and multiply the smaller one (corresponds to "more extreme" situations) by 2.

I use the same example as @risingStar: a linear regression for US divorce rate as a function of six predictors + an intercept. @risingStar shows how to bootstrap the coefficient estimates (+1); I show how to bootstrap the $t$-statistics. The p-values for all but the last predictor, military, are pretty much the same with both methods.

bootstrap.summary(beta.hats, t.stats, p)
#> # A tibble: 7 × 4
#>   Name        Estimate `t value` `Pr(>|t|)`
#>   <chr>          <dbl>     <dbl>      <dbl>
#> 1 (Intercept) 380.         3.83    0.000200
#> 2 year         -0.203     -3.81    0.000200
#> 3 unemployed   -0.0493    -0.917   0.292   
#> 4 femlab        0.808      7.03    0.000200
#> 5 marriage      0.150      6.29    0.000200
#> 6 birth        -0.117     -7.96    0.000200
#> 7 military     -0.0428    -3.12    0.00160

Aside: None of the p-values are exactly 0 because I use the bias-corrected formula for the p-values as described in After bootstrapping regression analysis, all p-values are multiple of 0.001996.

And finally I plot histograms of the bootstrap distributions of the coefficient estimate [left] and the test statistic [right] for military. These nicely illustrate the effect of "bootstrap pivoting".

R code to bootstrap p-values:

library("tidyverse")
data(divusa, package = "faraway")
model <- function(data) {
  lm(divorce ~ ., data = data)
}
simulator <- function(data) {
  rows <- sample(nrow(data), nrow(data), replace = TRUE)
  data[rows, ]
}
estimator <- function(data) {
  coefficients(model(data))
}
test <- function(data, b.test) {
  fit <- model(data)
  b <- coefficients(fit)
  var <- diag(vcov(fit))
  t <- (b - b.test) / sqrt(var)
  t
}
pvalue <- function(t.star, t.hat, alternative = c("two.sided", "less", "greater")) {
  alternative <- match.arg(alternative)
p.upper <- (sum(t.star >= t.hat) + 1) / (length(t.star) + 1)
  p.lower <- (sum(t.star <= t.hat) + 1) / (length(t.star) + 1)
if (alternative == "greater") {
    p.upper
  } else if (alternative == "less") {
    p.lower
  } else {
    # The two-tailed p-value is twice the smaller of the two one-tailed p-values.
    2 * min(p.upper, p.lower)
  }
}
bootstrap.summary <- function(b, t, p) {
  tibble(
    Name = names(b),
    Estimate = b,
    t value = t,
    Pr(&gt;|t|) = p
  )
}
set.seed(1234)
B <- 10000
These are the coefficient estimates, ${ \hat{\beta}_i }$ and the $t$ statistics, respectively.
We can also get those with the summary function.
beta.hat <- estimator(divusa)
beta.hat
t.stat <- test(divusa, 0) # Calculate (beta.hat - 0) / se(beta.hat)
t.stat
Bootstrap the coefficient estimates.
boot.estimate <- replicate(B, estimator(simulator(divusa)))
Bootstrap the t statistics.
boot.statistic <- replicate(B, test(simulator(divusa), beta.hat)) # Calculate (beta.star - beta.hat) / se(beta.star)
Bootstrapped p-values computed two ways:
p <- NULL
for (i in seq(beta.hat)) {
  p <- c(p, pvalue(boot.estimate[i, ], 0))
}
bootstrap.summary(beta.hat, t.stat, p)
p <- NULL
for (i in seq(t.stat)) {
  p <- c(p, pvalue(boot.statistic[i, ], t.stat[i]))
}
bootstrap.summary(beta.hat, t.stat, p)
The 7th coefficient is the estimate for x = military
i <- 7
pvalue(boot.estimate[i, ], 0)
pvalue(boot.statistic[i, ], t.stat[i])
par(mfrow = c(1, 2))
hist(boot.estimate[i, ],
  breaks = 50, freq = TRUE,
  xlab = NULL, ylab = NULL,
  main = paste0("Histogram of β* (x = ", names(beta.hat)[i], ")"),
  font.main = 1
)
hist(boot.statistic[i, ],
  breaks = 50, freq = TRUE,
  xlab = NULL, ylab = NULL,
  main = paste0("Histogram of t* (x = ", names(t.stat)[i], ")"),
  font.main = 1
)

Get p-value of coefficients in regression models using bootstrap

2 Answers2

Build linear model

Bootstrap

T test

Bootstrap

These are the coefficient estimates, ${ \hat{\beta}_i }$ and the $t$ statistics, respectively.

We can also get those with the `summary` function.

Bootstrap the coefficient estimates.

Bootstrap the t statistics.

Bootstrapped p-values computed two ways:

The 7th coefficient is the estimate for x = military

Linked