Should my bootstrap function return the test statistic calculated for each sample, or the estimate?
We can bootstrap both the coefficient estimates and the test statistics but it would be better to bootstrap the $t$-statistics. If we take care how we calculate the $t$-statistic in each bootstrap sample, we increase the power of the test as discussed in Correct creation of the null distribution for bootstrapped -values.
Should I calculate the proportion of the test statistic/estimate above 0 or above the point estimate of the base model?
This is perhaps the most confusing step as the reasoning behind the p-value calculation is different depending on whether we bootstrap coefficient estimates or test statistics.
The bootstrap principle states that the bootstrap distribution of $\beta^*$ is close to the sampling distribution of $\hat{\beta}$, and that $\hat{\beta}$ itself is close to the true value $\beta$. This is helpful as we can construct confidence intervals for $\beta$. However, unless the true $\beta$ is indeed equal to 0, the bootstrap sample is not simulated under the null hypothesis $H_0:\beta = 0$. Instead we can "invert" a confidence interval to compute a p-value; for example, $\operatorname{Pr}\left\{\beta^* \geq 0\right\}$ is the p-value for the one-sided right-tail test.
The bootstrap principle also states that the distribution of $t^* = (\beta^* - \hat{\beta}) / \operatorname{se}(\beta^*)$ is close to the distribution of $t = (\hat{\beta} - \beta) / \operatorname{se}(\hat{\beta})$. This is even more helpful because the $t$-statistic is (approximately) pivotal. A pivot is a random variable whose distribution doesn't depend on the parameters. In this case, the distribution of the $t$-statistic doesn't depend on the true value of $\beta$. So while $\operatorname{E}\hat{\beta} = 0$ under the null and $\operatorname{E}\hat{\beta}\neq 0$ under the alternative, the $t$-statistic has the same distribution under the null and under the alternative. The p-value for the one-sided right-tail test is $\operatorname{Pr}\left\{t^* \geq \hat{t}\right\}$ where the $t^*$s are the bootstrapped test statistics and $\hat{t}$ is the observed test statistic.
Should I multiply the result by 2 because the test is bilateral or use absolute values?
To report a two-sided p-value, calculate both tail area probabilities and multiply the smaller one (corresponds to "more extreme" situations) by 2.
I use the same example as @risingStar: a linear regression for US divorce rate as a function of six predictors + an intercept. @risingStar shows how to bootstrap the coefficient estimates (+1); I show how to bootstrap the $t$-statistics. The p-values for all but the last predictor, military, are pretty much the same with both methods.
bootstrap.summary(beta.hats, t.stats, p)
#> # A tibble: 7 × 4
#> Name Estimate `t value` `Pr(>|t|)`
#> <chr> <dbl> <dbl> <dbl>
#> 1 (Intercept) 380. 3.83 0.000200
#> 2 year -0.203 -3.81 0.000200
#> 3 unemployed -0.0493 -0.917 0.292
#> 4 femlab 0.808 7.03 0.000200
#> 5 marriage 0.150 6.29 0.000200
#> 6 birth -0.117 -7.96 0.000200
#> 7 military -0.0428 -3.12 0.00160
Aside: None of the p-values are exactly 0 because I use the bias-corrected formula for the p-values as described in After bootstrapping regression analysis, all p-values are multiple of 0.001996.
And finally I plot histograms of the bootstrap distributions of the coefficient estimate [left] and the test statistic [right] for military. These nicely illustrate the effect of "bootstrap pivoting".

R code to bootstrap p-values:
library("tidyverse")
data(divusa, package = "faraway")
model <- function(data) {
lm(divorce ~ ., data = data)
}
simulator <- function(data) {
rows <- sample(nrow(data), nrow(data), replace = TRUE)
data[rows, ]
}
estimator <- function(data) {
coefficients(model(data))
}
test <- function(data, b.test) {
fit <- model(data)
b <- coefficients(fit)
var <- diag(vcov(fit))
t <- (b - b.test) / sqrt(var)
t
}
pvalue <- function(t.star, t.hat, alternative = c("two.sided", "less", "greater")) {
alternative <- match.arg(alternative)
p.upper <- (sum(t.star >= t.hat) + 1) / (length(t.star) + 1)
p.lower <- (sum(t.star <= t.hat) + 1) / (length(t.star) + 1)
if (alternative == "greater") {
p.upper
} else if (alternative == "less") {
p.lower
} else {
# The two-tailed p-value is twice the smaller of the two one-tailed p-values.
2 * min(p.upper, p.lower)
}
}
bootstrap.summary <- function(b, t, p) {
tibble(
Name = names(b),
Estimate = b,
t value = t,
Pr(>|t|) = p
)
}
set.seed(1234)
B <- 10000
These are the coefficient estimates, ${ \hat{\beta}_i }$ and the $t$ statistics, respectively.
We can also get those with the summary function.
beta.hat <- estimator(divusa)
beta.hat
t.stat <- test(divusa, 0) # Calculate (beta.hat - 0) / se(beta.hat)
t.stat
Bootstrap the coefficient estimates.
boot.estimate <- replicate(B, estimator(simulator(divusa)))
Bootstrap the t statistics.
boot.statistic <- replicate(B, test(simulator(divusa), beta.hat)) # Calculate (beta.star - beta.hat) / se(beta.star)
Bootstrapped p-values computed two ways:
p <- NULL
for (i in seq(beta.hat)) {
p <- c(p, pvalue(boot.estimate[i, ], 0))
}
bootstrap.summary(beta.hat, t.stat, p)
p <- NULL
for (i in seq(t.stat)) {
p <- c(p, pvalue(boot.statistic[i, ], t.stat[i]))
}
bootstrap.summary(beta.hat, t.stat, p)
The 7th coefficient is the estimate for x = military
i <- 7
pvalue(boot.estimate[i, ], 0)
pvalue(boot.statistic[i, ], t.stat[i])
par(mfrow = c(1, 2))
hist(boot.estimate[i, ],
breaks = 50, freq = TRUE,
xlab = NULL, ylab = NULL,
main = paste0("Histogram of β* (x = ", names(beta.hat)[i], ")"),
font.main = 1
)
hist(boot.statistic[i, ],
breaks = 50, freq = TRUE,
xlab = NULL, ylab = NULL,
main = paste0("Histogram of t* (x = ", names(t.stat)[i], ")"),
font.main = 1
)