3

My model is here. I'm running a bootstrapping mixed-effects model with "case" type with lmersmapler. The thing is, now I'm using bootstrap_pvals()to obtain p-values and the results are not matching. Based on the 95% CIs, I should get all p < 0.05 except for the interaction, but the CIs and p-values are not matching. I need some help.

  • the model:
mod1 <- lmer(CONT_Y ~ YEAR * MY_GROUP + (1|PARTICIPANTS), data = data)

mod1_boot <- bootstrap(mod1, .f = fixef, type = "case", B = 1000, resample = c(TRUE, FALSE))

> confint(mod1_boot, type = "norm")

A tibble: 4 x 6

term estimate lower upper type level <chr> <dbl> <dbl> <dbl> <chr> <dbl> 1 (Intercept) 17.6 16.7 18.5 norm 0.95 ##DOESN'T CONTAIN 0 2 YEAR2 1.14 0.186 2.13 norm 0.95 ## DOESN'T CONTAIN 0 3 GROUPB 0.915 0.155 1.68 norm 0.95 ## DOESN'T CONTAIN 0 4 YEAR2:GROUPB -0.602 -1.80 0.577 norm 0.95 ## CONTAINS 0!!! >

  • obtaining p-values:

bootstrap_pvals(mod1, type = "case", B = B, resample = c(TRUE, F),
                         aux.dist = norm)

$coefficients

A tibble: 4 x 7

term Estimate Std. Error df t value Pr(&gt;|t|) p.value <chr> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl> 1 (Intercept) 17.6 0.393 79.7 44.8 2.67e-58 0.389 2 YEAR2 1.14 0.517 63.0 2.21 3.06e- 2 0.565 3 GROUPB 0.915 0.517 63.0 1.77 8.17e- 2 0.589 4 YEAR2:GROUPB -0.602 0.731 63.0 -0.824 4.13e- 1 0.568

$B [1] 1000

ps: conf.int doesn't work on this object

  • It doesn't make sense to me, the interaction should be ns and the other coefficient should be sig. Any ideas?

  • how can I get the 95 % CI for a bootstrap_pvals object ?

  • Edit 1: Trying to convert the factor variables into binary numerical ones:

num_data <- data %>% 
  mutate(YEAR_num = case_when( 
           YEAR == "A" ~ 0, YEAR == "B" ~ 1),
         GROUP_num = case_when(
           GROUP == "G1" ~ 0, GROUP == "G2" ~ 1))

check

> class(num_data$YEAR_num) [1] "numeric" > class(num_data$GROUP_num) [1] "numeric"

refit:

mod1 <- lmer(CONT_Y ~ YEAR_num * GROUP_num + (1|PARTICIPANTS), data = num_data, REML = FALSE)

bootstrap it:

mod_p <- bootstrap_pvals(mod1, type = "case", B = 1000, resample = c(TRUE, F), aux.dist = norm)

check:

mod_p

> mod_p $coefficients

A tibble: 4 x 7

term Estimate Std. Error df t value Pr(&gt;|t|) p.value <chr> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl> 1 (Intercept) 17.6 0.393 79.7 44.8 2.67e-58 0.395 2 YEAR_num 1.14 0.517 63.0 2.21 3.06e- 2 0.559 3 GROUP_num 0.915 0.517 63.0 1.77 8.17e- 2 0.582 4 YEAR_num:GROUP_num -0.602 0.731 63.0 -0.824 4.13e- 1 0.574

  • DATA:
data <- structure(list(PARTICIPANTS = c(1L, 1L, 1L, 1L, 2L, 2L, 2L, 2L, 
                                        3L, 3L, 3L, 3L, 4L, 4L, 4L, 4L, 5L, 5L, 5L, 5L, 6L, 6L, 6L, 6L, 
                                        7L, 7L, 7L, 7L, 8L, 8L, 8L, 8L, 9L, 9L, 9L, 9L, 10L, 10L, 10L, 
                                        10L, 11L, 11L, 11L, 11L, 12L, 12L, 12L, 12L, 13L, 13L, 13L, 13L, 
                                        14L, 14L, 14L, 14L, 15L, 15L, 15L, 15L, 16L, 16L, 16L, 16L, 17L, 
                                        17L, 17L, 17L, 18L, 18L, 18L, 18L, 19L, 19L, 19L, 19L, 20L, 20L, 
                                        20L, 20L, 21L, 21L, 21L, 21L), CONT_Y = c(19.44, 20.07, 19.21, 
                                                                                  16.35, 11.37, 12.82, 19.42, 18.94, 19.59, 20.01, 19.7, 17.92, 
                                                                                  18.78, 19.21, 19.27, 18.46, 19.52, 20.02, 16.19, 19.97, 13.83, 
                                                                                  15.93, 14.79, 21.55, 18.8, 19.42, 19.27, 19.37, 17.14, 14.45, 
                                                                                  17.63, 20.01, 20.28, 17.93, 19.36, 20.15, 16.06, 17.04, 19.16, 
                                                                                  20.1, 16.44, 18.39, 18.01, 19.05, 18.04, 19.69, 19.61, 16.88, 
                                                                                  19.02, 20.42, 18.27, 18.43, 18.08, 17.1, 19.98, 19.43, 19.71, 
                                                                                  19.93, 20.11, 18.41, 20.31, 20.1, 20.38, 20.29, 13.6, 18.92, 
                                                                                  19.05, 19.13, 17.75, 19.15, 20.19, 18.3, 19.43, 19.8, 19.83, 
                                                                                  19.53, 16.14, 21.14, 17.37, 18.73, 16.51, 17.51, 17.06, 19.42
                                        ), CATEGORIES = structure(c(1L, 1L, 2L, 2L, 1L, 1L, 2L, 2L, 1L, 
                                                                    1L, 2L, 2L, 1L, 1L, 2L, 2L, 1L, 1L, 2L, 2L, 1L, 1L, 2L, 2L, 1L, 
                                                                    1L, 2L, 2L, 1L, 1L, 2L, 2L, 1L, 1L, 2L, 2L, 1L, 1L, 2L, 2L, 1L, 
                                                                    1L, 2L, 2L, 1L, 1L, 2L, 2L, 1L, 1L, 2L, 2L, 1L, 1L, 2L, 2L, 1L, 
                                                                    1L, 2L, 2L, 1L, 1L, 2L, 2L, 1L, 1L, 2L, 2L, 1L, 1L, 2L, 2L, 1L, 
                                                                    1L, 2L, 2L, 1L, 1L, 2L, 2L, 1L, 1L, 2L, 2L), .Label = c("A", 
                                                                                                                            "B"), class = "factor"), MY_GROUP = structure(c(1L, 2L, 1L, 2L, 
                                                                                                                                                                            1L, 2L, 1L, 2L, 1L, 2L, 1L, 2L, 1L, 2L, 1L, 2L, 1L, 2L, 1L, 2L, 
                                                                                                                                                                            1L, 2L, 1L, 2L, 1L, 2L, 1L, 2L, 1L, 2L, 1L, 2L, 1L, 2L, 1L, 2L, 
                                                                                                                                                                            1L, 2L, 1L, 2L, 1L, 2L, 1L, 2L, 1L, 2L, 1L, 2L, 1L, 2L, 1L, 2L, 
                                                                                                                                                                            1L, 2L, 1L, 2L, 1L, 2L, 1L, 2L, 1L, 2L, 1L, 2L, 1L, 2L, 1L, 2L, 
                                                                                                                                                                            1L, 2L, 1L, 2L, 1L, 2L, 1L, 2L, 1L, 2L, 1L, 2L, 1L, 2L, 1L, 2L
                                                                                                                            ), .Label = c("G1", "G2"), class = "factor")), row.names = c(NA, 
                                                                                                                                                                                         -84L), class = c("tbl_df", "tbl", "data.frame"))

rename collumn:

data <- data %>% rename(., YEAR = CATEGORIES)

  • Why are there two columns p-values in the lower table? Please improve the formating of the tables. The value Pr(>|t|) is $0.413$ for the interaction so quite large and not significant at the 5% level. – COOLSerdash Jan 28 '23 at 20:29
  • @COOLSerdash , I honestly don't know. I've never used this bootstrap_pvals() before, I've read the doc, but it didn't quite answer that. However, it was the only function that gave me pvalues for my lmeresampler model – Larissa Cury Jan 28 '23 at 20:33
  • Ok, the documentation clearly states that the last column are bootstrap p-values. But as I've said: Both the "traditional" p-value ($0.413$) as well as the bootstrap p-value ($0.568$) are consistent with the confidence interval. – COOLSerdash Jan 28 '23 at 20:35
  • @COOLSerdash , still, tho, then YEAR2, GROUPB and the intercept should be significant. The last column says that they aren't, tho the CI shows otherwise. Any ideas? – Larissa Cury Jan 28 '23 at 20:37
  • 1
    I've had a look at the source code of bootstrap_pvals and especially bootstrap_pvals.merMod. I think it's a bug stemming from how the code updates the model. Try converting the group and year variables to binary numeric variables and refit the following model: lmer(CONT_Y ~ year_num + group_num + year_num:group_num + (1|PARTICIPANTS), data = data). Then run the bootstrap again. Now the p-values will be consistent (up to random error). I also strongly suggest submitting a bug report here: https://github.com/aloy/lmeresampler/issues – COOLSerdash Jan 28 '23 at 21:08
  • @COOLSerdash , thank you for the effort, I really appreciate it, I really really need to solve this! You mean changing it into zeros and ones, for example? Essentially, will it still make sense, since the variables are categorical ? (year and group of exam) ? – Larissa Cury Jan 28 '23 at 21:18
  • Yes, change YEAR into 0/1 (A = 0, B = 1) and the same with MY_GROUP (G1 = 0, G2 = 1). Just compare the models to see that they're identical. But please report the bug to improve the package! This will help you and others in the future. – COOLSerdash Jan 28 '23 at 21:22
  • Thank you, @COOLSerdash , I tried that, but it didn't work (see the edit above, please). Yes, if course, If I can help others, I'll do that. – Larissa Cury Jan 28 '23 at 21:37
  • ps: in addition to that, do you know why confi.int won't work for that? > confint(mod_p, type = "norm") Error in UseMethod("vcov") : no applicable method for 'vcov' applied to an object of class "coef_tbl . Even if I get the right p-values, I'm not getting the CIs and I need to report both – Larissa Cury Jan 28 '23 at 21:38
  • I'll report the bug. The confidence intervals are calculated with the other function. Report those. – COOLSerdash Jan 28 '23 at 21:39

1 Answers1

3

I suspect that this question is written, in part, from an expectation that there is a one-to-one correspondence between confidence intervals and p-values. While some confidence intervals can be derived by inverting a hypothesis test (and then the one-to-one correspondence is exact), this is not the case for all confidence intervals.

With the (nonparametric) bootstrap we get resamples from the bootstrap distribution $\hat{F}$, so that we can calculate the statistic $T$ on each resample: $$ \begin{aligned} \hat{F} \longrightarrow \mathbf{x}^* \longrightarrow t^* \end{aligned} $$ Notice that no hypothesis, either null or alternative, makes an appearance.

On the other hand, a p-value is defined as a long-run frequency under the null hypothesis. Let's say that the null hypothesis is $T = 0$ and the p-value is two-sided. Then $$ \begin{aligned} p = \operatorname{Pr}\left\{|T| \geq |t| ~\big|~ H_0\right\} \end{aligned} $$

The "under null hypothesis" part is crucial to the meaning of p-values. So to bootstrap p-values, we would need to generate sample from the null hypothesis sampling distribution: $$ \begin{aligned} p = \operatorname{Pr}\left\{|T| \geq |t| ~\big|~ \hat{F}_0\right\} \end{aligned} $$ And this is challenging; we'll need to make assumptions in order to sample from $\hat{F}_0$.

An obvious approach is to assume that the observed data are generated by the linear mixed model we have in mind: $\mathcal{F} = \left\{f_{\boldsymbol{\beta},\mathbf{u},\sigma^2}\right\}$ where $\boldsymbol{\beta}$ are the fixed effects, $\mathbf{u}$ are the random effects and $\sigma^2$ is the error variance. And we use the parametric rather than the nonparametric bootstrap.

$$ \begin{aligned} f_{\hat{\boldsymbol{\beta}},\hat{\mathbf{u}},\hat{\sigma}^2} \longrightarrow \mathbf{x}^* \longrightarrow t^* \end{aligned} $$

Now that we have an explicit model for the data, we can write down the model under the null hypothesis that one specific coefficient, say $\beta_k$, is equal to 0. We can fit this restricted model to the data and then use it to generate resamples $\left\{\mathbf{x}^* | H_0\right\}$. Finally, we fit the full model to those resamples to simulate the distribution of $T_k = \beta_k/\operatorname{se}(\beta_k)$ under the null hypothesis that $\beta_k = 0$.

$$ \begin{aligned} p = \operatorname{Pr}\left\{|T_k| \geq |t_k| ~\big|~ f_{\hat{\boldsymbol{\beta}}_{-k},\beta_k=0,\hat{\mathbf{u}},\hat{\sigma}^2}\right\} \end{aligned} $$

From looking at the code, this is the theory behind the implementation of lmeresampler::bootstrap_pvals.

However, there is an issue with this approach when applied to a model with interactions or polynomial terms. A model that has no main effect for either Year or Group but includes the Year×Group interaction doesn't make sense. Similarly, a model that drops the linear term but leaves in the higher order polynomials (of the same continuous predictor) won't make sense. As a result, the p-values computed from those weird bootstrapping resamples might not be not interpretable either.

As a demonstration, I fit the model Y = Group + Year + (1 | ID) and then use your code to compute confidence intervals and p-values; note that now the CIs and the p-values agree because it's reasonable to propose a model that has only a main term for Year or a main term for Group.

mod0 <- lmer(
  Y ~ GROUP + YEAR + (1 | PARTICIPANT),
  data = data1,
  REML = FALSE
)

mod0_boot <- bootstrap( mod0, B = 2000, type = "case", resample = c(TRUE, FALSE), .f = fixef ) confint(mod0_boot, type = "norm") #> # A tibble: 3 × 6 #> term estimate lower upper type level #> 1 (Intercept) 17.8 16.9 18.7 norm 0.95 #> 2 GROUP2 0.614 -0.0146 1.22 norm 0.95 #> 3 YEARB 0.843 -0.0552 1.74 norm 0.95

mod0_pvals <- bootstrap_pvals( mod0, B = 2000, type = "case", resample = c(TRUE, FALSE), aux.dist = norm ) mod0_pvals #> term Estimate Std. Error t value p.value #> 1 (Intercept) 17.8 0.349 50.9 0.000500 #> 2 GROUP2 0.614 0.368 1.67 0.0625
#> 3 YEARB 0.843 0.368 2.29 0.0600

I take two lessons from this exercise.

Lesson 1: A regression which keeps an interaction but drops a main effect is not meaningful.

When you bootstrap the p-value of interaction from the model Y ~ Year * Group, you get a meaningful p-value only for the interaction because the model which "zeros out" the interaction but leaves in the intercept and the two main effects is a meaningful model. You don't get sensible p-values for the interaction and the main effects because zeroing out these coefficients while keeping the interaction doesn't give a meaningful model.

Lesson 2: Estimation is (often) more meaningful than hypothesis testing.

It seems to me that bootstrapping p-values from a linear mixed model is a strange thing to do as it requires making more assumptions that computing (nonparametric) bootstrap confidence intervals. Most importantly, we have to decide how to we generate bootstrap resamples under $H_0$. So I suggest to stick with estimation (this is what you get with the bootstrap confidence intervals) than hypothesis testing (this is what you get with the bootstrap p-values).

dipetkov
  • 9,805