1

Assume there are $n$ subjects and for $1\leq i\leq n$, subject $i$ is measured at $t_{i1}<\dots<t_{im_i}$ time points. Consider a regression model $E[g(y_{ij})|u_i]=\beta\cdot x_{ij}+u_i$ for GLM where $u_i$ is random effect following normal distribution for subject $i$ measured at time $t_{ij}$.

If $g$ is identity link, then ignoring $u_i$ will not lead to bias estimation of $\beta$. This is not true in non-identity link case.

Suppose I want to obtain confidence interval of $\beta$'s by bootstrap, where bootstrap has nested structure in accordance with random effects $u_i$.

  1. I apply GLM without random effects $u_i$. Apply bootstrap to obtain confidence intervals of $\beta$'s.
  2. I apply GLM with random effects $u_i$. Apply bootstrap to obtain confidence intervals of $\beta$'s.

2's confidence interval always have nominal coverage under assumption of true model with random effects. However, it is not clear that 1's confidence interval will always have nominal coverage.

$Q1:$ In linear model case, I expect 1 has nominal coverage and its median is unbiased as well. When do I expect bootstrapped median unbiased? And when do I expect bootstrapped CI having nominal coverage?

$Q2:$ In case that 1 is biased, would 1's CI still have some close to nominal coverage?

user45765
  • 1,416
  • How can not applying random effects, when the true model has random effects, give you a correct confidence interval? – Sextus Empiricus May 15 '23 at 14:32
  • @SextusEmpiricus I would assume that I do not know the true model has random effect. I would be interested in whether bootstrap CI yield valid CI for model without random effect. For general link function, I do not expect this to be true. The bootstrap accounts the random effect by nesting. Thus the bootstrap of the model without random effect should reflect CI accounting random effects. I am not interested in obtaining CI from direct model's Fisher information matrix. I am interested whether general validity of bootstrap in misspecified covariance structure holds. – user45765 May 15 '23 at 14:39
  • 1
    is your question maybe inverted and is it like, 'given that the true model has no random effect, does fitting and bootstrapping while assuming random effects give the correct confidence intervals'? If you do not know the true model, then your Q1 is still not clear to me. If your true model has no random effects, then fitting and bootstrapping without random effects will give correct confidence intervals. But if you do not have random effects, then it won't. – Sextus Empiricus May 15 '23 at 14:48
  • Also your Q2 is not so clear. Bootstrapping is never giving the correct confidence interval and is an approximation method (it is the asymptotic behaviour for infinite sample size that equals the nominal level) – Sextus Empiricus May 15 '23 at 14:50
  • @SextusEmpiricus I am interested in true model with random effect and apply bootstrap against fitting model without random effects. However, from what you are suggesting, it seems that bootstrapping under model covariance misspecification will lead to incorrect CI. Is that correct? – user45765 May 15 '23 at 14:50
  • @SextusEmpiricus I would always assume very large bootstrap samples so that there is asymptotic approximation for close to nominal coverage if possible. – user45765 May 15 '23 at 14:51
  • So your question is 'if I apply a wrong model, do I get a correct confidence interval'? – Sextus Empiricus May 15 '23 at 15:00
  • @SextusEmpiricus Yes. Wrong model here means regression without random effects and the correct one has random effects. However, bootstrap will be applied in accordance with experimental design. – user45765 May 15 '23 at 15:01
  • "bootstrap will be applied in accordance with experimental design" I don't see how the model of the error distribution is something that can be in accordance with the experimental design. You state that your true model has this random intercept $u_i$, you can not take that away with your experimental design. (Actually an experimental design that samples random effects only one time, like here on the right image, takes away the influence of the random effect) – Sextus Empiricus May 15 '23 at 15:04
  • @SextusEmpiricus Suppose I have n subjects with measurements $n_i$ times. If $n_i$ is even, I apply bootstrap by consecutive block of size 2. If $n_i$ is odd, I sample a consecutive block of size 3 and consecutive blocks of size 2 for the remainder. That will keep correlation within blocks. You could choose other block sizes. The bootstrap procedure is dependent upon experimental design. By misspecification, I mean model misspecified not in accordance with design and this leads to covariance misspecification. – user45765 May 15 '23 at 15:12
  • I do note entirely follow why you are using blocks in your bootstrapping procedure. Could you explain that bootstrapping procedure for Q1, assuming no random effects, more in detail. It is not a resampling of residuals, but instead some blockwise resampling? – Sextus Empiricus May 15 '23 at 15:15
  • @SextusEmpiricus Sure. Suppose I have everyone measured 10 times. Within everyone, I sample 5 consecutive measurements, say at $t_3,t_4,t_5,t_6,t_7,t_8$. Then I need to sample 2 times to obtain 10 bootstrapped measurements for a subject. Repeat this for all subjects. That will yield a single bootstrapped sample. I am not conducting resampling of residuals here. – user45765 May 15 '23 at 15:17
  • So, it seems like the difference between 1 and 2 is that you apply a different fitting, but the bootstrapping is in both cases performed in the same way and assuming a nested error structure? – Sextus Empiricus May 15 '23 at 15:18
  • @SextusEmpiricus That is correct. – user45765 May 15 '23 at 15:19
  • I am still confused how you perform the bootstrapping. Are you resampling directly from the raw measurements? – Sextus Empiricus May 15 '23 at 15:22
  • @SextusEmpiricus I am sampling directly from raw measurements. – user45765 May 15 '23 at 15:23
  • re-sampling raw measurements might not be the best indicator of the sampling variation in the observed statistic. You get less information in your bootstrap samples then in your original data (and therefore overestimate the confidence interval size). In addition, only resampling the data within the individuals ignores the in-between individuals variation. – Sextus Empiricus May 15 '23 at 15:27
  • @SextusEmpiricus I see. I also need to conduct sampling in the subject level as well, as I have ignored subject level variations. But I think in this case, I would have underestimate CI width or have low coverage. I think when you conduct parametric bootstrap of residuals, you should almost always detect CI being incorrect for misspecified model. – user45765 May 15 '23 at 15:32
  • But I get the idea of your question now: "Can we characterise the sampling variation of a misspecified model by bootstrapping?". (And my last comment was more a critique of the bootstrapping method, but not so much of the the principle of the question) – Sextus Empiricus May 15 '23 at 15:33
  • @SextusEmpiricus Yes. Sorry for long winding discussion. Thanks for clarifications and clarifying my misunderstanding for ignoring subject level sampling during bootstrap. – user45765 May 15 '23 at 15:34
  • Note that parameters are defined relative to the model in which they are defined. This means that if your underlying model differs from the fitted one, you are estimating a different parameter (even if it appears in what looks like the "same place" and has the same name). This means that it isn't well defined what the question "Is the confidence interval correct" even means, as it is an interval for a parameter that doesn't exist in the other model. Of course you can ask what coverage probability it has for that other parameter, but that's a nonstandard question. – Christian Hennig May 15 '23 at 15:56
  • (Ctd.) I'm not saying it can't or shouldn't be answered, but it will require some effort at least. – Christian Hennig May 15 '23 at 15:56
  • @ChristianHennig I think in misspecified case, the misspecified model would be fitted to the one with smallest KL distance to the true one in Bayesian case. The coverage calculation usually conditions upon correct specification. However, I think in the linear regression case, it does not matter too much, as one parameter gets absorbed into variance parameter of misspecified model here. And that CI is exactly same CI as usual. It might get worse in non-linear misspecified case. – user45765 May 15 '23 at 16:10
  • @user45765 That's fair enough in principle, although I'm not sure whether the terms "correct" or "incorrect" would apply then because the CI is not meant to do this in the what you'd call "correct" way. If you want to check whether your belief that "it doesn't matter much" is correct, you can simulate artificial data and see how this plays out. – Christian Hennig May 15 '23 at 16:14

0 Answers0