Permutation test for model comparison?

Question

I have two nested, non-linear models for the same data, and I want to test whether the more complex model explains significantly more variance. Due to a necessary smoothing step, my data aren't independent, so I can't use standard tests for this (e.g. an F-test or likelihood-ratio test). I naturally thought of using a permutation test because I can permute the data before the smoothing step. This way I would introduce the same dependencies in the permuted data that also exist in the observed data, s.t. the simulated null distribution is fair. However, I can't quite come up with the right way to do this.

I can think of a way to test whether either model explains significant variance. In this case, my algorithm would be:

Fit the model to the smoothed observed data, and calculate $R^2$
For 10,000 iterations, repeat steps 3-5:
Randomly permute the observed data (i.e. randomly shuffle predictor and response values s.t. their true relationship is destroyed)
Apply smoothing to this permuted data to introduce the same dependencies that the observed $R^2$-value suffers from
Fit the model to the smoothed permuted data and calculate $R^2$
Compare the observed $R^2$-value to the thus-constructed null distribution of $R^2$-values for this data & model

The point being that in this case it's easy to construct the null distribution, because the null hypothesis holds that there is no relationship between the predictors in the model and the outcome variable, and it is obvious how we can permute the data to emulate this. (Formally, I guess the null hypothesis is that the values of the predictors are exchangeable w.r.t. the values of the outcome variable.)

However, what I want to do is estimate a null distribution for the increase in $R^2$ from one model to the next. The null hypothesis here is that the added parameter in the more complex model is meaningless, i.e. that the two models are exchangeable. However, it seems to me that this is an exchangeablility hypothesis on the models, rather than on (some aspect of) the data, so I just don't see what I would permute in a permutation test in order to simulate this. Can anyone help me out? Am I missing something, or is this just not possible and perhaps I can only do a bootstrap here?

Update: in the original version of this question, I failed to specify that my models were non-linear. That is, they don't have the form $Y=X\beta+\epsilon$ (where $X$ is a matrix of predictors, $\beta$ is a vector of linear coefficients and $\epsilon$ is random noise). Instead, they have the more general form $Y=f(X; \theta)+\epsilon$, where $\theta$ is a vector of model parameters. The simpler model can be obtained by fixing the value of one of the parameters of the more complex/general model.

This question a a duplicate:
https://stats.stackexchange.com/questions/213895/how-to-check-permutation-testing-exchangeability-assumption-when-using-a-general

Can't flag b/c of the bounty. — eric_kernfeld, Aug 23 '17 at 15:37
I don't think it's a duplicate because my model isn't linear (see also my comment on your answer), but it's my bad for not specifying that in the question. — Ruben van Bergen, Aug 23 '17 at 18:14
Can you give any more details on your models? I could try to edit for generality. — eric_kernfeld, Aug 23 '17 at 19:34
@eric_kernfeld: I fear it might be too elaborate to describe on SE, but I'll give it some thought. But I think I may actually have found a solution so once I'm sure of that I'll probably update my question (or try an answer myself) anyway. — Ruben van Bergen, Aug 24 '17 at 09:34
Hey, can you please post the solution that you came up with as an answer here? Thanks — NULL, Oct 21 '17 at 18:07

eric_kernfeld · Answer 1 · 2022-05-16T16:19:54.323

4

The typical permutation test, as you state, has a null hypothesis saying that the observations are exchangeable. If you have nested models $H_A: Y_i = \beta X_i + \epsilon_i$ and $H_0: Y_i = \epsilon_i$, with $\epsilon_i$ iid and zero-mean, then the exchangeability is a consequence of the form of the smaller model.

If you have $H_A: Y_i = \beta X_i + \gamma Z_i + \epsilon_i$ and $H_0: Y_i = \beta X_i + \epsilon_i$, then the observations $Y_i$ are not exchangeable even under the small model. Instead, you could permute the null-model residuals $Y_i - \hat \beta_0 X_i $, where $\hat \beta_0$ is an estimate of $\beta$ under the null hypothesis $\gamma=0$. If $\hat \beta_0$ is the usual least-squares estimate $(X^TX)^{-1}X^TY$, then under the null model, you're permuting $(I - X(X^TX)^{-1}X^T)\epsilon$, whose entries are not really exchangeable. For example, if $X$ is $[1, 0, 0, 0]$, the first residual is deterministically zero and the rest aren't. I recommend the bootstrap strategy from glen_b's answer, linked in comment above.

Edit: what to do with nonlinear models

As pointed out in comments, you can fit linear model to the residuals of another linear model and achieve the same predictions as you would via simultaneous estimation. This makes it obvious how to proceed (if you have nested linear models and you don't care about the lack of exchangeability of the null-model residuals): fit the null model, permute the residuals, then fit the remaining variables.

You can also do this with a nonlinear model:

Fit the smaller model to data $y$, producing predictions $\hat y_0$ and predicted residuals $r_0 = y - \hat y_0$.
For i in 1 to 10,000:
- Fit the full model to $\pi_i(r_0) + \hat y_0$ and record the measure of fit ($\pi_i$ is a random permutation)
Fit the full model to the actual data, record the measure of fit, and compare it to the null distribution as generated in step 2.

To implement this, you only need each model to generate predictions. You don't need them to be linear. But again, don't do this, because those residuals are not actually exchangeable, even under the null model.

edited May 16 '22 at 16:19

answered Aug 23 '17 at 15:52

eric_kernfeld

5,209

Unfortunately my model is not linear, which I guess I should have mentioned in the question (will update to clarify). Otherwise your suggested approach would definitely have been the right one. – Ruben van Bergen Aug 23 '17 at 18:13
@RubenvanBergen Could you explain how a non-linear model makes a difference for these tests? (I don't quite grasp this right of the bat) – IWS Aug 24 '17 at 08:35
@IWS: It makes a difference because it means that the different parameters don't make linearly separable contributions to the variance explained. With linear models, you can take the residuals from a simpler model and fit an extra predictor to those residuals, and that's equivalent to fitting the full model at once. This is what allows you to use the approach described by erik_kernfeld above. In a non-linear model, the parameters interact in such a way that you cannot separate their contributions, and so you can't just fit an extra parameter to the residuals from a reduced model fit. – Ruben van Bergen Aug 24 '17 at 09:32
I see what you mean. I have extended my answer to nonlinear models. – eric_kernfeld Aug 24 '17 at 13:49
@eric_kernfeld: Ah I see, sorry I missed that point on the non-exchangeability of the residuals before but that's very good to know. And barring that issue, your suggested approach for non-linear models is the sort of thing I was trying to come up with, but as per your warning, I won't be using that. I actually came up with a different way of doing it which I'll post as another answer - I'd be interested to hear your thoughts about that! – Ruben van Bergen Aug 25 '17 at 11:17
Great! I'll keep an eye out. – eric_kernfeld Aug 25 '17 at 12:24
@eric_kernfeld: See here for the solution I came up with for my particular case (which I later realized was a little more specific and therefore open up to other approaches than I formulated it here). Would be interested to hear your thoughts on it! – Ruben van Bergen Apr 26 '19 at 01:52

Permutation test for model comparison?

1 Answers1

Edit: what to do with nonlinear models

Linked