$R^2$ model comparison in test data

Question

I want to compare two models of the form:

model of interest: y ~ measure_of_interest1 + measure_of_interest2 + confound1+ confound2
control model: y ~ confound1+ confound2

I will collect a test data set for the model comparison.

I am thinking of using the fitted models from the training set and then compute $R^2$ for the test data. But then I'm not certain how to compare the two models. I was wondering whether I should compute the difference in the $R^2$ of the two models and then compare this to 1000 permutations (shuffling the y values). Is that a sensible method?

I'd be very grateful for any pointers.

Welcome to Cross Validated! 1) How do you plan to calculate out-of-sample $R^2?$ 2) What do you hope to learn from $R^2$ rather that you wouldn’t get from mean squared error? — Dave, Jul 05 '22 at 11:56
2 and adjusted 2 are great ways to compare models. However, if I am not mistaken, this is only when models are nested? If this is the case for you, you should just be able to look at the model summary (if you use R it's just summary(model) , and compare the R2 values — Andy, Jul 05 '22 at 12:04
@Dave I don't know much about the properties of either - I thought of R^2 because % variance explained sounded to me like an easy to understand quantity. Whereas for mean squared error, I was not sure how it translates to something intuitive? — JacquieS, Jul 05 '22 at 14:52
@Andy, when you say compare the R2 values - I think this is what I'm trying to figure out how to do? Say if I just look at the values and one model has e.g. R^2= 0.2 and the other R^2 =0.22, is that different than chance? And that's what I thought with a permutation I can answer? — JacquieS, Jul 05 '22 at 15:01
I believe my post here could help you out. If your models are nested, there are more than just R2 values you can look at.
I detail a few of the methods here to help you decide, what I believe to be objectively, which model is best to keep: https://stats.stackexchange.com/questions/571180/does-my-predictor-in-my-multiple-regression-have-too-many-variables/571186?noredirect=1#comment1055632_571186 — Andy, Jul 05 '22 at 15:07
$1)$ The "proportion of variance explained" interpretation of $R^2$ only exists for special circumstances, and I have not seen such a circumstance that for an out-of-sample $R^2$. $2)$ How you calculate an out-of-sample $R^2$ will matter, so how are you calculating such a value? Are you using a function from an existing software package? If so, which one? Are you writing the function yourself? If so, what equation are you coding, and why do you believe that to be a good definition of out-of-sample $R^2?$ — Dave, Jul 05 '22 at 15:11
@Dave. Thanks, I did not know this about R^2! I've computed it as: R^2 = 1- SS_Regression/SS_Total with SS_Res = sum of squared errors (true minus predict) and SS_tot = sum of squares (true minus mean)
ss_res = sum((measured.data - H1Ai.predict.behav$Estimate)^2) ss_tot = sum((measured.data - mean(measured.data))^2) R_sq_behav= 1- ss_res/ss_tot

Let's say I was using mean squared error instead, how would I compare the two models? Make the difference and compute to a permutation? — JacquieS, Jul 05 '22 at 15:23
Now knowing better what terms to use for my search, I've found this discussion which has explained to me how I've computed R^2 incorrectly for out of sample (https://stats.stackexchange.com/questions/228540/how-to-calculate-out-of-sample-r-squared). I've changed my computation now.
Then the only things I need to decide on:

is comparing R^2 to a permutation appropriate

what is the difference of R^2 vs say the difference in RMSE between the two models (again compared against a permutation for significance)? — JacquieS, Jul 06 '22 at 08:36

score 2 · Answer 1 · answered Jul 05 '22 at 14:35

2

What I would do (and you may choose differently) is to avoid comparing R^2 or any of its variants and look at everything graphically. A parallel box plot the residuals of the two models, maybe a Bland-Altman plot (aka Tukey mean difference plot), a quantile quantile plot, and comparisons of the actual and predicted values.

answered Jul 05 '22 at 14:35

Peter Flom

119,535
36
175
383

Thanks. I should have explained that this is part of a pre-registration, where based on the 'validation' ('discovery') sample, I've made the hypotheses - which each hypothesis being a regression. So I need to somehow now specify what will make me think that my hypothesis is 'true' or 'not true'. This is psychology, I should add.
In your approach, do you have something that could be used to give 'one answer'; and then of course I can show many other things as further information, but I need something that is the 'yes/no answer'.
– JacquieS Jul 05 '22 at 14:50
Since the models are nested (one has two variables, the other has those PLUS some others) you can do as Andy suggested in his comment on the other answer.
But you seem to have some misconception about how hypothesis testing works.....
Briefly, you specify a null hypothesis (usually, that nothing is happening) and then test it. You can either reject the null or fail to do so. You never say the null is true. There are some fundamental problems with this procedure, but that's another topic.
– Peter Flom Jul 05 '22 at 14:55
Agreed. Sorry, wording was sloppy on the NHT. – JacquieS Jul 05 '22 at 15:00

$R^2$ model comparison in test data

1 Answers1