Testing for a difference in $R^2$ with the same model but two different (equally sized) datasets

Question

I have two datasets of the same size, $\{\vec{Y}_{1},\vec{X}_1\}$ and $\{\vec{Y}_{2},\vec{X}_2\}$.

I fit the same regression model to both datasets and calculate the coefficient of determination from that model for both datasets, $R^{2}_{1}$ and $R^{2}_{2}$.

I would like to test whether these two values are "the same".

More formally, let the null hypothesis be that the model fits both datasets equally well; what is the probability of observing $R^{2}_{1}$ and $R^{2}_{2}$ under that null hypothesis.

I don't think I can use an F-test, since I'm not looking at a nested model versus the full model. What else is out there that I can use? Alternatively, is there some non-parametric or bootstrap/permutation test I can consider?

A comparison of $R^2$ isn't even meaningful unless the explanatory variables have the same (empirical) variance, or nearly so: see https://stats.stackexchange.com/a/13317/919 for an explanation. — whuber, Jan 24 '24 at 19:18

score 2 · Answer 1 · answered Jan 24 '24 at 18:31

While $R^2$ is not known to follow any know distribution, there are a few approaches that you can take. You have already identified two: Bootstrap and Permutation tests. Here are some other alternatives to compare two $R^2$ values from seperate regressions.

Bootstrapping: Like you identified, bootstrapping is one way. Here's how to do it:
- Resampling your data with replacement (SRSWR) and fitting the regression model to these bootstrap samples.
- Calculating $R^2$ for each sample.
- Doing this many times (theoretically infinite) to build up an empirical distribution of $R^2$ for each dataset.
- Finally, comparing the distributions of $R^2$ for the two datasets to see if they overlap significantly or if one tends to have higher $R^2$ values than the other.
This doesn't give you a p-value in the traditional sense but provides a way to see if one model consistently fits better than the other.
Permutation Test: Permutation test involves combining your data and then randomly (many times) dividing it into two groups of the original sizes and fitting the models and calculating $R^2$ for these groups. The idea is to see how often you get $R^2$ values as extreme as the ones you observed if the null hypothesis (that the models fit equally well) is true.
Fisher's Z-Transformation: Fisher's Z-transformation can be used to transform the $R^2$ values into a variable that approximately follows a normal distribution. After the transformation, you could use traditional parametric tests (like a t-test) to compare the two transformed $R^2$ values. The transformation is given by:

$$Z = 0.5 \cdot \log{\frac{1 + r}{1 - r}}$$

where $r$ is the square root of $R^2$ (i.e., the correlation coefficient). After transforming $R^2$ from both datasets, you can apply a two-sample t-test to compare the means of the two Z distributions.
Confidence Interval Overlap: Treat each $R^2$ as one estimate and calculate an empirical confidence interval. Then see if these intervals overlap. However, this method assumes that $R^2$ values are comparable across studies, which might not be the case always.

My bias would be to use bootstrapping, permutation test and Fisher's transformation in that order. But you do you!

My concern with bootstrapping the data, i.e., sampling with replacement, is that in the bootstrapped populations you will see repeats of the data. This means the overall dataset the regression is performed on will have much lower variance than the original, and I will see incorrectly high estimates of the $R^2$. — Pablo, Jan 24 '24 at 18:47
Repeats aren't a problem. But when the two sets of explanatory variables have different spreads or don't cover essentially the same range of values, comparing $R^2$ tells you next to nothing. — whuber, Jan 24 '24 at 19:19
Multigroup analysis within the framework of structural equation modeling (SEM; regression is a special case of SEM) is another option that would allow you to formally test a variety of coefficients for equality between groups--including regression coefficients, variances/residual (error) variances, and R-squared values by using nested model (chi-square) tests. — Christian Geiser, Jan 24 '24 at 19:35

Testing for a difference in $R^2$ with the same model but two different (equally sized) datasets

1 Answers1