How to get predicted R-square from statmodels?

Question

How can I get predicted R-square along with R-square and Adj-Rsquare in statmodels?

code

import statsmodels.api as sm
import numpy as np
import pandas as pd
import statsmodels.formula.api as smf
from statsmodels.stats import anova
mtcars = sm.datasets.get_rdataset("mtcars", "datasets", cache=True).data
df = pd.DataFrame(mtcars)
model = smf.ols(formula='mpg ~ wt', data=mtcars).fit()
model.summary()

output

Here is a Python gist obtaining $R^2$ and $R_{\text{adj}}$ in statsmodels. Not sure what you mean by "predicted R-square" vs "R-square" since $R^2$ uses the residuals... — Galen, Oct 18 '22 at 02:21
@Galen You can find the description about predicated-squared (https://stats.stackexchange.com/questions/242770/difference-between-adjusted-r-squared-and-predicted-r-squared). — ferrelwill, Oct 18 '22 at 05:17
Interesting. It looks like a sort of leave-one-out $R^2$. I have updated the Python gist to calculate that. — Galen, Oct 18 '22 at 05:41

Galen · Accepted Answer · 2022-10-18T16:08:58.477

The question Difference between Adjusted R Squared and Predicted R Squared gives a procedure as follows:

A data point from your dataset is removed

A refitted linear regression model is generated

The removed data point is plugged into the refitted linear model, generating a predicted value

The removed data point is placed back into your dataset. Repeat from step 1 for the next data point until all data points have had a chance to be removed.

Modifying your example, we can use the following:

import statsmodels.api as sm
import numpy as np
import pandas as pd
from statsmodels.stats import anova
from sklearn.model_selection import LeaveOneOut
Data
mtcars = sm.datasets.get_rdataset("mtcars", "datasets", cache=True).data
y = mtcars.mpg.to_numpy()
X = sm.add_constant(mtcars.wt.to_numpy())
Calculate PRESS and pre_rsq
loo_res = []
for train_index, test_index in LeaveOneOut().split(X):
    X_train, X_test = X[train_index], X[test_index]
    y_train, y_test = y[train_index], y[test_index]
    model = sm.OLS(y_train, X_train)
    results = model.fit()
    loo_res.append(*(y_test - results.predict(X_test)))
pred_rsq = 1 - np.sum(np.square(loo_res)) / np.var(y) / y.size
rsquared on all data
model = sm.OLS(y, X)
results = model.fit()
Print results
print(results.rsquared, results.rsquared_adj, pred_rsq)

How to get predicted R-square from statmodels?

1 Answers1

Data

Calculate PRESS and pre_rsq

rsquared on all data

Print results