The question Difference between Adjusted R Squared and Predicted R Squared gives a procedure as follows:
- A data point from your dataset is removed
- A refitted linear regression model is generated
- The removed data point is plugged into the refitted linear model, generating a predicted value
- The removed data point is placed back into your dataset. Repeat from step 1 for the next data point until all data points have had a chance to be removed.
Modifying your example, we can use the following:
import statsmodels.api as sm
import numpy as np
import pandas as pd
from statsmodels.stats import anova
from sklearn.model_selection import LeaveOneOut
Data
mtcars = sm.datasets.get_rdataset("mtcars", "datasets", cache=True).data
y = mtcars.mpg.to_numpy()
X = sm.add_constant(mtcars.wt.to_numpy())
Calculate PRESS and pre_rsq
loo_res = []
for train_index, test_index in LeaveOneOut().split(X):
X_train, X_test = X[train_index], X[test_index]
y_train, y_test = y[train_index], y[test_index]
model = sm.OLS(y_train, X_train)
results = model.fit()
loo_res.append(*(y_test - results.predict(X_test)))
pred_rsq = 1 - np.sum(np.square(loo_res)) / np.var(y) / y.size
rsquared on all data
model = sm.OLS(y, X)
results = model.fit()
Print results
print(results.rsquared, results.rsquared_adj, pred_rsq)
statsmodels. Not sure what you mean by "predicted R-square" vs "R-square" since $R^2$ uses the residuals... – Galen Oct 18 '22 at 02:21