1

How can I get predicted R-square along with R-square and Adj-Rsquare in statmodels?

code

import statsmodels.api as sm
import numpy as np
import pandas as pd
import statsmodels.formula.api as smf
from statsmodels.stats import anova

mtcars = sm.datasets.get_rdataset("mtcars", "datasets", cache=True).data df = pd.DataFrame(mtcars) model = smf.ols(formula='mpg ~ wt', data=mtcars).fit() model.summary()

output

enter image description here

  • Here is a Python gist obtaining $R^2$ and $R_{\text{adj}}$ in statsmodels. Not sure what you mean by "predicted R-square" vs "R-square" since $R^2$ uses the residuals... – Galen Oct 18 '22 at 02:21
  • @Galen You can find the description about predicated-squared (https://stats.stackexchange.com/questions/242770/difference-between-adjusted-r-squared-and-predicted-r-squared). – ferrelwill Oct 18 '22 at 05:17
  • Interesting. It looks like a sort of leave-one-out $R^2$. I have updated the Python gist to calculate that. – Galen Oct 18 '22 at 05:41

1 Answers1

3

The question Difference between Adjusted R Squared and Predicted R Squared gives a procedure as follows:

  • A data point from your dataset is removed
  • A refitted linear regression model is generated
  • The removed data point is plugged into the refitted linear model, generating a predicted value
  • The removed data point is placed back into your dataset. Repeat from step 1 for the next data point until all data points have had a chance to be removed.

Modifying your example, we can use the following:

import statsmodels.api as sm
import numpy as np
import pandas as pd
from statsmodels.stats import anova
from sklearn.model_selection import LeaveOneOut

Data

mtcars = sm.datasets.get_rdataset("mtcars", "datasets", cache=True).data y = mtcars.mpg.to_numpy() X = sm.add_constant(mtcars.wt.to_numpy())

Calculate PRESS and pre_rsq

loo_res = [] for train_index, test_index in LeaveOneOut().split(X): X_train, X_test = X[train_index], X[test_index] y_train, y_test = y[train_index], y[test_index] model = sm.OLS(y_train, X_train) results = model.fit() loo_res.append(*(y_test - results.predict(X_test)))

pred_rsq = 1 - np.sum(np.square(loo_res)) / np.var(y) / y.size

rsquared on all data

model = sm.OLS(y, X) results = model.fit()

Print results

print(results.rsquared, results.rsquared_adj, pred_rsq)

Galen
  • 8,442