Equivalent of R-squared in Generalized Linear Model Regression Results?

Question

How do we assess degree of fitness in a Generalized Linear Model (GLM) since R-squared is not given. For example, following are results of regression in iris dataset with code: smf.glm('SL~Species+SW+PL', data=irisdf, family=sm.families.Gaussian(sm.families.links.log)).fit() using statsmodels.

                 Generalized Linear Model Regression Results                  
==============================================================================
Dep. Variable:                     SL   No. Observations:                  150
Model:                            GLM   Df Residuals:                      145
Model Family:                Gaussian   Df Model:                            4
Link Function:                    log   Scale:                        0.096285
Method:                          IRLS   Log-Likelihood:                -34.807
Date:                Fri, 17 Jul 2020   Deviance:                       13.961
Time:                        12:44:23   Pearson chi2:                     14.0
No. Iterations:                     6                                         
Covariance Type:            nonrobust                                         
=========================================================================================
                            coef    std err          z      P>|z|      [0.025      0.975]
-----------------------------------------------------------------------------------------
Intercept                 1.1842      0.046     26.024      0.000       1.095       1.273
Species[T.versicolor]    -0.1335      0.035     -3.772      0.000      -0.203      -0.064
Species[T.virginica]     -0.2046      0.046     -4.405      0.000      -0.296      -0.114
SW                        0.0713      0.014      5.118      0.000       0.044       0.099
PL                        0.1244      0.010     12.214      0.000       0.104       0.144
=========================================================================================
==================== Summary2() ====================
                  Results: Generalized linear model
=====================================================================
Model:                 GLM                AIC:              79.6145

Link Function:         log                BIC:              -712.5809
Dependent Variable:    SL                 Log-Likelihood:   -34.807

Date:                  2020-07-17 12:44   LL-Null:          -492.86

No. Observations:      150                Deviance:         13.961

Df Model:              4                  Pearson chi2:     14.0

Df Residuals:          145                Scale:            0.096285 
Method:                IRLS

                   Coef.  Std.Err.    z    P&gt;|z|   [0.025  0.975]


Intercept              1.1842   0.0455 26.0241 0.0000  1.0950  1.2734
Species[T.versicolor] -0.1335   0.0354 -3.7717 0.0002 -0.2028 -0.0641
Species[T.virginica]  -0.2046   0.0464 -4.4051 0.0000 -0.2956 -0.1136
SW                     0.0713   0.0139  5.1183 0.0000  0.0440  0.0986
PL                     0.1244   0.0102 12.2141 0.0000  0.1044  0.1443
=====================================================================

What is the equivalent of R-squared in above analysis?

There is no equivalent, but there are a number of pseudo $R^2$. This question has been asked before, have a look here, here, or here, for example. — matteo, Jul 17 '20 at 08:10
Does this answer your question? How to calculate goodness of fit in glm (R) — matteo, Jul 17 '20 at 08:10
Thanks for the links. I will go through them and try to determine how good is fit in above example. — rnso, Jul 17 '20 at 09:50

score 1 · Accepted Answer · answered Jul 17 '20 at 07:53

1

First, the answer. You should be able to calculate the R2 for your model by hand then, sometimes statsmodel provides a pseudo R2 as well:

sst = sum(map(lambda x: np.power(x,2),y-np.mean(y))) 
sse = sum(map(lambda x: np.power(x,2),your_model.resid_response)) 
r2 = 1.0 - sse/sst

But, that being said I do not think assessing your regression model with R2 is the best solution in your case. Why do you not use AIC? There are many discussion whether or not R2 is really the 'golden standard' of assessing your regression. One quote from an interesting post about misunderstandings in statistics says about R2: "Equating a high R2 with a "good model" (or equivalently, lamenting - or, in the case of referees of papers, criticizing - that R2 is "too" low)."

Maybe look into this discussion here and here and re-consider how to report your model performance.

answered Jul 17 '20 at 07:53

Thomas

528

Specifically, what would you say is the performance of example model that I have given in my question above? I would like to see the readout there and determine rather than doing any more calculations. – rnso Jul 17 '20 at 09:52
Also just to be clear, it is 1.0 - (sse/sst) and not (1.0 - sse)/sst ? – rnso Jul 17 '20 at 09:54
If I had to asses the performance of your model, I would definitely calculate the RMSE as it tells you more about the average error performed by the model in predicting the outcome for an observation. Just look into how the residuals are between observation and prediction. And, it is SSE / SST and then you subtract that from 1, to my knowledge. – Thomas Jul 17 '20 at 11:21

Equivalent of R-squared in Generalized Linear Model Regression Results?

1 Answers1