Are the $1-SSe/SSt$ and $cor^2$ calculations of $R^2$ always equivalent?

Question

I am trying to calculate the $R^2$ value for a production constrained spatial interaction model, using Fotheringham and O'Kelly (1989) as my guide.

I get dramatically different values for R-Square, depending on whether I calculate it as r-square <- 1 - SSe/SSt or r-square <- cor(x, y)^2. Is this result expected? Of course, I may well be miscalculating this somewhere along the line.

I want to use r-square as a (flawed but nevertheless useful and widely understood) measure of goodness of fit, as recommended by Fotheringham & Knudsen (1987).

A reproducible example is below. I've saved my model output to a csv, to save space here.

predobs <- read.csv("http://dl.dropbox.com/u/66606821/pred_obs.csv")
sst <- sum((predobs$obs - mean(predobs$obs))^2)
sse <- sum((predobs$obs - predobs$pred)^2)
(r.square.1 <- 1 - (sse/sst))
(r.square.2 <- cor(predobs$obs, predobs$pred)^2)

Both are suggested different types of pseudo R-squared values for non-OLS regression models. Perhaps looking up literature on the different proposed pseudo R-squared values will be fruitful. — Andy W, May 01 '12 at 02:59
Thanks @AndyW I'll take that as a "no". Before I hit the books, is there any standard nomenclature to differentiate these two measures? — fmark, May 01 '12 at 04:13
Off the cuff I don't remember, the different pseudo R-squares are sometimes named after the people whom have suggested them. See this question for some examples and discussion for ones used for logistic regression models, http://stats.stackexchange.com/q/3559/1036. — Andy W, May 01 '12 at 04:31
These should only be the same if you are performing a linear regression! Looking at your data, if I try lfit <- lm(predobs$obs ~ predobs$pred), the model I get has a slope of 0.56, and an intercept of -0.0387. If you incorporate this affine shift into your prediction, then the two forms should be equal. You can check this via r.square.3 <- 1 - (sum((lfit$residuals)^2) / sst) — shabbychef, May 01 '12 at 04:38

score 3 · Accepted Answer · answered May 01 '12 at 06:01

As long as your Gaussian linear model contains an intercept, the R squared always equals the squared correlation between the observations and the predicted values:

> y <- runif(100)
> x <- rpois(100,5)
> w <- gl(4,25)
> 
> # first model with quantitative covariate :
> fit <- lm(y~x)
> summary(fit)$r.squared
[1] 0.01387019
> pred <- fit$fitted
> cor(y,pred)^2
[1] 0.01387019
> 
> # second model with quantitative covariates :
> fit <- lm(y~x+I(x^2))
> summary(fit)$r.squared
[1] 0.01930005
> pred <- fit$fitted
> cor(y,pred)^2
[1] 0.01930005
> 
> # model with qualitative factor :
> fit <- lm(y~w)
> summary(fit)$r.squared
[1] 0.01269687
> pred <- fit$fitted
> cor(y,pred)^2
[1] 0.01269687

This fact is sometimes called the "materialization of the R squared".

score 2 · Answer 2 · answered May 01 '12 at 04:41

2

They are equivalent when one is performing linear regression with an intercept term.

answered May 01 '12 at 04:41

shabbychef

14,814

They are always equivalent whenever the model contains an intercept. See my answer. – Stéphane Laurent May 01 '12 at 07:01

Are the $1-SSe/SSt$ and $cor^2$ calculations of $R^2$ always equivalent?

2 Answers2