Questions tagged [r-squared]

The coefficient of determination, usually symbolized by $R^2$, is the proportion of the total response variance explained by a regression model. Can also be used for various pseudo R-squared proposed, for instance for logistic regression (and other models.)

The coefficient of determination, usually symbolized by $R^2$, is the proportion of the total response variance explained by a regression model. In the case of simple linear regression it is the square of the Pearson product-moment correlation coefficient between the predictor and response variables. It is equivalently calculated as:

$$ R^2 = \frac{SS_{\rm total} - SS_{\rm resid}}{SS_{\rm total}} $$

$R^2$ tends to increase (i.e., look better) when variables are added to a multiple regression model, even if those variables are irrelevant. To counteract this, an adjusted $R^2$ statistic has been developed:

$$ R^2_\text{adj} = 1-(1-R^2)\frac{N-1}{N-p-1} $$

$R^2$ in the form given above is appropriate for models with normally distributed errors. It is not appropriate for other models, such as logistic regression. A variety of 'pseudo-$R^2$' statistics have been developed to provide similar information outside the context of linear models.

1107 questions
7
votes
3 answers

What is the lower and upper bound on $R^2$ with no intercept?

With an intercept in linear regression, I know that $R^2$ is bounded by [0,1]. Without an intercept, I know that $R^2$ can be negative. What is the lower bound for this case? And is the upper bound still 1 for this case?
6
votes
1 answer

Why does the glm function does not return an R^2 value?

The lm function in R retrieves an R^2 value. The glm function, even if applied to a Gaussian family, does not retrieve an R^2 value. What is/are the reason/reasons for this? Thank you!
5
votes
1 answer

Is it true that adjusted-$R^2$ not a measure of fit? Why or why not?

Previously I've read that adjusted-$R^2$ is not a measure of fit. Recently, though, I wanted to substantiate that piece of knowledge by understanding the reason why but I couldn't find any substantive sources to back this up. The Wikipedia article…
5
votes
3 answers

Is adjusted R squared score still appropriate when number of regressors is larger than the sample size?

So I have a really small sample size of 50, and I have 80 regressors. The $R^2$ score is about 0.1, and according to the following equation on Wikipedia about how to compute adjusted $\bar{R}^2$, $$ \bar{R}^2 = R^2 - (1-R^2)\frac{p}{n-p-1} \\ R^2 =…
longtengaa
  • 213
  • 2
  • 8
5
votes
1 answer

Explanation for R-squared as ratio of covariances and variances

I have code that calculates $R^2$ with summations $$R^2 = \frac{(\sum xy - \frac1n \sum x \sum y)^2}{(\sum x^2 - \frac1n \sum x \sum x) (\sum y^2 - \frac1n \sum y \sum y)},$$ which is equivalent to $$R^2 = \frac{cov(x, y) \cdot cov(x, y)}{var(x)…
4
votes
1 answer

Coefficient of determination ($R^2$) and sample size

Is there any relationship between $R^2$ and sample size - does the $R^2$ increase with sample size? And does the adjusted $R^2$?
guest99
  • 125
3
votes
1 answer

Do we maximize explained sum of squares with OLS?

I know that with OLS we minimize the sum of squared residuals but does that imply that we maximize SSE? From the following r-squared formula $$ R^2 = \frac{SSE}{SST} = 1- \frac{SSR}{SST}$$ and the fact that SST = SSE + SSR, it really seems like it…
Popopo
  • 31
3
votes
1 answer

Coefficient of determination

I'm taking an online intro class on statistics and right now we are covering a topic on relationship between quantitative variables. One of the subtopics is coefficient of determination. Here is an excerpt from the book: The coefficient of…
flashburn
  • 311
2
votes
1 answer

R^2 with accounting for mean-shift: Is there a name for this metric?

When calculating the r^2 of some model on some testing set, we're effectively comparing the MSE of that model's predictions to the MSE of some naive, base-line model that always predicts the sample mean of the target variable in the test set. But…
2
votes
1 answer

What is a good R^2 value?

I understand the answer to this question is that it entirely depends on the data set. However, this does not help people understand if their model is suitable or whether they should explore other variables. So I am attempting to getting this…
2
votes
0 answers

McFadden's Pseudo R² - Comparability across different datasets

Several authors in applied research papers claim that McFadden's Pseudo R² cannot be used for comparing models that are based on different datasets. I have searched some statistics/econometrics-textbooks (Wooldridge etc.) for guidance on this issue,…
Marc
  • 21
1
vote
1 answer

$R^2$ relative to a noiseless function

I am interested in computing the $R^2$ between a set of points $D_f = \{ (x,y)\} $ where $y = f(x)$ and a set of points $D' = \{(x',y') \}$ obtained adding noise to $D_f$. I don't think I can use: $$ R^2 = 1 - \frac{\sum_i (y'_i - y_i)^2}{\sum_i…
Simone
  • 7,078
1
vote
1 answer

R-squared for user-defined prediction algorithms

I've been working on a machine learning project for a while and I've come up with an algorithm that does what I want it to do (predict some values). I wondered if it is possible to calculate R-squared for my results. My results are like this…
Sinead
  • 13
1
vote
1 answer

How to get predicted R-square from statmodels?

How can I get predicted R-square along with R-square and Adj-Rsquare in statmodels? code import statsmodels.api as sm import numpy as np import pandas as pd import statsmodels.formula.api as smf from statsmodels.stats import anova mtcars =…
1
vote
0 answers

Does it make any sense to use R^2 as a measurment

I was wondering whether it makes any sense to use R-squared for the graph/model. I think it is not since it does not clearly explain a linear relation.
Wietze
  • 11
1
2