0

Say I want to estimate with lm() the means of y over k groups, where groups are defined by a factor.

If I just run lm(y ~ factor), this will give me an intercept, and the coefficient for the k-1 factors, but expressed as difference from the intercept. I want instead to have direct values of the means.

Is there a way to do this cleanly with contrast in lm()? I am not sure how this contrast would be called... orthogonal? I can obviously remove the intercept: lm(y ~ -1+ factor) but this would give me wrong R2 values

reg1 <- lm(Sepal.Length~ Species, data=  iris)
reg2 <- lm(Sepal.Length~ -1 + Species, data=  iris)

## get coefs
coef(reg1) # not what I want
#>       (Intercept) Speciesversicolor  Speciesvirginica 
#>             5.006             0.930             1.582
coef(reg2) # whay I want
#>     Speciessetosa Speciesversicolor  Speciesvirginica 
#>             5.006             5.936             6.588

## THe models are equivalent:
all.equal(fitted(reg1), fitted(reg2))
#> [1] TRUE


# but the -1 trick will create problems for some stats, such as R2
summary(reg1)$r.squared
#> [1] 0.6187057
summary(reg2)$r.squared
#> [1] 0.9925426

Created on 2019-05-01 by the reprex package (v0.2.1)

Zheyuan Li
  • 62,170
  • 17
  • 162
  • 226
Matifou
  • 6,722
  • 3
  • 40
  • 47
  • 1
    What do you mean by "wrong r2 values"? You can't have it both ways. This seems like maybe more of a statistics question than a programming question. If you need help understanding how linear regression models work, then you you should ask instead at [stats.se] where statistics questions are on topic. This might already explain it: https://stats.stackexchange.com/questions/26176/removal-of-statistically-significant-intercept-term-increases-r2-in-linear-mo – MrFlick May 01 '19 at 17:22
  • 1
    Also a discussion of the same issue here: https://stats.stackexchange.com/questions/171240/how-can-r2-have-two-different-values-for-the-same-regression-without-an-inte/171250#171250 – MrFlick May 01 '19 at 17:23
  • The point about R2 is secondary, my main question is how to get direct coefficient values with a factor. And for the secondary point, note that the two regressions are exactly the same (just a change in the way coefficients are labelled) and hence give the same SSR decomposition, SSR_tot =SSRpred +SSR_res. So one would expect to give the same R2. – Matifou May 01 '19 at 18:46

1 Answers1

3

It is not “orthogonal contrast” but “no contrast at all”.

Regarding the incorrect R squared: summary.lm computes this quantity in a different way whether there is explicitly an intercept in the model or not. You may want to manually compute R squared in this case: cor(Sepal.Length, fitted(reg2))^2. See this comment.

NelsonGon
  • 12,469
  • 5
  • 25
  • 52
Zheyuan Li
  • 62,170
  • 17
  • 162
  • 226
  • thanks! So is there a way in R to specify "no contrast at all" without removing manually the intercept? That would avoid having to do the manual R2 correction. – Matifou May 01 '19 at 19:00
  • @Matifou Contrast can be disabled, see [this Q&A](https://stackoverflow.com/q/41032858/4891738). However it will not achieve the result you hope. The Q&A provides you rich information how factor covariate variables in regression. – Zheyuan Li May 01 '19 at 19:10