$y = f(x,...)$
$y$ represents the total effort in hours to complete a mix of tasks (each $x$ is a different mix) of varying complexity. It might be a non-linear relationship, but for now, I only have the count of tasks. No complexity data. I'd like to understand the correlation between total effort and task count.
A simple regression throws this up:
> summary(lm(data=dat, y ~ x))
Call:
lm(formula = y ~ x, data = dat)
Residuals:
Min 1Q Median 3Q Max
-2912.84 -189.26 12.88 148.09 3138.23
Coefficients:
Estimate Std. Error t value Pr(>|t|)
(Intercept) -148.09 102.96 -1.438 0.155
x 146.76 13.89 10.568 1.69e-15 ***
Signif. codes: 0 ‘*’ 0.001 ‘’ 0.01 ‘*’ 0.05 ‘.’ 0.1 ‘ ’ 1
Residual standard error: 724.2 on 62 degrees of freedom
Multiple R-squared: 0.643, Adjusted R-squared: 0.6373
F-statistic: 111.7 on 1 and 62 DF, p-value: 1.687e-15
I re-ran it with a forced zero intercept.
It now says:
> summary(lm(data=dat, y ~ 0 + x))
Call:
lm(formula = y ~ 0 + x, data = dat)
Residuals:
Min 1Q Median 3Q Max
-2775.5 -287.4 -118.5 0.0 3389.7
Coefficients:
Estimate Std. Error t value Pr(>|t|)
x 137.24 12.31 11.15 <2e-16 ***
Signif. codes: 0 ‘*’ 0.001 ‘’ 0.01 ‘*’ 0.05 ‘.’ 0.1 ‘ ’ 1
Residual standard error: 730.3 on 63 degrees of freedom
Multiple R-squared: 0.6635, Adjusted R-squared: 0.6582
F-statistic: 124.2 on 1 and 63 DF, p-value: < 2.2e-16
Question: Which of the two above is a better fit? Should the Residual Standard Error* (RSE) be the guide? Or should I simply go with the second model because a zero-intercept makes sense in this case?
Note: The post shared by Nick Cox states that $R^2$ must not be relied upon for zero-intercept cases. I don't fully understand the math described there but I'm hoping that it is okay to rely on the Residual Standard Error in a zero-intercept scenario. Could someone please confirm this as well?
*Reference: This article says RSE is also a goodness-of-fit measure.
A = c(1,2,3,4,5,6,7,8,9); B = c(2,4,5,3,6,5,7,9,8); model = lm(B ~ 0 + A); summary(model); library(rcompanion); accuracy(list(model)); library(performance); r2_efron(model). – Sal Mangiafico Oct 28 '22 at 16:09