3

I am using R to estimate the error in the slope of a regression line. (I will later use the slope to calculate something.) I have some data, call them $x$ and $y$, and will fit a linear regression such as $y = mx+c$. If y has some error associated with it, say $\delta y = 0.1$ for example, what is the error in the slope?

The residual standard error given by the lm() function in R does not take this error in y into account. So how do I take the residual standard error and add the error in y that I have?

  • 1
    Could you please explain what you mean by "gradient"? One would have supposed you mean the coefficient $m$, but you contradict that where you claim lm does not account for the errors in y. – whuber Jan 04 '17 at 19:13
  • Yes coefficient m – Joe Wragg Jan 04 '17 at 19:14
  • Okay. What is the reason you claim "the lm() function in R does not take into account this error in y"? Because that's untrue, we ought to explore the origin of this misbelief because it might help us understand what question you're actually trying to formulate. – whuber Jan 04 '17 at 19:18
  • So from my understanding the lm function takes your x and y values and applies a linear regression to it. To fit a linear model to the data. When I plot my abline to that data using the regression coefficients given by the lm function the gradient is calculated using the sum of least squares right? – Joe Wragg Jan 04 '17 at 19:21
  • If I have some error in y say that that error is + or - 0.1 that i use to plot error bars on my points. The lm funciton does not take in that value when called in r. – Joe Wragg Jan 04 '17 at 19:23
  • 1
    I don't see anything that is incorrect in that description when it's properly interpreted. The squares of the residuals automatically incorporate any variation in the responses $y$: that's part and parcel of what least squares does. I suspect you will need to explain, as clearly as possible, what you mean by "error bars on my points." – whuber Jan 04 '17 at 19:24
  • 2
    @whuber, I think he means that OLS assumes your data are measured w/o error, & that the resulting estimated error variance will be the sum of the true error variance and the variance of the measurement error. – gung - Reinstate Monica Jan 04 '17 at 19:29
  • @gung I am arriving at a similar guess. What we have seems to be a form of ANOVA using summary statistics. – whuber Jan 04 '17 at 20:02
  • @gung yes this is what I mean – Joe Wragg Jan 04 '17 at 20:09
  • So can you guys help me then how do I add the error to the regressions calculation? – Joe Wragg Jan 04 '17 at 20:12
  • I have edited your title & post rather extensively for clarity based on what I think you are asking. Please ensure that it reflects what you want to know. If not, we can roll it back and try again. – gung - Reinstate Monica Jan 05 '17 at 01:08
  • Yes this is what I want answered – Joe Wragg Jan 05 '17 at 16:20

1 Answers1

0

Let's consider the simplest case: There is measurement error in $Y$ only. It is normally distributed, centered on the true value of each $y_i$, and independent (of the value of $y_i$, $x_i$, etc.). We'll say the standard deviation of the measurement error is $0.1$. That seems to be the situation you have in mind.

In this case, the variance of the residuals will be inflated relative to the true variance of the errors. (Note that "errors" is the traditional—unfortunate—name of the random part of the data generating process, they are not erroneous in the everyday sense, and they are not the same as the measurement errors.) Specifically, variances add, so the expected variance of the residuals will be the true variance of the errors plus the variance of the measurement error. What R calls the "residual standard error" is (somewhat bizarrely) the standard deviation of the residuals (cf., here), so you need to square that to get the variance. If you knew the variance of the measurement error a-priori, or had some independent estimate of it, you could simply subtract that value from the result. If you'd like, you could take the square root of the difference to get an estimate of the SD of the errors.

On the other hand, the standard error of the slope is based on the variability in your observed data. Since this is what was used to fit the slope, and slopes fit on data with measurement error should bounce around more widely that slopes fit on data measured without measurement error. As a result, I would not advise you to try to remove the variance of the measurement error from the standard errors of your regression parameters. They are correct already.