Prediction Interval for back-calculation when observation has variance

Question

Given an existing regression curve, how do I properly account for the known variance of the dependent variable when back-calculating for the (nominally) independent variable? If I had an observation $Y_{new}=1.09$ with a variance $\sigma_Y^2=0.012$ then how should I incorporate that information into my final answer for $X_{new}$ ?

I can build a (very simple) regression model with two vectors. Using R notation:

x <- c(8, 10, 50, 200, 350, 500, 1000, 2000)
y <- c(0.012, 0.016, 0.078, 0.333, 0.583, 0.799, 1.643, 3.002)
simple.lm <- lm(y ~ x,data=data.frame(x,y))

And back-calculation of any new value of X given a value of Y can be found by inverting the linear equation. Easy enough.

A <- coef(simple.lm)[1]
B <- coef(simple.lm)[2]
predict_X <- (y - A) / B
Y_new <- 1.09
X_new <- (Y_new - A) / B

But that doesn't deliver the prediction interval for the $X_{new}$ from that regression curve. It also assumes I either don't know, or don't care to include, the variability attached to my observation of Y. When I have $\sigma_Y^2$, I would like to carry that forward into the reported variance of X, from which I can calculate the prediction interval for X. This is basically the inverse of my prior question but I wanted to lay it out, and answer it, because this is the problem more likely to be searched for.

score 1 · Accepted Answer · edited Aug 01 '23 at 02:40

Consider three possible measures of variance in a new $y_0$ where the first variance value is treated as "no (known) variance," the second a low variance roughly on par with the experimentally expected values for the described problem, the third a high variance proposed in the question.

Rearranging the regression equation to solve for x, we can calculate the matrix and transposed matrix $X_p$ of that equation. The covariance matrix can be taken from the regression solution, which leads to both $\sigma_\text{regression}^2$ and $\sigma_\text{residuals}^2$. The prediction interval of a back-calculated concentration is then

$$ PI_\text{multiplier} \sqrt{\sigma_\text{regression}^2 + \sigma_\text{residuals}^2} $$

The known variance of that new $y_0$ is applied as the coefficient of variation and added to the overall standard error

$$ \sigma_x^2 = \sigma_\text{regression}^2 + \sigma_\text{residuals}^2 \\ CV_y = \frac{\sqrt{\sigma_y^2}}{\bar{y}} \\ se_y = \sigma_{x(\text{vary})} = \sigma_x + CV_y \\ x_\text{predict} = PI_\text{multiplier} \times se_y $$

Y_new <- 1.09 
variance_Y <- c(0,7.68E-8,1.2E-2)
lvl <- 0.95
deg_freedom <- 6
pi_multiplier_knownDF <- qt((1 - lvl) / 2, deg_freedom, lower.tail = FALSE)
Xp <- matrix(c(-1/B,(A-Y_new)/(B^2)),ncol=2)
V <- vcov(simple.lm)
variance_regression <- diag(Xp %% V %% t(Xp)) 
variance_residuals <- sum((x-predict_X)^2) / deg_freedom
cv_Y <- sqrt(variance_Y) / mean(y)
se_var <- sqrt(variance_regression + variance_residuals) + cv_Y 
x_PI_with_yvar <- pi_multiplier_knownDF * se_var

This is comparable to the calculations done in an errors-in-variables model, where the reliability ratio $\lambda$ is applied as a divisor to the slope.

Prediction Interval for back-calculation when observation has variance

1 Answers1