5

Consider data generated from a model $Y = \alpha A + \beta U$, where $U$ is a confounder, i.e. $\langle A,U\rangle \neq 0$. We don't measure U, but rather a noisy version of it, $U' = U+\epsilon$, where the entries of $\epsilon$ are Gaussian(0,1). We fit a regression model,

$$Y = \alpha' A +\beta' U',$$

and obtain an estimate of $\alpha$ that is not fully adjusted for by $U'$.

Are there any results concerning the bias $\alpha-\alpha'$ as a function of $\epsilon$? That is, can we bound the bias in the estimate of a coefficient in linear regression by the error in a confounding variable?

This paper has results if $U$ is binary. How about if it is continuous? If not, what is the difficulty?

user310374
  • 495
  • 5
  • 10

1 Answers1

4

Linear projection results tell us that the OLS slope coefficients will, letting $x=(A\quad U')'$, tend to (there is no constant in the regression) $$ \text{plim}\,\begin{pmatrix}\hat\alpha'\\\hat\beta'\end{pmatrix}=E(xx')^{-1}E(xy) $$ We have $$ E(xx')=\begin{pmatrix}E(A^2)&E(AU')\\E(AU')&E(U'^2)\end{pmatrix} $$ and hence $$ E(xx')^{-1}=\frac{1}{E(A^2)E(U'^2)-[E(AU')]^2}\begin{pmatrix}E(U'^2)&-E(AU')\\-E(AU')&E(A^2)\end{pmatrix} $$ Also, $$ E(xy)=\begin{pmatrix}E(A(\alpha A + \beta U)\\E(U'(\alpha A + \beta U)) \end{pmatrix}=\begin{pmatrix}\alpha E(A^2)+ \beta E(AU)\\\alpha E(U'A) + \beta E(U'U) \end{pmatrix} $$ so that \begin{eqnarray*} \text{plim}\,\hat\alpha'&=&\frac{E(U'^2)(\alpha E(A^2)+ \beta E(AU))-E(AU')(\alpha E(U'A) + \beta E(U'U))}{E(A^2)E(U'^2)-[E(AU')]^2}\\ &=&\alpha+\frac{E(U'^2)\beta E(AU)-E(AU')\beta E(U'U)}{E(A^2)E(U'^2)-[E(AU')]^2} \end{eqnarray*} If $\epsilon$ is independent of all other random variables (you only the state the marginal, but not the joint distribution), we have $E(U'U)=E((U+\epsilon)U)=E(U^2)$ and similarly, $E(U'^2)=E(U^2)+1$ and $E(AU')=E(AU)$. Thus, \begin{eqnarray*} \text{plim}\,\hat\alpha'&=&\alpha+\frac{\beta (E(U^2)+1)E(AU)-\beta E(AU)E(U^2)}{E(A^2)(E(U^2)+1)-[E(AU)]^2} \end{eqnarray*} I do not see any further "nice" simplification.

Illustration:

library(mvtnorm)

n <- 10000 cm <- matrix(c(1,0.5,0.5,1), ncol=2) v <- rmvnorm(n, sigma = cm) alpha <- 2 beta <- 3 epsilon <- rnorm(n)

A <- v[,1] U <- v[,2] y <- alphaA + betaU

U.prime <- U + epsilon

\alpha +\frac{\beta (E(U^2)+1)E(AU)-\beta E(AU)E(U^2)}{E(A^2)(E(U^2)+1)-[E(AU)]^2}

> alpha + (beta(cm[2,2]+1)cm[1,2]-betacm[1,2]cm[2,2])/(cm[1 .... [TRUNCATED] [1] 2.857143

> lm(y~A+U.prime-1)

Call: lm(formula = y ~ A + U.prime - 1)

Coefficients: A U.prime
2.836 1.265