My answer would be that you cannot prove it because the claim is, as I understand it, not correct in general. At the bottom, the edit provides an analysis of why we nevertheless see unbiasedness in your code in your special case and derives more general lessons from this.
Consider a "true" DGP (with a "structural" error, i.e. one that is at least uncorrelated with the $x_j$)
$$
y=x_1\beta_1+x_2\beta_2+x_3\beta_3+u
$$
As we regress on $x_1$ and $x_2$ only, the plim of our regression coefficients will be, letting $x=(x_1\quad x_2\quad x_3)$ and (for "short" regression) $x_s=(x_1\quad x_2)$, the linear projection coefficients
$$
\gamma=(E(x_sx_s'))^{-1}E(x_sy)
$$
Take, e.g., a covariance matrix
$$
\Sigma=E(xx')=\begin{pmatrix}1& 0.5& 0.25\\ 0.5& 1& 0\\ 0.25& 0& 1\end{pmatrix}
$$
This should be consistent with your setting in which $x_1$ and $x_3$ are correlated, but $x_2$ are not. They are of course, and that is the point of this answer, "indirectly" correlated via $x_1$.
Now, given our numbers,
$$
(E(x_sx_s'))^{-1}=\begin{pmatrix}4/3& -2/3\\-2/3& 4/3\end{pmatrix}
$$
and
$$
E(x_sy)=E\begin{pmatrix}x_1(x_1\beta_1+x_2\beta_2+x_3\beta_3+u)\\x_2(x_1\beta_1+x_2\beta_2+x_3\beta_3+u)
\end{pmatrix}=\begin{pmatrix}\beta_1+0.5\beta_2+0.25\beta_3\\0.5\beta_1+\beta_2\end{pmatrix},
$$
where the second equality uses uncorrelatedness with the "structural error" $u$. Hence, all in all,
$$
\gamma=\frac{1}{3}\begin{pmatrix}4(\beta_1+0.5\beta_2+0.25\beta_3)-2(0.5\beta_1+\beta_2)\\-2(\beta_1+0.5\beta_2+0.25\beta_3)+4(0.5\beta_1+\beta_2)\end{pmatrix}
$$
Hence, there is in general no reason to suppose that $\gamma_2=\beta_2$.
Illustration:
library(mvtnorm)
n <- 50000
CovMat <- matrix(c(1, 0.5, 0.25, 0.5, 1, 0, 0.25, 0, 1), ncol=3)
X <- rmvnorm(n, sigma = CovMat)
u <- rnorm(n)
beta <- 1:2
e <- X[,3] + u # i.e. \beta_3=1
y <- X[,1]*beta[1] + X[,2]*beta[2] + e
summary(lm(y~X[,1]+X[,2]-1)) # i.e. the coefficients are not close to the true values 1 and 2, but close to gamma
(gamma <- c(solve(CovMat[1:2,1:2])%%c(CovMat[1,1]beta[1]+CovMat[1,2]beta[2]+CovMat[1,3]1, CovMat[1,2]beta[1]+CovMat[2,2]beta[2])))
solve(crossprod(X[,1:2])/n) # to illustrate the inverse
crossprod(X[,1:2], y)/n # to illustrate E(x_sy)
EDIT: An analysis of what went "wrong" for the OP. A somewhat subtle story with, at least to me, a surprising result. I cannot quite make sense of whether there is something "deeper" to it that "had to" happen or if it was just a coincidence.
The OP generated $K=2$ (unlike $K=3$ in my above example, with one regressor going into the error term) correlated regressors and an error term via chol(covMat). Now, chol returns an upper diagonal matrix $L'$. Hence, when writing $L'm$ for $m\sim N(0,I)$, the covariance of $L'm$ will be $L'L$, and not $LL'$, the Cholesky decomposition of covMat.
Write the elements of covMat as, using the zeros employed by the OP which I think matter for the result to go through,
$$
\begin{pmatrix}
\sigma_{UU} & \sigma_{UV} & \sigma_{UZ}\\
\sigma_{UV} & \sigma_{VV} & \sigma_{VZ}\\
\sigma_{UZ} & \sigma_{VZ} & \sigma_{ZZ}
\end{pmatrix}=\begin{pmatrix}
\sigma_{UU} & 0 & \sigma_{UZ}\\
0 & \sigma_{VV} & \sigma_{VZ}\\
\sigma_{UZ} & \sigma_{VZ} & \sigma_{ZZ}
\end{pmatrix}
$$
with "southeastern" (the sorting of the variables is of course arbitrary) block $\Sigma$ for the two regressors.
Using the formula for a $3\times 3$ Cholesky decomposition, when exploiting the zeros,
$$
L=\begin{pmatrix}
\sqrt{\sigma_{UU}} & 0 & 0\\
0 & \sqrt{\sigma_{VV}} & 0\\
\sigma_{UZ}/\sqrt{\sigma_{UU}} & \sigma_{VZ}/\sqrt{\sigma_{VV}} & \sqrt{\sigma_{ZZ}-\sigma_{UZ}^2/\sigma_{UU}-\sigma_{VZ}^2/\sigma_{VV}}=:L_{33}
\end{pmatrix}
$$
so that the "wrong" Cholesky decomposition yields a new (symmetric with upper diagonal elements omitted) covariance matrix of the variables as in
$$
L'L=\begin{pmatrix}
\sigma_{UU}+\frac{\sigma_{UZ}^2}{\sigma_{UU}} & & \\
\frac{\sigma_{VZ}\sigma_{UZ}}{\sqrt{\sigma_{VV}\sigma_{UU}}} & \sigma_{VV}+\frac{\sigma_{VZ}^2}{\sigma_{VV}} & \\
L_{33}\frac{\sigma_{UZ}}{\sqrt{\sigma_{UU}}} & L_{33}\frac{\sigma_{VZ}}{\sqrt{\sigma_{VV}}} & L_{33}^2
\end{pmatrix}\equiv
\begin{pmatrix}
\theta_{U}&&\\
\theta_{UV}&\theta_{V}&\\
\theta_{UZ}&\theta_{VZ}&\theta_{Z}
\end{pmatrix}
$$
Now, by an analogous argument as above, for $x_s = (V\; Z)$, we have (in general for some covariance matrix $\Theta$, so not only this covriance matrix) that we may express the plim of the OLS estimator with two regressors in terms of the following (co-)variances:
$$
(E(x_sx_s'))^{-1}=\frac{1}{\theta_Z\theta_V-\theta_{VZ}^2}\begin{pmatrix}
\theta_{Z}&-\theta_{VZ}\\
-\theta_{VZ}&\theta_{V}
\end{pmatrix}
$$
and
$$
E(x_sy)=\begin{pmatrix}
\beta_V\theta_V+\beta_Z\theta_{VZ}+\theta_{UV}\\
\beta_V\theta_{VZ}+\beta_Z\theta_{Z}+\theta_{UZ}
\end{pmatrix}
$$
That is, we now have, for the first coefficient estimator, that
$$
\begin{eqnarray*}
\text{plim}\,\hat\beta_V&=&\frac{1}{\theta_Z\theta_V-\theta_{VZ}^2}[\theta_Z(\beta_V\theta_V+\beta_Z\theta_{VZ}+\theta_{UV})-\theta_{VZ}(\beta_V\theta_{VZ}+\beta_Z\theta_{Z}+\theta_{UZ})]\\
&=&\frac{1}{\theta_Z\theta_V-\theta_{VZ}^2}[\beta_V(\theta_Z\theta_V-\theta_{VZ}^2)+\theta_{Z}\theta_{UV}-\theta_{VZ}\theta_{UZ})]\\
&=&\beta_V+\frac{\theta_{Z}\theta_{UV}-\theta_{VZ}\theta_{UZ}}{\theta_Z\theta_V-\theta_{VZ}^2}
\end{eqnarray*}
$$
That is, the estimator is in general consistent if
$$
\theta_{Z}\theta_{UV}=\theta_{VZ}\theta_{UZ}
$$
That will be the case in the "canonical" case when the error $U$ is uncorrelated with both regressors ($\theta_{UV}=\theta_{UZ}=0$), but can also occur when these error correlations scale suitably with the variance of the endogenous regressor ($\theta_{Z}$) and the covariance of the two regressors ($\theta_{VZ}$).
(The result suggests that the equality cannot be satisfied when only one error correlation is zero but the regressors are correlated, i.e., when $\theta_{VZ}\neq0$.)
For example, we would get consistency on the first regressor if we modify my initial example to
library(mvtnorm)
n <- 50000
CovMat <- matrix(c(1, 0.5, 0.25, 0.5, 1, 0.5, 0.25, 0.5, 1), ncol=3)
X <- rmvnorm(n, sigma = CovMat)
u <- rnorm(n)
beta <- 1:2
e <- X[,3] + u # i.e. \beta_3=1
y <- X[,1]*beta[1] + X[,2]*beta[2] + e
CovMat[1,1]*CovMat[1,3]==CovMat[1,2]*CovMat[2,3] # TRUE, i.e. validity condition satisfied
summary(lm(y~X[,1]+X[,2]-1)) # i.e. the coefficient on the first regressor is now correct
Now, this precisely turns out to be the case here with the "wrong" Cholesky decomposition:
$$
\theta_{Z}\theta_{UV}=L_{33}^2\frac{\sigma_{VZ}\sigma_{UZ}}{\sqrt{\sigma_{VV}\sigma_{UU}}}=L_{33}\frac{\sigma_{VZ}}{\sqrt{\sigma_{VV}}}L_{33}\frac{\sigma_{UZ}}{\sqrt{\sigma_{UU}}}=\theta_{VZ}\theta_{UZ}
$$
Hence,
$$
\begin{eqnarray*}
\text{plim}\,\hat\beta_V&=&\beta_V
\end{eqnarray*}
$$
Let us generalize the insights from this analysis a little. Consider a model
$$
y=X\beta+e
$$
in which $X$ and $e$ are potentially correlated. We may then write the OLS estimator, upon plugging in the model and simplifying, as
$$
\hat\beta=\beta+(X'X/n)^{-1}\frac{X'e}{n},
$$
which will tend to, by a LLN, an expression of type
$$
\text{plim}\,\hat\beta=\beta+\Sigma^{-1}\sigma_{Xe},
$$
The plims for coefficient $j$ therefore is $\beta_j$ plus the $j$th element of $\Sigma^{-1}\sigma_{Xe}$. Under exogeneity ($\sigma_{Xe}=0$), the estimator will be consistent.
However, it is, as we have seen, possible that the $j$th row of $\Sigma^{-1}\sigma_{Xe}$, which is a linear combination of the inverse of the regressor (co-)variances ($\Sigma$) and error correlations ($\sigma_{Xe}$), equals zero for suitable (if, I suppose, rare) patterns of $\Sigma$ and $\sigma_{Xe}$.
What this analysis also seems to reveal is that it is, for full rank $X$, not possible for all coefficients to be estimated consistently if $\sigma_{Xe}$ is not a zero vector: if that was the case, then $\Sigma^{-1}\sigma_{Xe}=0$, which would imply that the columns $\Sigma^{-1}$ are not linearly independent, so that it has reduced rank, and so would have $\Sigma$.
For example, in our above bivariate example, the plim for $\hat\beta_Z$ is, along the same lines,
$$
\begin{eqnarray*}
\text{plim}\,\hat\beta_Z&=&\frac{1}{\theta_Z\theta_V-\theta_{VZ}^2}[\theta_{V}(\beta_V\theta_{VZ}+\beta_Z\theta_{Z}+\theta_{UZ})-\theta_{VZ}(\beta_V\theta_V+\beta_Z\theta_{VZ}+\theta_{UV})]\\
&=&\frac{1}{\theta_Z\theta_V-\theta_{VZ}^2}[\beta_Z(\theta_{V}\theta_{Z}-\theta_{VZ}\theta_{VZ})+\theta_{V}\theta_{UZ}-\theta_{VZ}\theta_{UV}]\\
&=&\beta_Z+\frac{\theta_{V}\theta_{UZ}-\theta_{VZ}\theta_{UV}}{\theta_Z\theta_V-\theta_{VZ}^2},
\end{eqnarray*}
$$
so that the condition for consistency of $\hat\beta_Z$ despite error correlation becomes
$$
\theta_{V}\theta_{UZ}=\theta_{VZ}\theta_{UV}.
$$
If we normalize variances to one for simplicity, this would entail that
$$
\theta_{VZ}=\frac{\theta_{UV}}{\theta_{UZ}},
$$
where the condition for $\hat\beta_V$ would entail
$$
\theta_{VZ}=\frac{\theta_{UZ}}{\theta_{UV}},
$$
which can only simultaneously be true if the error correlations are the same ($\theta_{UZ}=\theta_{UV}$), call this value $\theta_{e}$, or have opposite signs ($\theta_{UZ}=-\theta_{UV}$). In the former case (the latter is analogous), $\theta_{VZ}=1$, so that
$$
\Theta=\begin{pmatrix}
1&\theta_{e}&\theta_{e}\\
\theta_{e}&1&1\\
\theta_{e}&1&1
\end{pmatrix},
$$
so that the southeastern matrix corresponding to the regressors clearly has reduced rank. More generally, if we substitute
$$
\theta_{VZ}=\frac{\theta_{Z}\theta_{UV}}{\theta_{UZ}},
$$
into
$$
\theta_{V}=\frac{\theta_{VZ}\theta_{UV}}{\theta_{UZ}},
$$
we obtain
$$
\theta_{V}=\frac{\theta_{Z}\theta_{UV}^2}{\theta_{UZ}^2},
$$
and hence a southeastern block of $\Theta$ of
$$
\theta_{Z}\begin{pmatrix}
\frac{\theta_{UV}^2}{\theta_{UZ}^2}&\frac{\theta_{UV}}{\theta_{UZ}}\\
\frac{\theta_{UV}}{\theta_{UZ}}&1
\end{pmatrix},
$$
which again has rank 1 only.
muis missing. Also,onesseems not to be used (again unlikely to be relevant). – Christoph Hanck Oct 11 '23 at 04:04I wonder if perhaps the cholesky decomposition function in R does something? That seems to be the only difference between our methods. Even if this is what is causing it, this is okay for my purposes and I just want to figure out why its happening
– Tommy Tang Oct 11 '23 at 19:36cov(varMat)that does seem rather far away fromcovMat, which, at the only quick glance I can take right now, seems "wrong", no? – Christoph Hanck Oct 12 '23 at 03:52varTrue <- t(chol(covMat)) %*% varRaw– Christoph Hanck Oct 12 '23 at 05:37chol(covMat)%*%t(chol(covMat)), which has a very small variance ofX[,3]of just point 0.07 relative to around three for the other variables - that maybe is too little signal to generate bias. – Christoph Hanck Oct 12 '23 at 13:36