How to isolate the impact of one variable via regression?

Question

I know the dependent variable $Y$ is a function of $X$ and $Z$, while $X$ and $Z$ are orthogonal. I want to quantify the impact of $X$ on $Y$, but the problem is that $X$ is unobservable to me. Is it plausible to regress $Y$ on $Z$ and take the residuals or $1-R^2$ to estimate the impact of $X$? To be more specific, let's assume a simple linear relationship between $Y$, $X$, and $Z$: $$Y = X +aZ $$ Here $Y$ and $Z$ are observable, but $X$ and $a$ are not. So in this case how can I estimate the value of $X$? Can it be done via regressing $Y$ on $Z$?

What do you mean by "estimate the value of $X$"? Isn't $X$ a random variable? — Adrià Luz, Feb 04 '23 at 14:15
@ShawnHemelstrand It means that I have no way to directly measure the value of $X$. — Xiangyu WANG, Feb 05 '23 at 02:16
@AdriàLuz Yes it is a random variable. What I want is to get the value of each $X_t$ given $Y_t$ and $Z_t$. — Xiangyu WANG, Feb 05 '23 at 02:19
@dimitriy This question also puzzles me. By the definition of $Y$, there is no error term or an intercept. But in practice I guess there can be because of the measurement error or other disturbances existing in the sample data. For the reasons mentioned here, I think it is better to include an intercept when fitting the real data: https://stats.stackexchange.com/questions/7948/when-is-it-ok-to-remove-the-intercept-in-a-linear-regression-model — Xiangyu WANG, Feb 05 '23 at 02:38
If is is unobservable, couldnt this just be included as a latent factor in path analysis? — Shawn Hemelstrand, Feb 05 '23 at 02:45
@dimitriy What I am now thinking about is to run a regression between $Y$ and $Z$ then extract $\hat{a}$. Then I get $\hat{x}_t$ by $y_t - \hat{a}z_t$. I think it reasonable as long as $\hat{a}$ is a consistent estimate of $a$. Even if there indeed exists an intercept, I can still compare $\hat{x}_i$ with $\hat{x}_j$ because the intercept is constant for all. But I am not quite sure and expect your suggestions. — Xiangyu WANG, Feb 05 '23 at 02:47
@ShawnHemelstrand Thanks a lot for your comments. I am not familiar with path analysis and how to extract the effect of latent variables through this analysis. Could you please kindly provide some more details? Like in $Y = X+aZ$, how to measure the impact of latent variable $X$ on $Y$? — Xiangyu WANG, Feb 05 '23 at 03:02
Only in an unrealistic mathematical situation where no other variable relates to $Y$ and the model is a perfect fit could you possibly justify attributing all residual variation to $X.$ Since, in all applications, there is an assumed random error term in this model, you have no way of distinguishing the $X$ coefficient from the contribution of that error. — whuber, Feb 05 '23 at 15:15

score 3 · Accepted Answer · answered Feb 05 '23 at 04:09

3

If $X$ is truly orthogonal to $Z$, and the coefficient on $X$ is truly equal to 1, and there is no error (i.e., $X$ and $Z$ are truly the only causes of $Y$), then regress $Y$ on $Z$ (with no intercept) and use the residuals from the model as $X$. That is, fit the model $$ Y = a Z + \varepsilon $$ And set $X = Y - \hat{a} Z$. This can be done because $\hat{a}$ is identified from the fact that $X$ and $Z$ are orthogonal, so the bias due to omitting $X$ is 0.

This requires an insane and highly unlikely set of assumptions and so would have no practical utility. But under the assumptions you wrote in your post (unreasonable as they are), estimating $X$ is straightforward. This is possibly related to this somewhat similar post involving an impossible set of assumptions.

answered Feb 05 '23 at 04:09

Noah

33,180
3
47
105

My only issue with this answer is that $X$ is unobservable as specified by OP, so estimating $X$ with a vanilla regression would not be straightforward. – Shawn Hemelstrand Feb 05 '23 at 05:06
@ShawnHemelstrand I don't think so. $Y$ and $Z$ are observed, $a$ is identifiable from the data, and that leaves an equation with one unobserved quantity, $X$. This is unlike any real problem in statistics because OP claims there is a deterministic relationship among the variables with no error at all. – Noah Feb 05 '23 at 07:08
Now that you frame it like that it makes more sense. +1 – Shawn Hemelstrand Feb 05 '23 at 07:24
Thanks a lot for the clarification. For the real-world case where such a deterministic relationship may hold, I come up with the alpha return of a portfolio (yes it is not a pure statistics or causal inference problem since the relationship is imposed by definition). To calculate the unobservable alpha return 1) we regress the total return to the market return to get $\beta$, 2) we take the regression residual to measure the alpha return. It may not sound very reasonable in statistics but I see the financial literature take this procedure. – Xiangyu WANG Feb 06 '23 at 03:30

How to isolate the impact of one variable via regression?

1 Answers1