What are the pros and cons of employing LASSO for causal analysis?

Question

Statistical Learning and its results are currently pervasive in Social Sciences. A couple of months ago, Guido Imbens said: "LASSO is the new OLS".

I studied Machine Learning a little bit, and I know that its main goal is prediction. I also agree with Leo Breiman's distinction between two cultures of statistics. So, from my point of view, causality is opposed to prediction to some extent.

Considering that sciences usually try to identify and understand causal relations, is machine learning useful for this goal? In particular, what are the advantages of LASSO for causal analysis?

Are there any researchers (and papers) addressing those questions?

Well, OLS will not produce estimates of causal effects very often, so if LASSO is to replace OLS, it does not have the "burden" of discovering causal relations. That said, have a look at this page for some recent research in econometrics on causal effects and sparse methods: http://www.mit.edu/~vchern/ — Christoph Hanck, Feb 04 '16 at 05:03
For me the more natural distinction here would be that by Shmueli ("To Explain or to Predict", 2010) rather than Breiman's, but perhaps Breiman's distinction is also fine. — Richard Hardy, Feb 04 '16 at 07:30
I don't disagree there: in cases in which OLS can be used to estimate casual effects, I do not see why lasso should not also be applicable — Christoph Hanck, Feb 04 '16 at 11:06

score 3 · Answer 1 · answered Feb 04 '16 at 16:52

3

I don't know all of them, I'm sure, so I hope no one will mind if we do this wiki-style.

One important one though is that the LASSO is biased (source, Wasserman in lecture, sorry), which while acceptable in prediction, is a problem in causal inference. If you want causality, you probably want it for Science, so you're not just trying to estimate the most useful parameters (which happen strangely to predict well), you're trying to estimate the TRUE(!) parameters.

answered Feb 04 '16 at 16:52

one_observation

1,625

Perhaps! That's why I'm eager to have other people chime in. – one_observation Feb 04 '16 at 17:02
1

@GuilhermeDuarte, It is the overall error that matters, not bias. Under square loss we care about MSE, and that equals Bias$^2$ + Variance. Lasso may deliver a good tradeoff with relatively small MSE despite some bias and as such should be more useful for causal analysis than unbiased estimation with high MSE. The real problem with lasso is that it is difficult to get confidence intervals for it; currently that is an active research area. – Richard Hardy Mar 13 '17 at 19:17
1

@GuilhermeDuarte, just as in prediction, in causality we need precise estimates of model coefficients. Precision can be measured in terms of absolute error, squared error, etc., but not bias. For example, you can have low bias and high estimation error at the same time. So looking at bias you would think you are doing fine, but that would be misleading as the estimation error (absolute, squared or whichever) is high. It is estimation error, not bias that matters when you consider effect sizes, statistical significance etc. in causal inference. – Richard Hardy Mar 14 '17 at 13:43
This is something so basic that it is difficult to find references for. I thought the way you are thinking until Glen_b said to me what I said to you, and it has made perfect sense to me ever since :) I think the best way to understand this is through counterexamples showing that unbiased estimator with high variance works worse than biased estimator with low variance as long as the MSE of the latter is below the MSE of the former. And it does not really matter whether we are after causal or predictive modelling. – Richard Hardy Mar 14 '17 at 16:48
My doubts are strongly related to GuilhermeDuarte’s For example, @RichardHardy write: “just as in prediction, in causality we need precise estimates of model coefficients.” I strongly disagree with this sentence. In prediction we are not focused on coefficients, we can also completely ignore them. Exactly for this reason, for example, we can compare the performance in prediction between regression and neural network models. Only predicted values per se matters. – markowitz Feb 06 '20 at 11:33
@RichardHardy, this is not the place for long discussion but if you would to defend your position above let me known. If so I open a new question. – markowitz Feb 06 '20 at 11:34
@markowitz, for any given model, imprecise estimation of coefficients will yield large prediction errors. Yes, in prediction we can ignore the point estimates of parameters, but if the estimates are off, the prediction will be off. In that sense we want precise estimates. Some confusion in this discussion might be coming from possibly different definitions of the true parameters (structural/causal vs. reduced-form) -- we might be talking of different things. See e.g. https://stats.stackexchange.com/questions/265739/. Opening a new question is not a bad idea. Please notify me if you do, thanks. – Richard Hardy Feb 06 '20 at 11:54
So clarified sound me much better. If I will open another related question I notify you. – markowitz Feb 06 '20 at 15:33
@markowitz, I am not sure myself anymore. There might be a confusion of structural/causal vs. reduced-form parameters in my argumentation. I would have to think deeper about it before concluding. – Richard Hardy Feb 07 '20 at 13:33

score 0 · Accepted Answer · answered Oct 20 '22 at 15:27

I guess 6 years later, I can understand perfectly the answer to my own question. Suppose a causal model with $Y$ being an effect of $X$ and other covariates $L$.

LASSO will help someone to come up with a estimate for $E[Y|X, L_1, L_2, ...]$. If one proves that $E[Y|do(X)] = E[Y|X, L_1, ... ]$, then it's alright. Not the case always.

A common case is that $E[Y|do(X)] = E_{L_1,...}[E[Y|X,L_1, ...]]$ if L_1, L_2,... are observed confounders and backdoor adjustment is sufficient for identification. In this case, one could use something like g-formula over LASSO, and then they can estimate the causal effect.

Of course, this will always depend on the proof of identifiability. Estimation $\neq$ identifiability. LASSO is a model for estimation.

What are the pros and cons of employing LASSO for causal analysis?

2 Answers2