Where does multilevlel modeling fit in with causal inference?

Question

I am just now exploring the world of multilevel modeling and I am wondering how to contextualize MLM within the broader toolkit of causal inference techniques. In one of my graduate econometrics course, I was taught the fixed effects v. random effects dichotomy that Huntington-Klein helpfully breaks down and criticizes (random effects are only plausible with no correlation between fixed effects and right-hand side variables). In my brief exploration of Bayesian statistics via McElreaths Statistical Rethinking, he argues that MLMs should probably be the default over the standard regression model in most disciplines.

Conceptually, some of these ideas are fairly new to me, especially with how MLMs fit into the causal inference toolkit. As a result, I have three questions on the topic:

If MLMs incorporate fixed effects, should one consider using fixed effects anymore as a tool to make causal inferences?
If MLMs provide more detail on group or unit-specific intercepts and slopes, should users consider using standard regression adjustment anymore?
Where do MLMs fit into the broader causal inference toolkit. Given that regression is still used for estimation with most strategies (matching, DID, IV, RDD, etc.), can one use MLMs instead of the standard regression model for these contexts? Should one consider using MLMs?

Graham Wright · Accepted Answer · 2023-06-06T13:06:00.983

I think this question is conflating a few distinct issues.

First of all, the terms "multilevel modeling," "random effects," and "fixed effects" are all used in different ways by different people. This post outlines FIVE different ways people define the difference between fixed and random effects.

Second, the most common use of MLM is for when you have observations "nested" at multiple levels (so students nested within schools, or observations nested within people in a longitudinal dataset). The question there is how you should deal with the higher level "units" (schools or people). One approach is to treat them as "fixed effects" (basically include a dummy variable for each "group"). On on hand, this approach controls for ALL possible bias at the group level, so that's good. On the other hand, precisely because of that, it doesn't allow you to actually analyze the effect of any group level variable (like school size, or "race" in a longitudinal dataset). Treating the groups as "random effects" (allowing the intercept and/or one or more coefficient to vary randomly at the group level) allows you to control for other group level variables (and to do various other cool things like empirical Bayes estimation of group level characteristics), but also opens you up to group level bias if you haven't controlled for all of the important group level factors (which is always the case to some extent).

So in a nutshell, that's the trade off between fixed and random effects for using MLM to analyzed clustered data. How you navigate that trade off depends on your research question and how the data are set up.

Now, as you note, some Bayesians (like Andrew Gelman and perhaps also McElreaths) advocate using MLM (the "random effects" approach) even when there is no "nesting" of data, because Bayesians see all model parameters as inherently "random." But this is a more complicated approach and, in my experience, isn't yet super common among day-to-day statisticians due to various philosophical and logistical issues.

Also, any time you run a normal OLS model and include dummy variables for race, you could also correctly say that you are including "fixed effects" for race....but people don't usually consider that "multilevel modeling."

What does all of this have to do with causal inference? Nothing and everything. Causal inference is really tough, and "running a regression model" on observational data is generally regarded as a pretty suboptimal way to establish causality...although sometimes it's all we've got. The extent to which we can interpret the results of a model in causal terms depends both on the model specification and the underlying theory behind it. MLM is just one way of specifying models to deal with particular problems that might contribute to bias or error in our estimates of coefficients and/or standard errors. If deployed well MLM might make a causal interpretation of a particular coefficient in a particular model more defensible, or it might not. But like any kind of model specification MLM (either fixed or random effects) has no inherent power to make models results causally interpretable, any more than "including an interaction term" or "including a control for age," or any other way we might modify the specification of a model.

Scriddie · Answer 2 · 2023-06-06T12:15:35.620

In causal inference, this is the question of conditional treatment effects

"Regression" simply describes the procedure of fitting predictions from a set of features to a target value. MLM is no less regression than OLS, so there is no dichotomy. MLM proposes to use dummy variables for group memberships that can be used for group-specific intercepts or in interaction terms with other variables. These variables are predictors as much as any other, and how they should be included in causal inference depends on the directed acyclic graph assumed to describe the data generating process.

OLS. Assume a student's effort $X$ causes their test performance $Y$, but is confounded by the classroom $C$ they are in. One could use $C$ as a control variable such that $Y_i \sim \beta_1 X_i + \beta_2 C_i$.

MLM. Alternatively, one could use a MLM approach. Let $Y_{ij}$ be student $i$ in class $j$. We can then model student performances as $Y_{ij} \sim \beta_{j} X_{ij}$ where $\beta_j$ is the slope for classroom $j$.

In this case, instead of assuming that classroom $C$ confounds a universal causal relationship, the MLM approach models a classroom-specific slope. The underlying causal model is that the classroom variable $C$ interacts with the causal effect of student effort $X$ on performances $Y$.

In causal inference: CATE vs. ATE

The idea that a causal effect may be heterogeneous across different groups is well known in the causal inference literature. The primary estimand used to capture this concept is the "conditional average treatment effect" (CATE), which contrasts with the "average treatment effect" (ATE) that is the standard estimand for causal effects. Estimating the CATE is very similar to MLM. The idea is to estimate a causal effect conditional on other features, which can be understood as estimating the slope within subgroups in a linear model. Thus, what MLM is to OLS in statistical inference, CATE is to ATE in causal inference. Note that estimating the CATE poses a lot of additional difficulties compared to estimating the ATE, which complicates its application.

The CATE/ATE comparison is very useful. Although, it then leaves me with questions of the comparative advantage for modeling heterogenous treatment effects using a MLM instead of a standard OLS or some GLM model with an interaction term. — Brian Lookabaugh, Jun 06 '23 at 14:55

Where does multilevlel modeling fit in with causal inference?

2 Answers2

In causal inference, this is the question of conditional treatment effects

In causal inference: CATE vs. ATE