6

Consider a situation where a public formula specifies the amount of treatment (the dollar amount of financial aid) students receive and I am interested in estimating the causal effect of treatment on a given outcome.

For example, say that I know that treatment ($X$) is generated by the following simplistic formula $X$ = $Z_1$ + ($Z_2$ / $Z_1$). Also assume that $Z_1$ and $Z_2$ also have some impact on the outcome so that $Z_1$ and $Z_2$ are confounders.

Since treatment assignment is fully understood, do I simply need to adjust for the formula above (and maybe with some predictors of outcome to increase statistical precision) to identify a causal effect?

Alternatively, if $Z_1$ and $Z_2$ are not confounders - but do determine treatment assignment - is any adjustment required? I understand that no confounders basically should imply that no adjustment is necessary. However, something just feels off about this approach.

  • So X is continuous and is the amount of aid given, and Z1 + Z2/Z1 is the formula, such as need and merit. I don't understand. X is perfectly predicted by Z1 + Z2/Z1 and you want to include Z1 and Z2 as variables in the model? – AdamO Nov 08 '23 at 21:24

2 Answers2

11

If you are certain that $X$ is caused only by $Z_1$ and $Z_2$ (however the data-generating model for $X$ is), then the causal effect of $X$ on $Y$ is nonparametrically identified. However, there is a problem with this system as you have described it: $X$ is deterministically caused by $\{Z_1, Z_2\}$, which means another critical assumption, positivity, is violated. Adjusting for $Z_1$ and $Z_2$ (e.g., by stratifying on them) eliminates all variation in $X$, which means you can't compute the adjusted association between $X$ and $Y$.

There are two ways this can be avoided: 1) there is another variable, e.g., $U_X$, that causes variation in $X$ and is independent of $Y$ given $Z_1$ and $Z_2$, or $X$ is non-deterministic given $Z_1$ and $Z_2$ (e.g., is stochastic), in which case positivity is not necessarily violated and you can use standard methods (including stratification or IPW) to estimate the effect of $X$ on $Y$, and 2) you can correctly specific the outcome model and the treatment model is not nested within it, in which case there is still variation in $X$ after adjusting for $Z_1$ and $Z_2$ but the confounding has been adequately removed, so you can estimate the effect of $X$ on $Y$ using an outcome regression model (parametric g-formula).

If neither of these conditions are met, then you will not be able to use standard covariate adjustment methods to estimate the effect of $X$ on $Y$; it is impossible to disentangle the effect of $X$ from the effects of $Z_1$ and $Z_2$.

Below is an example using a simulated dataset in R.

set.seed(300)
n <- 1e5
Z1 <- rbinom(n, 4, .5) + 1
Z2 <- rbinom(n, 8, .4) + 1

Deterministic X

X <- Z1 + Z2/Z1

Confoudned Y; True effect = 1

Y <- X + Z1^2 - Z2 + rnorm(n)

Stratification estimate, deterministic X

ZZ <- interaction(Z1, Z2) sapply(levels(ZZ), function(z) { lm.fit(cbind(1, X[ZZ==z]), Y[ZZ==z])$coef[2] * mean(ZZ==z) }) |> table(useNA = "i") #> #> <NA> #> 45

Regression estimate, deterministic X

lm.fit(cbind(1, X, Z1^2, Z2), Y)$coef[2] #> X #> 1.008688

Stochastic X

Xs <- X + rnorm(n) Ys <- Xs + Z1^2 - Z2 + rnorm(n)

#Stratification estimate, stochastic X sapply(levels(ZZ), function(z) { lm.fit(cbind(1, Xs[ZZ==z]), Ys[ZZ==z])$coef[2] * mean(ZZ==z) }) |> sum() #> [1] 1.001835

Regression estimate, stochastic X

lm.fit(cbind(1, Xs, Z1^2, Z2), Ys)$coef[2] #> Xs #> 1.000814

IPW estimate, stochastic X

w <- WeightIt::weightit(Xs ~ Z1 + I(Z2/Z1))$weights

lm.wfit(cbind(1, Xs), Ys, w = w)$coef[2] #> Xs #> 1.012656

Created on 2023-11-13 with reprex v2.0.2

With a deterministic $X$, stratification doesn't work (all estimates are NA within strata of $Z_1$ and $Z_2$ because there is no variation in $X$). But using the true outcome model (which the treatment model is not nested within) we can get the correct treatment effect estimate. With a stochastic $X$ (simply adding noise to $X$ independent of $Y$), we get the correct answer using stratification, regression and IPW.

Noah
  • 33,180
  • 3
  • 47
  • 105
2

From your description controlling for the confounders should suffice to identify the causal effect. If you are unsure, you can always check the identification of the causal effect using tools like DoWhy, DAGitty, or others. As a general rule, when there is no confounding, there is no need for adjustment. In any case I'd recommend always writing down the structural equation model in full and drawing the DAG just to be sure nothing got lost along the way.

Scriddie
  • 2,244
  • 6
  • 13
  • 2
    Only the first sentence is an attempt to answer the question. I think more detail is needed. – AdamO Nov 08 '23 at 21:21