8

Consider a scenario where a treatment is a U.S. state-level policy (some states adopted the policy while others did not) and the outcome are individual-level responses to a survey across American states. To make this scenario less abstract, let's say that the policy is gun-reform legislation and the outcome is individual perceptions of safety. As a result, treatment and outcome are aggregated at different levels.

Immediately, I can see an issue with this approach as it relates to identifying confounders to adjust for. For example, ideology seems like a clear confounder in this case, but ideology aggregated in what way? The ideological make-up of the state will impact the probability of adopting gun-reform legislation and the ideology of an individual will impact their perception of safety, but the ideology of a state and the ideology of an individual are two separate (but related) concepts. Also, if this is the case, is "ideology" even a confounder?

Naively, I can say that:

Gun Reform $\leftarrow$ Ideology $\rightarrow$ Safety Perception

But this isn't really true, is it? Because, what I'm actually assuming are two separate measures of ideology entirely:

Gun Reform $\leftarrow$ State's Ideological Makeup

Safety Perception $\leftarrow$ Individual's Ideology

How might one handle situations such as these, where problems seem to be driven solely by the different level of aggregation between treatment and outcome? One obvious course might be to average the responses of the outcome and collapse to the state-level, but this has a serious drawback of severely reducing N. Is there any way to move forward without re-aggregating?

  • Generally speaking the standard causal inference pipeline assumes that the variables are given, so from that perspective this could be seen as a question of domain knowledge. If we are dealing with quantities at the state level, and the state's ideological makeup is something other than the aggregate of the individuals' ideological make-up, then those should probably be two different variables - I'm not sure this is a problem of aggregation. – Scriddie Dec 28 '23 at 12:03
  • @Scriddie My issue with adjustment for two different variables is that only "ideology" - in this example - is a true confounder. An individual's ideology impacts their individual perception of safety but it alone does not cause a change in Pr(Gun Reform). Likewise, a state's ideological makeup impacts Pr(Gun Reform), but it does not have a direct effect (I don't think) on an individual's ideology. Conceptualized at a unit-agnostic level, ideology is a clear confounder, but, accepting the reality of the unit of analysis issue, "ideology" itself does not work... I don't think. – Brian Lookabaugh Dec 28 '23 at 15:52
  • 4
    As for the measurement levels, perhaps it helps to think of what a hypothetical experiment would look like. Would you like to randomly assign different gun laws to different individuals, or would you like to assign different gun laws to different states? That should help answer what level of aggregation you are interested in. Once you have chosen a level of aggregation, it seems to me that all variables would have to work on that level for your structural causal model to be able to generate samples. – Scriddie Dec 28 '23 at 16:17

1 Answers1

0

One of the advantages of building models in general is that it costs relatively little to test different formulations to see which one works best for your objectives. There are advantages and disadvantages to modelling each level of aggregation, and the choice must be informed by

  • the answers that you want your model to provide you;
  • the data that you have or have the resources to collect; and
  • domain knowledge, i.e. knowledge about the shape of the causal model

In this case, like you I (someone lacking any domain knowledge, so take this with a grain of salt) see two possibilities:

  1. The first is similar to your first one, where the residents' "ideology" influences both gun reform legislation and perception of safety. I would argue that gun reform legislation also affects the perception of safety, like so:

causal model one level

  1. The second is similar to your second one, where there is a distribution of "ideology" among the voters (not residents) of each state, which influences the "ideology" of the legislators of that state, which influences the probability that they pass gun reform legislation. Both the ideology of the voters and whether or not gun reform has been codified into law influence the perception of safety among the residents of each state:

causal model two levels

The first model is simpler. If you have any data on each state "ideology" (be it their residents, their voters, their legislators, etc), you can plug it in the "ideology" node. However, because each "state ideology" is poorly defined (e.g. in one state you might have a phone survey of a sample of registered voters, in another you might have an internet poll of residents), there might be a lot of model uncertainty in whatever quantity you're interested in estimating.

The second model is more explicit in its description of how each variable influences the others. It also makes clear that what directly matters for gun reform legislation is the legislators' ideology; if you have data on lawmakers' opinions (e.g. a survey), it makes data on residents' preferences less influential (although not completely irrelevant, since it might reveal something about their perception of safety). Also, because it is more explicit and specific, you can focus you data collection resources on what matters (voters and lawmakers), and the model uncertainty in your estimate will probably (but not necessarily) be smaller. On the other hand, if you are limited to using existing datasets, it might be harder to find data that you can just plug into each node without further analysis and modification.

LmnICE
  • 816
  • 6
  • 18