Presenting DAGs in Journal-Quality Research

Question

One of the benefits of DAGs is that they openly state the causal assumptions a researcher is making, allowing for greater transparency. This is nice in theory. However, in practice, the DAGs I have personally created are a tangled web of nodes and arrows. Visually speaking, when the complexity of a DAG increases, I find that the transparency benefit of DAGs decrease.

With that being said, does anyone have an idea of how to present the DAG (or information that the DAG generates) in a visually friendly manner. The idea is to include a DAG in a peer-reviewed journal that, by and large, does not publish overtly causal research. With that being said, a highly complex DAG is going to be a hard sell.

Just wondering how I can retain the visual transparency of a DAG without bombarding the reader with a highly complex figure. There is no prior example in my field for presenting DAGs in a journal article (beyond educational material where a researcher uses a very simple DAG as a teaching element), so I do not have precedent to guide me.

I don't have an over-arching solution, but I can recommend simplifying your DAG as much as possible. E.g., might it be possible to coalesce some nodes, where you're not really interested in any interaction among those nodes, but only how that group of nodes affects other things? — Adrian Keister, Mar 02 '23 at 17:34
@AdrianKeister unfortunately, most of the nodes are confounders that represent distinct "things". Failure to include any of these would most likely result in a reviewer commenting, "why didn't you include X established variable in the graph?" or something along those lines. — Brian Lookabaugh, Mar 02 '23 at 17:54
Interactions are implied but not explicitly represented in DAGs. — Alexis, Mar 17 '23 at 17:51

Scriddie · Answer 1 · 2023-07-20T12:04:15.383

How to simplify the visual presentation of a DAG

Note of caution: You can only reasonably simplify the presentation if some parts of the DAG can be grouped together, or if not all variables are (equally) important. If things can very easily be grouped, you may want to check if you are using the right level of representation. If not everything is (equally) important, check if you really need/want to use all variables.

Once you have decided on a set of variables, here are some strategies to simplify the visual presentation of a DAG connecting them.

1. Multi-dimensional variables Say you have 10 variables $(X_1, ..., X_{10})$. Perhaps the first 5 describe one concept (e.g., health indicators), and the other 5 another (e.g., education measures). If the role of variables within the groups is similar enough, you could present them as two high-level variables $H$ (health) and $E$ (education).

2. Group by DAG position Another way to reduce variables is to group by their position in the DAG.

For example, if you have multiple confounders, you can simplify by using a placeholder that indicates that there are multiple variables with the same function (this is frequently used for unobservable variables, of which we do not know how many there may be).

[Comment: Some people (especially in stats) use variable names without a surrounding circle to denote observed variables, whereas variables inside circles are taken to be unobserved. Others place observed variables in circles (as seen above), and distinguish unobserved variables by name, color, etc., so there are definitely a few different notation conventions. It may be worth checking other research works using DAGs in your area, but long as it's consistent, it should be fine.]

3. Other visual aids You could visually group variables into a containing shape, use colors, or different types of arrows or nodes. This could be used for example to delineate context variables from model variables.

Non-visual presentation

If your DAG is really big (e.g., in the 100s of nodes), it might make more sense to provide it in a machine-readable format like an edge list or adjacency matrix.

+1 Brian Lookabaugh, note that "grouping variables together" does not mean "omitting the variables" even if the variables are implicated in backdoor path confounding, etc. All the grouping does is simplify communication of the structure of specific causal relationships. For example, if $L_1, L_2, L_3,$ and $L_4$ all confound a relationship between $A$ and $Y$ because they are all individually direct causes of both $A$ and $Y$, you need not draw all 8 arrows from the $L$s to $A$ and $Y$, but can label a single node as "$L_1, L_2, L_3,$ and $L_4$" with one arrow to $A$ and one arrow to $Y$. — Alexis, Mar 17 '23 at 17:55

Presenting DAGs in Journal-Quality Research

1 Answers1

How to simplify the visual presentation of a DAG

Non-visual presentation