I'm doing my first formal causal analysis and I'm a bit puzzled by what and how to include it.
I've read a few questions and guides about how to do it but I have no theoretical background (I haven't read manuscripts, but I have been reading books about it such as Causal Inference: The mixtape). Some sources describe that each node should be an event/condition, like if an ambulance goes by and a dog barks, or birth defects and underweight. Others use variables, such as sex, weight or abstract variables. Some books expressly mention to relate variables and not outcomes.
In my case, I am studying the effect of a viral infection in people and measuring the outcome in a sequencing data set (RNA expression and DNA methylation sequencing). Each person's environment and conditions affect the outcome, which we cannot control, but we can collect a few. For example it is know that age affects DNA methylation. At the same time two people's with the same age and sex respond differently in the outcome variable. In this analysis I have several samples of the same person at different time point, so age changes between them.
There is also a batch effect introduced by how the samples were processed before measuring the outcome. But it is unrelated to the other variables. You can see here my causal diagram of the variables I am considering:
If there is no causal relation but there is a relation between two variables, how should these be represented in a DAG? Should these properties be considered variables depending on the patient's or just independent variables?
