Directed Acyclic Graph of Stan Model

Question

I have the following Stan model:

```{stan output.var=""}
data{
  int N;
  vector[N] x;
  vector[N] y;
  int N_rep;
}
parameters{
  real<lower = 0> mu_x; 
  real<lower = 0> mu_y; 
  real<lower=0> sigma;
}
transformed parameters{
  real prob_positive_diff;

  prob_positive_diff = mu_y >= mu_x;
}
model {
  x ~ normal(mu_x, sigma);
  y ~ normal(mu_y, sigma);

  sigma ~ normal(0, 10);
}
generated quantities{
  real p_x_new;
  real p_y_new;

  {
  vector[N_rep] x_new_ind;
  vector[N_rep] y_new_ind;
  for(n in 1:N_rep){
    x_new_ind[n] = normal_rng(mu_x, sigma) > 15 ? 1.0 : 0.0;
    y_new_ind[n] = normal_rng(mu_y, sigma) > 15 ? 1.0 : 0.0;
  }
  p_x_new = sum(x_new_ind)/N_rep;
  p_y_new = sum(y_new_ind)/N_rep;
  }

}
```

I want to construct the directed acyclic graph that represents this model.

My attempt at doing so is as follows:

Note that $y_{new}$ and $x_{new}$ in the graph correspond to y_new_ind and x_new_ind in the code, respectively.

Based on my research, I have done the following:

The nodes represents random variables.
The directed edges represent dependencies.
Has no loops/cycles.

So we have that

$y$ depends on $\mu_y$ and $\sigma$,
$x$ depends on $\mu_x$ and $\sigma$,
$y_{new}$ depends on $\mu_y$ and $\sigma$, and
$x_{new}$ depends on $\mu_x$ and $\sigma$.

Assuming I did everything else correctly, there are 2 other question at the forefront of my mind as things that I am unsure of:

Why did I not include prob_positive_diff in the graph?

My reasoning for this is that prob_positive_diff is not, based on my understanding, a random variable per se; rather, it is a variable that is simply holding a single value: the probability that mu_y $\ge$ mu_x, where mu_y and mu_x are random variables.

Why did I not include p_x_new and p_y_new in the graph?

Honestly, I'm not sure about this one. I didn't include them because they were derived from x_new_ind and y_new_ind using simple operations, but I have a suspicion that, since they are random variables, they should be included below y_new_ind and x_new_ind with directed arrows going to them, respectively.

I would greatly appreciate it if people could please take the time to review my directed acyclic graph of this model.

Does this help you: https://stats.stackexchange.com/questions/215034/bayesian-errors-in-variables-model-definition-in-jags-and-symbolically/215043#215043 ? — Tim, May 22 '18 at 11:22
@Tim Thanks for the response. I just read through it, and it is indeed relevant and seems to confirm that most of my graph is correct. However, the parts that are causing me the most anxiety are the parts that aren't included in that example. So, for instance, (1) how do I handle vector[N_rep] x_new_ind;, vector[N_rep] y_new_ind;, p_x_new, and p_y_new? (2) What about prob_positive_diff? At the moment, I've only graphed y_new_ind and x_new_ind as $y_{new}$ and $x_{new}$, but I've left out vector[N_rep] x_new_ind; and vector[N_rep] y_new_ind;; And I'm unsure as to whether ... — The Pointer, May 22 '18 at 11:32
these parts are correct. In fact, I think it is likely that I've made an error, but I'm not familiar enough with Bayesian networks and their corresponding directed acyclic graphs to tell. — The Pointer, May 22 '18 at 11:33
DAG's describe the statistical model, while Stan is a programming language designed to estimate such models, there is no 1:1 translation. You can do things in Stan that are beyond the definition of the model (many stuff happening in generated quantities or transformed parameters, also the parameters section is just about defining internal Stan objects) — Tim, May 22 '18 at 11:35
@Tim Ahh, that makes sense. I was thinking there should be a 1:1 translation, so the extra variables I had in the transformed parameters and generated quantities block were causing me anxiety. What do you think of my graph? Does it look "correct" to you? Or is there anything I should be adding/removing to, in your opinion, make it more accurate? For instance, should I be adding any of the aforementioned variables to the graph, such as prob_positive_diff? — The Pointer, May 22 '18 at 11:38

Tim · Accepted Answer · 2018-05-22T12:01:14.020

1

In this answer of mine you can find you can find an example of DAG of the Bayesian model described in JAGS language. The thing to remember is that DAG's describe the statistical model, while Stan is a programming language designed to describe and then estimate such models. What follows, there is no one-to-one translation between the code and the probabilistic model. Stan is a programming language that can be used to define many things that go beyond the definition of the model (many stuff happening in the generated quantities, or transformed parameters, also the parameters section is just about defining internal Stan objects).

Your DAG looks almost fine, but there are two things that are incorrect:

You don't need the nodes for x_new and y_new, since they are just random draws from your model, they are neither parameters, nor data.
Your definition of the priors for mu_x and mu_y is incorrect, since you define them only in the parameters section, what means for Stan that you assume uniform distribution for them (rather then normal, as in your DAG).

edited May 22 '18 at 12:01

answered May 22 '18 at 11:48

Tim

138,066

Excellent answer! Thank you for the clarification. :) Indeed, I have made an error in specifying normal priors for mu_x and mu_y. So, in reality, the DAG should list them with as $\mu_x \sim \text{uniform}(0, \infty)$ and $\mu_y \sim \text{uniform}(0, \infty)$? – The Pointer May 22 '18 at 12:01
1

@ThePointer yes, but some would argue that uniform distribution needs to have finite bounds, otherwise this is an improper distribution (some people seem to be calling it "flat" to avoid confusion) – Tim May 22 '18 at 12:07
Understood. Thank you again for the clarification; I really appreciate it. – The Pointer May 22 '18 at 12:08

Directed Acyclic Graph of Stan Model

1 Answers1