Let us recall Bayes Theorem, where $\mathcal{D}$ is the data and $\theta$ is our parameter(s) of interest.
$ \operatorname{p}(\theta|\mathcal{D}) = \frac{{\operatorname{p}(\mathcal{D}, \theta)}}{\operatorname{p}(\mathcal{D})} = \frac{{\operatorname{p}(\mathcal{D}|\theta)}{\operatorname{p}(\theta)}}{\operatorname{p}(\mathcal{D})} $
We then have:
- The conditional probability $\operatorname{p}(\theta|\mathcal{D})$. As a conditional probability, it is only a function of $\theta$, and $\mathcal{D}$ is assumed to be given.
- The joint probability $\operatorname{p}(\mathcal{D}, \theta)$. As a joint probability, it is a function of two random variables, thus $\mathcal{D}$ is not given, but random.
- The marginal probability $\operatorname{p}(\mathcal{D})$. As a marginal probability, it is a function of a random variable, thus $\mathcal{D}$ is not given, but random.
It seems that $\mathcal{D}$ has a dual-role. In the left side of the equation it is a constant, whereas in the right hand side it is a (vector of) random variable(s). Is the posterior a function of $\theta$ and $\mathcal{D}$ (as in the right-hand side) or is the posterior just a function of $\theta$ (as in the left-hand side)?
Auxiliary Questions
I have omitted the case of the likelihood $\operatorname{p}(\mathcal{D}|\theta)$, but maybe it is at the core of what is happening. The likelihood is a function of $\theta$, once the data has been observed (https://stats.stackexchange.com/a/138707/180158). However, it is also used as a probability density function of the data given $\theta$. Otherwise, we would not use the chain rule to factorize the joint distribution $\operatorname{p}(\mathcal{D}, \theta)$ into the conditional and marginal distributions $\operatorname{p}(\mathcal{D}|\theta)\operatorname{p}(\theta)$. Does this dual-behavior depend on the presence of the other terms in Bayes theorem?
In practice we tend to work only with the unnormalized (log) posterior (e.g. in Stan) : $\operatorname{p}(\theta|\mathcal{D}) \propto \operatorname{p}(\theta, \mathcal{D})$. Does the use of the unnormalized (log) posterior change anything regarding our terminology or assumptions of what $\mathcal{D}$ is?