I have a problem that is formulated as latent variable model. Here we have a observed vector $\bar{x}$ and a latent variable scalar y, and an observed scalar z.
The graphical model looks like $ \bar{x} \rightarrow y \rightarrow z $ i.e z depends on the latent variable y, and y depends on the vector x. Here y and z are assumed to be Gaussian distribution
The inference problem is to find distribution of y given x i.e $ \mu_{y|x}$ and $\sigma_{y|x} $.
With this I would get a normal distribution which can be sampled and used.
I have come across some theory which says the conditional distribution can be represented in a parameterised form and can be solved for the parameters using Gradient descent. Let us consider a very simple parameterisation as $ \mu_{y|x}$ and $\sigma_{y|x} $ are linear combination of x.
Hence they can be represented as $$ \mu_{y|x} = w_\mu^Tx $$ $$ \sigma_{y|x} = w_\sigma^Tx $$
Now $p(y|z)$ and $p(x|y)$ can be formulated in the above mentioned way
$$ \mu_{z|y} = ay + b $$ $$ \sigma_{z|y} = cy + d $$
where a, b, c and d are the parameters. Since there is a latent variables, y is not observed, hence it cannot be directly substituted in the objective function to compute the gradient. Computing gradient analytically is not so important as we can rely on tensorflow for the same.
But, for this case since y is not observed we need to take the overall likelihood of x, y and z i.e $ p(x, y ,z) = p(z|y)p(y|x)p(x) $
Now the overall joint likelihood can be given as
$$ \frac{-1}{\sqrt{2\pi (w_{\sigma}^T} x)^2} \exp[\frac{-(y-w_{\mu}^T x)^2}{2(w_{\sigma}^T x)^2}] \times \frac{-1}{\sqrt{2\pi (cy+d)^2}} \exp[\frac{-(y-(ay+b))^2}{2(cy + d)^2}] $$
Since the value of y is not observed we cannot directly optimise the above objective function. Hence we need to use Expectation Maximisation to solve this.
In the Expectation step, we need to initialise all the parameters and find the value $ p(y|x,z) $ and also find the Expected log likelihood $$ \mathop{\mathbb{E}}_{p(y|x,z)} p(x,y,z) $$
Now the question is how do I find $p(y | x,z)$ ?
I have couple of ideas, Please critique them.
- Since x and z are observed, they can be assumed to be independent, so we can re-write
$$ p(y|x,z) = p(y|x)p(y|z) $$
In the above equation, finding $ p (y|x)$ is easy since it is already parameterised, finding $p(y|z)$ would be a challenge as I think we need to go with Bayes theorem to do it
$$ p(y|z) = \frac{p(z|y)p(y)}{p(z)} $$ Finding $p(y)$ and $p(z)$ is a challenge here
- Rewrite
$$ p(y|x,z) = \frac{p(x, y , z)}{p(x)p(z)} = \frac{p(z|y)p(y|x)p(x)}{p(x)p(z)} = \frac{p(z|y)p(y|x)}{p(z)} $$
In the above equation, how is it possible to get $p(z)$ ?
Or is it straight forward to find $p(z)$ from the observed variable z?
I have been thinking a lot about this, if there is any solution that I missed, or if I am thinking about this problem in a wrong way, please let me know. Thanks for reading such a long question and thanks in advance for providing the answer.
The maximisation step would be straight forward Gradient descent of the objective function.