Solution of a model with 1 latent variable and 2 observed variables

Question

I have a problem that is formulated as latent variable model. Here we have a observed vector $\bar{x}$ and a latent variable scalar y, and an observed scalar z.

The graphical model looks like $ \bar{x} \rightarrow y \rightarrow z $ i.e z depends on the latent variable y, and y depends on the vector x. Here y and z are assumed to be Gaussian distribution

The inference problem is to find distribution of y given x i.e $ \mu_{y|x}$ and $\sigma_{y|x} $.

With this I would get a normal distribution which can be sampled and used.

I have come across some theory which says the conditional distribution can be represented in a parameterised form and can be solved for the parameters using Gradient descent. Let us consider a very simple parameterisation as $ \mu_{y|x}$ and $\sigma_{y|x} $ are linear combination of x.

Hence they can be represented as $$ \mu_{y|x} = w_\mu^Tx $$ $$ \sigma_{y|x} = w_\sigma^Tx $$

Now $p(y|z)$ and $p(x|y)$ can be formulated in the above mentioned way

$$ \mu_{z|y} = ay + b $$ $$ \sigma_{z|y} = cy + d $$

where a, b, c and d are the parameters. Since there is a latent variables, y is not observed, hence it cannot be directly substituted in the objective function to compute the gradient. Computing gradient analytically is not so important as we can rely on tensorflow for the same.

But, for this case since y is not observed we need to take the overall likelihood of x, y and z i.e $ p(x, y ,z) = p(z|y)p(y|x)p(x) $

Now the overall joint likelihood can be given as

$$ \frac{-1}{\sqrt{2\pi (w_{\sigma}^T} x)^2} \exp[\frac{-(y-w_{\mu}^T x)^2}{2(w_{\sigma}^T x)^2}] \times \frac{-1}{\sqrt{2\pi (cy+d)^2}} \exp[\frac{-(y-(ay+b))^2}{2(cy + d)^2}] $$

Since the value of y is not observed we cannot directly optimise the above objective function. Hence we need to use Expectation Maximisation to solve this.

In the Expectation step, we need to initialise all the parameters and find the value $ p(y|x,z) $ and also find the Expected log likelihood $$ \mathop{\mathbb{E}}_{p(y|x,z)} p(x,y,z) $$

Now the question is how do I find $p(y | x,z)$ ?

I have couple of ideas, Please critique them.

Since x and z are observed, they can be assumed to be independent, so we can re-write

$$ p(y|x,z) = p(y|x)p(y|z) $$

In the above equation, finding $ p (y|x)$ is easy since it is already parameterised, finding $p(y|z)$ would be a challenge as I think we need to go with Bayes theorem to do it

$$ p(y|z) = \frac{p(z|y)p(y)}{p(z)} $$ Finding $p(y)$ and $p(z)$ is a challenge here

Rewrite

$$ p(y|x,z) = \frac{p(x, y , z)}{p(x)p(z)} = \frac{p(z|y)p(y|x)p(x)}{p(x)p(z)} = \frac{p(z|y)p(y|x)}{p(z)} $$

In the above equation, how is it possible to get $p(z)$ ?

Or is it straight forward to find $p(z)$ from the observed variable z?

I have been thinking a lot about this, if there is any solution that I missed, or if I am thinking about this problem in a wrong way, please let me know. Thanks for reading such a long question and thanks in advance for providing the answer.

The maximisation step would be straight forward Gradient descent of the objective function.

Hi Jaswant, can you provide the conditional probability distributions (CPDs) of $y|\mathbf{x}$ and $z|y$? You mentioned that they are Gaussian, please be more specific about how the Gaussian parameters are defined for each CPD. — Haotian Chen, Apr 24 '20 at 12:27
@HaotianChen The CPD's p(y|x) = Gaussian($\mu_{y|x}$, $\sigma_{y|x}$) = Gaussian($w_\mu^Tx$, $w_\sigma^Tx$); p(z|y) = Gaussian($\mu_{z|y}$, $\sigma_{z|y}$) = Gaussian(ay + b, cy + d). The exact values of these parameters are given in the question. So the final parameters would be $w_\mu^T$, $w_\sigma^T$, a, b, c, d. Does that answer the question ? — Jaswant, Apr 24 '20 at 15:42
Yes that's more clear, but there's something wrong about your CPDs. You can't use $w^Tx$ and $cy+d$ to model the variances, because they cannot guarantee the variance to be greater than 0. — Haotian Chen, Apr 24 '20 at 16:23
Yes @Haotian Chen, you are right. May be I can replace them with Exponential linear unit of the values. The main question is how do I learn the parameters? — Jaswant, Apr 24 '20 at 16:31
Your equation for posterior is also wrong, it should be $p(y|xz)=p(z|y)p(y|x)/p(z|x)$. There are different ways to get the posterior parameters, depends on your definition of the CPDs, for example if $p(y|x)$ is a conjugate prior for $p(z|y)$, then you can easily get the posterior with closed form solution. Please take some time to learn the basic concepts in Bayesian statistics and graphical models such as "CPD", "conjugate prior", "posterior", "marginal likelihood", "conditional independeny"... before making wild decisions based on limited knowledge. It will help you in the long term. — Haotian Chen, Apr 24 '20 at 16:59
Thanks for the comment, but p(x|z) in the denominator seems a bit complicated for me. Can we not assume independence among observed variables z and x, which can translate that to p(z)? Also how do we get close form solution if p(y|x) is a conjugate prior to p(z|y)? Can you please post an answer elaborating this? I was reading a lot about Graphical models, Expectation Maximization, Variational inference but I am not able get a concrete solution to the problem. — Jaswant, Apr 24 '20 at 17:18
Just because $p(x|z)$ is complicate, people invented conjugate priors to simplify the problem. For example if the CPDs are defined as $p(z|y): z \sim N(y,\Sigma)$, $p(y|x): y \sim N(x,S_0)$, $\Sigma$ and $S_0$ are known. Then after observing $z_i,i=1:N$, the posterior will be in this closed form: $p(y|x,z_{1:N}): y \sim N((S_0^{-1} + N \Sigma^{-1})^{-1}(S_0^{-1}x + N\Sigma^{-1}\bar z),(S_0^{-1} + N \Sigma^{-1})^{-1})$, where $\bar z = \sum_{i=1}^N z_i/N$ is the "sufficient statistics". — Haotian Chen, Apr 24 '20 at 17:41
I built an R package recently for facilitating Bayesian modelling, the name is "bbricks", there are some examples in the package vignette that can help you understand some basic concepts and conventions. — Haotian Chen, Apr 24 '20 at 17:43
I still have some questions on how Capital Sigma and S0 are known? Is there any blog post that I can read to know more about latent variable modeling? Can you please point me to that? — Jaswant, Apr 24 '20 at 18:38
Of course you can make them unknown and give a prior distribution to them! I choose that example is because it's simple. Blogs I don't know, but there are plenty of classic text books. Keep in mind that there's no short cut, merely reading some blogs without getting your hands dirty won't help. — Haotian Chen, Apr 24 '20 at 18:58
Probabilistic Graphical Models, and Machine Learning A Probabilistic Perspective. Two of my favorites. — Haotian Chen, Apr 26 '20 at 00:08

Solution of a model with 1 latent variable and 2 observed variables

0 Answers0