5

I have been trying to derive some conditional distribution for parameters of a linear transformation (represented as a matrix) and I had a lot of help on this thread yesterday. However, I realised I did something which could be a terrible mistake.

So, I have vector valued observation $y$, which is modelled as a linear combination of another observation $x$.

The way $y$ is modelled is:

$$ y \sim \mathrm{N} (Ax, \Sigma). $$

The thing to note here is that $A$ is a matrix. Now, I wanted to put a normal prior on the transformation parameters i.e. the entries of $A$ and I do that as:

$$ A \sim \mathrm{N} (A_0, \nabla). $$

Now, here I see $A$ as a vector. Now, when I tried to get the conditional distribution of $A$ by multiplying the two Gaussians, I ran into a bit of trouble because of this matrix-vector discrepancy. I could not separate the terms properly as was suggested by Glen_b in that thread (because my question had the mistake and I did not realize that then).

I was wondering if there is a way to deal with this so that I can still derive the conditional distribution in a closed form way. Perhaps, what I have done is valid and I need to find some linear algebra tricks to make this work. However, I think that might not be the case.

On a more hopeful note, I see that there is this Matrix Normal Distribution Would it be possible to use this as a prior for the transformation matrix $A$ and would it still be possible to get a closed form solution for the conditional posterior? I am sure this entity is a lot more complex to manipulate but perhaps someone with more expertise can confirm if this is a good road to go down.

Luca
  • 4,650

1 Answers1

1

If you observe $Ax = (x^\top \otimes I_m) vec(A)$, then you can put a prior over $A$ (matrix-normal prior is on $vec(A)$), so you can do proper Gaussian inference, in closed form. As far as I am aware, you need to put a special structure on prior covariance matrix to make the inference tractable. Regarding posterior updates, you can find exactly what you are looking for, from here. (Sec. II-B, Inference section). Note the special structure on the covariance matrix.

  • One thing I am confused about is why is the matrix-normal prior on $vec(A)$ rather than $A$ itself? – Luca Nov 04 '15 at 14:34
  • 1
    My motivation comes from converting $Ax$ to some model over $A$. It seemed easy to do so. I am not sure what will happen if you derive everything in directly matrix form. If you do, please post here the derivation! :) –  Nov 04 '15 at 15:05
  • 1
    I am going through the paper now and will update the thread on my progress! Since, we are putting a prior over $vec(A)$, is it not enough to use a MVN prior rather than a matrix-normal? – Luca Nov 04 '15 at 15:08
  • 1
    Assuming a special covariance structure on MVN prior on $vec(A)$ enabled me to go back to the matrix form (and you see in the Wikipedia, it also corresponds to assume a Matrix-normal prior with $U=I$). If you assume a MVN prior with full covariance on $vec(A)$, I suspect posterior is not tractable in matrix-form. I wanted to avoid vectorising and de-vectorising after each observation, and wanted to keep matrix-form along updates. (I will try to obtain a direct matrix-form as well!). –  Nov 04 '15 at 15:18
  • and note also that I also assume $\Sigma = \lambda \otimes I$ (meaning $\lambda I$ informally). This also has effect on tractability in my case. I wonder about what happens with general observation covariance and prior covariance as well. –  Nov 04 '15 at 15:22
  • yes, I also have these independence assumptions. I would be happy to get something working with a simple model in the first instance :) – Luca Nov 04 '15 at 15:23
  • May I ask a subsequent clarification? So, if the elements of my matrix $A$ are modelled to be independent than what would the matrix-normal prior look like to allow for a wide range of values? So, in a normal Gaussian case, the equivalant of having a wide prior. I guess I can have both $V$ and $U$ matrices to be diagonal with large values for the diagonal entries? – Luca Nov 05 '15 at 18:24
  • 1
    Yes it seems so. But if you assume $vec(A) \sim \mathcal{N}(\mu, sI)$ with $s$ being some number, and $I$ is the identity matrix -- I don't know if there is a way to specify $V$ and $U$ (they will be constant times identity, but constants are not easy to say I guess). –  Nov 06 '15 at 23:56