4

I had this question when reading section 3.5.3 on page 170 of "Pattern Recognition and Machine Learning" written by Christopher M. Bishop:

enter image description here

Here $\mathbf\Phi$ represents the design matrix and $\beta$ is precision which is positive.

My question is: why is $\mathbf\Phi^\top\mathbf\Phi$ a positive definite matrix?

I tried to prove it by showing that the error function defined in equation (3.80) of the same book has a local minimum at $\mathbf m_N$. But I don't have the luck (I can prove $\nabla E$ is zero at $\mathbf m_N$, but I cannot show further that $\mathbf m_N$ is necessarily a local minimum) and, even if it is the case, I cannot establish positive definiteness of $\mathbf\Phi^\top\mathbf\Phi$ from that of $\mathbf A$. So, can you please help me prove that $\mathbf\Phi^\top\mathbf\Phi$ is positive definite? Thanks a lot.

Edit: The columns of the design matrix $\mathbf\Phi$ are linearly independent almost surely, because the elements of this matrix are (functions of) random variables. This condition is essential in excluding positive semi-definiteness. As an aside, it is usually assumed that the number of samples $N$ is larger than the number of basis functions $M$.

utobi
  • 11,726
zzzhhh
  • 323
  • 1
    Can you say more about $\Phi$? In general, for a real-valued matrix $X$, we have that $X^T X$ is positive semi-definite: $$ z^T X^T Xz = | X z |_2^2 \ge 0 $$ for some vector $z \neq 0$. The intuition is that the sums of squares of real numbers must be non-negative. So there must be some additional property to $\Phi$. As a hint, the key is to work out what property $\Phi$ has to make the claim true. – Sycorax Oct 15 '22 at 14:34
  • 3
    The product of X transpose and X is a gram matrix https://en.m.wikipedia.org/wiki/Gram_matrix. You can prove these matrices are positive definite. – Demetri Pananos Oct 15 '22 at 14:55

2 Answers2

8

Note we have $$ v^\top \left(\Phi^\top\Phi\right) v = \left(v^\top \Phi^\top\right) \left(\Phi v\right) = \left(\Phi v \right)^\top \left(\Phi v\right) = \|\Phi v \|_2^2 \geq 0 $$ for all $\Phi \in \mathbb{R}^{N \times M}, v \in \mathbb{R}^M; M, N \in \mathbb{N}_{>0}$. This establishes the general positive-semidefiniteness of $\Phi^\top\Phi$.
Moreover, $$ \|\Phi v \|_2^2 = 0 \iff \|\Phi v \|_2 = 0 \iff \Phi v \ = 0_{\mathbb{R}^N}, $$ which means that $\Phi^\top\Phi$ and hence $\beta\Phi^\top\Phi$ (with $\beta \in \mathbb{R_{>0}}$) is positive-definite if (and only if) the columns of $\Phi$ are linearly independent.

statmerkur
  • 5,950
8

Observation $1.$ A symmetric matrix $\mathbf A^\mathsf T\mathbf A$ is positive semi-definite.

Note that

$$\mathbf v^\mathsf T\mathbf A^\mathsf T\mathbf A\mathbf v= (\mathbf A\mathbf v)^\mathsf T\cdot ( \mathbf A\mathbf v)\geq 0.\tag 1$$

Also, $\mathbf A\mathbf v=\mathbf 0$ implies $\mathbf A^\mathsf T\mathbf A$ being positive definite when $\mathcal N(\mathbf A) =\{\mathbf 0\}$, i.e., when $\mathbf A$ has linearly independent columns.

statmerkur
  • 5,950
User1865345
  • 8,202