5

I have a data matrix $X$ of size $n\times p$ with $n < p$, where $n$ is the number of observations and $p$ is the number of dimensions.

My question is: why $n < p$ results in not a positive-definite covariance matrix?

(By the way I want to use this data in a Factor Analysis model. Do you have any idea about Matlab code implementing a standard Factor Analysis for this kind of data when $n < p$?)

amoeba
  • 104,745
pierre
  • 85
  • 2
    You cannot do factor analysis (most algorithms and implementations won't allow) on a singular correlation matrix (and when n<p, it is but singular) as well as negative-definite matrix (which could appear sometimes with pairwise deletion of missng values). – ttnphns Feb 25 '16 at 19:19
  • @ttnphns; Is there any solution to the problem or simply I have to forget factor analysis? – pierre Feb 26 '16 at 11:49
  • 1
    This is a theoretical problem (see Pt 6). Due to relatively low n correlations cannot enough differentiate from one another and do not allow the factor model to play in full accordingly. So forget FA. It is good to have n>p at least 3-5 times, practically. – ttnphns Feb 26 '16 at 11:56
  • Then in my case which method of dimension reduction would you suggest? And can you also propose a standard matlab code for that method? – pierre Mar 02 '16 at 10:34
  • +1 but your second question (about the Matlab code) is off-topic here. – amoeba Mar 03 '16 at 00:06
  • I wouldn't know why you couldn't at least look via principal components and rotations on your data to see patterns or to reduce to a smaller number of components? (Of course a true factor analytic model expects itemspecific variance which you cannot model with your data) – Gottfried Helms Mar 04 '16 at 05:39

1 Answers1

4

This result is a direct, simple consequence of the fact that the rank of the $p\times p$ matrix $X^\prime X$ cannot be any greater than the smaller of $n$ and $p$, which is strictly less than $p$ in this case. That makes the $p\times p$ matrix $X^\prime X$ singular, which is equivalent to the existence of a nonzero $x$ for which $X^\prime X x = 0$. Consequently $$x^\prime X^\prime X x = x 0 = 0$$ demonstrates that $X^\prime X$ is indefinite.

Although I referenced $X$ in this argument, the column-centered version of $X$ that is used in computing the covariance matrix also has dimensions $n\times p$, so the same conclusions apply to it.


Definitions

The rank of a matrix $X$ is the dimension of its image, defined to be the set of all $Xx$ as $x$ ranges among all possible vectors.

The column-centered version of a matrix is obtained by subtracting the arithmetic mean of each column from the entries in that column.

The covariance matrix of $X$ is proportional to $Y^\prime Y$ where $Y$ is the column-centered version of $X$. (Depending on convention, the factor of proportionality is $1/n$ or $1/(n-1)$.)

A square matrix $A$ is singular when it has no multiplicative inverse. Equivalently, there is a nonzero vector $x$ for which $Ax=0$. ($A$ has a nontrivial kernel.) Equivalently, the rank of $A$ is strictly less than the dimension of its image space (equal to the number of rows of $A$).

A square matrix $A$ is semi-definite when all numbers of the form $x^\prime A x$ have the same sign (or are zero), regardless of what the vector $x$ might be. According to the sign, $A$ would be called negative semi-definite or positive semi-definite.

A semi-definite square matrix $A$ is definite when the only vector $x$ for which $x^\prime A x = 0$ is the zero vector itself.

whuber
  • 322,774