0

I do not have strong math background but I am currently working on a project that requires me to use a covariance matrix. and it is my first time touching on this topic, I am reading a note, which states that the estimator

$Σ_p =\frac{1}{n}\sum_{i=0}^n(X_i − X)(X_i − X)^T$

will be very bad when the dimension is larger than the number of samples (n<p), so far I read the article article, and I understand the proof of why the covariance matrix becomes singular when n<p, but is this the reason why estimator is no longer good?

hengJC
  • 1
  • 1
    Although the note to which you refer is correct and interesting, it concerns an extremely special circumstance where the number of variables almost always exceeds the number of observations, no matter how many observations you make! When that's not the case--that is, eventually $p\lt n$--there is no problem, as shown here at https://stats.stackexchange.com/questions/59478. – whuber Jun 15 '22 at 13:16
  • My main concern is that why the estimator is no longer work when p>n? Is this related to singularity of covariance matrix? – hengJC Jun 16 '22 at 03:00
  • As the link I provided shows, the estimator does not fail in the case $p\gt n.$ The statement in the note you reference is purely an asymptotic one concerning a case where both $n$ and $p$ increase without bound. – whuber Jun 16 '22 at 11:08

0 Answers0