3

I have a general question regarding LDA (Fisher's linear discriminant analysis).

What happens if the sample size $n$ is smaller than the dimensionality $p$ (number of predictors)? Is it possible to perform LDA and what will happen?

amoeba
  • 104,745
  • 1
    Very good question. In short - it depends how you regularize. If you use L2 regularizer - no, if you user L1 regularizer - yes. The latter has interesting properties. Check out this book (PDF) http://web.stanford.edu/~hastie/StatLearnSparsity/ – Vladislavs Dovgalecs Jan 11 '16 at 18:08
  • @xeon Why wouldn't L2 penalty work as well? "Regularized LDA" uses shrinkage estimator of the between-class covariance and can be seen as arising through L2 penalty. – amoeba Jan 11 '16 at 18:25
  • maybe this helps: http://perso.ens-lyon.fr/patrick.flandrin/LedoitWolf_JMA2004.pdf, you ask what happens when p >>n, well in that case the sample var-covar matrix is singular and the LDA method needs the inverse of the var-covar matrix (see sction 4.3 of http://web.stanford.edu/~hastie/local.ftp/Springer/OLD/ESLII_print4.pdf) –  Jan 11 '16 at 18:32
  • @amoeba It will but but the parameters learned with L2 penalty are not estimated as reliably (when p>>n). The L1 penalty is special which still gives a convex optimization problem AND does feature selection AND learns the parameters. – Vladislavs Dovgalecs Jan 11 '16 at 18:37
  • @xeon I was under impression that if one does not care about feature selection then there is no advantage of L1 over L2. Can you give a reference for this specific claim? By the way, thanks for the reference to Statistical Learning with Sparsity, I did not know about this book (even though I use The Elements all the time), looks very interesting. – amoeba Jan 11 '16 at 18:40
  • @amoeba This claim comes from the very same (brand new) book from Hastie et al.. I believe the claim comes from the Chapter 2. I was reading it this weekend with great interest. – Vladislavs Dovgalecs Jan 11 '16 at 18:41

0 Answers0