When would you use PCA rather than LDA in classification?

Question

I'm reading this article on the difference between Principle Component Analysis and Multiple Discriminant Analysis (Linear Discriminant Analysis), and I'm trying to understand why you would ever use PCA rather than MDA/LDA.

The explanation is summarized as follows:

roughly speaking in PCA we are trying to find the axes with maximum variances where the data is most spread (within a class, since PCA treats the whole data set as one class), and in MDA we are additionally maximizing the spread between classes.

Wouldn't you always want to both maximize the variance and maximize the spread between classes?

sorry, I meant multiple discriminant analysis which seems to also be called multiple Linear Discriminant Analysis — chris, Aug 17 '16 at 20:25
You should clarify your question, because as of now it's trivial: you should prefer PCA over MDA when there are no classes to be discriminated in your data. I think you should specify this is about classification in the question. — Firebug, Aug 17 '16 at 20:30
LDA is a much much more common term than MDA. There is no need to say "multiple linear", "linear" is enough. — amoeba, Aug 17 '16 at 20:43

score 12 · Answer 1 · edited Apr 13 '17 at 12:44

You are missing something deeper: PCA isn't a classification method.

PCA in machine learning is treated as a feature engineering method. When you apply PCA to your data you are guaranteeing there'll be no correlation between the resulting features. Many classification algorithms benefit from that.

You always have to keep in mind algorithms might have assumptions on the data, and if such assumptions don't hold they might underperform.

LDA must compute a covariance matrix inversion to project the data (check these threads and answers: Should PCA be performed before I do classification? and Does it make sense to combine PCA and LDA?). If you have few data, this is unstable, and you get overfitted projections towards your data points, i.e. a singular within-class covariance matrix. PCA is usually used to avoid that, reducing the dimentionality of the problem.

So the answer is you never use PCA to do classification, but you can use it to try to improve the performance of LDA.

score 10 · Answer 2 · answered Sep 20 '16 at 15:16

10

Whereas the previos answer by Firebug is correct, I want add another perspective:

Unsupervised vs. supervised learning:

LDA is very useful to find dimensions which aim at seperating cluster, thus you will have to know clusters before. LDA is not neccesarily a classifier, but can be used as one. Thus LDA can only be used in supervised learning

PCA is a general approach for denoising and dimensionality reduction and does not require any further information such as class labels in supervised learning. Therefore it can be used in unsupervised learning.

answered Sep 20 '16 at 15:16

Nikolas Rieble

3,451

4

+1, especially for LDA is not neccesarily a classifier. A reader (the OP, too) is also recommended to read this related question: How LDA, a classification technique, also serves as dimensionality reduction technique like PCA. – ttnphns Sep 20 '16 at 15:37
and answers here compare outputs and plots of LDA and PCA as dim. reductions. – ttnphns Sep 20 '16 at 15:41
1

(+1) LDA is really a dimension reduction technique, a generalization of Fisher's linear discriminant, which people usually treat as a classification criterion. – Firebug Sep 20 '16 at 17:07

Brad · Answer 3 · 2021-01-11T04:11:15.353

LDA is used to carve up multidimensional space.

PCA is used to collapse multidimensional space.

PCA allows the collapsing of hundreds of spatial dimensions into a handful of lower spatial dimensions while usually preserving 70% - 90% of the important information.

PCA: 3D objects cast 2D shadows. We can see the shape of an object from it's shadow. But we can't know everything about the shape from a single shadow. By having a small collection of shadows from different (globally optimal) angles, then we can know most things about the shape and size of an object. PCA helps reduce the 'Curse of Dimensionality' when modelling.

LDA is for classification, it almost always outperforms Logistic Regression when modelling small data with well separated clusters. It's also good at handling multi-class data and class imbalances.

first two sentences are fantastic – Trajan May 23 '20 at 15:26 — Trajan, May 23 '20 at 15:26

When would you use PCA rather than LDA in classification?

3 Answers3