12

I'm reading this article on the difference between Principle Component Analysis and Multiple Discriminant Analysis (Linear Discriminant Analysis), and I'm trying to understand why you would ever use PCA rather than MDA/LDA.

The explanation is summarized as follows:

roughly speaking in PCA we are trying to find the axes with maximum variances where the data is most spread (within a class, since PCA treats the whole data set as one class), and in MDA we are additionally maximizing the spread between classes.

Wouldn't you always want to both maximize the variance and maximize the spread between classes?

amoeba
  • 104,745
chris
  • 233
  • 1
    sorry, I meant multiple discriminant analysis which seems to also be called multiple Linear Discriminant Analysis – chris Aug 17 '16 at 20:25
  • 1
    You should clarify your question, because as of now it's trivial: you should prefer PCA over MDA when there are no classes to be discriminated in your data. I think you should specify this is about classification in the question. – Firebug Aug 17 '16 at 20:30
  • 1
    LDA is a much much more common term than MDA. There is no need to say "multiple linear", "linear" is enough. – amoeba Aug 17 '16 at 20:43

3 Answers3

12

You are missing something deeper: PCA isn't a classification method.

PCA in machine learning is treated as a feature engineering method. When you apply PCA to your data you are guaranteeing there'll be no correlation between the resulting features. Many classification algorithms benefit from that.

You always have to keep in mind algorithms might have assumptions on the data, and if such assumptions don't hold they might underperform.

LDA must compute a covariance matrix inversion to project the data (check these threads and answers: Should PCA be performed before I do classification? and Does it make sense to combine PCA and LDA?). If you have few data, this is unstable, and you get overfitted projections towards your data points, i.e. a singular within-class covariance matrix. PCA is usually used to avoid that, reducing the dimentionality of the problem.

So the answer is you never use PCA to do classification, but you can use it to try to improve the performance of LDA.

Firebug
  • 19,076
  • 6
  • 77
  • 139
10

Whereas the previos answer by Firebug is correct, I want add another perspective:

Unsupervised vs. supervised learning:

LDA is very useful to find dimensions which aim at seperating cluster, thus you will have to know clusters before. LDA is not neccesarily a classifier, but can be used as one. Thus LDA can only be used in supervised learning

PCA is a general approach for denoising and dimensionality reduction and does not require any further information such as class labels in supervised learning. Therefore it can be used in unsupervised learning.

6

LDA is used to carve up multidimensional space.

PCA is used to collapse multidimensional space.

PCA allows the collapsing of hundreds of spatial dimensions into a handful of lower spatial dimensions while usually preserving 70% - 90% of the important information.

PCA: 3D objects cast 2D shadows. We can see the shape of an object from it's shadow. But we can't know everything about the shape from a single shadow. By having a small collection of shadows from different (globally optimal) angles, then we can know most things about the shape and size of an object. PCA helps reduce the 'Curse of Dimensionality' when modelling.

LDA is for classification, it almost always outperforms Logistic Regression when modelling small data with well separated clusters. It's also good at handling multi-class data and class imbalances.

Brad
  • 600