Questions tagged [discriminant-analysis]

Linear Discriminant Analysis (LDA) is a dimensionality reduction and classification method. It finds low-dimensional subspace with the strongest class separation and uses it to perform classification. Use this tag for quadratic DA (QDA) too.

Given multivariate data split into several subsamples (classes), LDA finds linear combinations of variables, called discriminant functions, which discriminate between classes and are uncorrelated. The functions are then applied to assign old or new observations to the classes. Discriminant analysis is both a dimensionality reduction and a classification technique.

Suppose we are given a multivariate dataset split into $K$ classes. The objective is to find the posterior distribution, $P(Y=k|X=x)$, of a data point belonging to class $k$. Let $f_{k}(x)$ be the class-conditional density of $X$ in class $k$ and let $\pi_k$ be the prior probability of being in class $k$. By Bayes rule we have:

$$P(Y=k|X=x) = \frac{f_{k}(x)\pi_k}{\sum_{i=1}^K f_{i}(x)\pi_i}$$

LDA makes the following assumptions:

  1. $f_{k}(x)$ follows a Gaussian density with mean $\mu_k$ and covariance $\Sigma_k$
  2. $\Sigma_k = \Sigma$ for all $k$

The last assumption of constant covariance is what makes this a linear discriminant. The linearity in $x$ can be derived by finding the log-ratio of the posterior probabilities of belonging to a certain class:

$$\log \big( \frac{P(Y=k|X=x}{P(Y=l|X=x} \big) = \log\frac{\pi_k}{\pi_l} - \frac12(\mu_k +\mu_l)^T\Sigma^{-1}(\mu_k - \mu_l) + x^T\Sigma^{-1}(\mu_k - \mu_l)$$

If we don't use a constant covariance, the discriminant function becomes quadratic in $x$, leading to Quadratic Discriminant Analysis, QDA.

478 questions
16
votes
1 answer

Deriving total (within class + between class) scatter matrix

I was fiddling with PCA and LDA methods and I am stuck at a point, I have a feeling that it is so simple that I can't see it. Within-class ($S_W$) and between-class ($S_B$) scatter matrices are defined as: $$ S_W = \sum_{i=1}^C\sum_{t=1}^N(x_t^i -…
nimcap
  • 413
  • 1
  • 4
  • 8
14
votes
1 answer

Bayesian and Fisher's approaches to linear discriminant analysis

I know 2 approaches to do LDA, the Bayesian approach and the Fisher's approach. Suppose we have the data $(x,y)$, where $x$ is the $p$-dimensional predictor and $y$ is the dependent variable of $K$ classes. By Bayesian approach, we compute the…
avocado
  • 3,581
  • 6
  • 35
  • 49
7
votes
2 answers

The discriminant function in linear discriminant analysis

This is the bayesian probability for class 'k'(sigma is same for all k classes), assigning the observation to class with maximum pk(x) with πk as the probability of response variable belonging to the kth class This is a statement from the book…
Fenil
  • 662
5
votes
2 answers

What is the correct formula for covariance matrix in quadratic discriminant analysis (QDA)?

I know that in quadratic discriminant analysis (QDA) we use the variance of each class, so is the formula different than that in linear discriminant analysis (LDA)? Is it $$\frac{1}{N-K} \sum (x - \mu)(x - \mu)^T$$ or $$\frac{1}{N} \sum (x - \mu)(x…
datatista
  • 305
4
votes
0 answers

Coefficients of linear discriminants in the lda() function from package MASS in R

I was reading Chapter 4 (LDA) of the book Introduction to Statistical learning with R (http://www-bcf.usc.edu/~gareth/ISL/ISLR%20Sixth%20Printing.pdf) and could not understand the "coefficients of linear discriminants" part in the output of the…
3
votes
1 answer

Linear Discriminant Analysis' predictions newbie question

When I use predict.lda in R (MASS package) which discriminant function does the software choose? Say, I have 4 classes and 3 discriminant functions, does the software always use the first discrimination function (highest trace) or does it use an…
3
votes
1 answer

Fisher LDA - What is the difference between a discriminant function and a linear decision boundary?

I am studying Fisher LDA, the case where there are K=2 classes of data. It is my understanding that Fisher LDA looks for the 1-dimensional space onto which the data should be projected in order to maximize the ratio of the between class variance to…
TYBG
  • 31
3
votes
0 answers

What happens to linear discriminant analysis when $p>n$?

I have a general question regarding LDA (Fisher's linear discriminant analysis). What happens if the sample size $n$ is smaller than the dimensionality $p$ (number of predictors)? Is it possible to perform LDA and what will happen?
2
votes
0 answers

LDA classifier function

In Introduction to Statistical Learning the function given for LDA classification (for more than one predictors) is: $$\delta_k(x) = x^T\Sigma ^{-1}\mu_k-\frac{1}{2}\mu^T_k\Sigma^{-1}\mu_k+log\pi_k$$ Now, the prosterior probability is given by :…
Mooncrater
  • 787
  • 2
  • 9
  • 20
1
vote
0 answers

Coefficients of Linear Discriminants in R

I've read the answers in What are "coefficients of linear discriminants" in LDA?, but I still don't understand what coefficients of linear discriminants on output of R means. What is it? (How) Is it related to the decision boundary? nb: my…
hans-t
  • 569
  • 2
  • 10
  • 18
1
vote
1 answer

Why is using $\mathbf{\eta}^T \mathbf{\mu_\mathscr{l}}$ to calculate Fisher's rule easier than using $\mu_\mathscr{l}$?

I am currently studying discriminant analysis. Fisher's discriminant $\mathscr{D}$ is defined as follows: $$\mathscr{D} = \max_{\{ \mathbf{e} \ : \ \vert\vert \mathbf{e} \vert \vert = 1 \}} \mathscr{q} ( \mathbf{e} ) = \max_{\{ \mathbf{e} \ : \…
The Pointer
  • 1,932
1
vote
1 answer

How to show it?

Could you give me some hints how to show that the function $$f(a):=\frac{(\mathbb E [a^t X_1]-\mathbb E [a^t X_2])^2}{\mathrm {Var} (a^t X_1-a^t X_2)}$$ is maximised by a vector $$a \propto C^{-1} (\mu_1-\mu_2) ?$$ $X_1 \sim N (\mu_1, C)$ and $X_2…
P Lrc
  • 97
1
vote
0 answers

Classification of very large number of samples using LDA

I have a dataset with tens of thousands of samples and only 7 features. I want to use linear discriminant analysis to classify these samples into 2 classes. The Python scikit-learn's LDA package is used. However, the results become very strange, as…
Elkan
  • 185
1
vote
0 answers

Range of standardized coefficients in a discriminant analysis

I want to run a discriminant analysis on different motion capture measures to see which of the measures distinguishes best between my two conditions. The problem is that some of the standardized discriminant function coefficients are >1 and <-1. How…
1
vote
0 answers

Classifying after solving discriminant function

I have a set of data (7 variables) that are used as input to ultimately making a decision to approve or not approve something. I've generated two discriminant functions that I'm happy with - one for Approval and one for Not Approval, but now…
Tee
  • 11
1
2