Questions tagged [discriminant-analysis]

Linear Discriminant Analysis (LDA) is a dimensionality reduction and classification method. It finds low-dimensional subspace with the strongest class separation and uses it to perform classification. Use this tag for quadratic DA (QDA) too.

Given multivariate data split into several subsamples (classes), LDA finds linear combinations of variables, called discriminant functions, which discriminate between classes and are uncorrelated. The functions are then applied to assign old or new observations to the classes. Discriminant analysis is both a dimensionality reduction and a classification technique.

Suppose we are given a multivariate dataset split into $K$ classes. The objective is to find the posterior distribution, $P(Y=k|X=x)$, of a data point belonging to class $k$. Let $f_{k}(x)$ be the class-conditional density of $X$ in class $k$ and let $\pi_k$ be the prior probability of being in class $k$. By Bayes rule we have:

$$P(Y=k|X=x) = \frac{f_{k}(x)\pi_k}{\sum_{i=1}^K f_{i}(x)\pi_i}$$

LDA makes the following assumptions:

$f_{k}(x)$ follows a Gaussian density with mean $\mu_k$ and covariance $\Sigma_k$
$\Sigma_k = \Sigma$ for all $k$

The last assumption of constant covariance is what makes this a linear discriminant. The linearity in $x$ can be derived by finding the log-ratio of the posterior probabilities of belonging to a certain class:

$$\log \big( \frac{P(Y=k|X=x}{P(Y=l|X=x} \big) = \log\frac{\pi_k}{\pi_l} - \frac12(\mu_k +\mu_l)^T\Sigma^{-1}(\mu_k - \mu_l) + x^T\Sigma^{-1}(\mu_k - \mu_l)$$

If we don't use a constant covariance, the discriminant function becomes quadratic in $x$, leading to Quadratic Discriminant Analysis, QDA.

478 questions

votes

1 answer

Deriving total (within class + between class) scatter matrix

I was fiddling with PCA and LDA methods and I am stuck at a point, I have a feeling that it is so simple that I can't see it. Within-class ($S_W$) and between-class ($S_B$) scatter matrices are defined as: $$ S_W = \sum_{i=1}^C\sum_{t=1}^N(x_t^i -…

discriminant-analysis

asked Mar 22 '11 at 12:43

nimcap

votes

1 answer

Bayesian and Fisher's approaches to linear discriminant analysis

I know 2 approaches to do LDA, the Bayesian approach and the Fisher's approach. Suppose we have the data $(x,y)$, where $x$ is the $p$-dimensional predictor and $y$ is the dependent variable of $K$ classes. By Bayesian approach, we compute the…

discriminant-analysis

asked Feb 26 '14 at 14:41

avocado

3,581
6
35
49

votes

2 answers

The discriminant function in linear discriminant analysis

This is the bayesian probability for class 'k'(sigma is same for all k classes), assigning the observation to class with maximum pk(x) with πk as the probability of response variable belonging to the kth class This is a statement from the book…

discriminant-analysis

asked Jan 31 '17 at 16:32

Fenil

votes

2 answers

What is the correct formula for covariance matrix in quadratic discriminant analysis (QDA)?

I know that in quadratic discriminant analysis (QDA) we use the variance of each class, so is the formula different than that in linear discriminant analysis (LDA)? Is it $$\frac{1}{N-K} \sum (x - \mu)(x - \mu)^T$$ or $$\frac{1}{N} \sum (x - \mu)(x…

discriminant-analysis

asked Dec 14 '14 at 15:39

datatista

votes

0 answers

Coefficients of linear discriminants in the lda() function from package MASS in R

I was reading Chapter 4 (LDA) of the book Introduction to Statistical learning with R (http://www-bcf.usc.edu/~gareth/ISL/ISLR%20Sixth%20Printing.pdf) and could not understand the "coefficients of linear discriminants" part in the output of the…

discriminant-analysis

asked Jan 12 '17 at 22:52

conamore

votes

1 answer

Linear Discriminant Analysis' predictions newbie question

When I use predict.lda in R (MASS package) which discriminant function does the software choose? Say, I have 4 classes and 3 discriminant functions, does the software always use the first discrimination function (highest trace) or does it use an…

discriminant-analysis

asked Jul 23 '20 at 10:08

sendilab

votes

1 answer

Fisher LDA - What is the difference between a discriminant function and a linear decision boundary?

I am studying Fisher LDA, the case where there are K=2 classes of data. It is my understanding that Fisher LDA looks for the 1-dimensional space onto which the data should be projected in order to maximize the ratio of the between class variance to…

discriminant-analysis

asked Jun 09 '18 at 06:23

TYBG

votes

0 answers

What happens to linear discriminant analysis when $p>n$?

I have a general question regarding LDA (Fisher's linear discriminant analysis). What happens if the sample size $n$ is smaller than the dimensionality $p$ (number of predictors)? Is it possible to perform LDA and what will happen?

discriminant-analysis

asked Jan 11 '16 at 17:46

user1847307

votes

0 answers

LDA classifier function

In Introduction to Statistical Learning the function given for LDA classification (for more than one predictors) is: $$\delta_k(x) = x^T\Sigma ^{-1}\mu_k-\frac{1}{2}\mu^T_k\Sigma^{-1}\mu_k+log\pi_k$$ Now, the prosterior probability is given by :…

discriminant-analysis

asked Jul 31 '17 at 10:31

Mooncrater

vote

0 answers

Coefficients of Linear Discriminants in R

I've read the answers in What are "coefficients of linear discriminants" in LDA?, but I still don't understand what coefficients of linear discriminants on output of R means. What is it? (How) Is it related to the decision boundary? nb: my…

discriminant-analysis

asked Apr 11 '14 at 07:39

hans-t

vote

1 answer

Why is using $\mathbf{\eta}^T \mathbf{\mu_\mathscr{l}}$ to calculate Fisher's rule easier than using $\mu_\mathscr{l}$?

I am currently studying discriminant analysis. Fisher's discriminant $\mathscr{D}$ is defined as follows: $$\mathscr{D} = \max_{\{ \mathbf{e} \ : \ \vert\vert \mathbf{e} \vert \vert = 1 \}} \mathscr{q} ( \mathbf{e} ) = \max_{\{ \mathbf{e} \ : \…

discriminant-analysis

asked Feb 02 '21 at 09:28

The Pointer

1,932

vote

1 answer

How to show it?

Could you give me some hints how to show that the function $$f(a):=\frac{(\mathbb E [a^t X_1]-\mathbb E [a^t X_2])^2}{\mathrm {Var} (a^t X_1-a^t X_2)}$$ is maximised by a vector $$a \propto C^{-1} (\mu_1-\mu_2) ?$$ $X_1 \sim N (\mu_1, C)$ and $X_2…

discriminant-analysis

asked Feb 13 '20 at 00:50

P Lrc

vote

0 answers

Classification of very large number of samples using LDA

I have a dataset with tens of thousands of samples and only 7 features. I want to use linear discriminant analysis to classify these samples into 2 classes. The Python scikit-learn's LDA package is used. However, the results become very strange, as…

discriminant-analysis

asked May 12 '18 at 11:10

Elkan

vote

0 answers

Range of standardized coefficients in a discriminant analysis

I want to run a discriminant analysis on different motion capture measures to see which of the measures distinguishes best between my two conditions. The problem is that some of the standardized discriminant function coefficients are >1 and <-1. How…

discriminant-analysis

asked Jan 23 '12 at 15:02

Neuromie

vote

0 answers

Classifying after solving discriminant function

I have a set of data (7 variables) that are used as input to ultimately making a decision to approve or not approve something. I've generated two discriminant functions that I'm happy with - one for Approval and one for Not Approval, but now…

discriminant-analysis

asked Jun 11 '15 at 19:00

Tee

2 Next