I am trying to understand how the coefficients of linear discriminants are calculated in lda().
Consider the following data set.
library(MASS)
S<-matrix(c(2,.5,.5,1),2,2)
set.seed(1)
X<-data.frame(rbind(mvrnorm(25,c(0,0),S),mvrnorm(25,c(3,2),S)),Class=c(rep("First",25),rep("Second",25)))
lda.fit<-lda(Class~X1+X2,data=X)
lda.fit contains the following data.
Call:
lda(Class ~ X1 + X2, data = X)
Prior probabilities of groups:
First Second
0.5 0.5
Group means:
X1 X2
First -0.2205177 -0.1224064
Second 2.7965638 1.8489960
Coefficients of linear discriminants:
LD1
X1 0.3476010
X2 0.7330707
It seems that the vector of coefficients should be calculated using the formula $$ \bf w\propto{\bf S}_W^{-1}({\bf m}_2-{\bf m}_1), $$ where ${\bf S}_W^{-1}$ is the inverse of the pooled covariance matrix, ${\bf m}_2$ and ${\bf m}_1$ are the sample means of the groups (the formula comes from page 189 of Pattern Recognition and Machine Learning by Christopher M. Bishop).
Sh<-((25-1)*cov(X[1:25,1:2])+(25-1)*cov(X[26:50,1:2]))/(50-2)
w<-solve(Sh)%*%(lda.fit$means[2,]-lda.fit$means[1,])
w is equal to
[,1]
X1 0.8668882
X2 1.8282180
and this does not coincide with the results in lda.fit. However, both of these vectors (w and coef(lda.fit)) have the same direction. w is a scaled version of coef(lda.fit) and vice versa.
Could someone explain how the coefficients of linear discriminants are calculated? How is the scaling factor chosen for coef(lda.fit)?
Any help is much appreciated!
lda()calculates the coefficients of linear discriminants. I guess there might different ways, but I would like to know howlda()does that. If you could post a link to answer that explains this, I would be very grateful. – Cm7F7Bb Jul 06 '19 at 07:19