2

Resources about LDA usually say the number of components is bounded by the number of classes - 1. E.g, in the binary case, only one component can be found.

In LDA, the first discriminant direction $\Phi_1$ is calculated as argmax of $\frac{\Phi_1^T S_b \Phi_1}{\Phi_1^T S_w \Phi_1}$ where $S_b$ and $S_w$ are the between-class and within-class covariance matrices, respectively. Why can't we continue this way and compute the $i$th direction $\Phi_i$ to be the argmax of $\frac{\Phi_i^T S_b \Phi_i}{\Phi_i^T S_w \Phi_i}$ under the constraint of orthogonality to $\Phi_1, \Phi_2 \dots \Phi_{i-1}$, as is done in PCA?

Ostenbily, in the binary case, where each $\Phi_i$ is a vector, one can do it $n$ times, if the inputs are $n$ dimensional vectors.

conjectures
  • 4,226
  • 1
    https://stats.stackexchange.com/a/190821/3277 "Then q=g−1=2 independent dimensions will suffice to predict the class membership as precisely as formerly" – ttnphns Feb 02 '20 at 12:23
  • If I understand this answer correctly, $c-1$ directions ($c$ is the number of classes) are enough to maximize accuracy under the normality and same-covariance assumption, but if one wants to perform dimensionality reduction, one can continue calculating all directions $\Phi_1, \Phi_2 \dots \Phi_n$? – user1767774 Feb 02 '20 at 12:48
  • If by dimensionality reduction you mean PCA, then yes you might extract up to n components. But if you mean dim. reduction by means of linear discriminants then their max. number is min(n, c-1). This is the dimensionality spanned by $Sw^{-1}Sb$ matrix. – ttnphns Feb 02 '20 at 13:04
  • I meant dimensionality reduction via the LDA criterion. Can you please elaborate on the last sentence - what do you mean by the dimensionality spanned by $S_w^{-1}S_b$? if I have 2 classes, can't I find the direction $\Phi_2$ which is the second best direction in minimizing the LDA loss? – user1767774 Feb 02 '20 at 14:04
  • The number of nonzero eigenvalues of the aforementioned matrix is no greater than min(n, c-1). You could compute any number of eigenvectors, but all those corresponding to nonexistent eigenvalues (dimensions) are just numeric rubbish. – ttnphns Feb 02 '20 at 14:35
  • Algebra of LDA https://stats.stackexchange.com/a/48859/3277 – ttnphns Feb 02 '20 at 14:37
  • 1
    which is the second best direction in minimizing the LDA loss? Please return to my first link. If you have 2 data clouds of identical cov matrices (I.e. identical shape and space orientation) there is no "LDA loss" beyond the single dimension. One dimension suffice. LDA "loss" is separability loss, not variability loss like of PCA. – ttnphns Feb 02 '20 at 15:08
  • OK, thanks. Can you please share a reference regarding the number of nonzero eigenvalues of $S_w^{-1}S_b$? – user1767774 Feb 02 '20 at 15:55
  • 2
    To follow your notation, n is the number of variables, c the number of groups. Then rank of Sw is (if no multicollinearity)=n and rank of Sb is (when data is centered, and LDA centers data) is c-1. Hence rank of $S_w^{-1}S_b$ is min (n,c-1). – ttnphns Feb 02 '20 at 16:39
  • Understood, thanks! – user1767774 Feb 02 '20 at 21:12

1 Answers1

1

The rank of between-class scatter matrix $S_B$ for the whole data set is at most $c-1$. ($c$ is the number of classes.) The individual between-class scatter matrix $S_{Bi}$ for one class is at most $1$. The former matrix is the weighted sum of the latter.

Since $rank(AB)\le{min(rank(A), rank(B))}$, you have $rank(S^{-1}_WS_B)\le{rank(S_B)}\le{c-1}$

seralouk
  • 120