Linear Discriminant Analysis for dimensionality reduction - choosing the dimension

Question

I'm using Linear Discriminant Analysis to do dimensionality reduction of a multi-class data. What is the best method to determine the "correct" number of dimensions? Can I use a method similar to PCA, choosing the dimensions that explain 90% or so of the variance? Can I use AIC or BIC for this task?

In more mathematical terms, assume N data points of the form $(x_i, y_i)$, such that $x_i\in \mathbb{R}^n$ is the i'th feature vector, and $y_i\in\{1,...,d\}$ is its class. The purpose of Linear Discriminant Analysis (LDA) is to maximize the following objective: $$J(W)=\frac{w^TS_Bw}{w^TS_ww}$$ where $S_B$ is the "between" classes scatter matrix, and $S_w$ is the "within" class scatter matrix. In plain terms, we try to make each cluster as small as possible, and as far away from other clusters.

The scatter matrices are defined as: $$\begin{eqnarray} S_B & = & \sum_c{(\mu_c-\bar{x})(\mu_c-\bar{x})^T} \\ S_w &=& \sum_c\sum_{i\in c}{(x_i-\mu_c)(x_i-\mu_c)^T} \end{eqnarray}$$ where $\mu=\sum_i x_i$ is the average feature, and $\mu_c = \sum_{i \in c}x_i$ is the average of all features belonging to class $c$.

The solution to this optimization problem is of the form $$S_W^{-1} S_Bv=\lambda v$$ which can be solved as an eigenvalue problem.

Once we compute the eigenvectors and corresponding eigenvalues, we can perform dimensionality reduction by choosing only $m<n$ eigenvectors with the largest eigenvalues. These eigenvectors are not necessarily orthogonal so they span a space of dimension $m$ or less. My question is how to choose the most appropriate $m$?

In what sense are you using term "dimensionality reduction"? Well, LDA itself can be seen as a dimensionality reduction, but its is supervised. Not like PCA. Please make your question clearer and a bit detailed. What are you doing, what are you aiming? — ttnphns, Jan 04 '16 at 13:47
@ttnphns, I've updated the question following your suggestion — user1447407, Jan 05 '16 at 09:49
With k classes and p predictor variables there exist n=min(k-1,p) discrininant functions. Out of these n, you may, if you wish, to use just m first most strong or significant discriminant functions. Given the multivariate normal data the test of significance of the canonical fuctions is analogous to how it is done in canonical correlation analysis. — ttnphns, Jan 05 '16 at 14:08

Linear Discriminant Analysis for dimensionality reduction - choosing the dimension

0 Answers0