I'm using Linear Discriminant Analysis to do dimensionality reduction of a multi-class data. What is the best method to determine the "correct" number of dimensions? Can I use a method similar to PCA, choosing the dimensions that explain 90% or so of the variance? Can I use AIC or BIC for this task?
In more mathematical terms, assume N data points of the form $(x_i, y_i)$, such that $x_i\in \mathbb{R}^n$ is the i'th feature vector, and $y_i\in\{1,...,d\}$ is its class. The purpose of Linear Discriminant Analysis (LDA) is to maximize the following objective: $$J(W)=\frac{w^TS_Bw}{w^TS_ww}$$ where $S_B$ is the "between" classes scatter matrix, and $S_w$ is the "within" class scatter matrix. In plain terms, we try to make each cluster as small as possible, and as far away from other clusters.
The scatter matrices are defined as: $$\begin{eqnarray} S_B & = & \sum_c{(\mu_c-\bar{x})(\mu_c-\bar{x})^T} \\ S_w &=& \sum_c\sum_{i\in c}{(x_i-\mu_c)(x_i-\mu_c)^T} \end{eqnarray}$$ where $\mu=\sum_i x_i$ is the average feature, and $\mu_c = \sum_{i \in c}x_i$ is the average of all features belonging to class $c$.
The solution to this optimization problem is of the form $$S_W^{-1} S_Bv=\lambda v$$ which can be solved as an eigenvalue problem.
Once we compute the eigenvectors and corresponding eigenvalues, we can perform dimensionality reduction by choosing only $m<n$ eigenvectors with the largest eigenvalues. These eigenvectors are not necessarily orthogonal so they span a space of dimension $m$ or less. My question is how to choose the most appropriate $m$?