According to Wikipedia, canonical correlation analysis (CCA) finds pairs of canonical variables. CCA has also been used in many cases as dimensionality reduction tool to find low-dimensional subspaces. I am wondering how the subspace is found? and how the subspace is related to the pairs of canonical variables?
1 Answers
This question was based on a false premise that CCA finds one "common subspace". It does not.
CCA deals with two datasets $X$ and $Y$ of $n$ points each: points from dataset $X$ are $p$-dimensional and live in $\mathbb R^p$ and points from dataset $Y$ are $q$-dimensional and live in $\mathbb R^q$. Let $\mathbf X$ and $\mathbf Y$ be two centered data matrices of $n\times p$ and $n\times q$ size respectively.
CCA finds $m=\min(p,q)$ pairs of canonical axes. The first pair $(\mathbf w_1, \mathbf v_1)$ consists of one canonical axis $\mathbf w_1 \in \mathbb R^p$ and one canonical axis $\mathbf v_1 \in \mathbb R^p$. Projections of the data onto these axes (called "canonical components", "canonical variates", or "canonical variables") are given by $\mathbf X \mathbf w_1$ and $\mathbf Y \mathbf v_1$, and they have highest possible correlation between each other. Projections of the data on the next pair, $\mathbf w_2$ and $\mathbf v_2$ have second highest correlation, etc.
So the first pair of canonical axes defines a 1-dimensional subspace in each space, but these are two different subspaces in two different spaces. Two first pairs define a 2-dimensional subspace, but these are again two different subspaces. There is never a "common subspace", because the spaces $X$ and $Y$ are different to begin with.
- 104,745
I read multiple claims that "CCA finds common subspace for two different dataset". In a sense, this is true: if we initially embed both datasets in a common superspace then of course canonical variates lie in a common subspace. But this idea is of little heuristic value. Even in usual regression both Y and the Xs share some space (in which the error variable lies, by the way). And so what? What's really interesting is another point: CCA creates a pair of variables - one in set1 space and the other in set2 space - which correlate maximally. Then goes the 2nd pair.... etc – ttnphns Jan 15 '15 at 17:51