How does CCA find a low-dimensional common subspace?

Question

According to Wikipedia, canonical correlation analysis (CCA) finds pairs of canonical variables. CCA has also been used in many cases as dimensionality reduction tool to find low-dimensional subspaces. I am wondering how the subspace is found? and how the subspace is related to the pairs of canonical variables?

Related or even possible duplicate: http://stats.stackexchange.com/q/65692/3277 — ttnphns, Jan 15 '15 at 05:05
Are you asking to explain mathematically how CCA works? Or are you asking about intuition behind CCA? Or are you asking specifically about the connection between canonical variables and the subspace? This last connection is like the connection between principal components and principal axes (eigenvectors of the covariance matrix) in PCA. — amoeba, Jan 15 '15 at 09:59
If this is of interest: I've done a demonstration how the canonical correlations are computed from correlations resp. from the factor-loadings matrix, and have even added a short discussion when the varimax-rotation might even be superior over the usual principal axes-solution. It is a small (old) "living letter" to a friend using my 1996 Dos-program "Inside-R" without deeper introduction. One downloads the zip-file, unzips it in a directory where a Dos-process can work on, opens a Dos-Box and starts the demo. See http://go.helms-net.de/stat/ir/cc.zip If you have questions, ask here — Gottfried Helms, Jan 15 '15 at 11:03
@amoeba, I am asking for the connection between canonical variables and the subspace, is it simply the canonical variables spans the subspace? If I have n pairs of variables, no orthogonality within the pair, but orthogonal between the pairs, the spanned subspace is still dimension n? — fast tooth, Jan 15 '15 at 15:00
Say you have two datasets with $n$ points each, $X$ with $p$ dimensions and $Y$ with $q$ dimensions. CCA will find you $m=\min(p,q)$ canonical pairs; this means it will find $m$ canonical axes in the $p$-dimensional $X$ space and $m$ canonical axes in the $q$-dimensional $Y$ space such that projections of the data onto each pair of these axes (called a pair of canonical variables) are maximally correlated. If you take e.g. first 2 pairs only, then first two axes span a 2d subspace in space $X$ and a 2d subspace in space $Y$. Does this make sense? — amoeba, Jan 15 '15 at 15:08
@amoeba, yes, it make sense, but are those two subspace the same? where is the COMMON subspace? — fast tooth, Jan 15 '15 at 15:10
They cannot be the same, because they live in two different spaces! You have to imagine two spaces, $\mathbb R^p$ where data points from $X$ live and $\mathbb R^q$ where data points from $Y$ live. Continuing example from my previous comment, CCA finds 1d (or 2d) subspace in one of them and a 1d (or 2d) subspace in another of them, such that projections onto these subspaces are highly correlated. It is always a pair of subspaces, there is no "common" subspace. — amoeba, Jan 15 '15 at 15:38
@amoeba, The common subspace was my confusion. I read multiple claims that "CCA finds common subspace for two different dataset". for example: http://www.cvc.uab.es/~almazan/wp-content/uploads/2012/07/iccv13_poster.pdf , If there is no common subspace, how CCA helps to compare two dataset? by project on one dataset entirely to one of the newly found subspace? Thank you — fast tooth, Jan 15 '15 at 15:58
Yes, I think talking about "common subspace" is a bit sloppy and should be avoided. But I can see why people do it. Imagine that you find 1d subspace in $X$ and 1d subspace in $Y$ such that projections on these subspaces have correlation $r=0.99$. This means that these projections are pretty much the same (up to a scaling), and one is tempted to talk about the "common projection". By the way, this can be made more precise in a framework called "probabilistic CCA", but this is beyond the scope of classic CCA. I am not sure I understand your last question. — amoeba, Jan 15 '15 at 16:06
@amoeba, my last question is, in practice, CCA is used to find what is common between two dataset. but how? do we project the two original dataset to one of the subspace and compare them within the subspace? — fast tooth, Jan 15 '15 at 16:11
As I said, you cannot project both datasets to the same subspace! Let's talk about 1d subspaces, i.e. first pair of canonical axes. You project one dataset onto its first canonical axis (first axis in a pair) and another dataset onto its first canonical axis (second axis in a pair). Now you have two 1d projections, $x$ and $y$. Their correlation is maximal. You can plot them one against each other, or "compare" in any other way. The rest I guess depends on the application... — amoeba, Jan 15 '15 at 16:16
@amoeba, thank you very much for your patient explanation, I am now more comfortable with CCA and what it does. Do you mind to write a short answer so I can label it as the right answer? — fast tooth, Jan 15 '15 at 16:25
fast tooth: I read multiple claims that "CCA finds common subspace for two different dataset". In a sense, this is true: if we initially embed both datasets in a common superspace then of course canonical variates lie in a common subspace. But this idea is of little heuristic value. Even in usual regression both Y and the Xs share some space (in which the error variable lies, by the way). And so what? What's really interesting is another point: CCA creates a pair of variables - one in set1 space and the other in set2 space - which correlate maximally. Then goes the 2nd pair.... etc — ttnphns, Jan 15 '15 at 17:51

score 11 · Accepted Answer · answered Jan 15 '15 at 16:57

This question was based on a false premise that CCA finds one "common subspace". It does not.

CCA deals with two datasets $X$ and $Y$ of $n$ points each: points from dataset $X$ are $p$-dimensional and live in $\mathbb R^p$ and points from dataset $Y$ are $q$-dimensional and live in $\mathbb R^q$. Let $\mathbf X$ and $\mathbf Y$ be two centered data matrices of $n\times p$ and $n\times q$ size respectively.

CCA finds $m=\min(p,q)$ pairs of canonical axes. The first pair $(\mathbf w_1, \mathbf v_1)$ consists of one canonical axis $\mathbf w_1 \in \mathbb R^p$ and one canonical axis $\mathbf v_1 \in \mathbb R^p$. Projections of the data onto these axes (called "canonical components", "canonical variates", or "canonical variables") are given by $\mathbf X \mathbf w_1$ and $\mathbf Y \mathbf v_1$, and they have highest possible correlation between each other. Projections of the data on the next pair, $\mathbf w_2$ and $\mathbf v_2$ have second highest correlation, etc.

So the first pair of canonical axes defines a 1-dimensional subspace in each space, but these are two different subspaces in two different spaces. Two first pairs define a 2-dimensional subspace, but these are again two different subspaces. There is never a "common subspace", because the spaces $X$ and $Y$ are different to begin with.

How does CCA find a low-dimensional common subspace?

1 Answers1

Linked