2

I know there are a couple posts asking about why we whiten the data for ICA. I understand why we whiten to fix scaling invariants between the sources and to increase the computationally efficiency.

But most answers mention something saying we want to decorrelate our data before running ICA or it should be orthogonal. Could someone please explain the intuition behind this as well as the example in this link https://arnauddelorme.com/ica_for_dummies/.

Specifically, why does our data need to be decorrelated? The ICA assumption is that our data is statistically independent. Decorrelating our data does not create statistical independence just a form of linear independence.

As for the example in the link, I do not understand it all. If someone could break it down clearly that would be a huge help. One of the many questions I have about it is why do they apply a linear transformation to the data in the example.

A clear geometric intuition would be beyond appreciated. Thank you for the clarification.

Edit/Update

I saw someone else had a similar question, so after I did some research and felt like I had a decent intuition, I provided an explanation that I think should clear up both questions. It can be found here Whitening/Decorrelation - why does it work?.

Please take all of this with a grain of salt until someone can confirm it.

1 Answers1

2

The TL;DR is that whitening isn't essential but it does simplify the task. Specifically, whitening the data reduces the number of parameters under estimation.

"Independent Component Analysis: Algorithms and Applications" by Aapo Hyvärinen and Erkki Oja. Neural Networks, 13(4-5):411-430, 2000.

Here we see that whitening reduces the number of parameters to be estimated. Instead of having to estimate the $n^ 2$ parameters that are the elements of the original matrix $A$, we only need to estimate the new, orthogonal mixing matrix $\tilde{A}$. An orthogonal matrix contains $n(n − 1)/2$ degrees of freedom. For example, in two dimensions, an orthogonal transformation is determined by a single angle parameter. In larger dimensions, an orthogonal matrix contains only about half of the number of parameters of an arbitrary matrix. Thus one can say that whitening solves half of the problem of ICA. Because whitening is a very simple and standard procedure, much simpler than any ICA algorithms, it is a good idea to reduce the complexity of the problem this way.


The geometric intuition is that whitening doesn't remove any geometric information. It just rotates the data. Correlations only make sense in a specific coordinate system; if we rotate the coordinates, we can remove correlation.

enter image description here The second row of this diagram taken from Wikipedia contains several lines, but the center one has 0 correlation, even though it's just a rotation of the other lines in that row.

Sycorax
  • 90,934
  • Thank you for the reply. A couple of follow ups
    1. "The geometric intuition is that whitening doesn't remove any information." This is not necessarily, true right? Specifically, we are removing the first two moments (mean and covariance). However, in the case of ICA, this is fine because our final answer is statistically independent so it should ultimately have 0 covariance. I just think it might be important to make this distinction that whitening does cause a loss of information because in other contexts (other ML models) it might be wise to think twice about whitening.
    – user19402204 Sep 26 '23 at 20:08
  • but in the case of ICA we can view whitening as killing two birds with one stone: remove scaling invariance from each component and also give us an uncorrelated representation of our data which can be thought of as an informed prior or rough estimate of our final answer.
    1. Secondly, I am still not understanding the visual in the link I put in. I get the scatter plots Ive seen on other sources that show after whitening we just have a circle or blob.
    – user19402204 Sep 26 '23 at 20:11
  • This makes sense, we have removed linear relationships so we now should not be able to see any linear correlation between the variables in the scatter plots.

    However, I have no clue what is going on in that link so if someone could break that down, that would be incredibly helpful.

    Thanks

    – user19402204 Sep 26 '23 at 20:14
  • The mean and variance aren't geometric information. Take the axis labels off of a plot of the data and it will look the same if you shift it or rescale it by a positive scalar. // The only question in your post is about the geometric intuition of whitening for ICA. If you're specifically interested in understanding a particular portion of the link, you should post a new question quoting the passage you want to know about and clearly articulate what part you're having trouble understanding. – Sycorax Sep 26 '23 at 20:24
  • That makes sense, thanks! Just to confirm, you do lose information but not necessarily geometric information (and the statistical information lost in ICA does not make a difference)? – user19402204 Sep 26 '23 at 20:34
  • Geometry cares about distances and angles. Shifting and rescaling by constants preserves all angles among any 3 points. Shifting by constants preserves distances. Rescaling changes distances only by a multiple of the scale. Subtracting means and dividing by standard deviations only rescales and shifts, which doesn't matter for ICA or geometry. – Sycorax Sep 27 '23 at 03:06
  • Thats a nice way of putting it, thank you. Taking it outside of the geometric context and ICA context, is the following true: whitening removes information (correlation between points)?

    I just want to confirm this (especially somewhere people can see) because I feel like this is often not made explicit.

    – user19402204 Sep 27 '23 at 08:00
  • Whitening removes the mean, variance and correlations. After whitening, you won't be able to recover that information using the whitened data alone. – Sycorax Sep 27 '23 at 13:06