1

I have some multidimensional data. I would like to measure how "one directional" the data is. For example, the rows in this sample

    ┌          ┐
X = │  2    1  │
    │  4    2  │
    │ -2   -1  │
    └          ┘

are just scaled versions of [ 2 1 ].

Now lets say I add some noise so they are not perfect multiples anymore. My first idea is to do something like SVD and take the ratio of the first singular value divided by the sum of all singular values. This would give me a number that can be used to compare different samples, I think as long as the dimensions are the same. However, that requires quite a bit of computation, and I would like to do this for a million samples.

Any ideas for a better or faster way to measure or estimate how "one directional" my data is? I do not care what that direction is (don't need the basis vectors). My actual data is larger than 2D.

kexu
  • 111
  • 1
    A truncated SVD will quickly compute the first singular value. One ordinarily uses the squared singular values because they represent variances (at least for column-centered matrices). The sum of the squared singular values can be found quickly through direct calculation because it's the sum of squares of the entries of $X.$ – whuber Apr 28 '22 at 21:16
  • Thanks @whuber. I heard of the truncated SVD but did not know about the sum of squared singular values property. – kexu Apr 28 '22 at 23:19
  • In that case, here's a demonstration. Let the SVD for a matrix $A$ be $A=UDV^\prime.$ Then the sum of squares of entries of $A$ is $$\operatorname{tr}(A^\prime A)=\operatorname{tr}(VDU^\prime UD^\prime V^\prime)=\operatorname{tr}(VD^2V^\prime)=\operatorname{tr}(D^2VV^\prime)=\operatorname{tr}(D^2),$$ manifestly the sum of squared singular values. – whuber Apr 29 '22 at 12:25

0 Answers0