0

A random variable X={X1..Xm} and Var(X) is the mXm variance-covariance matrix. Is there an accepted 1-d statistic like variance that may be extracted from the sample variance of data representing a 2-d random variable?

(explanation of the 2d equivalent of a sample variance) How to find variance between multidimensional points?

Will
  • 3
  • What do you want this statistic to do or represent? – whuber Oct 24 '13 at 21:42
  • Something comparable to the 1-d variance. In other words.. if we have n individuals with m data each, what is the total variance of that sample? – Will Oct 24 '13 at 22:17
  • 3
    The thing which is comparable is the $2\times 2$ variance/covariance matrix. You could take the mean of the square of the Euclidean distances between values and their mean, but that is unlikely to be as informative in most circumstances. – Henry Oct 24 '13 at 22:23
  • It's not clear what you want. Are you after something like the variance of the first principal component? – Glen_b Oct 24 '13 at 23:43
  • Sorry, I'm not being clear. The n measurements would determine the variance of each of m PC's, if you like. So X=[X1...Xm] and I believe that Var(X) is the mXm (Variance-Covariance Matrix).. as seen in the answers to the other question above. I want to know if there is a standard, accepted way way of collapsing that mXm matrix into a single value in R. For example would it be legitimate to take a weighted mean of the separate variances for each of the Xi's i.e. trace(Var(X))/m. Or is there a better way? – Will Oct 25 '13 at 04:57
  • 1
    The thrust of our comments, Will, is that because there are myriad ways to summarize an $n$ by $n$ matrix with a single number, then in order to answer this question we need to know why you need just a single number and how you intend to interpret it. Please edit your question to clarify. – whuber Oct 25 '13 at 16:03
  • @whuber, the three meaningful summaries that I can think of are all based on the spectrum of this matrix, and are commonly used in multivariate analysis: $l_\infty$, the largest eigenvalue; $l_1$, trace of the matrix; and an analogue of the entropy, the determinant of the matrix. If you reopen the question, I'd be happy to post this as an answer; I don't think it is an unreasonable questions per se. – StasK Oct 25 '13 at 17:15
  • @StasK It might be a reasonable question on [math.se], where nobody cares what anything means or how it is applied, but on this site we need more information, because the question asks for a statistic and statistics have a purpose: they are used to compare things in particular ways. In the absence of any context, we are left having to guess what comparison is intended. The OP only needs to give us a glimmer of their intention for this question to become answerable; surely there is no harm in waiting for that information. – whuber Oct 25 '13 at 17:45
  • The variance is the expected variation about the mean. My question is if there is a 2-d equivalent of the variance. Implicitly I want to know if there is something similar to the first central moment as well. In response to the question about application: I have m PC vectors of length n (one for each of n individuals). – Will Oct 29 '13 at 01:05

2 Answers2

0

I don't believe there is an answer to your question. You can't scale down a variance-covariance matrix. If you did (say by taking the determinant) you will lose all the usefull information -- it would no longer be explaining the variance.

As suggested in the comments, you can run a Principal Component Analysis to reduce the dimension of your data to one. Then the inertia (not the variance) of the first component can provide a variance-like summary of the full data.

However, this is different from the variance because you will be calculating the variation with respect to a new calculated principal component and not the original variables. It is therefore no longer the variance, but the inertia.

Drew75
  • 1,205
  • 10
  • 12
0

I guess it really depends on what you are doing. If your are investigating the relationship between the two variables, the answer would be the covariance. (off-diagonal element in your 2x2 matrix.)

In case you are trying to track the uncertainty of estimates from 2 different systems using their covariance matrices, you could use:

$det(\Sigma)$, or $tr(\Sigma)$, where $\Sigma$ is the covariance matrix.

i.e. the determinant of the covariance matrix is a 1D measure to track uncertainty, and there is some theory about how and why it might make sense. Intuitively, the det tells you by how much you would the cov matrix scale a space if applied as a linear transformation.

  • Just read Drew's answer. He is right on the information loss.Having said that, in the end, what is "accepted" as measure really depends on the field you work in. I would refrain from re-inventing the wheel. Look at what papers in your field are using and do that. For SLAM research in robotics, the det and trace are being used as optimality criteria for a long time with varying success and Google returns many papers to prove this. – means-to-meaning Oct 25 '13 at 11:29
  • Very interesting information. I' – Drew75 Oct 25 '13 at 17:17