1

Suppose $x,y$ are I.I.D random variables in $d$ dimensions. Consider the following quantity:

$$R=\frac{(E\|x\|^2)^2}{E\langle x, y\rangle^2}$$

equivalently for empirical distribution with data matrix $X$

$$R=\frac{(\operatorname{Tr}X'X)^2}{\operatorname{Tr}X'XX'X}$$

Does this quantity come up in statistics literature?

This came up recently under ML as "effective rank $R_0$" (Bartlett Lemma 5). It's similar to quantity mentioned in wikipedia under Satterthwaite approximation, except where the hat matrix $H=X(X'X)^{-1}X'$ is replaced with $X$.

Richard Hardy
  • 67,272
Yaroslav Bulatov
  • 6,199
  • 2
  • 28
  • 42
  • 3
    What do you mean by "degrees of freedom" here? It is usually meant as a property of a statistical model, not the data. – Tim Sep 26 '22 at 08:28
  • Good question, trying to nail down the terminology, but one example is chi-squared distribution -- it has a degrees of freedom parameter, hence, we can talk about corresponding dataset having a "degrees of freedom" statistic – Yaroslav Bulatov Sep 26 '22 at 08:36
  • 1
    OK, so how do you define "degrees of freedom" here? – Tim Sep 26 '22 at 08:38
  • This talks about degrees of freedom as a property of the data https://en.m.wikipedia.org/wiki/Degrees_of_freedom_(statistics)#Regression_effective_degrees_of_freedom – Yaroslav Bulatov Sep 26 '22 at 08:41
  • 1
    No, it doesn't, it still discusses statistical models, as the whole article which starts with a definition "In statistics, the number of degrees of freedom is the number of values in the final calculation of a statistic that are free to vary." – Tim Sep 26 '22 at 08:46
  • In that link under "Satterthwaite approximation" it provides the following degrees of freedom statistic $$\frac{\operatorname{Tr}(H'H)^2}{\operatorname{Tr}(H'HH'H)}$$ it is similar to the $R$ statistic in the question in that it is computed for a particular dataset – Yaroslav Bulatov Sep 26 '22 at 08:52
  • The one described under "regression effective degrees of freedom, appropriate definitions can include the trace of the hat matrix"? So you mean calculating degrees of freedom for a linear regression model? – Tim Sep 26 '22 at 08:59
  • Thanks for clarification, yes, I'm looking for variations of what the article calls "regression effective degrees of freedom" statistics – Yaroslav Bulatov Sep 26 '22 at 09:01
  • BTW, this earlier post characterizes degrees of freedom as the number of independent degrees of information that go into an estimate. In that sense, degrees of freedom is a property of 1) data 2) model 3) estimating procedure. A dataset which is all zeros would have 0 independent pieces of information – Yaroslav Bulatov Sep 26 '22 at 19:48

1 Answers1

1

This quantity seems to reduce to "Welch-Satterthwaite degrees of freedom" is the limit of variance of $x,y$ going to 0

Yaroslav Bulatov
  • 6,199
  • 2
  • 28
  • 42