1

From some process I got a series of values. I want to compute the variance of the mean from this series. The series is built with contiguous sub-series. In each sub-series the values are correlated. All sub-series follow the same pattern but they have different total lengths.

I read on Wikipedia that for correlated variables holds

$$\operatorname{Var}\left(\overline{X}\right) = \frac{\sigma^2}{n} + \frac{n - 1}{n}\rho\sigma^2 $$

where $\rho$ is the average correlation.

Q1: What does average correlation means or how it is computed?

Intuitively I would expect it to be something like the lag or the sum of all lags of the autocorrelation function of the series, but I don't know.

Q2: Is this approach reasonable?


EDIT: I checked these questions:

According to comments, I should use:

$$\operatorname{Var}\left(\sum_{i=1}^n X_i\right) = \sum_{i=1}^n \operatorname{Var}\left(X_i\right) + 2\sum_{1\le i<j\le n}\operatorname{Cov}\left(X_i, X_j\right)$$

but I am not sure on how to compute the $\operatorname{Cov}\left(X_i, X_j\right)$.

If it helps, the data arises from an stationary process (although it always starts with a complete sub-series).

Variance of a sum of identically distributed random variables that are not independent

Variance of sum of dependent random variables

and this article.

1 Answers1

1

The general formula is

$$\text{Var}\left(\sum_{k=1}^n a_k X_k\right) = \sum_{k=1}^n a_k^2 \text{Var}\left(X_k\right) + \sum_{k=1,l=1 \\k\neq l}^n a_k a_l \text{Cov}\left(X_k,X_l\right) $$

And when all the variances are equal to $\sigma^2$ and all $a_k = \frac{1}{n}$ then

$$\sum_{k=1}^n a_k^2 \text{Var}\left(X_k\right) + \sum_{k=1,l=1 \\k\neq l}^n a_k a_l \text{Cov}\left(X_k,X_l\right) = \sum_{k=1}^n \frac{1}{n^2} \sigma^2 + \sum_{k=1,l=1 \\k\neq l}^n \frac{1}{n^2} \rho_{k,l}\sigma^2 = \frac{1}{n} \cdot \sigma^2 + \frac{1}{n^2} \left( \sum_{k=1,l=1 \\k\neq l}^n \rho_{k,l} \right) \cdot \sigma^2$$

It is this sum of all the correlations $\rho_{k,l}$ that is being replaced by the average.

$$\sum_{k=1,l=1 \\k\neq l}^n \rho_{k,l} = \bar\rho \cdot n \cdot (n-1)$$

but I am not sure on how to compute the $\operatorname{Cov}\left(X_i, X_j\right)$.

See the definition of the Pearson correlation coefficient $\rho_{X,Y} = \frac{Cov(X,Y)}{\sqrt{Var(X)\cdot Var(Y)}}$

  • +1 Thank you! Could you take a look at this related question? https://stats.stackexchange.com/questions/588821/how-to-compute-the-variance-for-this-process – user1420303 Sep 14 '22 at 21:40