How to compute the variance for this process?

Question

I have a sequences of random iid non-correlated positive integers sampled from some (unknown) distribution $\boldsymbol X=[n_1,n_2,...,n_N]$.

From it, I built another sequence in the following way:

$$ [1,2,3,...,n_1, 1,2,3,...,n_2, ...,1,2,3,...,n_N]$$

That is, I replaced each $n_i$ with a list $1,2,3,..,n_i$ (note: the lowest allowed value for $n_i$ is 1).

Q: How to get the variance of the mean of the last list? If it be possible, I would prefer a solution in terms of the $n_i$s.

For the first list I just used $\operatorname{Var}(\bar X)=\frac{\operatorname{Var}(X_i)}{N}$. I tried to solve the problem following this approach. But I do not get good results (sometimes I get a negative result).

The number of values in the new sequence is
$$\nu = \sum_{i=1}^N n_i.$$

Their sum is

$$S = \sum_{i=1}^N \frac{(n_i+1)n_i}{2}.$$

By definition, the mean is

$$m = \frac{S}{\nu} = \frac{\sum_{i=1}^N (n_i+1)n_i}{2\sum_{i=1}^N n_i} = \frac{1}{2} + \frac{1}{2}\frac{\sum_{i=1}^N n_i^2}{\sum_{i=1}^N n_i}.$$

This does not simplify. That's as far as you can go in this general setting. If you would like to specify the distribution of the $n_i,$ we could give more specific results. — whuber, Sep 14 '22 at 21:54
@whuber Thank you. I do not know the underlying distribution. Values arise from a chemical process. I was thinking if it would be possible to decompose it as some kind of sums of variances. That is, dividing the sequence in blocks for each $n_i$: 1,2,3,...,$n_i$. The variance of each sequence would be $(n_i+1)(2n_i+1)/6-(n_i+1)^2/4$ if I am not wrong (it gives me the same result that the common expression for the variance of a set of numbers). I am not looking for an unique result. I look for some way to compute it for each experiment. Do you think that there could be some chance for it? — user1420303, Sep 14 '22 at 23:05
@whuber I had in mind that as the $n_i$ are uncorrelated maybe knowing the variances of each sublist would lead me to the variance of the entire list. Does it makes sense? — user1420303, Sep 14 '22 at 23:07
Each term in the sum for $m$ involves all the $N$ random variables in its denominator. This creates a strong interdependency, as well as creating algebraic difficulties in the evaluation of variances even when you do stipulate the common distribution of the $n_i.$ (Your formula neglects that dependency and therefore is incorrect, although it might sometimes be an OK approximation.) Out of curiosity, what aspect of a chemical process can be modeled in this way? The use of integral random variables is unusual. Possibly a more tractable model can be found. — whuber, Sep 14 '22 at 23:31
@whuber I understand. The formula is just for the variance of a increasing sequence, but of course, it is just part of the numerator. Regarding your question: it is associated with the duration of adsorbates on a surface. I think that in statistical terms what I want to do is to compute de variance of the mean residual time. — user1420303, Sep 14 '22 at 23:46
It is likely that an integral random variable is not a good model for a duration. — whuber, Sep 15 '22 at 14:18
Thanks @whuber It seems to me that my problem was already solved in https://arxiv.org/pdf/1707.02484.pdf (last one line eq. from page 2) but unfortunately it is expressed in mathematical terms that are inaccessible to me. — user1420303, Sep 15 '22 at 14:32
The notation in that paper obscures the basic simplicity of the equations. But I cannot see any connection between it and the situation you describe here. Why not ask us about the problem you really have, rather than about a model of it that might or might not be useful or tractable? — whuber, Sep 16 '22 at 14:25

How to compute the variance for this process?

0 Answers0

Linked