1

I have a sequences of random iid non-correlated positive integers sampled from some (unknown) distribution $\boldsymbol X=[n_1,n_2,...,n_N]$.

From it, I built another sequence in the following way:

$$ [1,2,3,...,n_1, 1,2,3,...,n_2, ...,1,2,3,...,n_N]$$

That is, I replaced each $n_i$ with a list $1,2,3,..,n_i$ (note: the lowest allowed value for $n_i$ is 1).

Q: How to get the variance of the mean of the last list? If it be possible, I would prefer a solution in terms of the $n_i$s.

For the first list I just used $\operatorname{Var}(\bar X)=\frac{\operatorname{Var}(X_i)}{N}$. I tried to solve the problem following this approach. But I do not get good results (sometimes I get a negative result).

User1865345
  • 8,202
  • 1
    The number of values in the new sequence is

    $$\nu = \sum_{i=1}^N n_i.$$

    Their sum is

    $$S = \sum_{i=1}^N \frac{(n_i+1)n_i}{2}.$$

    By definition, the mean is

    $$m = \frac{S}{\nu} = \frac{\sum_{i=1}^N (n_i+1)n_i}{2\sum_{i=1}^N n_i} = \frac{1}{2} + \frac{1}{2}\frac{\sum_{i=1}^N n_i^2}{\sum_{i=1}^N n_i}.$$

    This does not simplify. That's as far as you can go in this general setting. If you would like to specify the distribution of the $n_i,$ we could give more specific results.

    – whuber Sep 14 '22 at 21:54
  • @whuber Thank you. I do not know the underlying distribution. Values arise from a chemical process. I was thinking if it would be possible to decompose it as some kind of sums of variances. That is, dividing the sequence in blocks for each $n_i$: 1,2,3,...,$n_i$. The variance of each sequence would be $(n_i+1)(2n_i+1)/6-(n_i+1)^2/4$ if I am not wrong (it gives me the same result that the common expression for the variance of a set of numbers). I am not looking for an unique result. I look for some way to compute it for each experiment. Do you think that there could be some chance for it? – user1420303 Sep 14 '22 at 23:05
  • @whuber I had in mind that as the $n_i$ are uncorrelated maybe knowing the variances of each sublist would lead me to the variance of the entire list. Does it makes sense? – user1420303 Sep 14 '22 at 23:07
  • 1
    Each term in the sum for $m$ involves all the $N$ random variables in its denominator. This creates a strong interdependency, as well as creating algebraic difficulties in the evaluation of variances even when you do stipulate the common distribution of the $n_i.$ (Your formula neglects that dependency and therefore is incorrect, although it might sometimes be an OK approximation.) Out of curiosity, what aspect of a chemical process can be modeled in this way? The use of integral random variables is unusual. Possibly a more tractable model can be found. – whuber Sep 14 '22 at 23:31
  • @whuber I understand. The formula is just for the variance of a increasing sequence, but of course, it is just part of the numerator. Regarding your question: it is associated with the duration of adsorbates on a surface. I think that in statistical terms what I want to do is to compute de variance of the mean residual time. – user1420303 Sep 14 '22 at 23:46
  • It is likely that an integral random variable is not a good model for a duration. – whuber Sep 15 '22 at 14:18
  • Thanks @whuber It seems to me that my problem was already solved in https://arxiv.org/pdf/1707.02484.pdf (last one line eq. from page 2) but unfortunately it is expressed in mathematical terms that are inaccessible to me. – user1420303 Sep 15 '22 at 14:32
  • The notation in that paper obscures the basic simplicity of the equations. But I cannot see any connection between it and the situation you describe here. Why not ask us about the problem you really have, rather than about a model of it that might or might not be useful or tractable? – whuber Sep 16 '22 at 14:25

0 Answers0