0

I am trying to work out the mean and variance of the fan speed of a turbine across its operational life.

The data I have is the timeseries output of N simulations, each simulating a different load/environmental case. Each of the simulations has a different number of datapoints due to the timestep being varied to achieve convergence. In addition, each of the load cases has a different probability of occurrence.

To frame the question generally, I believe I have N samples, each with a different sample size, n. With corresponding sample mean and sample variance. However, each of these samples has a weighting applied to the sample as a whole rather than the sample points individually. I guess the sampling is biased as each sample is not randomly selected from the entire possibility of design cases.

I think the population mean is simple - just the weighted average of the sample means. But I am finding it hard to reflect the sample weightings in the population variance. I believe this is different to this case because the samples are weighted not just by the sample size but also by the probability of occurrence.

I have access to every single datapoint so I could apply the probability of occurance weighting to each datapoint and then calculate the mean and variance, but this would be biased by the number of sample points in each simulation. My one idea is to compress the simulation data so each sample is the same size, but this might be an unacceptable loss of information.

Thank you in advance for any help.

Francis
  • 13
  • It is not clear to me precisely what is going on here, but it seems you know how to estimate the population mean $E[X]$ where $X$ is the fan speed. If you estimated $E[X^2]$ the same way then perhaps you could use $E[X^2] - (E[X])^2$ as an estimator of the variance, possibly using something like Bessel's correction if you feel it appropriate – Henry Oct 28 '22 at 11:45
  • @Henry Yes! Thanks, that seems a mathematically sound approach. I'll see if that value is sensible. I guess maybe my problem is there are multiple valid estimators - but my stats knowledge is very limited to know which one to choose. – Francis Oct 28 '22 at 12:27

0 Answers0