I am trying to work out the mean and variance of the fan speed of a turbine across its operational life.
The data I have is the timeseries output of N simulations, each simulating a different load/environmental case. Each of the simulations has a different number of datapoints due to the timestep being varied to achieve convergence. In addition, each of the load cases has a different probability of occurrence.
To frame the question generally, I believe I have N samples, each with a different sample size, n. With corresponding sample mean and sample variance. However, each of these samples has a weighting applied to the sample as a whole rather than the sample points individually. I guess the sampling is biased as each sample is not randomly selected from the entire possibility of design cases.
I think the population mean is simple - just the weighted average of the sample means. But I am finding it hard to reflect the sample weightings in the population variance. I believe this is different to this case because the samples are weighted not just by the sample size but also by the probability of occurrence.
I have access to every single datapoint so I could apply the probability of occurance weighting to each datapoint and then calculate the mean and variance, but this would be biased by the number of sample points in each simulation. My one idea is to compress the simulation data so each sample is the same size, but this might be an unacceptable loss of information.
Thank you in advance for any help.