So I estimated a particular statistic $\Phi$ (custom made) by bootstrapping 1000 samples from the original dataset to generate 1000 different $\Phi$s. The issue is that saving those 1000 bootstrap statistics was not at all memory efficient, so I decided to just save the summary statistics from the bootstrap samples. Saving the samples is also difficult because there are lot of iterations of the bootstrap.
So for example, for each iteration, I have 1000 samples of my dataset, I get 1000 different $\phi$, from which I calculate the summary (mean, median etc).
I would eventually report the summary statistic, but I also have to analyze the distribution of my bootstraps.
I have saved the mean, standard deviation, median, first and third quartiles, the min and max as well as the 5th and 95th percentiles. I do presume that the bootstrap distribution should look normal. But I wanted a robust way to generate my samples back (1000 samples) for further analysis based on these summary statistics.
Here is what I have tried so far.
Assuming the bootstrap sampling distribution to be normal, I tried using the truncnorm distribution in Python to specify the min,max,mean and std deviation and generate 1000 samples.
Then I find the index of the corresponding percentiles (medians and the others) in those 1000 samples, and just change them to the summary statistics I have.
I tried searching different StackOverflow forums for an answer but this is the best I could come up with till now.
It would be helpful if I can get further insights on this.
I did indeed set a seed, but why will repeat the calculation of my custom statistic all over again? It takes over 2 days for my big dataset.
– cwanderroycbooks May 31 '23 at 14:04Since I cannot save those 1000 statistic each time, I rather save their summary. But I need to analyze and plot the distribution, so I need to regenerate the distribution back.
I have modified my question to reflect this.
– cwanderroycbooks Jun 01 '23 at 02:30