I have some existing data, and I want to do a power analysis for a second confirmatory study. I want to estimate the number of participants I would need for having a power of .9, keeping the same items I already have. To give a bit of extra context, I am analyzing my data with linear mixed effects models; now I have a huge dataset where the t value for the predictor of interest is 25, so I think I will need less data for my second study.
The standard approach in simulation-based power analysis as I understand it is (e.g. see simr):
- Select a number of participants N
- Simulate the response variable for some subset of the data with N participants
- Refit the model on the simulated data
- Calculate the power for N participants as number of simulations that yielded significant results
I was wondering, why do we need to simulate the response data? Why can't I just:
- Select a number of participants N
- Sample N1 possible subsets of our data with N participants
- Refit our mixed effects models N1 times on downsampled data (as opposed to simulated data)
- Calculate the power for N participants as the ratio of models in N1 that are significant
Also, in case this second approach makes sense, I was wondering whether I should sample with or without replacement.