I have a two large lists and I want to compute the p-value that they are similar. The similarity algorithm is a black-box, but for this application we will trust it gives accurate p-values. My problem is my list is too large and the algorithm won't give answers for them. However, I think that if I take a smaller random sample of each list the algorithm will give a p-value. Is it permissible to randomly sample both lists a large number of times and average the p-values?
Should I randomly sample with or without replacement?