I am trying to make an argument that if my field collected larger samples, we would be able to make better models with higher predictive accuracy. However, there's also the possibility that we are reaching an asymptote because the quality of the data is relatively noisy.
Is there an appropriate way to estimate how much more I could improve classificaton acuracy if I were to increase my sample? I gave this a shot by taking random subsamples of my data for training and testing (non-overlapping) using from 20-100% of the original sample. Thus, I simulated smaller subsamples.
From that, I found that it indeed seems that accuracy increases a lot at first and then asymptotes. Such that If I were to double my dataset, it would only buy me a small increase in accuracy.
Anyone have any thoughts if what I did was valid?