In Geron's book "Hands-on Machine Learning with Scikit-Learn and Tensorflow" there is this sentence on page 187 "By default a BaggingClassifier samples m training instances with replacement (bootstrap=True), where m is the size of the training set. This means that only about 63% of the training instances are sampled on average for each predictor." And in the footnote he mentions this ratio approaches $1-\exp(-1)$ as $m$ grows (the author means that approaches $\infty$, I think). How should I prove this? I have no idea apart from: there are $m^m$ possible training sets.
Asked
Active
Viewed 133 times
1 Answers
1
The probability of a particular sample in the training set not appearing in the bootstrap sample is $$p=\left(\frac{m-1}{m}\right)^m=\left(1-\frac{1}{m}\right)^m\rightarrow e^{-1}$$ as $m$ grows. That means probability of this particular sample being present in the bootstrap sample is $1-p=1-1/e\approx 0.63$.
This means, on average, $63\%$ of the training set is expected to be in the bootstrapped sample.
gunes
- 57,205