1

I am learning about bagging ensemble techniques like Random Forests and the concepts of Row Sampling, Pasting, Random Subspace, and Random Patches Methods. What I understood is that bagging involves the creation of a bootstrap dataset from the original dataset. As per wikipedia the size of the bootstrap dataset should be equal to the original dataset. Compared with the definition of bootstrapping in statistics, a bootstrap dataset is a smaller sample of the original population.

I want to confirm if this concept of bootstrapping is different when we talk about Bagging versus statistics, as I have explained above?

Tim
  • 138,066
tanmay
  • 111

1 Answers1

0

They are the same. In both cases when resampling you created a bootstrap dataset of the size of the original data. It's called bootstrap aggregating because it used the bootstrap known from statistics.

The size of the bootstrap sample is equal to the size of the data, not the population because you want to imitate sampling your data from the population. It is done to account of the uncertainty of the sampling process. It wouldn't make much sense to sample from your data the number of samples equal to the size of the population that is much larger than the sample (e.g. few billions of people, infinite size). Such sample would have many repeated observations so would have much less variability than the intended population.

Tim
  • 138,066