random forest and bootstrap out of bag

Question

If you can explain these, it will be highly appreciated! Thanks in advance

1)Here is the code

S3 method for class 'formula' randomForest(formula, data=NULL, ..., subset, na.action=na.fail) ## Default S3 method: randomForest(x, y=NULL, xtest=NULL, ytest=NULL, ntree=500, mtry=if (!is.null(y) && !is.factor(y)) max(floor(ncol(x)/3), 1) else floor(sqrt(ncol(x))), replace=TRUE, classwt=NULL, cutoff, strata, sampsize = if (replace) nrow(x) else ceiling(.632*nrow(x)),...

What I understand is if we build a random forest, it will have a bunch of trees with bootstrap samples. If we have sample with replacement, we will have a sample size of nrow(x), which is the number of all the observations, but some of them are duplicates. Long story short, if we have 600 obseravtions in total, we will have 400 observations that are unique and 200 are duplicate. Then, each decision tree is trained on a randomly selected number of observations (i.e. 400 observations) from the all the training set (i.e. sample size of 600), with replacement. This process is known as bootstrapping. Those that are not selected to be trained are called out-of-bag.

Please correct me if I am wrong.

Also, if we have else, we will have 0.632*nrows. do we have out-of-bag observations? And is each decision tree trained on a randomly selected number of observations from the all the training set?

There are $N$ objects and you sample $.632 N$ of them without replacement. How many objects are not sampled? — Sycorax, Apr 20 '23 at 16:30
(1-0.632)N are not sampled. So, are these still out-of-bag observations in "else" case? Thanks! — shawn, Apr 20 '23 at 16:37
Yes, they're out-of-bag because they are not sampled & not used to train the tree. — Sycorax, Apr 20 '23 at 16:39
Thank you, Sycorax. Also, can you please verify if #1 is correct? thanks in advance — shawn, Apr 20 '23 at 16:47

random forest and bootstrap out of bag

0 Answers0