If you can explain these, it will be highly appreciated! Thanks in advance
1)Here is the code
S3 method for class 'formula' randomForest(formula, data=NULL, ..., subset, na.action=na.fail) ## Default S3 method: randomForest(x, y=NULL, xtest=NULL, ytest=NULL, ntree=500, mtry=if (!is.null(y) && !is.factor(y)) max(floor(ncol(x)/3), 1) else floor(sqrt(ncol(x))), replace=TRUE, classwt=NULL, cutoff, strata, sampsize = if (replace) nrow(x) else ceiling(.632*nrow(x)),...
What I understand is if we build a random forest, it will have a bunch of trees with bootstrap samples. If we have sample with replacement, we will have a sample size of nrow(x), which is the number of all the observations, but some of them are duplicates. Long story short, if we have 600 obseravtions in total, we will have 400 observations that are unique and 200 are duplicate. Then, each decision tree is trained on a randomly selected number of observations (i.e. 400 observations) from the all the training set (i.e. sample size of 600), with replacement. This process is known as bootstrapping. Those that are not selected to be trained are called out-of-bag.
Please correct me if I am wrong.
- Also, if we have else, we will have 0.632*nrows. do we have out-of-bag observations? And is each decision tree trained on a randomly selected number of observations from the all the training set?