How does random forest generate the random forest

Question

I am not an expert of random forest but I clearly understand that the key issue with random forest is the (random) tree generation. Can you explain me how the trees are generated? (i.e. What is the used distribution for tree generation?)

Thanks in advance !

score 19 · Answer 1 · 2022-06-18T13:42:20.083

19

The main idea is the bagging procedure, not making trees random. In detail, each tree is built on a sample of objects drawn with replacement from the original set; thus each tree has some objects that it hasn't seen, which is what makes the whole ensemble more heterogeneous and thus better in generalizing.

Furthermore, trees are being weakened in such a way that on the each split only M (or mtry) randomly selected attributes are considered; M is usually a square root of the number of attributes in the set. This ensures that the trees are overfitted less, since they are not pruned. You can find more details here.

On the other hand, there is a variant of RF called Extreme Random Forest, in which trees are made in a random way (there is no optimization of splits) -- consult, I think this reference.

edited Jun 18 '22 at 13:42

answered Jul 22 '10 at 09:53

Sorry, bu I don't really understand your answer. What do you mean by "In detail, each tree is build on a sample of objects drawn with replacement from the original set"
Can you give more precision on where I find the details "here"?
– robin girard Jul 22 '10 at 10:04
1

This is how bagging works; check out http://en.wikipedia.org/wiki/Bootstrap_aggregating . Here is a link (hardly visible in that theme I admit) to the detailed RF reference. – Jul 22 '10 at 11:24
"This ensures that the trees are overfitted less, since they are not pruned." Doesn't pruning reduce overfitting of the trees? – Student Jun 18 '22 at 14:46

score 17 · Accepted Answer · edited Aug 21 '11 at 20:53

17

Implementations of RF differ slightly. I know that Salford Systems' proprietary implementation is supposed to be better than the vanilla one in R. A description of the algorithm is in ESL by Friedman-Hastie-Tibshirani, 2nd ed, 3rd printing. An entire chapter (15th) is devoted to RF, and I find it actually clearer than the original paper. The tree construction algorithm is detailed on p.588; no need for me to reproduce it here, since the book is available online.

edited Aug 21 '11 at 20:53

casperOne

585
4
18

answered Jul 26 '10 at 17:50

gappy

5,600

Thanks a lot for your answer ! I have read this book from first to last page, but I think it was edition 1... I didn't know it was available online. – robin girard Jul 26 '10 at 18:24

How does random forest generate the random forest

2 Answers2

Linked