Why does Random Forest perform worse than Bagging?

Question

The 2 results I got for bagging and random forest are shown below. It seems that calculating mean MSE from bootstrapping also result in a lower mean MSE for bagging as compared to random forest. Is bagging a better predictive model in this case?

best_bag_model_all
Call:
randomForest(formula = Balance ~ ., data = bank, mtry = 60, 
             importance = TRUE) 
           Type of random forest: regression
                 Number of trees: 500
No. of variables tried at each split: 60
Mean of squared residuals: 259811.3
                % Var explained: 97.8
best_rf_model_all
Call:
randomForest(formula = Balance ~ ., data = bank, mtry = 8, 
             importance = TRUE) 
           Type of random forest: regression
                 Number of trees: 500
No. of variables tried at each split: 8
Mean of squared residuals: 279642.4
                % Var explained: 97.63

Does this answer your question? Intutition of why Bootstrap aggregating reduces overfitting? — patagonicus, Apr 03 '22 at 05:32
The Wikipedia article on No free lunch in search and optimization substantiates the claim that no type of learning algorithm is guaranteed to always outperform another type of learner when it comes to Machine Learning applications. Please see the answer in this CV.SE thread too: https://stats.stackexchange.com/questions/550021 — usεr11852, Apr 03 '22 at 23:24

score 1 · Answer 1 · answered Apr 25 '23 at 03:48

1

Because you are not splitting the data into training and testing. Bagging will always give a better in-sample fit because it included all variables while random forest gives better out-of-sample predictions.

answered Apr 25 '23 at 03:48

Alex Tan

11

Why does Random Forest perform worse than Bagging?

1 Answers1