In all the (regression) random forest papers I've read, when it comes the time to gather the predictions of all the trees, we take the average value as the prediction.
My question is why do we do that?
Is there is a statistical justification for taking the average?
EDIT: To clarify the question, I know it's possible to use other aggregation functions (we use the mode for classification), I'm mostly interested in whether there is some theoretical justification behind the choice of the average function.
Here's the reference: On the probabilities vs class labels part
http://sebastianraschka.com/Articles/2014_ensemble_classifier.html#2-prediction-based-on-predicted-probabilities-equal-weights-weights111
– PauAI
Feb 13 '18 at 22:53