I am taking a course that introduced me to sklearn.ensemble.RandomForestClassifier. At first it uses n_estimators with the default value of 10 and the resulting accuracy turns out to be around 0.28. If I change n_estimators to 15, the accuracy goes to 0.32
Here's some of the code:
pl = Pipeline([
('union', FeatureUnion(
transformer_list = [
('numeric_features', Pipeline([
('selector', get_numeric_data),
('imputer', Imputer())
])),
('text_features', Pipeline([
('selector', get_text_data),
('vectorizer', CountVectorizer())
]))
]
)),
('clf', RandomForestClassifier())
])
I thought that increasing the number of trees (n_estimators) in the RandomForestClassifier would give a better accuracy, but sometimes if I use a value of 100 I can get between 0.30 and 0.32. Could someone please explain? How do you find which is the smallest value for getting the highest possible accuracy?
n_elementsargument in sklearn's RandomForestClassifier; if you meann_estimators, this has a default value of 100, and not 10. Please clarify, as your shown code is actually irrelevant to the question. – desertnaut Oct 19 '20 at 23:44n_estimatorsis that more trees reduces variance in the predictions (and takes more time to train). Any other apparent effect on performance is only due to random effects. https://datascience.stackexchange.com/q/1028/55122 – Ben Reiniger Oct 20 '20 at 14:10