The title essentially says it all. Below are some details regarding my data and model.
This is the current class distribution within my training set:
0 1353849
1 26217
Name: binary, dtype: int64
My training set includes 104 features.
My current recall is at 94%; My current precision is at 20%
Here are the hyperparameters for my XGBoost model:
nrounds = 500, eta = 0.2, max_depth = 20, subsample = 0.8, colsample_bytree = 0.2,reg_alpha=0.1, reg_lambda=0.8
I've tried SMOTE but it isn't working well likely cause of the high dimensionality. If you all have any recommendations, that would be much appreciated.
- Are you looking at a single point of precision/recall? It is in fact a curve, depending on the threshold you choose.
- Have you tried hyperparameter tuning?: https://stats.stackexchange.com/questions/171043/how-to-tune-hyperparameters-of-xgboost-trees
– Alex R. Feb 27 '18 at 19:40