1

I have a dataframe with multiple numeric predictive features, and one binary target feature. 63% of the data belongs to class 0, and the rest to class 1. So the data is imbalanced but not severely.

I aim to build a classifier, using Random forests or xgboost.

I'm using cross validation with GridSearchCV, I tired with and without feature selection (in both algorithms), I tuned the hyperparameters as much as I could, I tried many different numbers for each parameter, like max_depth, min_samples_leaf and more.. using the grid_params argument.

I'm using roc_auc_score to evaluate the models, and the best outcome so far was AUC = 0.61 in both algorithms. What could be the reason for such poor results in your opinion?

EDIT:

I tried also Kernel SVM and logistic regression. They failed, with AUC = 0.6 and 0.55. Random forest and gradient boosting machines (and xgboost) were the best among those, altough they still produce poor results. Still can't figure out what improvements can I make.

CORy
  • 543

0 Answers0