I have a computer science background but I am trying to learn how to apply ML by solving small problems.
I have been working on this problem for the last couple of days and I cannot find a solution. I have a dataset with just 10 samples (5 belong to class A and 5 to classB) and 30000 features. I can reduce the number of features (~100) and I would like to use random forest algorithm to identify the most important features among those 100.
I split the dataset into train and test set (test_size=0.20, so it is even smaller than the initial one). Unfortunately (and as expected) I have the overfitting problem. I tried to tune the model using different parameters (max_depth, n_estimators, min_samples_leaf, criterion) + GridSearchCV. However, I still get 100% accuracy. Is there anything else I can try?
Thank you in advance for you help