I have trained a couple of models which I'm experimenting with. One is Logistic Regression and the other Random Forest. I've got 10s of 1000s of samples in my dataset (which has 4 features) and I've experimented with how many samples throw up the best out-of-sample test accuracy. I have done 10 k-fold validation, and some gridsearch optimisation of hyperparameters ... and I'm consistently(**) getting about 82% accuracy predicting on test data. I am splitting my dataset 70:30, training on the 70% and then testing on the unseen 30%. Both models give me roughly 82% accuracy predicting on test data. I was thinking this was a good result and because k-fold validation is giving me a nice accuracy, I am not overfitting or underfitting. But, I must be ...
... when I try predicting on new data samples captured very soon after I train the model ... I am getting nowhere near 82% accuracy. In fact, I'm getting less than 40% success rate when I compare my model prediction with what outcome actually transpires.
So I guess my model does not generalise well. Where can I go from here? I would like to first of all confirm what the problem is exactly. Is the 82% accuracy misleading? How can my live results be so much worse? Could it be that the 4 features are simply not good enough? In which case how can I get 82% accuracy in testing? Are there tests that I can do on the model(s) to gain insights for further work?
(** I retrain the model quite often as new data comes in realtime)