0

I'm interested in creating a superlearner algorithm. Unfortunately, my situation is such that I have access to the predictions of submodels I'm interested in on new data, but don't necessarily have access to the OOB results of their original training data (or the training data). Is it still possible to train a linear regression on the predictions I have access to and treat the results as if they were the results of some form of model stack/ensemble?

Perhaps a better way to ask this is, is there any difference between training a model on the results of a submodel and its OOB training results?

Nate
  • 21

1 Answers1

0

As long as you have the prediction target available for the new data (and assuming that the new data is as relevant to the original problem, assuming that there's no overlap between the new data vs. the original training data etc.), there's no obvious reason why you could certainly train a new model with the outputs of the original models as an input on this new data. In fact, it slightly reduces overfitting concerns that are always a little bit of a worry in stacking on out-of-fold- (or out-of-bag-) predictions (due to - amongst other things - hyperparameter tuning often being on the out-of-fold performance).

Björn
  • 32,022