2

I am extracting features from time series data using different parameters and then creating a SINGLE feature based data set with all features to perform classification.

If I wanted to create separate feature sets corresponding to different parameters and train classification models for each smaller feature set, how should the separate classification models be combined to provide a combined model?

Is voting a better approach or is there some variant of Stacking for this kind of problem?

I am aware that in general the data is the same and the classification models are different in Stacking or Voting, but in the above setting should this approach be applied or a combined feature set is a better option.

Currently, I am using Random forests or Extra Trees as classifiers.

Thanks.

Atif
  • 123

1 Answers1

2

There aren't really strict rules for this - you could always try both majority voting and stacking and pick whichever works better based on a validation dataset.

Oftentimes, people will use a logistic regression for stacking for classification problems: if you want to try stacking, I would start with that approach. That is, feed the output of the different random forest classifiers into the logistic regression.

I would assume naively that using the combined feature set would provide better performance, however. I've noticed anecdotally that the performance gains from stacking with an ensemble classifier as one of the base classifiers is rather marginal: I believe it probably will be better to train a single ensemble classifier with a richer feature set, but this is all off the top of my head.