Strategies for dealing with near zero variance

Question

I am trying to create a predictive model for future stock returns. At a high level, I'd like to explore the idea that the stock market is dynamic, that a predictive model should shift/evolve through time. What I've been considering is creating a model (e.g. a random forest or an SVM) per month. To predict I would average the predictions of the models over the last 12 months. So for example, I might take the independent data as of January 2015 and feed them into the last 12 models and average the predictions for each stock. For this to work, all 12 models must work with the same data. Here's a problem I'm running into. I've been studying Max Kuhn's Applied Predictive Modeling and am aware of issues with variables with near zero variance (nzv). I start with almost 700 features. In the first month about 14 have nzv. I'm thinking about removing them. But in future months a different set will have nzv. If a variable consistently has nzv, I don't mind removing it. But I can't remove a variable one month and then use it the next. This is because the predict function only works if the set of variables is identical. Would you leave nzv in or remove them?

Strategies for dealing with near zero variance

0 Answers0