This is a quite generic question: assume I want to build a model to predict the next observation based on the previous $N$ observations ($N$ can be a parameter to optimize experimentally). So we basically have a sliding window of input features to predict the next observation.
I can use a Hidden Markov Model approach, i.e. Baum-Welch to estimate a model, then Viterbi to predict a current state based on the last $N$ observations, then predict the most likely next state based on the current state, and then predict the next observation using the most likely next state and the HMM parameters (or variants such as find the predictive distribution of the next observation).
Or I can use a much simpler approach, using a stateless model (which can get as input the previous $N$ observations), e.g. SVM, linear regression, splines, regression trees, nearest neighbors, etc. Such models are based on minimizing some prediction error over the training set and are therefore, conceptually, much simpler than a hidden state based model.
Can someone share her/his experience in dealing with such a modelling choice? What would speak in favour of the HMM and what in favour of a regression approach? Intuitively one should take the simpler model possible to avoid over-fitting; this speaks in favour of a stateless approach...We also have to consider that both approaches get the same input data for training (I think this implies that if we do not incorporate additional domain knowledge in the modelling of a hidden state model, e.g. fix certain states and transition probabilities, there is no reason of why a hidden state model should perform better). At the end one can of course play with both approaches and see what performs better on a validation set, but some heuristics based on practical experience might also be helpful...
Note: for me it is important to predict only certain events; I prefer a model which predicts few "interesting/rare" events well, rather than a model which predicts "average/frequent" events but the interesting ones not so well . Perhaps this has an implication for the modelling choice. Thanks.