Resources to deal with non-stationary data

Question

I am currently reading Introduction to Statistical Learning, and Applied Predictive Modelling. I'm also doing the John Hopkins course on Data Science. I haven't completed any of these yet. But one thing that seems to be common is that they all deal with well behaved data sets where the out-of-sample data behave the same as in-sample data. In most instances this makes sense. In biological data expressions are going to be similar or mostly the same. But I am working with very non-stationary data, specifically financial data. To give an example, usually profitable companies perform better than unprofitable companies. Except in 1999 when unprofitable companies outforformed profitable companies by 98%. A seemly conservative and safe strategy of going long (owning) profitable companies and going short (simplistically owning negative numbers of shares of) unprofitable companies would have lost 98%. For the next 3 years the markets rectified this with absolute vengeance, unprofitable companies were slaughtered. However this is hard to model because of the non-stationary nature.

Are there any good resources on the Internet or in book form for how to deal with non-stationary data? For instance even the simplest of questions, should I use the most accurate models (random forest, svm etc) or looser models, I don't even know the answer to this?

What you call non-stationarity is not what you described in the example. What you described potentially could be explained by stationary process where there is an exogenous shock. Also, your example sounds like an anecdote to me, not a real phenomenon. Maybe you should give a reference to the reputable source so that we know what exactly you're talking about. — Aksakal, Dec 10 '14 at 17:19
My understanding of non-stationary is that the rules that apply to one set of data or one time frame, might not work for a different time frame or a different set of the same data. With biological data for instance, the out-of-sample is going to follow the same rules as the in-sample. but with financial data, if the data is broken on a time basis, the out-of-sample will follow different rules than the in-sample. This is what I mean by non stationary, although there may be a better term for what I mean? You are right, I am using an anecdote, it just seemed like an easy example. — Graeme, Dec 11 '14 at 17:16
The stationarity usually refers to changing error mean and covariance with time. The problem is that errors are not observable. We try to estimate them, and get residuals. However, the residuals depend of the model specification. That's why it's often very difficult to establish stationarity of the data for it depends on your modeling view. The same data may look stationary or not depending on how you structured your model and variables. — Aksakal, Dec 11 '14 at 18:33

score 2 · Answer 1 · answered Dec 10 '14 at 17:29

2

Most of the time, people deal with nonstationarity by first finding the nature of the nonstationarity and then taking a transformation that makes the new data stationary. For example, stock prices are nonstationary, but daily returns are typically stationary. Another example, trending data are nonstationary, so people will de-trend the data.

Check out Elements of Forecasting by FX Diebold.

EDIT: Additionally, the commenter is right, your situation doesn't necessarily imply nonstationarity, it could just be a negative shock. You might consider looking up ARIMA processes and learn more about the technical definition of nonstationary in time series.

answered Dec 10 '14 at 17:29

wolfsatthedoor

1,011

I think I may have used the term non-stationary incorrectly. I understood it to refer to data where the rules underlying the data change over time. But I think I may have used the term incorrectly. Although I gave a single example above which could be understood as an exogenous shock, in fact every year could be defined by its own untypical behaviour. The regular 'exogenous shocks' are an implicit part of the normal market behaviour. – Graeme Dec 11 '14 at 17:28
The tools to deal with so-called "regime charges" are pretty complicated. Check out Hidden Markov Models and Kalman Filters. – wolfsatthedoor Dec 11 '14 at 18:02

Resources to deal with non-stationary data

1 Answers1