How to select GARCH lag for forecasting purpose (AIC+likelihood ratio)?

Question

I modified this question to be more specific.

I'm fitting a ARIMA-GARCH model to my hedge fund index daily log return series. I used ACF, PACF, Ljung-Box test and Archtest to check for autocorrelation and conditional heteroskedasticity. As the ACF and PACF for return itself don't show significant autocorrelation, (but they do for squared return), also suggested by the Ljung-Box test with h=0, so I exclude the autocorrelation in the mean process. But to double check, I use an ARIMA(1,0,1), and for both coefficients for AR and MA terms are not statistically significant. So I exclude them and go for only a GARCH model.

I did check the other similar questions about garch lag selection (for example, here) and it seems like that when it comes to the function of predicting, it's better to choose the one with lowest AIC rather than BIC. So I first compare the AIC then I further check using likelihood ratio test.

Here are my results

I highlighted the lowest two for AIC and BIC. And so I use likelihood ratio test to compare those highlighted models, at the end it indicates that GARCH(5,4) is the best at 95% significant level for each comparison, So my questions are:

Though I only tried the lags until 5, it seems like for now I will have lower AIC if I further increase the lags, so should I try testing on more lags? Also I have concern that more lags is not suitable for a model aiming at forecasting. That leads to my second question.
I'm new to this field but as far as I know, for parsimony, it is not favorable to have too many parameters in the model, as it might result in more error if we use the model to forecast.In that case, should I just go for a garch(1,1), like suggested by the answer from the link I share above? Or should I restrict to some lag number, say, I choose only until lag 2, if so that would give the choice of garch(2,2). And yet, how should I choose which lag number as the bound?
Regarding the use of AIC BIC tests, I'm not sure whether I should use the whole model: ARMA(p,q)-GARCH(p,q) or just the GARCH(p,q)? In this case there's no arma process so I guess it is sufficient to use only Garch. In other time series with also arma process, I did try both, and it showed that with whole model I get lower number of AIC and BIC,but the ranking did not change. So, does it matter whether to use the entire model to test for aic and bic or not?

Here are the results from matlab, with ARMA(0,0)-GARCH(5,4), just to provide more information

I would appreciate if anyone could help me with them! References are welcomed as well:)

Please do not cross post at multiple Stack Exchange sites. Also, have you checked related questions, e.g. these? — Richard Hardy, Jun 18 '17 at 15:44
@RichardHardy I've deleted the other one and modified this post as well:) — nahsel, Jun 21 '17 at 10:13
You can also consider cross validation. It is better than information criteria. Also since you removed ARIMA terms, your forecasts are only interesting for asymmetric loss functions. — Cagdas Ozgenc, Jun 21 '17 at 12:50
Cross-validation is an alternative to information criteria. It is neither better nor worse in general, and it is not clear whether it would be better or worse in this example. But trying it out would do no harm. — Richard Hardy, Jun 22 '17 at 11:19
@RichardHardy To my experience in financial series skipping lags don't do any good for out of sample forecasting. Information critera are most of the time misleading by suggesting holes in the lag structure. However in general CV is better because it doesn't assume much compared to information criteria, which are very demanding. — Cagdas Ozgenc, Jun 22 '17 at 21:03
@CagdasOzgenc, thanks for sharing this real world experience. In general CV is inferior to information criteria in the time series setting in the sense that CV significantly reduces the sample size (as we discussed at some point before, and as Rob J. Hyndman agreed with me on this one), but that might be compensated if the assumptions of the information criteria are violated strongly enough. So if it worked for you in practice, that is a point in favour. — Richard Hardy, Jun 22 '17 at 21:29
@CagdasOzgenc thanks for the comments. By cross validation, do you mean rolling window? If so, I'm not sure how it works, here's my thought: choose a window length and based on in-sample data to estimate the model, and re-estimate parameters each time and get an out-of sample forecast. And then compare the results with the observed out-of-sample data? Then choose the best fitted model as the model for this time series data? — nahsel, Jun 23 '17 at 13:35
@RichardHardy Thanks! Regarding the CV and Information criteria, how large should a sample size be so that it could give a reasonable results? Or in another words, is there some minimum sample size I should reach or it would be better to use Information criteria instead of CV? — nahsel, Jun 23 '17 at 14:00
@nahsel, there is no general answer to that. All depends on how badly the assumptions are violated for AIC or BIC, and you do not know that unless you have generated the data yourself. — Richard Hardy, Jun 23 '17 at 14:01

score 1 · Accepted Answer · answered Jun 21 '17 at 11:43

In general, building an ARMA-GARCH model in a stepwise fashion based on diagnostics such as ACF, PACF and Ljung-Box is problematic because the latter do not have standard null distributions when applied on returns when the conditional variance is nonconstant; or on squared returns when the conditional mean is nonconstant. Thus the following will not work exactly as you expect it to (but hopefully the distortion will not be too large and you could still trust the results to some extent):

I'm fitting a ARIMA-GARCH model to my hedge fund index daily log return series. I used ACF, PACF, Ljung-Box test and Archtest to check for autocorrelation and conditional heteroskedasticity. As the ACF and PACF for return itself don't show significant autocorrelation, (but they do for squared return), also suggested by the Ljung-Box test with h=0, so I exclude the autocorrelation in the mean process. But to double check, I use an ARIMA(1,0,1), and for both coefficients for AR and MA terms are not statistically significant. So I exclude them and go for only a GARCH model.

Going forward,

I did check the other similar questions about garch lag selection (for example, here) and it seems like that when it comes to the function of predicting, it's better to choose the one with lowest AIC rather than BIC. So I first compare the AIC then I further check using likelihood ratio test.

AIC, BIC and LR all address different questions and serve different goals. You should not expect all of them to point to the same direction, and you should choose the appropriate one based on your modelling goal. If the goal is forecasting, AIC is the most relevant choice.

Regarding your Q1, experience in finance tells us that high-order GARCH models do not tend to beat low-order GARCH models. I would stick to a relatively parsimonious model unless I had reasons to believe the time series is somehow special and unlike other financial time series. I do not see a sound theoretical reason to select a model that has higher AIC than another model (when there are not that many models being compared, like in your case), but the experience in finance points to a different solution.

Regarding your Q2, see above.

Regarding your Q3, it does matter that you consider the full model. Considering only part of the model does not make sense. (You could construct examples where you choose really poor models over much better models only because you happen to look at part of the picture instead of the whole picture.)

Thanks a lot Richard for the explanations! I'm still confused about a few points. As you mentioned that building a model in a stepwise fashion with ACF, PACF and Ljung-Box test would be problematic, then why there are still quite a few researchers using this method?, For example, [Füss, R., Kaiser, D. & Adams, Z. J Deriv Hedge Funds (2007) 13: 2. doi:10.1057/palgrave.jdhf.1850048],and also I don't understand why in this paper the author use lbq test on residuals and squared residuals at first to prove Arch effect, and later again use lbq test on squared returns also to prove Arch effect? — nahsel, Jun 23 '17 at 13:26
@nahsel, There is lots of technically wrong research on GARCH, especially when it comes to applications. Whether these technical mistakes have a large impact in practice is not always obvious. (I cannot tell whether the particular paper you refer to still gets a sensible result in the end or not.) Regarding the LB test, it is again the same story: it gets abused routinely in GARCH research. But when looking for ARCH effects, you can use it on squared returns or residuals as long as the conditional mean is constant over time. — Richard Hardy, Jun 23 '17 at 13:39
the paper only conclude that the model reduces the arch effect but not eliminate it, so does that mean it's almost impossible to define a best fitted model? As it is common to still have arch effect after fitting a arma-garch model, even if it could eliminate the arch effect, this is based on some diagnostic tests which might not give appropriate test results on condition that the assumption might have been violated already. Also, I see some doing the tests on standardized residuals, is it necessary to standardized residuals before doing the tests? — nahsel, Jun 23 '17 at 13:53
Whether you will be able to get rid of ARCH effects depends on the data at hand. I have seen plenty of studies who claim to be able to do that. Now regarding standardized vs. non-standardized residuals, that could be a new question. But I will answer it here: the assumptions of ARMA-GARCH are on standardized residuals, thus you test whether these assumptions are met on the standardized residuals, not on something else. — Richard Hardy, Jun 23 '17 at 13:57
Thanks Richard! You explained very clearly, for now I have no more questions:) — nahsel, Jun 23 '17 at 14:11
@nahsel, I am glad I could help. The topic is quite complex, plus there are so many mistaken sources that it is very hard to keep the understanding straight. — Richard Hardy, Jun 23 '17 at 14:14

How to select GARCH lag for forecasting purpose (AIC+likelihood ratio)?

1 Answers1