I'm fitting ARIMA models to two different data sets (different metrics of fish abundance and distribution from two different sites) to see which model orders and covariates best describe the data from each site and would be good to forecast.
To do so, I'm using the auto.arima function. I'm running auto.arima with different combinations of covariates and looking at the AICc. I fixed d=1 so I know the input data is always the same, thus enabling to compare models using AICc.
The orders of the ARIMA output are typically different depending on the covariate(s) I include. Am I doing this right? Should I just fix the orders p, d and q of the ARIMA and then evaluate the different combinations of covariates.
Or am I totally wrong and I should just run auto.arima() with all the possible covariates in xreg and see what comes out? I tried this and I got a coefficient for each variable but I'm not sure if that means all variables are important or if auto.arima is forcing the variables to be included in the final model.
auto.arimacertainly uses all the regressors supplied inxreg, there is no selection withinxregbuilt intoauto.arima-- I think you can state that unconditionally and shorten the first six lines. Regarding the last paragraph, some say that sample splitting for model selection is an inefficient alternative to selection based on full sample via information criteria. But perhaps that relies on assumptions of a constant data generating process over time, and in practice I am sympathetic to the idea of cross validation for model selection. – Richard Hardy May 10 '17 at 09:03