I have weekly sales data over many years and my data shows clear seasonality + few other well defined spikes. For instance, there are always spikes around major holidays like Christmas and Thanksgiving. I tried using auto.arima to fit a model and it worked well and captured most of the monthly variations. I figured that I can add two exogenous variables that indicate whether Christmas and Thanksgiving fall on the week being predicted and that should help capture the holiday spikes as well. But what happens results then is a ARIMA(0,0,0) model (the original model was ARIMA(3,1,2)) with some coefficients for the two holiday variables - effectively, the model fits a constant for all non Thanksgiving/Christmas weeks and some delta for these weeks!!! The model with the exogenous variables is poorer by all measures (lower log-likelihood, higher AIC/BIC, poorer fit on the training data). Am I doing something incorrect with the model specification?
Unfortunately, the underlying data is proprietary and I cannot share it here. I tried to replicate a similar pattern with dummy data and the same approach yields good results along expected lines there (the addition of regressors gives a better model).
I apologize for not sharing the actual data but if you can share any pointers/ideas to understand this further, I would be most grateful.
tsobjects don't really work well with long weekly series, because there is a non-integer number of weeks in each year. Anyway, collinearity should not really be a problem, because your model is actually a regression with ARIMA errors, so it is fitted in a two-step process. – Stephan Kolassa Oct 02 '20 at 18:28