0

I have a time series dataset with 7 independent variables. I have created a multi-linear regression(MLR) using 3 of these independent variables and two lagged variables. When I checked for the normality of the residuals using the Shapiro-Wilk test in R, p>5% (thus confirming normality).

Then, the residuals of the MLR are modeled using ARIMA (no xreg since this is for residuals only).

My problem is,

  1. can I forecast MLR and ARIMA separately and add them to check the test set? ( I did that, the test MAPE is really high)
  2. Please kindly tell me how to do this "Regression with ARIMA errors" without auto.arima function in R
Stephan Kolassa
  • 123,354
dan
  • 1
  • Seven predictors is a lot. I hope you have a lot of data, because this could overfit very easily. 1. Yes, you can do this. Note that "MAPE is high" does not really tell us anything, since we have no way of knowing what MAPE is even achievable on your dataset, see here. If you get a MAPE above 100%, then something is indeed off, but anything smaller may indeed be the best you can do. Also, note that the MAPE has a number of "interesting" properties. ...
  • – Stephan Kolassa Jul 02 '23 at 18:31
  • ... 2. auto.arima() would be my go-to function. Can you explain why you don't want to use it, and what you can use? Can you use auto.arima() to get a model form, then feed this model into arima() to fit the actual parameters? Or are you constrained not to use R at all (and if so, what can you use?)? – Stephan Kolassa Jul 02 '23 at 18:33
  • Thank you so much. I can use R software, but not the auto.arima() function. – dan Jul 03 '23 at 05:50
  • Hm. Then you will need to take a look at the code of auto.arima to understand how it decides on whether to use a seasonal model or not, and how it determines the order of integration. After that, you can use a similar stepwise approach to find a good model per the AICc, by fitting multiple models using arima(). – Stephan Kolassa Jul 03 '23 at 06:31
  • Thank you so much, Stephan! will do that. – dan Jul 03 '23 at 06:40
  • @StephanKolassa, the question is about obtaining a forecast, not about model selection. – Richard Hardy Jul 04 '23 at 06:46
  • @RichardHardy: yes of course. I assumed that the model selection step was also part of the question. Once the OP has selected a model (based on AICc or whatever), they can simply use this one and forecast out. – Stephan Kolassa Jul 04 '23 at 06:48
  • @StephanKolassa, right, and the question (1.) is how exactly to do that. (Just a clarification from my side.) – Richard Hardy Jul 04 '23 at 07:57
  • @RichardHardy: that is their question 1, which I answered in my very first comment, since they are interested in regression with ARIMA errors, not ARIMAX, https://robjhyndman.com/hyndsight/arimax/. – Stephan Kolassa Jul 04 '23 at 08:14
  • @StephanKolassa, got it, thank you! – Richard Hardy Jul 04 '23 at 13:28