1

I want to apply ARIMAX model on Google trends. I used python package to get daily data. However, this data contains a lot of zeros, so if I do first difference of logs, I see a lot of (inf) in python.

How I can solve that? Better some way to apply to single keyword search, but if there is a fast way to make weighted index (with rescaling), it may also help.

Richard Hardy
  • 67,272
Arri
  • 47
  • 1
    I you have lots of zeros in your data, ARIMA(X) is probably not the correct tool, since the assumptions of normally distributed innovations are not met. What are you trying to do? What do you plan on doing with the resulting time series model? – Stephan Kolassa May 15 '23 at 06:07
  • Regarding the formulation, you apply model on data, not data for model. I have edited accordingly. – Richard Hardy May 15 '23 at 06:47
  • @StephanKolassa I want to compare forecasting performance or 3 models. I saw some papers in Elsevier apply ARIMAX on Google trends/Number of news articles. But my data has many zeros (around 20-60 zeros, and 1900 non-zero values). – Arri May 15 '23 at 09:05
  • Ah. That is not really all that many zeros. In that case, you are probably good with ARIMA (except that it does not really give all that good forecasts). I'll write up an answer. – Stephan Kolassa May 15 '23 at 09:45

1 Answers1

1

You are probably good with an ARIMA model, although ARIMA is notorious for not really giving good forecasts.

However, I would discourage "rolling your own" model - better to use a good automatic ARIMA model selection algorithm. This will take care of log-transforming (or using other Box-Cox transformations, and the back-transformations, which are not trivial!).

Alternatively, consider exponential smoothing, e.g., ets() in the forecast package for R. This is still a very good benchmark.

This thread contains pointers to general literature about forecasting: Resources/books for project on forecasting models

Stephan Kolassa
  • 123,354