3

I am new to forecasting and have been using the fpp3 package and Hyndman's to practice on some fabricated data I obtained online. I have a specific challenge related to handling 0 values in my dataset.

Here's the context: I have three years of daily sales data for a retail store. The store is closed on weekends (Saturday and Sunday) as well as holidays such as Thanksgiving and Christmas. For these closed days, I have 0 values in the sales column.

My ultimate goal is to predict one month of sales for the store. However, I'm facing difficulties when dealing with the 0 values. Initially, I removed these 0 values from the dataset, but it caused issues as some functions recognized them as missing dates in the data.

I would appreciate guidance on how to handle the 0 values for the days when the store is closed. How can I manage these values without causing problems in my forecasting process?

Thank you for any advice or direction you can provide!

1 Answers1

2

It makes most sense in my opinion to indeed treat store closure days as missing data. After all, if the store had been open that day, we would probably have seen some sales. (Alternatively, one could treat these data points as censored at zero - but on the one hand, I have rarely seen a forecast that does it this way, and on the other hand, this happens rarely enough that the two approaches probably don't make much of a difference.)

Now, your problem is that "classical" forecasting methods can't deal with missing values very well (see here, though: Fitting ARIMA to time series with missing values). One simple approach would be to impute before you fit: fill the missing observation with an average of the sales from the same day of week a few weeks before and after. This at least makes the models work. Then forecast, and of course, set the forecasts for such future closure dates to zero.

Stephan Kolassa
  • 123,354
  • 1
    Out of curiosity, why not just drop the weekends, i.e. treat Mondays as the days following Fridays? I can imagine the pros and cons of such an approach, so I'm curious why you didn't mention it at all. – Tim Jul 12 '23 at 07:07
  • 2
    @Tim: weekends can certainly be dropped if all weekends are closed. For instance, in Germany, most stores are closed on Sunday... but then are open on a very few Sundays, about four per year depending on your local regulations. And of course such "special openings" are extremely interesting and important from a forecasting point of view. ... – Stephan Kolassa Jul 12 '23 at 07:15
  • 2
    ... In addition, if it were only (all) weekends that are closed, we could remove them and set the seasonal frequency to 5 - but then you have Thanksgiving (always a Thursday, but not always the same one every year - there goes your yearly seasonality) and Christmas (which moves around as a day of the week), so if you remove these, your day-of-week seasonality is shot. – Stephan Kolassa Jul 12 '23 at 07:16
  • 2
    Anyway, given that retailers often have to deal with multiple seasonalities (day of week and day of year, or even others), this section in our textbook might be of interest. – Stephan Kolassa Jul 12 '23 at 07:17
  • Thank you very much! Yes, the data set is missing sales for all weekends. After removing all the weekends, I only have 6 holiday closures. Additionally, while utilizing the fpp3 package, I've encountered difficulties with functions due to the non-sequential nature of the time series after removing weekends. Considering this, would it be advisable to continue using the ts_object, or would it be more flexible to switch to tsibble? – vashfive Jul 12 '23 at 12:52
  • I don't think that would make all that much of a difference... what kind of issues did you have precisely? After all, the time series would still be sequential if all you did was remove (really remove, not set to NA) weekends. Did you remember to change the seasonal frequency to 5? – Stephan Kolassa Jul 12 '23 at 13:07