Why does Exponential Smoothing model on all data produces NAN forecasts when given parameters from Training and Testing model in Python s

Question

I am using statsmodels.tsa.holtwinters.ExponentialSmoothing to perform Holt Winters' Additive method, first on training dataset and later on the whole dataset. After training and testing, I take the parameters of the exponential smoothing instance and assign it to the model that will be fit on the entire data, but then the outcome ends up having NaNs for the forecasts, for the levels, trends, and seasonal values.

Training Data:

Date                                     
2016-07-31                           3349
2016-08-31                            401
2016-09-30                            314
2016-10-31                            473
2016-11-30                           1415
2016-12-31                           2351
2017-01-31                           1834
2017-02-28                           1924
2017-03-31                           1291
2017-04-30                           2737
2017-05-31                           2919
2017-06-30                           1098
2017-07-31                           3032
2017-08-31                           1973
2017-09-30                           1196
2017-10-31                           1611
2017-11-30                            832
2017-12-31                            768
2018-01-31                           3051
2018-02-28                           1100
2018-03-31                           1606
2018-04-30                            526
2018-05-31                            808
2018-06-30                            788
2018-07-31                           5040
2018-08-31                            304
2018-09-30                           1709
2018-10-31                            479
2018-11-30                           1884
2018-12-31                            681
2019-01-31                            806
2019-02-28                           1083
2019-03-31                           1338
2019-04-30                           1293
2019-05-31                           1926
2019-06-30                            700
2019-07-31                            322
2019-08-31                            298
2019-09-30                            932
2019-10-31                           2211
2019-11-30                           1611
2019-12-31                            892
2020-01-31                           1189
2020-02-29                           7015
2020-03-31                           2609
2020-04-30                           6072
2020-05-31                           9651
2020-06-30                          13114

Dictionary output of Training Data:

{'No. of Participants Series': {Timestamp('2016-07-31 00:00:00', freq='M'): 3349, Timestamp('2016-08-31 00:00:00', freq='M'): 401, Timestamp('2016-09-30 00:00:00', freq='M'): 314, Timestamp('2016-10-31 00:00:00', freq='M'): 473, Timestamp('2016-11-30 00:00:00', freq='M'): 1415, Timestamp('2016-12-31 00:00:00', freq='M'): 2351, Timestamp('2017-01-31 00:00:00', freq='M'): 1834, Timestamp('2017-02-28 00:00:00', freq='M'): 1924, Timestamp('2017-03-31 00:00:00', freq='M'): 1291, Timestamp('2017-04-30 00:00:00', freq='M'): 2737, Timestamp('2017-05-31 00:00:00', freq='M'): 2919, Timestamp('2017-06-30 00:00:00', freq='M'): 1098, Timestamp('2017-07-31 00:00:00', freq='M'): 3032, Timestamp('2017-08-31 00:00:00', freq='M'): 1973, Timestamp('2017-09-30 00:00:00', freq='M'): 1196, Timestamp('2017-10-31 00:00:00', freq='M'): 1611, Timestamp('2017-11-30 00:00:00', freq='M'): 832, Timestamp('2017-12-31 00:00:00', freq='M'): 768, Timestamp('2018-01-31 00:00:00', freq='M'): 3051, Timestamp('2018-02-28 00:00:00', freq='M'): 1100, Timestamp('2018-03-31 00:00:00', freq='M'): 1606, Timestamp('2018-04-30 00:00:00', freq='M'): 526, Timestamp('2018-05-31 00:00:00', freq='M'): 808, Timestamp('2018-06-30 00:00:00', freq='M'): 788, Timestamp('2018-07-31 00:00:00', freq='M'): 5040, Timestamp('2018-08-31 00:00:00', freq='M'): 304, Timestamp('2018-09-30 00:00:00', freq='M'): 1709, Timestamp('2018-10-31 00:00:00', freq='M'): 479, Timestamp('2018-11-30 00:00:00', freq='M'): 1884, Timestamp('2018-12-31 00:00:00', freq='M'): 681, Timestamp('2019-01-31 00:00:00', freq='M'): 806, Timestamp('2019-02-28 00:00:00', freq='M'): 1083, Timestamp('2019-03-31 00:00:00', freq='M'): 1338, Timestamp('2019-04-30 00:00:00', freq='M'): 1293, Timestamp('2019-05-31 00:00:00', freq='M'): 1926, Timestamp('2019-06-30 00:00:00', freq='M'): 700, Timestamp('2019-07-31 00:00:00', freq='M'): 322, Timestamp('2019-08-31 00:00:00', freq='M'): 298, Timestamp('2019-09-30 00:00:00', freq='M'): 932, Timestamp('2019-10-31 00:00:00', freq='M'): 2211, Timestamp('2019-11-30 00:00:00', freq='M'): 1611, Timestamp('2019-12-31 00:00:00', freq='M'): 892, Timestamp('2020-01-31 00:00:00', freq='M'): 1189, Timestamp('2020-02-29 00:00:00', freq='M'): 7015, Timestamp('2020-03-31 00:00:00', freq='M'): 2609, Timestamp('2020-04-30 00:00:00', freq='M'): 6072, Timestamp('2020-05-31 00:00:00', freq='M'): 9651, Timestamp('2020-06-30 00:00:00', freq='M'): 13114}}

Testing Data:

Date                                     
2020-07-31                          16693
2020-08-31                          14797
2020-09-30                           7066
2020-10-31                          11157
2020-11-30                           5737
2020-12-31                          11147
2021-01-31                          14031
2021-02-28                           1847
2021-03-31                           6549
2021-04-30                          14614
2021-05-31                           8315
2021-06-30                           4372

Dictionary output of Testing Data:

{'No. of Participants Series': {Timestamp('2020-07-31 00:00:00', freq='M'): 16693, Timestamp('2020-08-31 00:00:00', freq='M'): 14797, Timestamp('2020-09-30 00:00:00', freq='M'): 7066, Timestamp('2020-10-31 00:00:00', freq='M'): 11157, Timestamp('2020-11-30 00:00:00', freq='M'): 5737, Timestamp('2020-12-31 00:00:00', freq='M'): 11147, Timestamp('2021-01-31 00:00:00', freq='M'): 14031, Timestamp('2021-02-28 00:00:00', freq='M'): 1847, Timestamp('2021-03-31 00:00:00', freq='M'): 6549, Timestamp('2021-04-30 00:00:00', freq='M'): 14614, Timestamp('2021-05-31 00:00:00', freq='M'): 8315, Timestamp('2021-06-30 00:00:00', freq='M'): 4372}}

The entire dataset is essentially both of the above combined.

My code for the entire modeling process, along with the data are below:

# Importing required packages
import pandas as pd
from pandas import Timestamp
import numpy as np
from statsmodels.tsa.holtwinters import ExponentialSmoothing
Getting data from dictionary and turning to Pandas Dataframe
train_data_dict = {'Count of Participants': {Timestamp('2016-07-31 00:00:00'): 3349, Timestamp('2016-08-31 00:00:00'): 401, Timestamp('2016-09-30 00:00:00'): 314, Timestamp('2016-10-31 00:00:00'): 473, Timestamp('2016-11-30 00:00:00'): 1415, Timestamp('2016-12-31 00:00:00'): 2351, Timestamp('2017-01-31 00:00:00'): 1834, Timestamp('2017-02-28 00:00:00'): 1924, Timestamp('2017-03-31 00:00:00'): 1291, Timestamp('2017-04-30 00:00:00'): 2737, Timestamp('2017-05-31 00:00:00'): 2919, Timestamp('2017-06-30 00:00:00'): 1098, Timestamp('2017-07-31 00:00:00'): 3032, Timestamp('2017-08-31 00:00:00'): 1973, Timestamp('2017-09-30 00:00:00'): 1196, Timestamp('2017-10-31 00:00:00'): 1611, Timestamp('2017-11-30 00:00:00'): 832, Timestamp('2017-12-31 00:00:00'): 768, Timestamp('2018-01-31 00:00:00'): 3051, Timestamp('2018-02-28 00:00:00'): 1100, Timestamp('2018-03-31 00:00:00'): 1606, Timestamp('2018-04-30 00:00:00'): 526, Timestamp('2018-05-31 00:00:00'): 808, Timestamp('2018-06-30 00:00:00'): 788, Timestamp('2018-07-31 00:00:00'): 5040, Timestamp('2018-08-31 00:00:00'): 304, Timestamp('2018-09-30 00:00:00'): 1709, Timestamp('2018-10-31 00:00:00'): 479, Timestamp('2018-11-30 00:00:00'): 1884, Timestamp('2018-12-31 00:00:00'): 681, Timestamp('2019-01-31 00:00:00'): 806, Timestamp('2019-02-28 00:00:00'): 1083, Timestamp('2019-03-31 00:00:00'): 1338, Timestamp('2019-04-30 00:00:00'): 1293, Timestamp('2019-05-31 00:00:00'): 1926, Timestamp('2019-06-30 00:00:00'): 700, Timestamp('2019-07-31 00:00:00'): 322, Timestamp('2019-08-31 00:00:00'): 298, Timestamp('2019-09-30 00:00:00'): 932, Timestamp('2019-10-31 00:00:00'): 2211, Timestamp('2019-11-30 00:00:00'): 1611, Timestamp('2019-12-31 00:00:00'): 892, Timestamp('2020-01-31 00:00:00'): 1189, Timestamp('2020-02-29 00:00:00'): 7015, Timestamp('2020-03-31 00:00:00'): 2609, Timestamp('2020-04-30 00:00:00'): 6072, Timestamp('2020-05-31 00:00:00'): 9651, Timestamp('2020-06-30 00:00:00'): 13114}}
train_data = pd.DataFrame.from_dict(train_data_dict)
test_data_dict = {'Count of Participants': {Timestamp('2020-07-31 00:00:00'): 16693, Timestamp('2020-08-31 00:00:00'): 14797, Timestamp('2020-09-30 00:00:00'): 7066, Timestamp('2020-10-31 00:00:00'): 11157, Timestamp('2020-11-30 00:00:00'): 5737, Timestamp('2020-12-31 00:00:00'): 11147, Timestamp('2021-01-31 00:00:00'): 14031, Timestamp('2021-02-28 00:00:00'): 1847, Timestamp('2021-03-31 00:00:00'): 6549, Timestamp('2021-04-30 00:00:00'): 14614, Timestamp('2021-05-31 00:00:00'): 8315, Timestamp('2021-06-30 00:00:00'): 4372}}
test_data = pd.DataFrame.from_dict(test_data_dict)
full_data_dict = {'Count of Community Participants': {Timestamp('2016-07-31 00:00:00'): 3349, Timestamp('2016-08-31 00:00:00'): 401, Timestamp('2016-09-30 00:00:00'): 314, Timestamp('2016-10-31 00:00:00'): 473, Timestamp('2016-11-30 00:00:00'): 1415, Timestamp('2016-12-31 00:00:00'): 2351, Timestamp('2017-01-31 00:00:00'): 1834, Timestamp('2017-02-28 00:00:00'): 1924, Timestamp('2017-03-31 00:00:00'): 1291, Timestamp('2017-04-30 00:00:00'): 2737, Timestamp('2017-05-31 00:00:00'): 2919, Timestamp('2017-06-30 00:00:00'): 1098, Timestamp('2017-07-31 00:00:00'): 3032, Timestamp('2017-08-31 00:00:00'): 1973, Timestamp('2017-09-30 00:00:00'): 1196, Timestamp('2017-10-31 00:00:00'): 1611, Timestamp('2017-11-30 00:00:00'): 832, Timestamp('2017-12-31 00:00:00'): 768, Timestamp('2018-01-31 00:00:00'): 3051, Timestamp('2018-02-28 00:00:00'): 1100, Timestamp('2018-03-31 00:00:00'): 1606, Timestamp('2018-04-30 00:00:00'): 526, Timestamp('2018-05-31 00:00:00'): 808, Timestamp('2018-06-30 00:00:00'): 788, Timestamp('2018-07-31 00:00:00'): 5040, Timestamp('2018-08-31 00:00:00'): 304, Timestamp('2018-09-30 00:00:00'): 1709, Timestamp('2018-10-31 00:00:00'): 479, Timestamp('2018-11-30 00:00:00'): 1884, Timestamp('2018-12-31 00:00:00'): 681, Timestamp('2019-01-31 00:00:00'): 806, Timestamp('2019-02-28 00:00:00'): 1083, Timestamp('2019-03-31 00:00:00'): 1338, Timestamp('2019-04-30 00:00:00'): 1293, Timestamp('2019-05-31 00:00:00'): 1926, Timestamp('2019-06-30 00:00:00'): 700, Timestamp('2019-07-31 00:00:00'): 322, Timestamp('2019-08-31 00:00:00'): 298, Timestamp('2019-09-30 00:00:00'): 932, Timestamp('2019-10-31 00:00:00'): 2211, Timestamp('2019-11-30 00:00:00'): 1611, Timestamp('2019-12-31 00:00:00'): 892, Timestamp('2020-01-31 00:00:00'): 1189, Timestamp('2020-02-29 00:00:00'): 7015, Timestamp('2020-03-31 00:00:00'): 2609, Timestamp('2020-04-30 00:00:00'): 6072, Timestamp('2020-05-31 00:00:00'): 9651, Timestamp('2020-06-30 00:00:00'): 13114, Timestamp('2020-07-31 00:00:00'): 16693, Timestamp('2020-08-31 00:00:00'): 14797, Timestamp('2020-09-30 00:00:00'): 7066, Timestamp('2020-10-31 00:00:00'): 11157, Timestamp('2020-11-30 00:00:00'): 5737, Timestamp('2020-12-31 00:00:00'): 11147, Timestamp('2021-01-31 00:00:00'): 14031, Timestamp('2021-02-28 00:00:00'): 1847, Timestamp('2021-03-31 00:00:00'): 6549, Timestamp('2021-04-30 00:00:00'): 14614, Timestamp('2021-05-31 00:00:00'): 8315, Timestamp('2021-06-30 00:00:00'): 4372}}
full_data = pd.DataFrame.from_dict(full_data_dict)
Training model - Exponential Smoothing Holt Winters' Additive Method
train_test_model = ExponentialSmoothing(train_data, trend='add', damped_trend=False, seasonal='add').fit(smoothing_level=None, smoothing_trend=None, smoothing_seasonal=None)
Print the Train Model Summary
print("TRAIN MODEL SUMMARY")
print(train_test_model.summary())
Retrieving the train model's parameters
trend = train_test_model.model.trend # add or mul
seasonal = train_test_model.model.seasonal # Would be None if it isn't, otherwise 'add' or 'mul'
smoothing_level = train_test_model.params['smoothing_level']
smoothing_trend = train_test_model.params['smoothing_trend']
damped_trend = False
damping_trend = train_test_model.params['damping_trend'] # None since damped_trend is set to False
smoothing_seasonal = train_test_model.params['smoothing_seasonal']
initial_level = train_test_model.params['initial_level']
initial_trend = train_test_model.params['initial_trend']
initial_seasons = train_test_model.params['initial_seasons']
if damping_trend:
    damped_trend = True
Fitting the model on entire dataset using train model parameters
model = ExponentialSmoothing(full_data, trend=trend, damped_trend=damped_trend, seasonal=seasonal, initialization_method='known', initial_level= initial_level, initial_trend=initial_trend, initial_seasonal=initial_seasons, seasonal_periods=12).fit(smoothing_level=smoothing_level, smoothing_trend=smoothing_trend, damping_trend = damping_trend, smoothing_seasonal=smoothing_seasonal, optimized=False, method=None)
forecast = model.forecast(60)
forecast = pd.DataFrame({'Count of Participants': forecast.copy()})
Print the Forecasting Model Summary
print("MODEL FOR FORECASTING SUMMARY")
print(model.summary())
print('Forecasts:\n', forecast)

Can you edit your post to make your analysis easier for us to replicate, especially in terms of us using the exact same time series data as you, without having to read in a plaintext table? https://stackoverflow.com/q/47450931/452096 and https://stackoverflow.com/q/22418895/452096 — Stephan Kolassa, Nov 15 '22 at 15:05
Thank you for the suggestion, I will edit my post with dictionary output of the data. — Nagusameta, Nov 15 '22 at 15:20
Right now, running your code breaks for me because Timestamps is undefined. If you ensure your code is working on its own in a new Python console, you may get more replies out of people who rarely use Python and don't want to figure out what to import to make the code work: a Minimal Working Example. — Stephan Kolassa, Nov 15 '22 at 15:41
I'm really sorry I have edited the post with code that can be executed to reproduce the output easily. Thank you for your inputs! — Nagusameta, Nov 15 '22 at 16:08

Stephan Kolassa · Accepted Answer · 2022-11-15T17:46:35.497

2

model.summary()

yields nan in some of the trend variables:

...
initial_trend                       nan                  b.0                False
damping_trend                       nan                  phi                False
...

So it looks like there is a trend-related problem.

Looking at your call model = ExponentialSmoothing(..) etc. shows that one of the parameters is nan, specifically damping_trend. Your code contains a comment that this should be None since damped_trend is set, but precisely when we want to use a dampened trend, we need to specify by how much we dampen, so this parameter needs to be between 0 and 1.

If we set damping_trend=0.5 and then refit model = ExponentialSmoothing(..) etc. and re-forecast, we get non-nan forecasts:

model.forecast(60)
Out[24]: 
2021-07-31    8807.417748
2021-08-31    6355.072201
2021-09-30    6120.073955
2021-10-31    6465.294132
2021-11-30    6351.564278
2021-12-31    6265.477816
2022-01-31    6742.345887

Incidentally, I just fed your data into R, which has much better facilities for time series forecasting with the forecast and the fable package. Applying ets() to your training data suggests a model with multiplicative error, no trend and no seasonality. Plotting your data reveals that there is a huge change in variability just at the end of the training data, indicated by a red vertical line in the plot below. Something fundamentally changed the data generating process at this time. I would be very careful about fitting a model to the initial period and applying it to the whole dataset for forecasting. Likely enough, the best forecast would be the historical average of the data after about April or June 2020, with very wide prediction intervals.

R code:

train_data <- structure(c(3349, 401, 314, 473, 1415, 2351, 1834, 1924, 1291, 
2737, 2919, 1098, 3032, 1973, 1196, 1611, 832, 768, 3051, 1100, 
1606, 526, 808, 788, 5040, 304, 1709, 479, 1884, 681, 806, 1083, 
1338, 1293, 1926, 700, 322, 298, 932, 2211, 1611, 892, 1189, 
7015, 2609, 6072, 9651, 13114), tsp = c(2016.5, 2020.41666666667, 
12), class = "ts")
test_data <- structure(c(16693, 14797, 7066, 11157, 5737, 11147, 14031, 1847, 
6549, 14614, 8315, 4372), tsp = c(2020.5, 2021.41666666667, 12
), class = "ts")
library(forecast)
ets(train_data)
all_data <- ts(c(train_data,test_data),frequency=12,start=c(2016,7))
plot(all_data)
abline(v=2020.5,col="red")

edited Nov 15 '22 at 17:46

answered Nov 15 '22 at 16:32

Stephan Kolassa

123,354

Thank you so much for the response! I have also tried the damped method, as well as the multiplicative method (setting seasonal='mul' instead of 'add'). Like the additive method in my post above, multiplicative also produced NaN forecasts while it was only the damped method that had non-NaNs. Hence why I was more curious with the additive method in this post, so I had the damped_trend parameter set to False. Perhaps its simply the behavior of the data that makes the train model parameters produce NaNs when fitted on the entire dataset, but I am honestly not knowledgeable enough to be sure. – Nagusameta Nov 15 '22 at 16:49
1

Hm. If I refit the model using seasonal='mul' (and damping_trend not nan, of course), then I get non-nan forecasts. (Nonsensical negative ones, to be sure.) Which is as it should be. You might want to look at the automatic forecasting tools in the forecast and fable packages for R, and the accompanying textbook: https://otexts.com/fpp3/ – Stephan Kolassa Nov 15 '22 at 16:57
May I also ask, is there a fundamental reason for why the forecasts (and other parameters) end up as NaNs for the additive and multiplicative methods (seasonal='add' or 'mul', with no damping_trend)? I have found these methods from https://otexts.com/fpp3/holt-winters.html with more details on what function parameters to set to use the methods through https://otexts.com/fpp3/taxonomy.html – Nagusameta Nov 15 '22 at 17:05
1

If you include a dampened trend (damped_trend=True in Exponential_Smoothing), then you need to specify by how much the trend should be dampened, so you need to specify the damping_trend parameter to the fit() method. This is the case for both additive and multiplicative trends and seasonalities. If you do not want to dampen your trend (damped_trend=False in Exponential_Smoothing), then you don't need to specify the dampening factor, and the damping_trend parameter to the fit() method can be nan or None or whatever, it will be ignored. – Stephan Kolassa Nov 15 '22 at 17:22
1

In that taxonomy, the value of damping_trend would be $\varphi$. (Or possibly $1-\varphi$, there are different conventions, and I don't know which one statsmodels follows.) – Stephan Kolassa Nov 15 '22 at 17:23
Thank you for your additional analysis! The organization had an increase in activity after March 2020 (due to COVID and stay-at-home setups) so their other datasets for other areas also show the same change which you have described. When I arbitrarily modeled only using data from 2020-2021, I noticed in a lot of cases that the forecast MAPE would be better than if I used the entire dataset but I could not quite explain why. May I ask for concepts I should explore to be able to write more on why the forecasts are better modeling only from 2020? Or is it already about the change in variability? – Nagusameta Nov 18 '22 at 06:20
1

One possible reason why MAPEs are "better" (more precisely: lower) if you use the full data set could be that using all data, your forecasts are systematically lower - and the MAPE prefers forecasts that are below the expectation. I would be very, very careful about using the MAPE - it looks intuitively easy to explain and understand, but it can lead you badly astray in forecasting, depending on what functional of the future distribution you want to elicit. See What are the shortcomings of the Mean Absolute Percentage Error (MAPE)? – Stephan Kolassa Nov 18 '22 at 06:24

Why does Exponential Smoothing model on all data produces NAN forecasts when given parameters from Training and Testing model in Python s

Getting data from dictionary and turning to Pandas Dataframe

Training model - Exponential Smoothing Holt Winters' Additive Method

Print the Train Model Summary

Retrieving the train model's parameters

Fitting the model on entire dataset using train model parameters

Print the Forecasting Model Summary

1 Answers1