7

In Holt's linear trend method, given a time series $y_1,\cdots,y_t$, the forecasting equation at time $t+h$ based on data up to time $t$ is given by $$ \hat{y}_{t+h|t}= \ell_t + h b_t \tag{1} $$ where $\ell_t$ is an estimate of the level of the series at time $t$ which satisfies $$ \ell_{t}=\alpha y_t + (1-\alpha)(\ell_{t-1}+b_{t-1}) \tag{2} $$ and $b_t$ an estimate of the trend (slope) of the series at time $t$ which satisfies $$ b_{t}=\beta \color{blue}{(\ell_t-\ell_{t-1})}+ (1-\beta)b_{t-1} \tag{3} $$

$\alpha \in [0,1]$ and $\beta \in [0,1]$ are the smoothing parameters.

The trend equation $(3)$ shows that $b_t$ is a weighted average of the estimated trend at time $t$ based on $\ell_t-\ell_{t-1}$ and the previous estimate of the trend, $b_{t-1}$. My question is, why not compute this average based on the real trend between $t-1$ and $t$? as follows:

$$ b_{t}=\beta \color{red}{(y_t-y_{t-1})}+ (1-\beta)b_{t-1} \tag{3} $$

Kuifje
  • 13,324
  • 1
  • 23
  • 56

1 Answers1

6

You could of course do that if you chose to, so I assume you are asking why Holt did not choose that formula.

Suppose that the series $y_t$ actually satisfies a linear trend model, say $$y_t = a_0 + a_1 t + \epsilon_t$$ where $\epsilon_t$ is some i.i.d. random noise (mean 0, constant variance). Let's also assume that we are far enough along in applying whichever smoothing formulas we chose so that $\ell_{\tau} \approx a_0 + a_1 \tau$ and $b_{\tau}\approx a_1.$ According to the first Holt formula, \begin{align*} \ell_{t} & =\alpha(a_{0}+a_{1}t+\epsilon_{t})+(1-\alpha)(\ell_{t-1}+b_{t-1})\\ & \approx\alpha(a_{0}+a_{1}t+\epsilon_{t})+(1-\alpha)(a_{0}+a_{1}[t-1]+a_{1})\\ & =a_{0}+a_{1}t+\alpha\epsilon_{t}. \end{align*} Using Holt's second formula as stated, we get \begin{align*} b_{t} & =\beta(\ell_{t}-\ell_{t-1})+(1-\beta)b_{t-1}\\ & \approx\beta(a_{0}+a_{1}t+\alpha\epsilon_{t}-[a_{0}+a_{1}(t-1)])+(1-\beta)a_{1}\\ & =a_{1}+\beta\alpha\epsilon_{t}, \end{align*} whereas with your second formula we get \begin{align*} b_{t} & =\beta(y_{t}-y_{t-1})+(1-\beta)b_{t-1}\\ & \approx\beta(a_{1}+\epsilon_{t}-\epsilon_{t-1})+(1-\beta)a_{1}\\ & =a_{1}+\beta(\epsilon_{t}-\epsilon_{t-1}). \end{align*} Both estimates of slope are unbiased, but yours contains more noise.

Addendum: That was a bit hand-wavy even by my lax standards, so let me try to make it slightly more rigorous. Let $\lambda_t$ and $\eta_t$ be the errors in $\ell_t$ and $b_t$ respectively, i.e., $$\ell_t = a_0 + a_1 t +\lambda_t$$ and $$b_t = a_1 + \eta_t.$$ The Holt formula for $b_t$ reduces to $$b_t = a_1 + \beta(\lambda_t - \lambda_{t-1}) + (1-\beta)\eta_t$$and the proposed alternative reduces to $$b_t = a_1 + \beta(\epsilon_t - \epsilon_{t-1}) + (1-\beta)\eta_t.$$ Assuming that the exponential smoothing is actually smoothing things, we expect $\lambda_t$ to have lower variance than $\epsilon_t.$ I'm pretty sure you can prove that via an induction argument.

prubin
  • 39,078
  • 3
  • 37
  • 104