Fitting ARMA(p, q) parameters simultaneously vs. sequentially

Question

There are already multiple questions asking where the lagged error terms in an MA(q) model come from and how the model can be fit without knowing the errors in advance. Textbooks give multiple methods for dealing with this, but I've noticed that earlier texts and even recent formal texts seem to recommend finding the likelihood function for the entire MA(q) or ARMA(p, q) model by setting some initial values for the error terms from $e_0$ to $e_{-q + 1}$. We then iteratively estimate

$$ε_t = y_t - μ - θ_{1}ε_{t-1} - θ_{2}ε_{t-2}-\cdots-θ_{q}ε_{t-q}$$

By setting initial errors (often to 0), everything becomes known except for the parameters, which can then be estimated by numerical optimization. If the model is MA(1), this expands into a tedious yet clean equation that anyone could derive with a pencil and paper:

$$ε_t = (y_t - μ) + (-1)^1θ_1^1(y_{t-1} - μ) + (-1)^2θ_1^2(y_{t-2} - μ) + \cdots + (-1)^{t-1}θ_1^{t-1}(y_1 - μ) + (-1)^tθ_1^tc $$

Where $c$ is whatever we picked for $ε_0$.

This is found, for example, in Hamilton's Time Series Analysis on page 130 (of the 1994 edition). He even includes something like the expansion I wrote above in the $q=1$ case. However, when I tried to derive a similarly clean equation for $q > 1$, it became clear that while it could probably be programmed recursively, it is monstrous to try to write it out.

This approach is also recommended often in texts for fitting ARMA(p, q) i.e. set initial errors, estimate the rest with a variation on the above method, and optimize the resulting huge equation with respect to the parameters.

This contrasts sharply with most blogs, such as this one, which seem to describe something like the Hannan-Rissanen (1982) algorithm. This approach fits an AR model, then uses the errors of the AR model to fit the MA portion. The final ARMA is then just these two models put together. The Hannan-Rissanen in particular follows this approach to fit an ARMA model, then repeats the fitting with new errors until convergence.

The sequential method seems much, much simpler than the method I described above. Estimating parameters in an AR model is simple, and if the lagged error terms are known, then fitting the MA model also becomes simple.

Many of these texts were published in the 50s and 60s when computational power was hard to come by, yet they recommend the more difficult method. And even after Hannan and Rissanen published their paper, many texts and lectures seem to have all but ignored them and continued referencing Box and Jenkins or Hamilton.

Is there any practical reason that one would ever use the simultaneous optimizing method instead of some variant of the Hannan-Rissanen algorithm? What are the relative merits of these two different fitting methods?

@jbowman, here are 2 points. First, we do not want the actual AR order when estimating the errors; we want a larger AR order so that not only the actual AR but also the omitted MA order is approximated. Second, regarding multiple models can give almost identical results from a fitting or predictive point of view. However, if we actually wish to understand model structure, e.g., how long is it before a shift in monetary policy has worked its way through the economy, we want something pretty close to the true model structure with good associated parameter estimates. — Richard Hardy, Jan 22 '23 at 08:45
@jbowman, now, the same process can have several representations. MA(1) is AR($\infty$), AR(1) is MA($\infty$) and the like. We care about the process, not the representation. If our substantive conclusions differ based on different representations of the same process, they must be mistaken. Therefore, I would look at impulse-response functions, not the estimated AR and MA parameters. The latter can be very different for two processes that are virtually the same, while the former would reflect the simplicity. See Gilbert "State space and ARMA models: An overview of the equivalence". — Richard Hardy, Jan 22 '23 at 08:48

Fitting ARMA(p, q) parameters simultaneously vs. sequentially

0 Answers0