3

How do I transform ARIMA(2,1,0) into MA infinity process? I tried to compute it as

$$y_t=\frac{1}{(1-\phi_1 B-\phi_2 B^2)(1-B)}\epsilon_t$$

But then I cannot find a way to separate this denominator...

Richard Hardy
  • 67,272
John
  • 31

1 Answers1

1

In this (conventional) notation, $\mathcal E = (\epsilon_t)$ is a sequence of random variables indexed by integers ("times") $t=\ldots -2,-1,0,1,2,\ldots.$ Such sequences are known as time series processes.

Mathematical preliminaries

This section emphasizes the way in which familiar algebraic methods apply directly, without any appreciable change, to expressions involving the backshift operator $B.$

Time series processes (defined on a common sample space) can be added term by term and multiplied by scalars (real or complex numbers) term by term. These operations satisfy the vector space axioms. Let $\mathbb T$ be such a vector space.

The lag or "backshift" operator $B$ shifts a time series process $\mathcal X = (X_t)$ forward one step in time: that is,

$$(B\mathcal X)_t = X_{t-1}.$$

The set of linear operators $A:\mathbb T \to \mathbb T$ forms an algebra because any two such operators can be multiplied via composition (applying one to the result of the other). This algebra has a unit, written "$\mathbf 1,$" which is the identity map: it sends every time series process to itself.

Any operator, like $B,$ generates a subalgebra of linear operators formed by linear combinations of $B$ and $1.$ Letting $\mathcal R$ be the scalars (real or complex numbers), this subalgebra consists of expressions of the form

$$a_0 \mathbf 1 + a_1 B + a_2 B^2 + \cdots a_n B^n$$

where the $a_i\in\mathcal R$ are scalars. Such a linear combination acts on time series process in the obvious way; namely,

$$((a_0 \mathbf 1 + a_1 B + a_2 B^2 + \cdots a_n B^n)\mathcal X)_t = a_0 X_t + a_1 (B\mathcal X)_t + \cdots + a_n (B^n\mathcal X)_t.$$

In the special case of the lag operator $B$ this has the especially simple and relevant form

$$((a_0 \mathbf 1 + a_1 B + a_2 B^2 + \cdots a_n B^n)\mathcal X)_t = a_0 X_t + a_1 X_{t-1} + \cdots + a_n X_{t-n}.$$

The question concerns inverting such operators. In particular, it concerns the inverse of

$$P(B)=(\mathbf1-\phi_1B-\phi_2B^2)(\mathbf1+B) = \mathbf1 + (1-\phi_1)B - (\phi_1+\phi_2)B^2 - \phi_2B^3$$

where $\phi_i\in\mathcal R$ are fixed scalars.

Inverting the operator $P(B).$

Ordinarily, such operators are not evidently invertible: and that is the crux of the question. There are two ways to proceed. Both rely on making sense of infinite sums of such operators, thereby enlarging the algebra generated by $B.$

Rather than get involved in the analytical niceties of this, I propose to skip all such issues for the moment and merely work formally with infinite power sums, pretending that they always converge. (There is a sense in which they do; it relies on defining an ultrametric in which higher powers of $B$ grow smaller.)

When this is the case, familiar algebraic techniques apply. In particular,

  1. We can factor polynomials in $B,$ provided we allow for Complex roots. That is,

    $$\mathbf 1 -\phi_1 B - \phi_2 B^2 = (\mathbf 1 -\lambda_1 B)(\mathbf 1 - \lambda_2 B)$$ where, upon expanding the right hand side, we see

    $$\lambda_1 + \lambda_2 = \phi_1;\quad \lambda_1\lambda_2 = -\phi_2.$$

    You find the $\lambda_i$ in the same way you would solve any such polynomial equations.

  2. We can perform a partial fractions decomposition. This is how you "separate" the denominator. In particular, when the $\lambda_i$ are distinct and neither equals $-1,$ then

    $$\frac{1}{(1 - \lambda_1 x)(1-\lambda_2x)(1+x)} = \frac{A_1}{1 - \lambda_1 x} + \frac{A_2}{1 - \lambda_2 x} + \frac{A_0}{1+x}$$

    where the $A_i$ are numbers that are readily computed in terms of the $\lambda_i.$ Formally, the same result holds for any polynomial in $B,$ because all the rules of algebra are the same.

    (One easy way to find the $A_i$ is illustrated by computing $A_0:$ multiply both sides by $1+x$ and take the limit as $x\to -1.$ The right hand side reduces to $A_0$ while the left hand side is the limit of $1/((1-\lambda_1 x)(1-\lambda_2 x)),$ obtained simply by plugging $x=-1$ in (because this fraction is a continuous function of $x$ at $-1$), giving $$A_0 = \frac{1}{(1+\lambda_1)(1+\lambda_2)}.$$ The other $A_i$ are found the same way. When there are repeated roots the work is a little more involved but leads to similar results.)

  3. Thus, inversion of the original operator $(1-\phi_1 B - \phi_2 B^2)(1+B)$ is (formally) reduced to the problem of inverting simple expressions of the form $1 - \lambda_i B.$ Newton's (generalized) Binomial Theorem, tells us what the inverse must be (if it is to exist at all), because for any $\lambda\ne 0,$

$$\begin{aligned} \frac{1}{\mathbf 1-\lambda B} &= (\mathbf1 + (-\lambda)B)^{-1} \\ &= \sum_{j=0}^\infty \binom{-1}{j}(-\lambda)^j\,B^j \\ &= \mathbf 1 + \lambda B + \lambda^2 B^2 + \cdots + \lambda^j B^j + \cdots \end{aligned}$$

(You ought to recognize this expression as the sum of a geometric series; the common ratio is $\lambda B.$)

Combining results $(2)$ and $(3)$ tells us the inverse of $P(B)$ is a sum of powers of $B$ where the coefficient of the $j^\text{th}$ power is given by the partial fraction coefficients $A_i$ and roots $\lambda_i$ as

$$\frac{1}{P(B)} = \mathbf 1 + \cdots + \left[A_0 (-1)^j + A_1 \lambda_1^j + A_2 \lambda_2^j\right]B^j + \cdots \tag{*}$$

Generally (and still formally), this is the so-called "$\operatorname{MA}(\infty)$" expansion of $1/P(B).$ It will be applied to the time series process $(\epsilon_t)$ to produce an explicit "infinite moving average" expression for each $Y_t$ in terms of $\epsilon_t, $ $\epsilon_{t-1},$ and so on.


When considerations of stationarity (essentially, convergence of infinite sums) come up, we can dodge that issue by expressing $(1-B)\mathcal Y$ (the "first difference" of $\mathcal Y$) as an $MA$ process using these same techniques, because

$$(1-B)\mathcal Y = \left((\mathbf 1 - \lambda_1B)(\mathbf 1 - \lambda_2 B)\right)^{-1}\mathcal E.$$

When we expand the right hand side, what makes the resulting infinite sequences converge analytically (not just formally) is the assumption that $|\lambda_i|\lt 1,$ thereby causing the terms $\lambda_i^j$ in $(*)$ to shrink to zero rapidly as $j$ grows. Assuming $B$ is a bounded operator (according to some useful metric on $\mathbb T$), this will assure convergence in that metric.

whuber
  • 322,774
  • Thank you so much. So regarding your last method I(1), how do we compute the variance of $y_t$ (the variance of process ARIMA(2,1,0)) – John Mar 22 '22 at 21:23
  • It depends on what you assume about $(\epsilon_t).$ Most of the time one makes assumptions that imply $\operatorname{Var}(\epsilon_t)=\sigma^2$ is a constant regardless of $t$ and $\operatorname{Cov}(\epsilon_t, \epsilon_s)=0$ when $t\ne s.$ This is enough to apply the basic rules of working with covariances to finite linear combinations of the $\epsilon_t;$ you then have to take limits. – whuber Mar 22 '22 at 21:26
  • I mean, is it possible to calculate variance of ar(2) first, then use this result in arima(2,1,0)? If yes, how? Thanks. – John Mar 22 '22 at 21:26