I think I finally understand the point of the 'moving average' part of ARMA/ARIMA, but I wanted to confirm here, just in case I am still off.
Idea 1: Autoregressive processes are easy to motivate - future prices could reasonably be better predicted with current prices and perhaps the prices before today could paint an even more complete picture.
Idea 2: (Finite) moving average processes may not be as natural to motivate, but they have a lot of "niceness" properties due to the autocovariance vanishing beyond the specified window.
Idea 3: Any autoregressive process can be viewed as an $MA(\infty)$ and any moving average process can be viewed as an $AR(\infty)$ process. My understanding so far is that it's also a bit easier to deduce nice properties of $MA(\infty)$ processes from properties of their coefficients than it is for $AR(\infty)$ processes, even though they cover the exact same processes (which is why Wold's decomposition involves an $MA(\infty)$ instead of an $AR(\infty)$ term).
Key idea: The following is a reasonable model:
$x_t\sim \varepsilon_t + \sum\limits_{i>0} \alpha^i x_{t-i}$
for some $0<\alpha<1$. Using $L$ for the lag operator, this can be written:
$x_t=\varepsilon_t + \frac{1}{1-\alpha L} x_{t} - x_t$
and collecting terms:
$(1-2\alpha L)x_t=(1-\alpha L)\varepsilon_t$
which gives us
$x_t\sim 2\alpha x_{t-1} + \varepsilon_t - \alpha \varepsilon_{t-1}$
which is an $ARMA(1,1)$ model.
Punchline: Sometimes $AR(\infty)$ processes can be better approximated by rational functions in the $L$ operator than pure polynomials with the same number of terms. And multiplying both sides by the denominator of the rational function gives the "MA" side of the process.
Is there a bigger picture I'm missing?