What exactly does the Box-Cox transformation do to a time series?

Question

If I were to try and rephrase the argument in the original Box-Cox paper in my own words, I would say something like the following: given a model $$ y = x \beta , $$ if the residuals do not appear to come from a normal distribution, we can try transforming the data $y \to y^{(\lambda)}$ such that the residuals of the model $$ y^{(\lambda)} = x \beta $$ satisfy the necessary properties. The optimal value of $\lambda$ can be found by some procedure. (I don't understand all of the details, and perhaps this is the source of my confusion.)

What I am having trouble understanding is how this applies to a time series considered on its own. That is, I often read something to the effect of "a log transformation stabilises the variance of a time series". How does the Box-Cox argument above apply here? Specifically, if $y$ represents the time series in the equation above, what is $x$?

Moreover, I fail to understand what exactly is being transformed so that it looks as though it comes from a normal distribution. By this I mean that many time series exhibit clear trends, so that it doesn't make sense to me to talk about transforming them so that they appear to come from a normal distribution as the mean is clearly trending. But this isn't a problem anyway since we usually want returns or differences to be normally distributed. Is this what the Box-Cox transformation accomplishes? For instance, if I run a standard Box-Cox routine in R or python on a time series and it turns out that the square root transformation "maximises the likelihood", what does this mean for the time series? Does it mean that $$ \sqrt{y_t} \sim \mathcal{N}(0, \sigma^2) $$ or $$ \sqrt{y_t} - \sqrt{y_{t-1}} \sim \mathcal{N}(0, \sigma^2) $$ or $$ \sqrt{y_t - y_{t-1}} \sim \mathcal{N}(0, \sigma^2) $$ or something else?

I believe most of your questions are fully answered in other threads. For an explanation and practical example see https://stats.stackexchange.com/a/74594/919. For the theoretical underpinnings, check out https://stats.stackexchange.com/a/66038/919. The only topic not touched on in those threads is the likelihood question. — whuber, Sep 26 '22 at 15:02
Thank you @whuber for the excellent threads, that already clears up a lot of my confusion. I would be curious as to how this relates to the Log-likelihood bit though if you had any insight to share? Specifically, am I right in thinking that spread-level argument from the other thread seems to be related to the Jacobian in the Log-likelihood function? — Anthony, Sep 26 '22 at 19:13
It could be framed as such, but that argument was developed in a spirit foreign to likelihoods. When you do posit a family of distributions $F(\ ;\theta),$ and you are willing to apply a parameterized transformation $\phi$ to the data, all you need do is introduce the transformation parameter $\lambda$ and stipulate that $$\Pr(X\le x\mid \theta,\lambda)=F(\phi(X,\lambda)\mid \theta).$$ This simply says some transformation of $X$ follows a distribution of the assumed form. The rest is just Calculus. — whuber, Sep 26 '22 at 19:34

What exactly does the Box-Cox transformation do to a time series?

0 Answers0