If I were to try and rephrase the argument in the original Box-Cox paper in my own words, I would say something like the following: given a model $$ y = x \beta , $$ if the residuals do not appear to come from a normal distribution, we can try transforming the data $y \to y^{(\lambda)}$ such that the residuals of the model $$ y^{(\lambda)} = x \beta $$ satisfy the necessary properties. The optimal value of $\lambda$ can be found by some procedure. (I don't understand all of the details, and perhaps this is the source of my confusion.)
What I am having trouble understanding is how this applies to a time series considered on its own. That is, I often read something to the effect of "a log transformation stabilises the variance of a time series". How does the Box-Cox argument above apply here? Specifically, if $y$ represents the time series in the equation above, what is $x$?
Moreover, I fail to understand what exactly is being transformed so that it looks as though it comes from a normal distribution. By this I mean that many time series exhibit clear trends, so that it doesn't make sense to me to talk about transforming them so that they appear to come from a normal distribution as the mean is clearly trending. But this isn't a problem anyway since we usually want returns or differences to be normally distributed. Is this what the Box-Cox transformation accomplishes? For instance, if I run a standard Box-Cox routine in R or python on a time series and it turns out that the square root transformation "maximises the likelihood", what does this mean for the time series? Does it mean that
$$
\sqrt{y_t} \sim \mathcal{N}(0, \sigma^2)
$$
or
$$
\sqrt{y_t} - \sqrt{y_{t-1}} \sim \mathcal{N}(0, \sigma^2)
$$
or
$$
\sqrt{y_t - y_{t-1}} \sim \mathcal{N}(0, \sigma^2)
$$
or something else?