Interpreting the Lambdas of Yeo Johnson Transformation?

Question

The following is the table of Lambda values that describe what the resulting dataset would look like after a Box Cox transformation:

What is the equivalent table for Yeo Johnson's lambda values? Cant seem to find it online.

The Yeo-Johnson transformation, like Box-Cox, is explicitly defined in terms of powers, so if you know what this transformation is, you have an answer See https://en.wikipedia.org/wiki/Power_transform for instance. — whuber, Mar 10 '23 at 20:39
Thanks, but I'm having trouble understanding the Wikipedia page, which is why I'm looking for a more digestible answer here. — Katsu, Mar 10 '23 at 22:04
Could you elaborate on what you don't understand? It explicitly tabulates the possibilities. — whuber, Mar 10 '23 at 22:18

whuber · Accepted Answer · 2023-03-13T23:22:31.677

A table adds little, but a picture can add a lot more to our understanding. I offer two pictures.

Unlike the Box-Cox transformation, which applies to positive numbers, the Yeo-Johnson transformation applies to all numbers. It does so by splitting the real line at zero, shifting the positive values by $1$ and the negative values by $-1,$ and applying a Box-Cox transformation to the absolute values, negating them when the argument is negative. In effect, it sews two Box-Cox transformations together. However, they have "inverse" Box-Cox parameters. The natural origin of the Box-Cox parameters is $\lambda = 1$ and the "inverse" parameter is

$$\lambda^\prime = 2 - \lambda,$$

reflecting the parameter line around $\lambda = 1.$ The sewing is smooth (as you will see in the first plot below) because all Box-Cox transformations are by design made to agree with the identity transformation at $x = 1.$

For pictures of the Box-Cox transformations and some explanation of their construction, see https://stats.stackexchange.com/a/467525/919. These transformations are given by

$$\operatorname{BC}(x;\lambda) = \frac{x^\lambda - 1}{\lambda}$$

(which has the limiting value of $\log(x)$ when $\lambda = 0$). They can be inverted: when $y$ is the transformed value, the original $x$ is recovered by

$$\operatorname{BC}^{-1}(y;\lambda) = (1 + \lambda y)^{1/\lambda}$$

(limiting to the exponential function when $\lambda = 0$).

The Yeo-Johnson transformation is

$$\operatorname{YJ}(x;\lambda) = \left\{\begin{aligned}\operatorname{BC}(1+x,\lambda), && x \ge 0\\ -\operatorname{BC}(1-x, \lambda^\prime),&& x \lt 0.\end{aligned} \right.$$

These can all be inverted by inverting the positive and negative values separately.

The implementation in any programming language is thereby simple. In R, for instance, it is

BC <- function(x, lambda) ifelse(lambda != 0, (x^lambda - 1) / lambda, log(x))
YJ <- function(y, lambda) ifelse(y >= 0, BC(y + 1, lambda), -BC(1 - y, 2-lambda))

The graphs of $\operatorname{YJ}$ show the effects on the data for various $\lambda,$

Here's what they do to a reference (Normal) distribution (the green distribution for $\lambda = 1$ in the middle panel):

Like the Box-Cox family, these transformations make a distribution more positively skewed when $\lambda \gt 1$ and more negatively skewed when $\lambda \lt 1.$

Thank you for taking the time to create this. I will still need to go through and understand this piece by piece, but it looks great! — Katsu, Mar 14 '23 at 17:01
+1. It might be helpful to explicitly point out for future readers that the expectation of a transformed random variable is not equal to the transform of the expectation - one needs to correct for bias. This is frequently gotten wrong when people transform a time series, forecast this out, and then naively back-transform the expectation forecast, ending up with bias. See the bottom here: https://otexts.com/fpp3/ftransformations.html. — Stephan Kolassa, Mar 15 '23 at 09:27
@Stephan Thank you. That issue is tangential to this thread, but I investigated it anyway. In the second figure it looks to me like these transformed distributions do have means near zero. As you say, they are not exactly zero. However, the original zero-mean distribution ($\lambda=1$) has a standard deviation of $0.3$ and all the transformed distributions have means between $-0.08$ and $+0.08,$ which is not a huge change. (For $-2\le\lambda\le4,$ $E[y]\approx0.14\operatorname{SD}(x)$) Thus, provided one doesn't apply a strong transformation, this might not be an important concern. — whuber, Mar 15 '23 at 13:25
Let's consider a log-transformation ($\lambda=0$) of a lognormal (which always yields normality) under the usual parameterization. When $\sigma$ is small (e.g. $\sigma=0.1$, say), the bias in simply backtransforming the log-scale mean is small (here I'm specifically talking in percentage error terms rather than the usual raw difference, so only $\sigma$ will be relevant). When $\sigma$ is large ($\sigma>1$ say), the effect may be huge. We can see that the same strength of transformation (if we treat $\lambda$ as strength) might have either small or large effects on this percentage-error bias. — Glen_b, Mar 16 '23 at 00:54
@Glen_b All true -- but log transformations are not part of this family. The intended application is to data that are centered near zero, not to positive data. — whuber, Mar 16 '23 at 13:21

Interpreting the Lambdas of Yeo Johnson Transformation?

1 Answers1