The difference between $y^{\lambda}$ and $\frac{y^{\lambda}-1}{\lambda}$ is a linear transformation, which in itself will not affect the relative shape of the distribution of $y$ - it will change absolute mean and variance however. The whole point is the non-linear part that will squeeze or stretch the distribution's tails.
In fact, the original paper describes both $\frac{y^{\lambda}-1}{\lambda}$ (1) and $y^{\lambda}$ (3):
Note that since an analysis
of variance is unchanged by a linear transformation (1) is equivalent to (3); the form (1) is slightly preferable for theoretical analysis because it is continuous at $\lambda=0$.
It is more common in practice, if you are linearly transforming a variable, to replace that linear transformation by $\frac{y-\bar y}{\sqrt{\text{Var}(y)}}$, resulting in a standardized variable with mean 0 and variance 1. Determining optimal values of $\lambda$ such as by R's MASS::boxcox would use formula (1).
I should point out that while standardisation can definitely be a good idea (it stabilizes many numerical procedures, not in the least fitting of covariance matrices), Box-Cox tends to be applied rather mindlessly (all data must follow a normal distribution, right?).
As Stephan Kolassa raised in the comments, regressions on linearly transformed variables also are readily back-transformed (e.g. their predictions); this is not the case for non-linear transformations.