How to normalize data between -1 and 1?

Question

I have seen the min-max normalization formula but that normalizes values between 0 and 1. How would I normalize my data between -1 and 1? I have both negative and positive values in my data matrix.

If you're working in R, see this thread for a few options. In particular, a comment on the accepted answer has this function where you set the 'newMax' to 1 and 'newMin' to -1 and run the function on your data — mtreg, Oct 26 '15 at 01:19
You can find reference at Wikipedia as follows: https://en.wikipedia.org/wiki/Normalization_(statistics) — salem, Feb 26 '18 at 05:38
@covfefe if you are still around you might want to accept one of the answers — Simone, Nov 04 '18 at 09:06
Javascript example, taken from here. function convertRange( value, r1, r2 ) { return ( value - r1[ 0 ] ) * ( r2[ 1 ] - r2[ 0 ] ) / ( r1[ 1 ] - r1[ 0 ] ) + r2[ 0 ]; } convertRange( 328.17, [ 300.77, 559.22 ], [ 1, 10 ] ); >>> 1.9541497388276272 — Giuseppe Canale, Mar 08 '18 at 16:46
Here is a Python gist of the Javascript command convertRange shared by Giuseppe Canale. — Galen, Oct 16 '22 at 15:50

Simone · Answer 1 · 2022-10-04T11:27:24.523

195

With: $$ x' = \frac{x - \min{x}}{\max{x} - \min{x}} $$ you normalize your feature $x$ in $[0,1]$.

To normalize in $[-1,1]$ you can use:

$$ x'' = 2\frac{x - \min{x}}{\max{x} - \min{x}} - 1 $$

In general, you can always get a new variable $x'''$ in $[a,b]$:

$$ x''' = (b-a)\frac{x - \min{x}}{\max{x} - \min{x}} + a $$

And in case you want to bring a variable back to its original value you can do it because these are linear transformations and thus invertible. For example:

$$ x = (x''' - a)\frac{(\max{x} - \min{x})}{b-a} + \min{x} $$

An example in Python:

import numpy as np
x = np.array([1, 3, 4, 5, -1, -7])
# goal : range [0, 1]
x1 = (x - min(x)) / ( max(x) - min(x) )
print(x1)
>>> [0.66666667 0.83333333 0.91666667 1. 0.5 0.]

edited Oct 04 '22 at 11:27

answered Oct 26 '15 at 01:15

Simone

7,078

17

Honestly I don't have citations for this. It is just a linear transformation of a random variable. Have a look at the effect of linear transformations on the support of a random variable. – Simone Oct 19 '17 at 04:40
Do you have better methods to do this? – GoingMyWay Feb 29 '20 at 12:15
@GoingMyWay better from what point of view? – Simone Feb 29 '20 at 21:03
4

Appreciate the final general formula for any interval $[a,b]$ – Alex Trueman Apr 18 '20 at 19:32
@Simone: Is there a way to renormalize all the values? i.e. bring them back to their original values? – Srivatsan Jun 26 '20 at 16:56
5

@ThePredator this is a linear transformation of a random variable, so it is invertible. But you need to know the original $\max{x}$ and $\min{x}$. If you have $x''$ (as in the formula above) in $[-1,1]$ you can get back to $x$ with $(\max{x} - \min{x})\frac{x''+1}{2} + \min{x}$. – Simone Jun 28 '20 at 09:49
1

or in general: $x=\frac{(x'''-a)(\max{x}-\min{x})}{b-a}+\min{x}$. I advise keeping your original and normalised datasets, you can then find $\max{x}$ and $\min{x}$ easily by just looking at your original dataset again. – GMSL Oct 11 '21 at 11:22
Just a remark: https://scikit-learn.org/stable/modules/generated/sklearn.preprocessing.MinMaxScaler.html performs the same normalization as in $x'''$ – Simone Feb 15 '22 at 16:02
1

Here is a Python gist of $x^{\prime\prime\prime}$. – Galen Oct 16 '22 at 15:33

A.L. Verminburger · Answer 2 · 2019-07-17T17:18:41.230

-3

I tested on randomly generated data, and

\begin{equation} X_{out} = (b-a)\frac{X_{in} - \min{X_{in}}}{\max{X_{in}} - \min{X_{in}}} + a \end{equation}

does not preserve the shape of the distribution. Would really like to see the proper derivation of this using functions of random variables.

The approach that did preserve the shape for me was using:

\begin{equation} X_{out} = \frac{X_{in} - \mu_{in}}{\sigma_{in}} \cdot \sigma_{out} + \mu_{out} \end{equation}

where

\begin{equation} \sigma_{out} = \frac{b-a}{6} \end{equation}

(I admit that using 6 is a bit dirty) and

\begin{equation} \mu_{out} = \frac{b+a}{2} \end{equation}

and

$a$ and $b$ is the desired range; so as per original question would be $a=-1$ and $b=1$.

I arrived at the result from this reasoning

\begin{equation} Z_{out} = Z_{in} \end{equation}

\begin{equation} \frac{X_{out} - \mu_{out}}{\sigma_{out}} = \frac{X_{in} - \mu_{in}}{\sigma_{in}} \end{equation}

edited Jul 17 '19 at 17:18

answered Jul 17 '19 at 16:44

A.L. Verminburger

172

3

Are you sure that this guarantees the transformed data will lie within the bounds? In R, try: set.seed(1); scale(rnorm(1000))*.333. I get a max of 1.230871. Your method seems to be just a tweak on standardizing data, rather than normalizing them as requested. Note that the question does not ask for a method that preserves the shape of the distribution (which would be a strange requirement for normalization). – gung - Reinstate Monica Jul 17 '19 at 17:01
4

I'm not sure how the original transformation could fail to preserve the shape of the data. It's equivalent to subtracting a constant and then dividing by a constant, which is what your proposal does, and which doesn't change the shape of the data. Your proposal assumes all the data falls within three standard deviations of the mean, which may be somewhat reasonable with small, approximately normally distributed samples, but not with big or non-normal samples. – Noah Jul 17 '19 at 17:01
1

@Noah It's not equivalent to subtracting and dividing by constants, because the min and max of the data are random variables. Indeed, for most underlying distributions they are pretty variable--more variable than the rest of the data--whence using them for any form of standardization is usually not a good idea. In this answer it's unclear what $a$ and $b$ mean or how they might be related to the data. – whuber Jul 17 '19 at 17:15
2

@whuber true, but I meant that in a given dataset (i.e., treating the data as fixed), they are constants, in the same way the sample mean and sample standard deviation function as constants when standardizing a dataset. My impression was that OP wanted to normalize a dataset, not a distribution. – Noah Jul 17 '19 at 17:57
@Noah I had the same impression, but I believe the present post may be responding to a different interpretation. – whuber Jul 17 '19 at 19:57
Here is a Python gist of A.L. Verminburger's scaling approach. – Galen Oct 16 '22 at 15:42

How to normalize data between -1 and 1?

2 Answers2

Linked

Related