Does normalizing/changing the scale of the target variable impact the shape of the loss function equation?

Question

I was under the impression that changing the scale/normalizing the target variable in a regression task would not change the overall shape of the loss function equation but would simply translate/move it somewhere else. Therefore, putting bad weight initialization aside, the network would be able to converge the same way regardless of what scale the output was.If in one instance the target variable would be in the range [0,1] and in another instance with range [1000,10000] this should not make a difference.

However I was playing around with a 3d visualizer to see how the loss function would change under different scales of output variables and the shapes of the graph did actually seem to change.

I was trying to model a simple neural network with one input, two weights, and one output which looked like this.

Therefore the mean squared error loss function would be something like:

where z represents the loss, w2 is the second weight, w1 is the first weight, the value of 5 represents the input X for one data sample, and the value of 2 represents the true value of the data sample.

When plotting this, I got something like:

When I change the scale of the output so that say the output is now 20 instead of 2 representing a different scale of outputs, the equation becomes:

The plot now looks like this:

The two 3d plots definitely seem to have some shape dissimilarities to them in terms of gradients and are not simply just translations of eachother.

My Question is shouldn't we always be normalizing our target variables in regression tasks if it leads to different shaped loss curve and would probably make converging easier or is there a particular reason why it might not matter to normalize the target variables.

gunes · Answer 1 · 2023-04-01T11:19:31.213

0

It's hard to generalize for all the optimization procedures, but if we're talking about neural networks and gradient descent, it's usually a good idea to normalize for the gradient updates to behave well. In general, it's hard to come up with a case where after normalizing, you're worse off.

The following post might be useful on different opinions, I tend to agree with the answers given at the end: Is it necessary to scale the target value in addition to scaling features for regression analysis?

edited Apr 01 '23 at 11:19

answered Mar 31 '23 at 08:09

gunes

57,205

thank you for your answer, in general would you agree with the fact there is no real rule of thumb on whether to normalize outputs or not, but for really large scaled target variables, there is no downside to normalizing them since it can only help. (the link you provided definitely had numerous differing opinions on whether to normalize or not) – Kiran Manicka Mar 31 '23 at 16:20

Does normalizing/changing the scale of the target variable impact the shape of the loss function equation?

1 Answers1