2

I was under the impression that changing the scale/normalizing the target variable in a regression task would not change the overall shape of the loss function equation but would simply translate/move it somewhere else. Therefore, putting bad weight initialization aside, the network would be able to converge the same way regardless of what scale the output was.If in one instance the target variable would be in the range [0,1] and in another instance with range [1000,10000] this should not make a difference.

However I was playing around with a 3d visualizer to see how the loss function would change under different scales of output variables and the shapes of the graph did actually seem to change.

I was trying to model a simple neural network with one input, two weights, and one output which looked like this.

enter image description here

Therefore the mean squared error loss function would be something like: (w2(1/1+e^(-5w1))-2)^2

where z represents the loss, w2 is the second weight, w1 is the first weight, the value of 5 represents the input X for one data sample, and the value of 2 represents the true value of the data sample.

When plotting this, I got something like: enter image description here

When I change the scale of the output so that say the output is now 20 instead of 2 representing a different scale of outputs, the equation becomes: enter image description here

The plot now looks like this: enter image description here

The two 3d plots definitely seem to have some shape dissimilarities to them in terms of gradients and are not simply just translations of eachother.

My Question is shouldn't we always be normalizing our target variables in regression tasks if it leads to different shaped loss curve and would probably make converging easier or is there a particular reason why it might not matter to normalize the target variables.

Richard Hardy
  • 67,272

1 Answers1

0

It's hard to generalize for all the optimization procedures, but if we're talking about neural networks and gradient descent, it's usually a good idea to normalize for the gradient updates to behave well. In general, it's hard to come up with a case where after normalizing, you're worse off.

The following post might be useful on different opinions, I tend to agree with the answers given at the end: Is it necessary to scale the target value in addition to scaling features for regression analysis?

gunes
  • 57,205
  • thank you for your answer, in general would you agree with the fact there is no real rule of thumb on whether to normalize outputs or not, but for really large scaled target variables, there is no downside to normalizing them since it can only help. (the link you provided definitely had numerous differing opinions on whether to normalize or not) – Kiran Manicka Mar 31 '23 at 16:20