I was under the impression that changing the scale/normalizing the target variable in a regression task would not change the overall shape of the loss function equation but would simply translate/move it somewhere else. Therefore, putting bad weight initialization aside, the network would be able to converge the same way regardless of what scale the output was.If in one instance the target variable would be in the range [0,1] and in another instance with range [1000,10000] this should not make a difference.
However I was playing around with a 3d visualizer to see how the loss function would change under different scales of output variables and the shapes of the graph did actually seem to change.
I was trying to model a simple neural network with one input, two weights, and one output which looked like this.
Therefore the mean squared error loss function would be something like:

where z represents the loss, w2 is the second weight, w1 is the first weight, the value of 5 represents the input X for one data sample, and the value of 2 represents the true value of the data sample.
When plotting this, I got something like:

When I change the scale of the output so that say the output is now 20 instead of 2 representing a different scale of outputs, the equation becomes:

The two 3d plots definitely seem to have some shape dissimilarities to them in terms of gradients and are not simply just translations of eachother.
My Question is shouldn't we always be normalizing our target variables in regression tasks if it leads to different shaped loss curve and would probably make converging easier or is there a particular reason why it might not matter to normalize the target variables.

