I'm trying to wrap my head around a loss surface in pytorch. This is for work, not a homework assignment.
let's say we have a model
y = model(x)
error = y - y_label
The most simple of loss functions, absolute error
error.abs().mean().backwards()
The "industry standard" loss function looks like this
(error * error).mean().backwards() # error.pow(2) also works
In my mind, two things are happening here:
- The errors are being weighted by the magnitude of the errors
- The error function is now non linear, and the gradient is dependant on the error squared
So my question is: Can anyone tell me (like I'm a 5 year old), what their intuition is about the difference between mse_loss and the function below is?
(error * error.detach()).mean().backwards()

