The difference is simply that the RSS are scaled or weighted by the reciprocal of the model fit. It's not something I have seen before. A reason for doing this could be to make this value more comparable between datasets on different scales.
The value is unfortunately unbounded, e.g., if $\hat{y}_i$ is very small. If $\hat{y}_i<0$, then the ratio will be negative, and if this happens for "many" $i$, the entire expression under the square root might be negative.
Note that minimizing this expression means that we are not incentivized for $\hat{y}_i$ to be the conditional expectation of $y_i$, because by increasing $\hat{y}_i$ a little bit, we can make the ratio a little smaller, so the optimal $\hat{y}_i$ under this loss will be slightly higher than $E(y_i)$. This is similar to a property of the Mean Absolute Percentage Error (MAPE). (Actually, some people use a variant of the MAPE with the prediction in the denominator, which is similar to your loss function, only with an absolute value rather than the square in the numerator.)