0

I know that RSS explains the deviation between the model and actual data by measuring the square of the difference between them. I found this metric in one of the questions sir gave me. What is the difference between this and RSS? Can the value of this be bounded?

$$\sqrt{\frac{1}{n-p-1}\sum_{i=1}^n \frac{(y_i-\hat{y_i})^2}{\hat{y_i}}}$$

Bijay
  • 103

1 Answers1

1

The difference is simply that the RSS are scaled or weighted by the reciprocal of the model fit. It's not something I have seen before. A reason for doing this could be to make this value more comparable between datasets on different scales.

The value is unfortunately unbounded, e.g., if $\hat{y}_i$ is very small. If $\hat{y}_i<0$, then the ratio will be negative, and if this happens for "many" $i$, the entire expression under the square root might be negative.

Note that minimizing this expression means that we are not incentivized for $\hat{y}_i$ to be the conditional expectation of $y_i$, because by increasing $\hat{y}_i$ a little bit, we can make the ratio a little smaller, so the optimal $\hat{y}_i$ under this loss will be slightly higher than $E(y_i)$. This is similar to a property of the Mean Absolute Percentage Error (MAPE). (Actually, some people use a variant of the MAPE with the prediction in the denominator, which is similar to your loss function, only with an absolute value rather than the square in the numerator.)

Stephan Kolassa
  • 123,354