2

Is taking the average of different rmse valid? for example average rmse = (rmse1+rmse2+rmse3)/3 Thank you for your help!

angelo
  • 29
  • 1
  • 2

2 Answers2

4

I actually wasn't sure about this either so I tested it out with a short example:

## Create simple function to calcualte the error
rmse <- function(error){sqrt(mean(error^2))}

Define two example error vectors

error1 <- c(0.4, 0.2, 0.01) error2 <- c(0.1, 0.3, 0.79)

Find the RMSE of each error vector

rmse1 <- rmse(error1) rmse2 <- rmse(error2)

Compare the RMSE variants

print(rmse_all <- rmse(c(error1, error2))) [1] 0.3924708 print(rmse_avg <- mean(rmse1, rmse2)) [1] 0.2582634

So we can se that they are not equal.

## As described by @whuber in the comments:

a <- rmse1^2length(error1) # - square each rmse & multiply b <- rmse2^2length(error2) # it by its associated count c <- sum(a, b) # - sum that stuff up,
d <- c/sum(length(error1), length(error2)) # - divide by the total count,
print(total_rmse <- sqrt(d)) # - take the square root. [1] 0.3924708

Nic
  • 3
n1k31t4
  • 551
  • 4
    The answer lies in front of you. Since you have defined rmse as the root of the mean of certain values, then just square each rmse, multiply it by its associated count, sum that stuff up, divide by the total count, and take the square root. The "slight bastardization" can be dreadfully wrong. – whuber Jan 14 '16 at 21:51
  • @whuber - I have edited to show what you describe. Please feel free edit my post with comments to add clarity. – n1k31t4 Jan 14 '16 at 22:41
0

In the case where you do not have the actual errors, only the RMSE statistic, I guess an improvement on taking the simple mean might be to use the geometric mean. This makes the assumption that each RMSE score deserves the same weight.

rmse1
[1] 0.2582634
rmse2
[1] 0.4912908
(rmse1*rmse2)^(1/2)
[1] 0.3562056