2

There is mse function: C = $\frac{1}{2n}$ * $\sum(length(y - a)^2)$

why not just use C = $\sum(length(y - a))$ ?

(where "length" is the vector's length, "y" - ideal network's output, "a" - current network output)

Sycorax
  • 90,934

2 Answers2

1

You're talking about L1 norm and L2 norm. Both work for neural networks. However, they are different:

Without more information, I can't comment on how L2 norm is better (or worse) for your problem.

SmallChess
  • 7,211
-1

Short answer: both can be used.

Longer answer: both measures are in active use. The first measure is based on the Euclidean distance, the second one on the taxi-cab distance. Or more formally: the $L_2$ distance and the $L_1$ distance.
Which is better depends on the context. Intuitively: the Euclidean distance prefers many small/medium errors over a few big errors while the taxi-cab distance is more forgiving when it comes to a few large errors. Which one is preferable depends on the context and what you are trying to achieve.

dimpol
  • 1,032
  • 5
  • 13
  • you are talking about length function, but I'm asking why result of length function should be powerd by 2? – Dmytro Nalyvaiko Mar 09 '17 at 11:59
  • That could be because you want to 'punish' big errors in certain training cases over the same error spread over multiple training cases. Suppose over 2 training cases one algorithm has an error of 0 and 3 respectively and another algorithm has an error of 2 for both training cases. Squaring the errors would make the second algorithm preferable, not squaring would have the first algorithm as better. The right choice depends on the context – dimpol Mar 09 '17 at 12:12
  • Squaring the errors would make the second algorithm preferable

    0^2 + 3^2 = 9; 2^2 + 2^2 = 8;

    why second algorithm is preferable?

    – Dmytro Nalyvaiko Mar 09 '17 at 13:43
  • An error-score of 8 is lower than an error-score of 9, lower error-score is preferable. – dimpol Mar 10 '17 at 22:20