What is an intuitive explanation for why my mean absolute error converges to 66.6%?

Question

Let's assume I have actual numbers (randomly generated) between 1 and 1000. Let's further assume, my prediction model tries to predict the actual numbers. My "forecasting" model is a monkey that true-randomly presses buttons between 1 and 1000. Essentially, it is the same function which is used to generate the actual numbers. I just put two sets of randomly generated numbers next to each other and calculate their difference ("error").

If I follow this logic and calculate the mean absolute error of the monkey model and normalize it with the actual numbers, the MAE will be around 66%. Just to confirm, the formula to calculate the MAE is: SUM(ABS(Actual-Forecast))/SUM(Actual).

You can create this model easily in Excel.

What is an intuitive explanation for why I get exactly a value of 66.666% the more numbers I generate? My understanding is, that on average I predict 66% of an actual figure wrong.

The MAE would be the average of ABS(Actual-Forecast). Instead of averaging, you take the sum and divide by the sum of the actuals. This is a kind of a weighted MAPE (Kolassa & Schütz, 2007). — Stephan Kolassa, Mar 18 '24 at 19:55
i used the german wikipedia article for the formula of the mean absolute error calculation and the first normalization shown in the wiki article is basically what i am doing. in the article the MAE is divided by the average actual value. It is essentially the same like dividing the sum of abs errors with the sum of actuals. — UDE_Student, Mar 18 '24 at 20:01
Well, that is a normalized MAE, not the MAE, there is a difference. — Stephan Kolassa, Mar 18 '24 at 20:04
you are right, maybe i should have mentioned that in the title. In my explanation I mentioned that I normalize the MAE with the actual numbers. — UDE_Student, Mar 18 '24 at 20:05
This is answered at https://stats.stackexchange.com/questions/52909 for the two-dimensional analog -- which reduces to this question for an arbitrarily skinny rectangle. — whuber, Mar 18 '24 at 20:14
Why not assume nothing, go back and list what you actually know? — Robbie Goodwin, Mar 23 '24 at 00:15

Sextus Empiricus · Accepted Answer · 2024-03-19T17:32:05.963

17

You can consider this as randomly drawing a number on a square and computing the vertical distance to the diagonal.

Then this relates to the average of the triangular distribution.

If we fill in the formula for the mean of that distribution, with $a = c = 0$ and $b=1$, then you get $MAE = \frac{1}{3}$ and your division with the actual value $\frac{1}{2}$ makes it a relative error of $\frac{2}{3}$.

Alternative viewpoint, you can also consider the perpendicular distance to the diagonal which is the residual vector in estimating the mean $(x-\mu, y-\mu)$, where $\mu = \frac{x+y}{2}$.

Another alternative viewpoint, consider that the plot of $(x-y)$ and $(x+y)$, two values that you use in your computation, is a rotation, shift and scaling of the square.

edited Mar 19 '24 at 17:32

answered Mar 18 '24 at 20:04

Sextus Empiricus

77,915

Why can we "consider this as randomly drawing a number on a square and computing the distance to the diagonal"? To me, it's more like "randomly drawing two points from a square and computing the distance between them". – Zhanxiong Mar 18 '24 at 22:11
@Zhanxiong It's like the transformation to coordinates $x-y$ and $x+y$ which scales, rotates and shifts the square. It gives a geometric interpretation to why $|x-y|$ resembles a triangular distribution. – Sextus Empiricus Mar 18 '24 at 23:34
2

@Zhanxiong if you take your approach, then it is like randomly drawing two points on a line (instead of a square). You can do it like that and compute the integral, but it gives less of a geometrical intuition how you get this triangle. – Sextus Empiricus Mar 18 '24 at 23:43
Alternatively to my representation, you can also consider the projection straight down to the diagonal. – Sextus Empiricus Mar 18 '24 at 23:44
I posted, IMHO, a more straightforward (in that there is no "discrete-to-continuous" or projection transformations) proof. – Zhanxiong Mar 19 '24 at 00:02
@Zhanxiong yes, it is also easily proven by working out a sum/integral for discrete/continuous cases. $$\frac{\frac{1}{n^2}\sum_{x=1}^n \sum_{y=1}^n |x-y|}{\frac{1}{n}\sum_{x=1}^n x} = \dots = \frac{2}{3}$$ I did the indirect less straightforward approach on purpose, to make the algebraic steps more intuitive. It's like deriving $\sum_{k=1}^{n} k = k(k+1)/2$ which can be done with a geometric approach (like imagining a square of $k$ by $k+1$ and splitting it into two triangles). – Sextus Empiricus Mar 19 '24 at 07:32
Also, I didn't expect this post to get so popular. It's simple and I didn't create a full complete treatment. See my first version of the answer which was just a pen and paper solution and a hint that it relates to a triangle distribution. – Sextus Empiricus Mar 19 '24 at 07:36
Correction $\sum_{k=1}^{n} k = n(n+1)/2$ – Sextus Empiricus Mar 19 '24 at 07:49
The distance to the diagonal is the error divided by $\sqrt 2$, isn't it? If the pair is $(0, 1)$, then the error is $1$ but the distance to the diagonal is $\sqrt 2 / 2$. – Tanner Swett Mar 19 '24 at 14:19
@TannerSwett The diagonal distance has indeed a factor $\sqrt{2}$. I regarded only the difference in the horizontal component. I have changed it now into a different viewpoint that avoids the potential confusion (although the diagonal projection to the diagonal line has some nice aspect as it occurs in other problems; it is namely the residual vector of a computation of the mean; example: https://stats.stackexchange.com/a/365070/) – Sextus Empiricus Mar 19 '24 at 17:06

Zhanxiong · Answer 2 · 2024-03-19T01:18:49.443

Below is how I interpreted and confirmed your empirical study.

The experiment can be described as follows:

Draw two independent random variables $X$ (representing "actual") and $Y$ (representing "monkey prediction") from the uniform discrete distribution $\{1, 2, \ldots, c\}$. In your case $c = 1000$.
Repeat this for $n$ times. So you will collect data $\{(X_1, Y_1), (X_2, Y_2), \ldots, (X_n, Y_n)\}$.

Your conjecture is that \begin{align*} MAE_n = \frac{\sum_{i = 1}^n|X_i - Y_i|}{\sum_{i = 1}^n X_i} \to \frac{2}{3} \end{align*} as $n \to \infty$ (in some convergence mode). I will show that $MAE_n \to 2/3$ with probability $1$.

To this end, by SLLN, with probability $1$: \begin{align*} & \frac{1}{n}\sum_{i = 1}^n|X_i - Y_i| \to E[|X - Y|], \\ & \frac{1}{n}\sum_{i = 1}^n X_i \to E[X]. \tag{1}\label{1} \end{align*}

It thus suffices to evaluate $E[X]$ and $E[|X - Y|]$ respectively, which are: \begin{align*} & E[X] = c^{-1}\sum_{k = 1}^c k = \frac{1 + c}{2}, \\ & E[|X - Y|] = c^{-2}\sum_{k = 1}^c \sum_{l = 1}^c |k - l| = \frac{c^2 - 1}{3c}. \tag{2}\label{2} \end{align*}

It then follows by $\eqref{1}$ and $\eqref{2}$ that \begin{align*} MAE_n \to \frac{\frac{c^2 - 1}{3c}}{\frac{1 + c}{2}} = \frac{2}{3}\times\frac{c - 1}{c} \text{ with probability } 1. \end{align*} Hence when $c = 1000$, the limit is of course very close to $\frac{2}{3}$.

Very nice. The only thing is that it's more MAPE than MAE, if you take error proportional to the actual. — Pawel, Mar 19 '24 at 19:33

What is an intuitive explanation for why my mean absolute error converges to 66.6%?

2 Answers2