I'm trying to understand autodiff better, and specifically the connection between autodiff and dual numbers, and why dual numbers are needed in the first place.
The pytorch help pages about autodiff [1][2], for example, does not mention dual numbers at all. The wikipedia page, as well as other sources, suggest that it is only implemented in forward-mode automatic differentiation.
My question is - why do we need dual numbers at all?
My intuition is that it is just an elegant way to store both the function value and it's derivative. But I think this can be done with any data-structure where for each operation you store the function evaluation and it's derivative (based on elementary operations/rules like the product rule, quotient rule, etc, and on primitive derivative like polynomials, exponents, etc.).
I'm failing to see the actual benefit of the dual-number representation.