It is easiest understood when considering solving the linear problem,
$$
Ax = b
$$
where $b$ and $A$ are the problem data, and $x$ the parameters we are trying to estimate. In practice you have errors in $b$ which propagate through $A$. How?
Assume we have only errors in the measurements, $b$, and denote $\delta b$ and $\delta x$ the errors in the measurements and estimatation, respectively.
Because of linearity,
$$
\delta b = A \delta x
$$
In order to see how the measurements errors are magnified by the matrix $A$, you can calculate,
$$
\frac{||\delta x ||}{||x||}/\frac{||\delta b ||}{||b||}
$$
We have that this number is bounded by the condition number of $A$,
$$
cond(A) = \frac{\lambda_{1}}{\lambda_{n}}
$$
where $\lambda_{1}$ and $\lambda_{n}$ are the biggest and smallest eigenvalue of $A$, resp. Hence, the bigger the condition number, the higher the magnification of errors.
Here, a low condition number corresponds to directions where the gradient is small, which leads to oscillations and slow convergence.
This issue has motivated a lot of research for the optimization of neural networks (as you already point out), which has led to the development of techniques like momentum (see On the importance of initialization and momentum in deep learning) and early stopping. This blog entry provides a very nice description of this topic.