For the sake of notation, let's suppose that $f: \mathbb{R}^{n} \rightarrow \mathbb{R}^{n}$ (i.e., it's a vector-valued function that takes a vector as input and outputs a vector of the same size). There are two concerns: computational cost and numerical accuracy.
Calculating the derivative $\mathrm{D}f(x)$ (the Jacobian matrix, $J(x)$, or $(\nabla f(x))^{T}$, or whatever you prefer) using finite differences is going to require $n$ function evaluations. If you could calculate the derivative using floating point arithmetic directly from the definition, you would have to calculate the difference quotient
\begin{align}
\mathrm{D}f(x)e_{i} = \lim_{\varepsilon \rightarrow 0} \frac{f(x + \varepsilon e_{i}) - f(x)}{\varepsilon}
\end{align}
for each $i = 1, \ldots, n$, assuming you don't do any sort of "smart finite differencing" (like Curtis-Powell-Reid) because you know (or can detect) the sparsity pattern of $\mathrm{D}f$. If $n$ is large, that could be a lot of function evaluations. If you have an analytical expression for $\mathrm{D}f$, then calculating it could be cheaper. Automatic (also known as algorithmic) differentiation methods can also be used in some cases to calculate $\mathrm{D}f$ at roughly 3 to 5 times the cost of a function evaluation.
There are also numerical concerns. Obviously, on a computer, we can't take the limit of a scalar as it goes to zero, so when we approximate $\mathrm{D}f$, we're really picking $\varepsilon$ to be "small" and calculating
\begin{align}
\mathrm{D}f(x)e_{i} \approx \frac{f(x + \varepsilon e_{i}) - f(x)}{\varepsilon},
\end{align}
where $\approx$ means it's an approximation, and we hope it's a really good approximation. Calculating this approximation in floating point arithmetic is tough because if you pick $\varepsilon$ too large, your approximation could be bad, but if you pick $\varepsilon$ too small, there could be significant rounding error. These effects are covered in the Wikipedia article on numerical differentiation in superficial detail; more detailed references can be found within the article.
If the error in the Jacobian matrix $\mathrm{D}f$ isn't too large, Newton-Raphson iterations will converge. For a detailed theoretical analysis, see Chapter 25 of Accuracy and Stability of Numerical Algorithms by Nick Higham, or the paper by Françoise Tisseur on which it is based.
Libraries generally take care of these algorithmic details for you, and usually, library implementations of the Newton-Raphson algorithm (or variants thereof) will converge quite nicely, but every so often, there will be a problem that causes some trouble due to the drawbacks above. In the scalar case $(n = 1)$, I'd use Brent's method, owing to its robustness and good convergence rate in practice.