The reason why people prefer to use the first estimate, in my opinion, is that the first one arises naturally from the Galerkin orthogonality of the FEM, interpolation approximation property, and most importantly the coercivity of the bilinear form(for Poisson equation's boundary value problem, it is equivalent with the Poincaré/Friedrichs inequality for $H^1_0$ functions):
$$
\begin{aligned}
\|u - u_h\|^2_{H^1(\Omega)} &\leq c_1 \| \nabla (u - u_h) \|^2_{L^2(\Omega)}
\\
\| \nabla (u - u_h) \|^2_{L^2(\Omega)} &= \int_{\Omega} \nabla(u- u_h)\cdot \nabla(u- u_h)
\\
&= \int_{\Omega} \nabla(u- u_h)\cdot \nabla(u- \mathcal{I}u)
\\
&\leq \| \nabla (u - u_h) \|_{L^2(\Omega)} \| \nabla (u - \mathcal{I}u) \|_{L^2(\Omega)}
\\
\Rightarrow \| \nabla (u - u_h) \|_{L^2(\Omega)} &\leq \| \nabla (u - \mathcal{I}u) \|_{L^2(\Omega)} \leq c_2 h\| u\|_{H^2(\Omega)}
\end{aligned}
$$
where $c_1$ depends on the constant in the Poincaré/Friedrichs inequality for $H^1_0$ functions, $\mathcal{I}u$ is the interpolation of $u$ in the finite element space, and $c_2$ depends on the minimum angles of the mesh.
While the elliptic regularity estimate $\|u \|_{H^2(\Omega)}\leq c\|f\|_{L^2(\Omega)}$ is solely on the PDE level, has nothing to do with the approximation, plus above argument holds even when $f\in H^{-1}$ is a distribution.
Now move on to the reason why a posteriori error estimates are widely used, is mainly because:
It is computable, there is no generic constant in the expression of the estimates.
The estimator has its local form, which could be the local error indicator using in the adaptive mesh refining procedure. Therefore, the problem with singularities or really "bad" geometries could be dealt with.
Both of the a priori type estimates you listed are valid, they provide us the information of the orders of convergence, however none of them could be a local error indicator just for one triangle/tetrahedron, because neither of them are computable due to the constant, nor are them defined locally.
EDIT: For more of a general view of the FEM for elliptic PDEs, I highly recommend reading Chapter 0 in Brenner and Scott's book:The Mathematical Theory of Finite Element Methods, which consists only 20 pages and covers briefly almost every aspect of finite element methods, from the Galerkin formulation from the PDE, to the motivation why we would like to use adaptive FEM to tackle some problem. Hope this would help you more.