I think you're mixing up ideas. The best thing here is to know the foundations, and it then "implicit FEM" will be a trivial idea (which is why there won't be a book specifically about that).
Finite Element Methods are a form of spatial discretization where you use some kind of structured or unstructured mesh to re-write the problem as a system of equations using something like a Galarkan approximation. The common way is to take a basis of triangles and use that to get an equation at each node.
Time-Independent
If the problem was time-independent and linear, this will result in a linear system:
$$ Au = b$$
where $u$ is the vector of the values at each node of the triangle. You solve this linear system with any method for linear system, like direct solvers or indirect solvers. If this was a time-dependent problem that was nonlinear, then you have a system of implicit equations at each node like:
$$ Au = f(u)$$
or more generally (if you let $F(u)= Au -f(u)$ in the above example),
$$F(u)=0 $$
that you have to solve for some function $f$ (or $F$) for some vector $u$ which is the value at the nodes of the triangles. This you have to solve with a root finder method like Newton's method.
Time-Dependent
If you had started with a time-dependent problem, that at every node of the triangle you get an ordinary differential equation, i.e. instead of the above you have:
$$ u_t = f(u) $$
where $u_t = \frac{du}{dt}$ is useful shorthand. This is known as a system of ordinary differential equations because how everything changes in time can be inter-related. At this point, you have to make another choice: how do you discretize in time? Those give you either explicit timestepping methods, or implicit timestepping methods. They each have their pros/cons.
The explicit timestepping methods are like Euler's method. For example, that would be doing
$$ u_{n+1} = u_{n} + \Delta t f(u_{n}) $$
to go from time $t_n = t_0 + n\Delta t$ to $t_{n+1}$. This is simply saying "change $u$ by how much $F(u)$ is now", almost a direct translation of what the ODE says. This is the simplest approximation and can work in certain cases. However, if your problem is "stiff" (that's tough to define succinctly, I will just say "different parts of the problem move at different speeds"), this will fail to be "stable". Therefore you will use an implicit timestepping method. For example, this could be the Implicit Euler method:
$$ u_{n+1} = u_n + \Delta t f(u_{n+1}) $$
This is saying that the best guess for how $u$ changes in the time interval is to use $u_{n+1}$ (since the ODE itself doesn't have "timesteps" and takes place with infinitely small steps, they are essentially equivalent in the original formulation, but cause a difference in the approximation for non-infinitesimal $\Delta t$). In a reference on numerical ODEs, it will go into detail on how this will be "more stable. However, this is an implicit equation since $u_{n+1}$ is inside the function $F$, and so we can't just isolate the $u_{n+1}$ on the left hand side of the equation and get a solution for it in terms of $u_n$. Indeed, this is another way of getting an implicit equation since if you let
$$F(u_{n+1})=u_{n+1} - u_n - \Delta t f(u_{n+1})$$
then you'll notice that at each timestep we have to solve
$$ F(u_{n+1}) = 0 $$
(Note that at time $n$ we already know $u_n$ so it's a constant, which is why the only variable is $u_{n+1}$).
Conclusion
As you can see, time dependent or time independent, the problem boils down to solving $F(x)=0$ which is an implicit equation. If you had a time-dependent problem, you have to solve this at each timestep, whereas time-independent problems solve this once. But as you can see, it can arise for very different reasons. However, because all of these problems (and many more) boil down to solving such a simple equation $F(x)=0$, there has been a lot of research into methods for solving this problem. These "root finding" methods (or more generally, this falls under optimization) include Newton, Broyden, trust region, etc. methods, are their own course of study.
That should help you find the right reference. Either the knowledge you're missing is in FEM discretizations, methods for numerical ODEs, or in the root finding methods ("Newton methods") used to solve the resulting equation.