This question has been open for a long time, but I think it still deserves to be answered.
The fundamental problem with the use of Krylov-space solvers on individual blocks as inner preconditioners is that they are not linear operators. To understand this, let's denote by $\tilde x = K(A,P,\tau,N; b)$ the vector you get as a solution by running a Krylov space method $K$ on the linear system $Ax=b$ for at most $N$ iterations or until a tolerance $\tau$ is reached, using a preconditioner $P\approx A^{-1}$. In other words, you can think of $K$ as an operator that acts on $b$.
Now note that $K(A,P,0,\infty;\cdot)$ is a linear operator: it would require solving $Ax=b$ exactly, i.e., $K(A,P,0,\infty;b)=A^{-1}b$, which is linear in $b$. In many cases, running a Krylov space method for exactly one iteration starting from a zero vector is also a linear operator applied to $b$. But because the sequence of Krylov vectors depends on the starting residual $r^{(0)}=b-Ax^{(0)}$, the operator $K(A,P,\tau,N; \cdot)$ is in general not a linear operator for finite $N$ and $\tau$.
What this means is that if you use $K(A,P,\tau,N; \cdot)$ as part of a preconditioner for a linear system in which $A$ is one block, then you end up with a preconditioner that does not act as a linear operator.
This is in contrast to many other methods that are used to precondition: for example, one SSOR step is a linear operation on the vector to which you apply it, as are all other methods that apply one step of a fixed point iteration.
The fundamental problem now is that most Krylov space methods do require that the preconditioner is a linear operator. They will simply not converge if the preconditioner is not linear, explaining your observation. On the other hand, there are variations of some Krylov space methods -- typically prefixed by the word "Flexible", such as F-GMRES in "Flexible GMRES" -- that work around this and that can deal with preconditioners that are not linear operators. These flexible variants of the original methods will still converge, and are often powerful methods when coupled with good (but nonlinear) preconditioners.