Commonly people do not plot the coefficient of $Z_1$ against $X$ because $Z_1$ is just going to be a constant so this plot will not tell you much. If you want to show an association between your instrument and the endogenous variable you should consider the first stage F-statistic (which in a 1-endogenous variable, 1-instrument case is just the t-statistic on the instrument in the first stage) or the pairwise correlation between the instrument and the endogenous variable. For visual inspection you could do a scatter plot of the instrument against the endogenous variable.
For the second part of your question I have written a similar answer here so with your permission I will copy this.
If you have an endogenous variable $D$ (which in your case would be $X_1$, I just call it differently), an outcome $Y$, an instrument $Z$ and other exogenous variables $X$, the problem of the regression
$$Y_i = \alpha + \delta D_i + X'_i \beta + \epsilon_i$$
is that $\text{Cov}(D_i,\epsilon_i) \neq 0$ which may be due to omitted variable bias, measurment error, simultaneity, or a mix of those. For example, let's say that $\epsilon_i = e_i + W_i$, where $e_i$ is a stochastic error and $W_i$ is the omitted variable which is such that $\text{Cov}(D_i,W_i)\neq 0$ which is the source of the endogeneity problem. Say you then estimate your first stage with the instrument,
$$D_i = a + \pi Z_i + X'_i \gamma + \eta_i$$
where $\eta_i = u_i + W_i$ and $u_i$ again is a stochastic error term. Remember that the regression splits the variation in the dependent into an explained and an unexplained part, i.e.
$$\text{total sum of squares} = \text{explained sum of squares} + \text{residual sum of squares}$$
When you predict $\widehat{D}_i = a + \pi Z_i + X'_i \gamma$, this separates the "good" variation of $D_i$ (that's the variation explained by variables that you know are exogenous which are $Z$ and the $X$) from the "bad" variation in $\eta_i$. In the second stage,
$$Y_i = \alpha + \delta \widehat{D}_i + X'_i \beta + \epsilon_i$$
you still have $W_i$ in the error term but this time it's not a problem because $\text{Cov}(\widehat{D}_i,W_i) = 0$ and this solves your endogeneity problem, allowing for consistent estimation of $\delta$.