If you understand how the 2 sample $t$-test works, the 2 sample $z$-test works the same way. We generally cover this material in introductory stats classes ('stats 101') by starting with the 1 sample $z$-test, where the population SD is known a-priori, and then move to the 1 sample $t$-test, where the SD is estimated from the data, and then to the 2 sample $t$-test, because that's usually the easiest way to learn the ideas. It's also possible to have a 2 sample $z$-test, if the population SD is known a-priori, but since we never really know the population SD a-priori, this is skipped and we just talk about the 1 sample test as part of a series of stepping stones to get people to the point where they can understand the 2 sample $t$-test.
With binomial data (number of heads out of a known total number of coin flips), the SD is a function of the proportion. What that implies is that if you can assume the normal approximation of the binomial sampling distribution holds (by somewhat arbitrary convention, people say this is OK so long as there are at least $5$ of the less commonly occurring outcome), then testing proportions means you can use the $z$-test. But really, the underlying ideas and machinery are the same as for the 2 sample $t$-test.
Regarding the jump from a 1 sample test to a 2 sample test, the extra complexity is that you now have two numbers. So which should you use? The insight is that you get back to a single number via the magic of subtraction. By subtracting the mean of group 2 from the mean of group 1, or subtracting the proportion in group 2 from the proportion in group 1, you get a single number: the mean difference, or the difference in proportions. You can then test that proportion by figuring out the sampling distribution of that difference. To do so, you need to assume a distributional form (e.g., the $t$-distribution, or the normal [i.e., $z$] distribution), and you need to calculate the standard error. So the denominator of the formula you were given for the test statistic is the calculation that gets you the standard error. Moreover, you are assuming a normal distribution for the sampling distribution. And that's it, you're done. As I said, if you understand the 2 sample $t$-test, this is really the same.
Regarding a figure, you can use the one you already have. That shows a normal distribution for the sampling distribution of the difference under the null, and another normal distribution, shifted some amount to the side, for the sampling distribution of the difference under some possible alternative hypothesis. The shaded regions would work the same way.