Ok. Let's starting by addressing your question piecemeal. First, how is $q$, called the jumping distribution chosen? It's up to you, the model-er. A reasonable assumption, as always, would be a Gaussian, but this may change according to the problem at hand. The choice of the jumping distribution will change how you walk, of course, but it an arbitrary choice.
Now, the core of Metropolis-Hastings is the choice of $\alpha$. You can think of $\alpha$ as the way you control the sampling procedure. The main idea behind MCMC is that in order to estimation an unknown distribution, you 'walk around' the distribution such that the amount of time spent in each location is proportional to the height of the distribution. What $\alpha$ does is ask, 'compared to our previous location, how much higher/lower are we?' If we are higher, then the chance that we pick to move to the next point is higher, and if we are lower, then it's more likely that we stay where we are (this refers to Step 3 from the algorithm you reference). The precise functional form of $\alpha$ can be derived, fundamentally, it comes from the condition that we want our final distribution to be stationary.
Next, let's discuss your final question. Generally speaking, this notion goes beyond Metropolis-Hastings, you should google 'rejection sampling.' If you've heard of that, that's all this is. This is to ensure that you've fully explored the distribution, and don't get 'stuck' in one place.
Hopefully this has given you some greater intuition behind the algorithm. I do recommend spending some time delving into the math, my approach is very casual, focused on interpretability. Though the math can be intimidating, it's the best way to build intuition. Perhaps looking at a software implementation may help. As always, The Elements of Stat. Learning and Bishop are great references, and there are a plethora of online resources you could fine to further your understanding. cheers!