My understanding is that rejection sampling for some target distribution $p_{X}(x)$ and proposal distribution $\tilde p_{\tilde X} (x)$ follows the process below:
- If there is some scaler $c$ such that $c \times \tilde p_{\tilde X} ( x) > p(x)$ for all $x$ then proceed to step 1; otherwise, rejection sampling based on $\tilde p_{\tilde X}$ will not work
- Sample $\tilde X_j \sim \tilde p_{\tilde X} (\tilde x)$
- Keep $X_i=\tilde X_j$ with probability $\frac{p_{ X}(\tilde X_j)}{c \tilde p_X (\tilde X_j)}$
- Discard $\tilde X_j$ if it was not kept, and return to step 1 as desired
where our proposal distribution is easier to sample from then the target distribution. But we cannot do step 2 without knowing $p_{ X}(\tilde X_j)$. So if ultimately we are computing $p_{ X}(\tilde X_j)$ anyway, so why can't we just sample from the target pdf and forget all the proposal distribution stuff?
I believe I'm having some confusion between "sampling" vs pdf at a value calculation.