1

My understanding is that rejection sampling for some target distribution $p_{X}(x)$ and proposal distribution $\tilde p_{\tilde X} (x)$ follows the process below:

  1. If there is some scaler $c$ such that $c \times \tilde p_{\tilde X} ( x) > p(x)$ for all $x$ then proceed to step 1; otherwise, rejection sampling based on $\tilde p_{\tilde X}$ will not work
  2. Sample $\tilde X_j \sim \tilde p_{\tilde X} (\tilde x)$
  3. Keep $X_i=\tilde X_j$ with probability $\frac{p_{ X}(\tilde X_j)}{c \tilde p_X (\tilde X_j)}$
  4. Discard $\tilde X_j$ if it was not kept, and return to step 1 as desired

where our proposal distribution is easier to sample from then the target distribution. But we cannot do step 2 without knowing $p_{ X}(\tilde X_j)$. So if ultimately we are computing $p_{ X}(\tilde X_j)$ anyway, so why can't we just sample from the target pdf and forget all the proposal distribution stuff?

I believe I'm having some confusion between "sampling" vs pdf at a value calculation.

Ator
  • 33

1 Answers1

0

I think there are two answers to this:

  1. If $X$ is has more dimensions then it's not straight forward to sample even if you can easily calculate $p_X(x)$

  2. You don't actually need $p_X(\tilde X_j)$. What you need is $c\cdot p_X(\tilde X_j)$, which means something that is proportional to $p_X$ is sufficient and Bayesian statistics produces those (almost) every time you multiply prior and likelihood, which is kind of the core idea of Bayesian statistics. Here's an explanation: https://stats.stackexchange.com/a/444845/341520

Lukas Lohse
  • 2,482