0

By simple math, we can have $$ E_P[f(X)] = \sum_X f(x)p(x) = \sum_X f(x)\frac{p(x)}{q(x)}q(x) = E_Q[f(X)\frac{P(X)}{Q(X)}], $$ which can be approximated by Monte Carlo sampling in two ways.

1. Normal (ordinary) importance sampling: Let $X_1, \cdots, X_n$ be i.i.d. samples from $Q(X)$, and $\mu_1$ the resulted estimator. $$ \mu_1 = \frac{1}{n} \sum^n f(X_i)\frac{P(X_i)}{Q(X_i)} $$ Throughout the rest of this post, we also denote $w(x) = p(x)/q(x)$ as the importance weight. Therefore, the above estimator can also be written as $$ \mu_1 = \frac{1}{n} \sum^n w(X_i)f(X_i) $$ One can prove that $\mu_1$ is an unbiased estimator by showing $$ E_Q[\mu_1] = \frac{1}{n} \sum^n E_Q[f(X_i)\frac{P(X_i)}{Q(X_i)}] = \frac{1}{n} \sum^n E_P[f(X)] = E_P[f(X)] $$

2. Weighted importance sampling: However, sometimes we do not know exactly $P(X)$ up to the normalizing constant, i.e. we know $g(x)$ instead, s.t. $p(x) = g(x)/Z$. This is more useful in the continuous case as $Z$ will be the integral constant, but we here take the discrete case for example. The importance weight is now revised to $w'(x) = g(x)/q(x)$, and consequently so will be the estimator $$ \mu_2 = \frac{\frac{1}{n} \sum^n w'(X_i)f(X_i)}{\frac{1}{n} \sum^n w'(X_i)} $$ One can show this estimator is consistent by showing that $$ \begin{split} \lim_{n \to \infty} \frac{\frac{1}{n} \sum^n w'(X_i)f(X_i)}{\frac{1}{n} \sum^n w'(X_i)} & = \frac{\lim_{n \to \infty} \frac{1}{n} \sum^n w'(X_i)f(X_i)} {\lim_{n \to \infty} \frac{1}{n} \sum^n w'(X_i)} \\ & = \frac{\sum_X g(x)/q(x) f(x) q(x)} {\sum_X g(x)/q(x) q(x)} \\ & = \frac{ZE_P[f(X)]}{Z} = E_P[f(X)] \end{split} $$ And a fun fact is that $\frac{1}{n} \sum^n w'(X_i)$ is actually also an unbiased estimator of the normalizing constant $Z$, i.e. an integral estimate.

Nevertheless, can anyone show that $\mu_2$ is a biased estimator (as it is claimed to be a trade-off between bias and variance), since I cannot find any formal proof and failed to prove it on my own?


Some references:

  1. The original paper on weighted importance sampling, where I did not find the proof showing the estimator is unbiased.
  2. A possibly related answer by Xi'an, but seems he is jumping over a few steps.

1 Answers1

2

Sorry if I was "jumping over a few steps", the argument seems simple enough to me: if $N$ and $D$ are (not necessarily independent) random variables such that $\mathbb E[N]=\nu$ and $\mathbb E[D]=\delta$ then $$\mathbb E[1/D]\ne \frac1\delta\quad\text{and}\quad\mathbb E[N/D]\ne\frac\nu\delta$$ except in highly specialised circumstances. For once thing, the lhs may be undefined as, e.g., when $D$ is a Normal variate (a fact linked with the Cauchy variate having no mean). For another, assuming $D>0$, the inequality is a consequence of Jensen's inequality: since the mapping $\iota: x\mapsto 1/x$ is strictly convex, $\mathbb E[\iota(D)]>\iota(\mathbb E[D])$.

For a mathematical study of the self- (or auto-) normalised importance sampling estimator, see Agapiou et al. (2014).

Xi'an
  • 105,342
  • Hi Professor, glad you saw this post :-) I can understand the Jensen's inequality in your original post and here. However, my question is : N and D in our case are weighted samples and corresponding weights respectively, so they are actually one (jointly distributed) random variable (up to different evaluation functions). Therefore, instead of E[N/D], our case is pretty much like E[f(X)/g(X)], where E[f(X)] = \nu and E[g(X)] = \delta, and the bias of WIS seems to be provably O(1/n) (I saw this somewhere else but unfortunately forgot the source). Could you please add some comments, thank you! – Fernando Zhu Mar 28 '24 at 08:58
  • Similarly$$\mathbb E[f(X)/g(X)]\ne\mathbb E[f(X)]\big/\mathbb E[g(X)]$$in most cases, hence the bias. This bias naturally decreases with the number $n$ of Monte Carlo simulations since the estimator is convergent. – Xi'an Mar 28 '24 at 09:08
  • Dear Professor, sorry to bother you again. I can understand that the expectation of the ratio does not necessarily equal to the ratio of the expectations. But I am really eager to figure out the gap (bias) in this special case of auto normalized importance sampling. [con't] – Fernando Zhu Mar 31 '24 at 11:52
  • [con't] So I read this [2014 paper] on my own and looked into the proof detail. The proof in appendix 6.3 for Theorem 2.1 shall resolve my confusion. However, I got stuck in inequalities in the first paragraph of page 44. I can understand what leads to it and what follows but just cannot get this very part: (1) in the second inequality why the second abs expectation becomes a probability, and (2) how to get the third inequality, it looks like something of the variance of the sample distribution? Could you please kindly give me some hint? – Fernando Zhu Mar 31 '24 at 11:54