Metropolis Algorithm for approximating integrals

Question

I've been trying to approximate $\int_a^b f(x)\hspace{1pt} dx$ with the weight function $W(x)=A(e^{x^2}-1)$, $A$ is a constant of normalization (when $x\in[0,1]$ then $A=2.16145...$). I don't have any problems when the integration limits are $x \in [0,1]$, but the general case when $x \in [a,b]$ is cumbersome.

This is the procedure I followed for the $x \in [0,1]$ case:

Choose a number $x_0$ in $[0,1]$
Now set $x_t$ (trial step) to be $x_t=x_0 +$ Random[{-h,h}]
Then set r to be: $r=\frac{W(x_t)}{W(x_0)}$ and compare it to some random number $\Omega_i \in \{0,1\}$
If $r\geq\Omega_i$ then $x_t$ is accepted (i.e. $x_0=x_t$), but if $r<\Omega_i$ then $x_t$ is set to be $x_t= x_0$.
Repeat discarding the $m$ first steps. (That is, to remove any dependence on the starting point, the algorithm is run for some large number of steps which are then discarded).
Now calculate the integral $\int_a^b f(x)\hspace{1pt} dx$ as:

$\int_0^1 f(x)\hspace{1pt} dx=\int_0^1 \frac{f(x)}{W(x)}\hspace{1pt}W(x)dx\approx \frac{1}{(N-m)} \sum\limits_{i=1}^{N-m} \frac{f(x_i)}{W(x_i)}$.

I run the program for little $N$ to fixate h such as the acceptance ratio of $x_t$ is $\approx 50$% and then I re-run for a large $N$ with this $h$.

With $W(x)=A(e^{x^2}-1)$, $f(x)=x^2$, $N=1\times 10^6$, $m=1\times 10^5$ and $h=0.8$ (which gives an acceptance ratio of $50.8$%), I get $\int_0^1 x^2 dx=0.333305$, which is close enough to the actual value $\frac{1}{3}$;

What do I have to do in order for accounting the general integral limits? Do you have any suggestion? Do you suggest any book or paper I can read? Thank you in advance for your time.

P.S. English is not my first language.

Xi'an · Answer 1 · 2020-08-27T13:10:00.423

3

The same exact procedure (which qualifies as importance sampling implemented via Metropolis-Hastings) applies to any integration range $(a,b)$.

Select an importance probability density $w(\cdot)$ over $(a,b)$ [for instance, $w(x)\propto e^{x^2}-1$]
Simulate a Markov chain $(X_t)_t$ with stationary distribution $w(\cdot)$
Approximate the integral $\int_a^b f(x)\text{d}x$ by the empirical average$$\frac{1}{T}\sum_{t=1}^T f(x_t)/w(x_t)$$or, if the normalising constant is unknown,$$\sum_{t=1}^T f(x_t)/w(x_t)\Bigg/\sum_{t=1}^T 1/w(x_t)(b-a)$$

The bottom approximation is based on the identity \begin{align*}1&=\int_a^b \frac{1}{b-a}\text{d}x\\&=\int_a^b \frac{1}{b-a}\frac{w(x)}{w(x)}\text{d}x\\&=\int_a^b \frac{A}{(b-a)w(x)}\frac{w(x)}{A}\text{d}x\\&=\mathbb{E}\left[\frac{A}{(b-a)w(X)}\right]\end{align*} (It is a safe version of the harmonic mean estimator.)

As for suggesting textbooks on MCMC integrations, there are many to pick from

(avoiding citing my own books!)

edited Aug 27 '20 at 13:10

answered Aug 26 '20 at 07:19

Xi'an

105,342

Thank you very much for your answer. Well, I did two mistakes. One, is that I inverted wrong the weight function. This is $W(x)=W(\beta(b-a)+a)$ with $\beta \in [0,1]$
Second, and for this one I am not sure why, but you must calculate $\int_a^b f(x)dx$ as $\int_a^b f(x)dx\approx \sum\limits_i \frac{f(\beta_i(b-a)+a)}{W(\beta_i(b-a)+a)}$ and not: $\int_a^b f(x)dx\approx(b-a)\sum\limits_i \frac{f(\beta_i(b-a)+a)}{W(\beta_i(b-a)+a)}$
– Leinad Aug 26 '20 at 14:57
1

Silly question, but what's the difference between f(x) and w(x) in your write up? I gather w(x) is proportional to the function OP supplied, without the normalizing constant. (And so f(x) is the actual function with the constant...?) – jbuddy_13 Aug 26 '20 at 16:39
1

@jbuddy_13 f(x) is the function I wanted to integrate. W(x) is the PDF I used in the Metropolis Algorithm. While it's true one doesn't need it to do the reject/accept $x_t$ step (because of the division involved $\frac{W(x_t)}{W(x_0)})$, one indeed needs the normalization constant $A$ to obtain the correct integral. – Leinad Aug 26 '20 at 16:52
1

@jbuddy_13: $f$ is the integrand and a given, while $w$ is the density of the importance function, a probability density up to a constant. – Xi'an Aug 27 '20 at 05:07
1

@Leinad: Rescaling $(a,b)$ to $(0,1)$ to use the original algorithm is a correct solution, but it requires recalibrating the bandwidth $h$. Your "mistake" does not relate to the original question so should be either included as an edit to the original question or in another question. – Xi'an Aug 27 '20 at 05:10
1

Naïve question: how does convergence enter the calculation? The MC could remain near a particular “accepted” value, and thus not explore all of the $[a, b]$? – Single Malt Aug 27 '20 at 06:58
1

@SingleMalt: it is not a matter that can be answered in a comment. You should check one of the textbooks cited above. – Xi'an Aug 27 '20 at 13:10
@Xi'an Thank you again for your answer and your time. Yes, indeed one must recalibrate h in order to get a 50% acceptance ratio as in the $[0,1]$ case. With $f(x)=x^3$ , $h=0.44$, $[a=0,b=2]$, $N=3\times 10^4$, $m=3\times 10^3$, I get $\int_0^2 x^3 dx=3.991\pm0.0091$. – Leinad Aug 27 '20 at 15:03

jbuddy_13 · Answer 2 · 2020-08-26T14:12:47.620

As Xi'an pointed out, MH algorithm is appropriate for this task. One neat part about MH is that the normalizing constant isn't required; algorithm starts with an initial position, which we'll call current. A disturbance is observed, ie you increase by +/- x (you can tune this amount, I've used 0.25 below.) And so a new variable called movement is current + disturbance. Now, we evaluate if the move is reasonable. In MH, we look at the likelihood ratio of observations, using the PDF. For your purposes, this is simply the y-value associated with your function. After comparing these values we'll get a number in the range [0,inf). Now, we observe a random event bounded by [0,1]. If the event is <= to the ratio, we accept the movement, otherwise, we remain at the current position. After some predefined number of iterations, we return the samples.

Here is some python code to illustrate.

def func(x,a,b):
    if (a<x<b):
        return np.exp((x**2)-1)
    else:
        return 0
def areaMCMC(a,b,hops=10_000):
    current = np.random.uniform(a,b)
    samples = []
    for i in range(hops):
        samples.append(current)
        disturbance = np.random.uniform(-0.25,0.25)
        movement = current + disturbance
    current_lik = func(x=current,a=a,b=b)
    movement_lik = func(x=movement,a=a,b=b)
    ratio = movement_lik/current_lik

    event = np.random.uniform(0,1)
    if event &lt;= ratio:
        current = movement
return samples


samples = areaMCMC(a=0,b=3)
import matplotlib,pyplot as plt
plt.hist(samples,density=True)

sum(samples)/len(samples)
>>>
2.7379448212941697

The sum we've just observed is the approximate area of your function in the range (0,3). Note, if you want this range to be inclusive, not exclusive, adjust func() to use <= rather than <. However, at a high enough number of iterations, the effects of the difference should be moot.

Lastly, and as touched on earlier, adjust 0.25 disturbance as necessary. When sampling (0,3) 0.25 is probably appropriate. When sampling (-1000,1000), you probably want bigger disturbances to adequately explore the function.

Edit1: For your case, it might not be true that you can disregard the normalizing constant. When approximating probability distributions, we already know that the distribution integrates to 1 and since we're only using the PDF for likelihood of a current position relative to another, we don't need the normalizing constant; it doesn't affect shape, only scale. But in your case, the area does not sum to 1. Nonetheless, if you multiply the output of this code by the normalizing constant, you'll scale the estimated area correctly.

Edit2: To account for integration limits, as you've asked, you need to return a y-value of 0 for any x-value supplied outside of this range. In other words, if the probability distribution is not defined here, the likelihood returned by the PDF will be 0.

Hope this helps!

Thank you very much. I did make some mistakes, but I now got the correct answer in both cases. By the way, your program is neater than mine (I wrote it in Mathematica). — Leinad, Aug 26 '20 at 15:53
@Leinad, ah I see; I've never used Mathematica. (Are you a student/researcher?) I'm curious, how is your normalizing constant computed? I gather it's dependent on a and b, but no idea beyond that — jbuddy_13, Aug 26 '20 at 16:35
"Are you a student/researcher?". It's complicated. "how is your normalizing constant computed" Well, is easy $A=\frac{1}{\int_a^b (e^(x^2)-1) dx}$, which means I have to compute A by means of another method different from Metropolis-Monte Carlo. I used NIntegrate (a Mathematica build-in function that gives the result with the precision I needed) to obtain A. — Leinad, Aug 26 '20 at 16:45

Metropolis Algorithm for approximating integrals

2 Answers2