1

I have implemented the Markov Chain Monte Carlo (MCMC) with the Metropolis-Hastings sample selection criteria.

Basically, as I understand it and as I have implemented it is:

 n = number of accepted samples
 current_x = 0.5

 while i < n:
     new_x = random()
     new_f = CDF(new_x)
     current_f = CDF(current_x)
     a = abs(new_f / current_f)
     transition_prob = min(1, a)

     if random() <= transition_prob:  

         sampled_x[i] = new_x
         sampled_data[i] = new_f
         curr_x = new_x
         i += 1

acceptance_rate = i / trials 

Using n=20 and sampling a N(0,1) CDF, I get this picture, which looks fine, with an acceptance rate of 51% (I know I should be sampling millions to get an accurate Monte Carlo, but for the sake of the example let's use n=20)

![enter image description here

I would like to ask the following:

  • Is my understanding of the MCMC implementation correct?
  • In the sampling I have rejected 49% of actual function evaluations. Those rejected evaluations could have provided me a bigger insight of the objective function. Is that so?
  • Is MCMC better than MC for "mapping" a very complicated black box function, given that I would reject many costly function evaluations?

I hope my questions are clear enough.

EDIT

In my conceptual code I call CDF to a function that given a probability x it returns the value of a "real life" variable in "real life" units.

For instance, if my CDF was to represent the electric consumption of a house over the year, I would call CDF_house(x=0.1) and it would return 0.4 kWh.

I construct the CDF by sorting the real life measurements and linking the resulting array (P(x)) with another array of equal number of elements ranging from 0 to 1 (x). I build an interpolation function with the two arrays and I call it CDF, but probably it is an inverse CDF as I have been told in the answers below.

  • Are you asking us to check your code? In general that would be seen as off topic (we're not a code review site) $:$ 2. MCMC is not universally better than ordinary simulation.
  • – Glen_b Jul 24 '16 at 10:24
  • 1
    @Glen_b even if it is about the code then still I'd say that there are statistical issues that are core of the question rather the coding issues, so I'd say that this is on-topic. – Tim Jul 24 '16 at 10:25
  • @Tim The question could be asked in terms of a plain algorithm, though, which would not strike the problem that the on topic page seems to exclude it as it stands. Indeed, I'd like to encourage the OP to take steps to avoid the risk of closure because it I think it would otherwise be a good question – Glen_b Jul 24 '16 at 10:29
  • I don't think the code here is particularly obtuse so don't think the question would greatly benefit from changing it to pseudo-code, though annotation with comments might be a good idea. – Silverfish Jul 24 '16 at 11:35
  • Indeed the code is just illustrative, I am asking about conceptual issues. – Santi Peñate-Vera Jul 25 '16 at 09:24
  • Your CDF is the quantile function, hence the inverse of the actual cumulative distribution function. – dv_bn Jul 25 '16 at 11:54