How to apply numpy random.choice to a matrix of probability values (Vectorized solution)

Question

The problem I have is as follows

I have a 1-D list of integers (or np.array) with 3 values

l = [0,1,2]

I have a 2-D list of probabilities (for simplicity, we'll use two rows)

P = 
[[0.8, 0.1, 0.1],
 [0.3, 0.3, 0.4]]

What I want is numpy.random.choice(a=l, p=P), where each row in P (probability distribution) is applied to l. So, I want a random sample to be drawn from [0,1,2] with prob. dist. [0.8, 0.1, 0.1] first, then with prob. dist. [0.3, 0.3, 0.4] next, to give me two outputs.

===== Update ======

I can use for loops or list comprehension, but I am looking for a fast/vectorized solution.

See if this helps: http://stackoverflow.com/questions/3679694/a-weighted-version-of-random-choice — Prateek Dewan, Nov 07 '16 at 20:59

score 16 · Accepted Answer · answered Nov 07 '16 at 21:54

16

Here's one way.

Here's the array of probabilities:

In [161]: p
Out[161]: 
array([[ 0.8 ,  0.1 ,  0.1 ],
       [ 0.3 ,  0.3 ,  0.4 ],
       [ 0.25,  0.5 ,  0.25]])

c holds the cumulative distributions:

In [162]: c = p.cumsum(axis=1)

Generate a set of uniformly distributed samples...

In [163]: u = np.random.rand(len(c), 1)

...and then see where they "fit" in c:

In [164]: choices = (u < c).argmax(axis=1)

In [165]: choices
Out[165]: array([1, 2, 2])

answered Nov 07 '16 at 21:54

Warren Weckesser

102,583
19
173
194

Lovely thought there! – Divakar Nov 07 '16 at 21:57
Pretty neat! Thank you! – max_max_mir Nov 07 '16 at 23:15
As speed was part of the question, is argmax the right solution? Maybe searchsorted would make more sense? – graffe Dec 06 '16 at 19:22
1

In theory, `searchsorted` would make sense, but `searchsorted` doesn't have an `axis` argument to allow operating along the axis of a 2-d array, so you would have to write a loop in Python, and that is slow. But for *large* arrays, it might be faster than `argmax`. Give it a shot, and if it looks good, add another answer. – Warren Weckesser Dec 06 '16 at 19:27
if you are working with a pd.DataFrame, prefer using `choices = (u < c).idxmax(axis=1)` instead of `choices = (u < c).argmax(axis=1)` – jpetot Nov 24 '21 at 13:23

How to apply numpy random.choice to a matrix of probability values (Vectorized solution)

1 Answers1

Linked

Related