Choose list variable given probability of each variable

Question

I've been trying to code a program that uses the softmax activation function in the middle.

Right now, I have a list of probabilities like this:

P[0.10,0.25,0.60,0.05]

The sum of all the variables in P is always 1.

I wanted a way to pick the index of the list given the probability attached to it. Or, in other words, a function that returned

0 - 10% of the time
1 - 25% of the time
2 - 60% of the time
3 - 5% of the time

I've absolutely no idea where to start on this. Any help would be appreciated. :)

What you are looking for is a **weighted** random generation, and while it isn't built-in, there are [a lot of standard recipes for this](http://www.google.ca/search?q=python+weighted+random). — Karl Knechtel, Dec 14 '10 at 08:41
Actually, starting from Python 3.6 there is `random.choices` (note the 's' at the end) which allows submitting relative weights. — Nick stands with Ukraine, Aug 03 '20 at 03:44

score 62 · Answer 1 · edited Dec 30 '20 at 17:56

62

You can easily achieve this with numpy. It has a choice function which accepts the parameter of probabilities.

np.random.choice(
  ['pooh', 'rabbit', 'piglet', 'Christopher'], 
  5,
  p=[0.5, 0.1, 0.1, 0.3]
)

edited Dec 30 '20 at 17:56

a.t.

1,411
1
15
49

answered Nov 12 '16 at 11:22

Salvador Dali

199,541
138
677
738

Concise, although I think `numpy` is overkill here, especially if the script had no dependencies beyond the standard library – slezica May 16 '20 at 23:29

score 15 · Answer 2 · answered Dec 14 '10 at 18:21

Basically, make a cumulative probability distribution (CDF) array. Basically, the value of the CDF for a given index is equal to the sum of all values in P equal to or less than that index. Then you generate a random number between 0 and 1 and do a binary search (or linear search if you want). Here's some simple code for it.

from bisect import bisect
from random import random

P = [0.10,0.25,0.60,0.05]

cdf = [P[0]]
for i in xrange(1, len(P)):
    cdf.append(cdf[-1] + P[i])

random_ind = bisect(cdf,random())

of course you can generate a bunch of random indices with something like

rs = [bisect(cdf, random()) for i in xrange(20)]

yielding

[2, 2, 3, 2, 2, 1, 2, 2, 2, 1, 2, 1, 2, 1, 2, 1, 2, 2, 2, 2]

(results will, and should vary). Of course, binary search is rather unnecessary for so few of possible indices, but definitely recommended for distributions with more possible indices.

score 12 · Answer 3 · answered Dec 14 '10 at 08:39

12

Hmm interesting, how about...

Generate a number between 0 and 1.
Walk the list substracting the probability of each item from your number.
Pick the item that, after substraction, took your number down to 0 or below.

That's simple, O(n) and should work :)

answered Dec 14 '10 at 08:39

slezica

69,920
24
96
160

It will help if the probabilities are pre-sorted in a descending way - the iteration is likely to terminate faster. – Nick stands with Ukraine Aug 04 '20 at 00:02

score 6 · Answer 4 · answered May 16 '14 at 20:28

This problem is equivalent to sampling from a categorical distribution. This distribution is commonly conflated with the multinomial distribution which models the result of multiple samples from a categorical distribution.

In numpy, it is easy to sample from the multinomial distribution using numpy.random.multinomial, but a specific categorical version of this does not exist. However, it can be accomplished by sampling from the multinomial distribution with a single trial and then returning the non-zero element in the output.

import numpy as np
pvals = [0.10,0.25,0.60,0.05]
ind = np.where(np.random.multinomial(1,pvals))[0][0]

I think using argmax() instead of where()[0][0] is simpler and does the same. — Hawk, Oct 16 '18 at 19:09

sje397 · Answer 5 · 2012-09-27T12:57:02.940

4

import random

probs = [0.1, 0.25, 0.6, 0.05]
r = random.random()
index = 0
while(r >= 0 and index < len(probs)):
  r -= probs[index]
  index += 1
print index - 1

edited Sep 27 '12 at 12:57

answered Dec 14 '10 at 08:39

sje397

40,327
7
84
103

Haha and here I thought ~2 seconds before you posted that I was being original – slezica Dec 14 '10 at 08:41
This always takes O(n) time where n is the len(probs). Can we do better? – Sush Aug 28 '19 at 20:31
@Sush Yes: we can sort the probs and do a binary search. This would reduce to O(log n). – Nick stands with Ukraine Aug 04 '20 at 00:04

Choose list variable given probability of each variable

5 Answers5

Linked