How to randomly choose 1 sample for each category in numpy array (integer encoded)

Question

I've used integer encoding to represent categories in a numpy array. However, I can't figure out how to take 1 random sample for each category and return the index value.

For instance, I have an array like:

np.array([2, 2, 2, 2, 2, 2, 2, 1, 2, 2, 1, 1, 1, 0, 1, 0, 0, 2, 2, 1])

How would I randomly sample a 0, 1 and 2 and return the index of each of the samples?

Divakar · Answer 1 · 2019-07-25T10:22:58.107

Vectorized one for generic number of integer labels -

# https://stackoverflow.com/a/51915131/ @Divakar
def random_num_per_grp(L):
    # For each element in L pick a random number within range specified by it
    r1 = np.random.rand(np.sum(L)) + np.repeat(np.arange(len(L)),L)
    offset = np.r_[0,np.cumsum(L[:-1])]
    return r1.argsort()[offset] - offset

# a is input array
sidx = a.argsort()
c = np.bincount(a)
out = sidx[random_num_per_grp(c) + np.r_[0,c[:-1].cumsum()]]

To simplify things for our case, we can skip the offsetting at the last part of random_num_per_grp. Hence, it would be - return r1.argsort()[offset] and then to get out, it would be - sidx[random_num_per_grp(c)].

For negative labels, simply offset by the minimum.

score 1 · Accepted Answer · answered Jul 25 '19 at 09:42

You can use np.where and np.random.choice()

x = np.array([2, 2, 2, 2, 2, 2, 2, 1, 2, 2, 1, 1, 1, 0, 1, 0, 0, 2, 2, 1])
ind0 = np.random.choice(np.where(x==0)[0])
ind1 = np.random.choice(np.where(x==1)[0])
ind2 = np.random.choice(np.where(x==2)[0])

As np.where returns a tuple, with an array in it, hence to access the array you will need to access the 0 index of the tuple.

score 1 · Answer 3 · answered Jul 25 '19 at 09:46

import numpy as np

array = np.array([2, 2, 2, 2, 2, 2, 2, 1, 2, 2, 1, 1, 1, 0, 1, 0, 0, 2, 2, 1])
categories = [0,1,2]
for category in categories:
    indices = np.where(array  == category)[0]
    random_choice = array[np.random.choice(indices)]

1) Get the indices of numbers, where your condition is true (category) 2) Make a random choice from those indices

score 1 · Answer 4 · answered Jul 25 '19 at 09:47

1

If you do not know the categories in advance, you can do the following:

import numpy as np
from numpy.random import choice

a = np.array([2, 2, 2, 2, 2, 2, 2, 1, 2, 2, 1, 1, 1, 0, 1, 0, 0, 2, 2, 1])

samples = [choice(np.argwhere(a==s).flatten()) for s in np.unique(a)]

answered Jul 25 '19 at 09:47

Boris W

66
3

Best solution due to being a single line and universal – zacha2 Jul 25 '19 at 09:49

score 0 · Answer 5 · answered Jul 25 '19 at 10:05

First, you need know what the categories are. set(iterable) is a convenient method.

Then, np.where can tell all the indexes of a certain object in the array.

Finally, chooseone randomly from the indexes for each category.

import random
import numpy as np

def random_index_each(array):
    def random_index(item):
        return (item, random.choice(np.where(array == item)[0]))
    return dict(map(random_index, set(array)))

if __name__ == '__main__':
    array = np.array([2, 2, 2, 2, 2, 2, 2, 1, 2, 2, 1, 1, 1, 0, 1, 0, 0, 2, 2, 1])
    for _ in range(4):
        print(random_index_each(array))

Output:

{0: 16, 1: 12, 2: 8}
{0: 15, 1: 14, 2: 6}
{0: 15, 1: 19, 2: 6}
{0: 15, 1: 11, 2: 2}

If you do not care about from which category the index gets picked, you can use a list to restore the result. Or let it exist in form of an iterable object.

How to randomly choose 1 sample for each category in numpy array (integer encoded)

5 Answers5