26

Given the following NumPy array,

> a = array([[1, 2, 3, 4, 5], [1, 2, 3, 4, 5],[1, 2, 3, 4, 5]])

it's simple enough to shuffle a single row,

> shuffle(a[0])
> a
array([[4, 2, 1, 3, 5],[1, 2, 3, 4, 5],[1, 2, 3, 4, 5]])

Is it possible to use indexing notation to shuffle each of the rows independently? Or do you have to iterate over the array. I had in mind something like,

> numpy.shuffle(a[:])
> a
array([[4, 2, 3, 5, 1],[3, 1, 4, 5, 2],[4, 2, 1, 3, 5]]) # Not the real output

though this clearly doesn't work.

Sebastian
  • 1,358
  • 1
  • 19
  • 28
lafras
  • 7,913
  • 4
  • 28
  • 28

2 Answers2

26

Vectorized solution with rand+argsort trick

We could generate unique indices along the specified axis and index into the the input array with advanced-indexing. To generate the unique indices, we would use random float generation + sort trick, thus giving us a vectorized solution. We would also generalize it to cover generic n-dim arrays and along generic axes with np.take_along_axis. The final implementation would look something like this -

def shuffle_along_axis(a, axis):
    idx = np.random.rand(*a.shape).argsort(axis=axis)
    return np.take_along_axis(a,idx,axis=axis)

Note that this shuffle won't be in-place and returns a shuffled copy.

Sample run -

In [33]: a
Out[33]: 
array([[18, 95, 45, 33],
       [40, 78, 31, 52],
       [75, 49, 42, 94]])

In [34]: shuffle_along_axis(a, axis=0)
Out[34]: 
array([[75, 78, 42, 94],
       [40, 49, 45, 52],
       [18, 95, 31, 33]])

In [35]: shuffle_along_axis(a, axis=1)
Out[35]: 
array([[45, 18, 33, 95],
       [31, 78, 52, 40],
       [42, 75, 94, 49]])
Community
  • 1
  • 1
Divakar
  • 212,295
  • 18
  • 231
  • 332
  • Interesting solution! However I made a quick experiment and it was way slower (on the order of 1000x) then the naiive solution below which repeatedly invokes rng.shuffle. Can anyone confirm this? Why is it so slow? – Nils Mar 24 '22 at 13:42
21

You have to call numpy.random.shuffle() several times because you are shuffling several sequences independently. numpy.random.shuffle() works on any mutable sequence and is not actually a ufunc. The shortest and most efficient code to shuffle all rows of a two-dimensional array a separately probably is

list(map(numpy.random.shuffle, a))

Some people prefer to write this as a list comprehension instead:

[numpy.random.shuffle(x) for x in a]
Sven Marnach
  • 530,615
  • 113
  • 910
  • 808
  • Thanks, simple and clean solution. – lafras Feb 21 '11 at 11:22
  • at least for python 3.5, numpy 1.10.2, this doesn't work, a remains unchanged. – drevicko Mar 16 '16 at 17:22
  • @drevicko: What dimension does your array have? This answer is for shuffling all rows of a two-dimensional array (and I'm sure it also works with your combination of Python and Numpy versions). – Sven Marnach Mar 16 '16 at 22:12
  • 1
    Aha! I see what happened: in Python 3.5, map is lazy, producing an iterator, and doesn't do the mapping until you iterate through it. If you do e.g.: `for _ in map(...): pass` it'll work. – drevicko Mar 21 '16 at 15:40
  • 1
    @drevicko That makes sense. It might be best to write that code as `for x in a: numpy.random.shuffle(x)` then. – Sven Marnach Mar 21 '16 at 15:57
  • I guess so.. You do get a view when you iterate over `a`, don't you? There's also a messy one-liner: `list(map(...))` if `a` isn't too big, but a for loop starts to look more attractive ;) – drevicko Mar 21 '16 at 16:09
  • @drevicko The for loop basically does the same as `map()`: it uses Python's iterator protocol to iterate over `a`. It calls `a.__iter__()` to retrieve an iterator for `a`, and then calls the `__next__()` method on that iterator until `StopIteration` is raised. In this particular case, with `a` being a two-dimensional Numpy array, the items returned by the `__next__()` method are indeed views for the respective rows. In case of a one-dimensional array, you'd simply get the values of the elements. – Sven Marnach Mar 21 '16 at 21:11
  • or `[*map(numpy.random.shuffle, a)]` to be simpler. – Frost-Lee May 26 '20 at 11:57