47

When I run this code:

import numpy as np
a = np.array([1, 2, 3, 4, 5, 6])
print(np.where(a > 2))

it would be natural to get an array of indices where a > 2, i.e. [2, 3, 4, 5], but instead we get:

(array([2, 3, 4, 5], dtype=int64),)

i.e. a tuple with empty second member.

Then, to get the the "natural" answer of numpy.where, we have to do:

np.where(a > 2)[0]

What's the point in this tuple? In which situation is it useful?

Note: I'm speaking here only about the use case numpy.where(cond) and not numpy.where(cond, x, y) that also exists (see documentation).

jpp
  • 147,904
  • 31
  • 244
  • 302
Basj
  • 36,818
  • 81
  • 313
  • 561
  • Note: I already read https://stackoverflow.com/questions/5642457/how-does-python-numpy-where-work but it doesn't explain this part. – Basj Jun 01 '18 at 14:53
  • `(3,)` is a single element tuple. There's no 'second element', empty or not. – hpaulj Jun 01 '18 at 15:29
  • Yes @hpaulj, but if `w = ([13, 14, 15], )`, then we still have to do `w[0][1]` to get 14 and not `w[1]`, that's why I meant: the `[0]` is to retrieve the *first element of the tuple*. – Basj Jun 01 '18 at 15:31
  • `np.where` is uses `np.nonzero`, which is a little more explicit about returning a tuple, one array per dimension. It doesn't special case 1d. – hpaulj Jun 01 '18 at 15:35
  • Originally I thought that this allows tuple indexing with e.g. `a[np.where(a > 2)]`, but actually that works for 1-d case with or without the 1-element tuple. – wim Jun 01 '18 at 15:45

3 Answers3

30

numpy.where returns a tuple because each element of the tuple refers to a dimension.

Consider this example in 2 dimensions:

a = np.array([[1, 2, 3, 4, 5, 6],
              [-2, 1, 2, 3, 4, 5]])

print(np.where(a > 2))

(array([0, 0, 0, 0, 1, 1, 1], dtype=int64),
 array([2, 3, 4, 5, 3, 4, 5], dtype=int64))

As you can see, the first element of the tuple refers to the first dimension of relevant elements; the second element refers to the second dimension.

This is a convention numpy often uses. You will see it also when you ask for the shape of an array, i.e. the shape of a 1-dimensional array will return a tuple with 1 element:

a = np.array([[1, 2, 3, 4, 5, 6],
              [-2, 1, 2, 3, 4, 5]])

print(a.shape, a.ndim)  # (2, 6) 2

b = np.array([1, 2, 3, 4, 5, 6])

print(b.shape, b.ndim)  # (6,) 1
jpp
  • 147,904
  • 31
  • 244
  • 302
8

From the documentation of np.where

If only condition is given, return the tuple condition.nonzero(), the indices where condition is True

So we look into the documentation of 'np.nonzero'

Returns a tuple of arrays, one for each dimension of a, containing the indices of the non-zero elements in that dimension. The values in a are always tested and returned in row-major, C-style order. The corresponding non-zero values can be obtained with:

So how can this be useful for np.where/np.nonzero return a tuple of arrays? I think it is related to indexing multi-dimensional arrays.

From the example of the documentation if we have

y = np.arange(35).reshape(5,7)

We can do

y[np.array([0,2,4]), np.array([0,1,2])]

to select y[0, 0], y[2, 1], y[4, 2].

In this case, if the index arrays have a matching shape, and there is an index array for each dimension of the array being indexed, the resultant array has the same shape as the index arrays, and the values correspond to the index set for each position in the index arrays. In this example, the first index value is 0 for both index arrays, and thus the first value of the resultant array is y[0,0]. The next value is y[2,1], and the last is y[4,2].

Hope that indexing multi-dimensional arrays would justify that np.nonzero/np.where return a tuple of arrays such that it can be used to select elements later on.

Tai
  • 7,124
  • 3
  • 25
  • 44
  • 2
    For the 1-d case, the fancy indexing works regardless of whether the return value is packed inside a 1-element tuple or not. – wim Jun 01 '18 at 16:33
  • @wim that's true! – Tai Jun 01 '18 at 17:53
  • 3
    The text from the documentation that you quoted seems to have been removed, there is no reference to tuples at all. – vcovo Nov 01 '20 at 18:42
  • @vcovo Indeed, this looks like a "documentation regression". Looks like the information has been removed in favor of that note to use `nonzero` directly. The behavior still seems to exists, but now it is documented less clearly :( – bluenote10 Dec 29 '21 at 15:34
5

For consistency: the length of the tuple matches the number of dimensions of the input array.

>>> np.where(np.ones((1)) > 0)
(array([0]),)
>>> np.where(np.ones((1,1)) > 0)
(array([0]), array([0]))
>>> np.where(np.ones((1,1,1)) > 0)
(array([0]), array([0]), array([0]))

Making the 1-d case return an array instead of a tuple would cause inhomogeneous return types. If the caller code is dealing with input data of arbitrary shape, then the programmer would have to special-case handling for 1-d inputs in the return value.

wim
  • 302,178
  • 90
  • 548
  • 690