10

I have an np.where problem using Pandas that is driving me crazy and I can't seem to solve through Google, the documentation, etc.

I'm hoping someone has insight. I'm sure it isn't complex.

I have a df where I'm checking the value in one column - and if that value is 'n/a' (as a string, not as in .isnull()), changing it to another value.

Full_Names_Test_2['MarketCap'] == 'n/a'

returns:

70      True
88     False
90      True
145     True
156     True
181     True
191     True
200     True
219     True
223    False
Name: MarketCap, dtype: bool

so that part works.

but this:

Full_Names_Test_2['NewColumn'] = np.where(Full_Names_Test_2['MarketCap'] == 'n/a', 7)

returns:

ValueError: either both or neither of x and y should be given

What is going on?

DSM
  • 319,184
  • 61
  • 566
  • 472
Windstorm1981
  • 2,394
  • 5
  • 27
  • 50

1 Answers1

18

You need to pass the boolean mask and the (two) values columns:

np.where(Full_Names_Test_2['MarketCap'] == 'n/a', 7)
# should be
np.where(Full_Names_Test_2['MarketCap'] == 'n/a', Full_Names_Test_2['MarketCap'], 7)

See the np.where docs.

or alternatively use the where Series method:

Full_Names_Test_2['MarketCap'].where(Full_Names_Test_2['MarketCap'] == 'n/a', 7)
Andy Hayden
  • 328,850
  • 93
  • 598
  • 514
  • I'm such an idiot. I think I didn't grasp the basic syntax of the np.where method. Now I see clearly. thanks again! – Windstorm1981 Oct 21 '15 at 18:42
  • 2
    @Windstorm1981 fwiw, I think the docs on this method & this error message could be a LOT clearer. It's not obvious (enough, IMO) that the 2nd argument is required. – szeitlin Jan 27 '16 at 12:20
  • This error also arises if you put `x` and `y` in an iterable (i.e. `[x,y]`) even though the Docstring contains `numpy.where(condition, [x, y])` – johnDanger Jun 16 '20 at 18:17