0

I have a DataFrame df:

  A       B        C
1 Red     Cyan     Web
2 Green   Magenta  Web
3 Blue    Yellow   Print

I want a vectorised method to create column D that would choose from columns A or B based on the value of C.

if row C == 'Web': row A

if row C == 'Print': row B

Output should be:

  A       B        C      D
1 Red     Cyan     Web    Red
2 Green   Magenta  Web    Green
3 Blue    Yellow   Print  Yellow

I can do this already with:

def find_colour(df):
    if df['C'] == 'Web':
        return df['A']
    else:
        return df['B']

but it is very slow on my dataset with millions of rows.

Is there a superior method using a vectorized method of pandas or numpy?

While Pandas conditional creation of a series/dataframe column is similar, the return value of that question is just static data and not based on a column of the df which does change the output and means it isn't the same.

Ryan Ward
  • 147
  • 9

0 Answers0