I have a DataFrame df:
A B C
1 Red Cyan Web
2 Green Magenta Web
3 Blue Yellow Print
I want a vectorised method to create column D that would choose from columns A or B based on the value of C.
if row C == 'Web': row A
if row C == 'Print': row B
Output should be:
A B C D
1 Red Cyan Web Red
2 Green Magenta Web Green
3 Blue Yellow Print Yellow
I can do this already with:
def find_colour(df):
if df['C'] == 'Web':
return df['A']
else:
return df['B']
but it is very slow on my dataset with millions of rows.
Is there a superior method using a vectorized method of pandas or numpy?
While Pandas conditional creation of a series/dataframe column is similar, the return value of that question is just static data and not based on a column of the df which does change the output and means it isn't the same.