I have a pandas data frame as shown below,
df = pd.DataFrame({'group_1':['A','A','A', 'B', 'B','B','C','C','C','B'],
'group_2':['a', 'b', 'a', 'b', 'a', 'b','a','c','b','c'],
'col_1':np.random.randint(0,100,10),
'col_2':np.random.randint(0,100,10),
'col_3':np.random.randint(0,100,10)},
)
df
group_1 group_2 col_1 col_2 col_3
A a 37 79 83
A b 3 36 0
A a 4 11 55
B b 95 22 47
B a 44 66 78
B b 77 55 36
C a 28 94 35
C c 30 70 15
C b 35 5 33
B c 98 81 68
Now I'm adding another column whose values depend upon the values from group_2. In the original dataframe I'm working with, I'm adding this column by merging this dataframe with another dataframe.
My dataframe now becomes
df
group_1 group_2 col_1 col_2 col_3 col_to_pick
A a 37 79 83 col_1
A b 3 36 0 col_2
A a 4 11 55 col_1
B b 95 22 47 col_2
B a 44 66 78 col_1
B b 77 55 36 col_2
C a 28 94 35 col_1
C c 30 70 15 col_3
C b 35 5 33 col_2
B c 98 81 68 col_3
For the purpose of reproducing above dataframe, col_to_pick can be added using
df['col_to_pick'] = np.where(df['group_2'] == 'a', 'col_1',np.where(df['group_2']=='b','col_2','col_3'))
Now I want to add a another column col_4 to my dataframe that takes numbers from col_1 or col_2 or col_3 depending upon col_to_pick value of the corresponding row. For a small dataframe like this, I can acheive this using something like
df['col_4'] = np.where(df['col_to_pick'] == 'col_1', df['col_1'],np.where(df['col_to_pick']=='col_2',df['col_2'],df['col_3']))
and my final dataframe becomes,
df
group_1 group_2 col_1 col_2 col_3 col_to_pick col_4
A a 37 79 83 col_1 37
A b 3 36 0 col_2 36
A a 4 11 55 col_1 4
B b 95 22 47 col_2 22
B a 44 66 78 col_1 44
B b 77 55 36 col_2 55
C a 28 94 35 col_1 28
C c 30 70 15 col_3 15
C b 35 5 33 col_2 5
B c 98 81 68 col_3 68
However, the dataframe I'm working with is very large and it has multiple columns like col_1, col_2 etc. So, methods like np.where are too difficult to implement. How can get col_4 values effeciently for large dataframe with large number of columns?
Edit: The reason I have added group_1 in example is, once I have got my col_4, I want to apply
df['col_4'] = df.groupby(['group_1','group_2'])['col_4'].transform('sum')