I've struggeled with a problem and finally found a solution that works. That is, I found two. My question is what differs in the two approaces? I followed the instructions here: Get column name based on condition in pandas
In the first soulution with .dot I do not understand why the result is giving the same value to all cells.. When i removed the patient column, the results are correct. I want all values in the same cell.
I also tried the second solution with mapping in hope of not having to remove any columns. It worked, but here the values are given in a list (of lists?)
What is the difference in the two different formats, what can I do with them?
Data and code:
import numpy as np
import pandas as pd
df= pd.DataFrame( {'patient' : [11,12,13,14,15],
'K1' : [1,0,1,0,1],
'K2' : [0,0,0,1,0],
'K3' : [1,0,0,0,0],
'K4' : [0,0,0,0,1],
'K5' : [1,1,0,0,0] })
print(df)
#with 'patient' column and without
df2 = df.dot(df.columns + ';').str.rstrip(';')
print(df2)
#with 'patient' column
df_dict = dict(
list(
df.groupby(df.index)
)
)
for k, v in df_dict.items():
check =v.columns[(v==1).any()]
if len(check) > 0:
print((k, check.to_list()))
Result:
Solution 1
patient K1 K2 K3 K4 K5
0 11 1 0 1 0 1
1 12 0 0 0 0 1
2 13 1 0 0 0 0
3 14 0 1 0 0 0
4 15 1 0 0 1 0
0 patient;patient;patient;patient;patient;patien...
1 patient;patient;patient;patient;patient;patien...
2 patient;patient;patient;patient;patient;patien...
3 patient;patient;patient;patient;patient;patien...
4 patient;patient;patient;patient;patient;patien...
dtype: object
>>>
>>> Solution 1 without patient column:
0 K1;K3;K5
1 K5
2 K1
3 K2
4 K1;K4
dtype: object
>>>
Solution 2
(0, ['K1', 'K3', 'K5'])
(1, ['K5'])
(2, ['K1'])
(3, ['K2'])
(4, ['K1', 'K4'])