Add a new columns to a dataframe by transposing rows of another dataframe

Question

I need to add new columns to a dataframe by transposing rows of another dataframe.

Here is my DF1:

ID1  Value1  Value2
11   xxx      yyy
12   aaa      bbb

Here is DF2:

Index ID2   KEY     VALUE
1     123   ID1     11
2     123   FOLDER  folder1
3     123   FILE    file1
4     456   ID1     12
5     456   FOLDER  folder2
6     456   FILE    file2
7     456   FOLDER  folder3
8     456   FILE    file3

Note that ID2 456 has duplicate KEY's.

I want to ignore all duplicate keys and pick the values for last KEY's (with highest value of Index).

What I want:

Look for the KEY = ID1 in DF2.
Transpose all the remaining keys with identical ID2

Output:

ID1  Value1  Value2  FOLDER   FILE
11   xxx      yyy    folder1  file1
12   aaa      bbb    folder3  file3

richardec · Accepted Answer · 2022-03-22T18:00:28.840

2

You can first do a pivot, then a merge:

new_df = df1.merge(df2.drop_duplicates(keep='last', subset=['ID2', 'KEY']).pivot(index='ID2', columns='KEY', values='VALUE').astype({'ID2': int}))

Output:

>>> new_df
   ID1 Value1 Value2   FILE   FOLDER
0   11    xxx    yyy  file1  folder1
1   12    aaa    bbb  file3  folder3

edited Mar 22 '22 at 18:00

answered Mar 22 '22 at 16:33

richardec

14,202
6
23
49

Thank you @Richardec for the answer. But, I just noticed that my DF2 has duplicate KEY's. Meaning, I have 2 values of FOLDERNAME for ID2 = 456 in above example. Due to this, I'm getting `ValueError: Index contains duplicate entries, cannot reshape` error. How can I take last value of the KEY? – ShitalSavekar Mar 22 '22 at 17:28
1

Will you please provide a sample of such a case in the question? Just so that it's easier for me to understand. – richardec Mar 22 '22 at 17:31
1

Updated the question. – ShitalSavekar Mar 22 '22 at 17:36
@Shital done. Check now :) – richardec Mar 22 '22 at 17:40
The duplicate key error is gone but now I'm getting this error - `KeyError: "Only a column name can be used for the key in a dtype mappings argument. 'ID1' not found in columns."` – ShitalSavekar Mar 22 '22 at 17:58
Weird. Try now. – richardec Mar 22 '22 at 18:00
1

Sorry, My bad. There was problem in my dataframe. Your solution works perfectly. Accepting as right answer. – ShitalSavekar Mar 22 '22 at 18:30

Add a new columns to a dataframe by transposing rows of another dataframe

1 Answers1