In pandas dataframe find duplicates based on a column and keep only one but with some changes in the other column

Asked Jul 06 '21 at 18:10

Active Jul 06 '21 at 20:47

Viewed 34 times

Suppose this is my dataframe:

In [1]: df = pd.DataFrame(
   ...:     {
   ...:         "A": ["A0", "A1", "A0", "A2", "A3", "A2"],
   ...:         "B": ["B0", "B1", "B4", "B2", "B3", "B5"],
   ...:     }  
   ...: )

Out[1]: 
    A    B   
1  A0   B0  
2  A1   B1  
3  A0   B4 
3  A2   B2
4  A3   B3
5  A2   B5

I want to clean this Dataframe based on column A, i.e. keep only one row based on column 'A' and in column 'B' the values should get appended. That is, I want my output to look like this:

Out[2]: 
    A       B   
1  A0   B0 B4  
2  A1      B1  
3  A2   B2 B5  
3  A3      B3

The first thing that came to my mind was using Dataframe.duplicated(), but I couldn't figure out how.

edited Jul 06 '21 at 20:47

Laura Elvira Hernández Lara

asked Jul 06 '21 at 18:10

kiran pradeep

2

[groupby aggregate](https://pandas.pydata.org/docs/reference/api/pandas.core.groupby.DataFrameGroupBy.aggregate.html) -> `df.groupby('A', as_index=False).agg({'B': ' '.join})` – Henry Ecker Jul 06 '21 at 18:11
Also [How to combine multiple rows into a single row with pandas](https://stackoverflow.com/q/36392735/15497888) – Henry Ecker Jul 06 '21 at 18:18
1

Yeah that does answer my question, though I found your answer much simpler (me being a novice). Thankyou so much! – kiran pradeep Jul 06 '21 at 18:29

In pandas dataframe find duplicates based on a column and keep only one but with some changes in the other column

0 Answers0