Merge multiple column values into one column in python pandas

Question

I have a pandas data frame like this:

   Column1  Column2  Column3  Column4  Column5
 0    a        1        2        3        4
 1    a        3        4        5
 2    b        6        7        8
 3    c        7        7

What I want to do now is getting a new dataframe containing Column1 and a new columnA. This columnA should contain all values from columns 2 -(to) n (where n is the number of columns from Column2 to the end of the row) like this:

  Column1  ColumnA
0   a      1,2,3,4
1   a      3,4,5
2   b      6,7,8
3   c      7,7

How could I best approach this issue? Any advice would be helpful. Thanks in advance!

EdChum · Accepted Answer · 2019-07-26T19:09:38.707

133

You can call apply pass axis=1 to apply row-wise, then convert the dtype to str and join:

In [153]:
df['ColumnA'] = df[df.columns[1:]].apply(
    lambda x: ','.join(x.dropna().astype(str)),
    axis=1
)
df

Out[153]:
  Column1  Column2  Column3  Column4  Column5  ColumnA
0       a        1        2        3        4  1,2,3,4
1       a        3        4        5      NaN    3,4,5
2       b        6        7        8      NaN    6,7,8
3       c        7        7      NaN      NaN      7,7

Here I call dropna to get rid of the NaN, however we need to cast again to int so we don't end up with floats as str.

edited Jul 26 '19 at 19:09

answered Oct 13 '15 at 09:05

EdChum

339,461
188
752
538

For some reason this doesnt work for me. I get duplicates. Therefore row 0 columnA is 1,2,3,4,1,2,3,4 – Sade Feb 09 '21 at 14:57
It seems like using iloc works for me. Theres no duplicates. df['ColumnA'] = df.iloc[:,source_col_loc+1:source_col_loc+4].apply( lambda x: ",".join(x.astype(str)), axis=1) – Sade Feb 09 '21 at 15:08
A value is trying to be set on a copy of a slice from a DataFrame. Try using .loc[row_indexer,col_indexer] = value instead – Kaustuv Mar 21 '21 at 14:02

score 15 · Answer 2 · edited Sep 12 '19 at 08:36

15

I propose to use .assign

df2 = df.assign(ColumnA = df.Column2.astype(str) + ', ' + \
  df.Column3.astype(str) + ', ' df.Column4.astype(str) + ', ' \
  df.Column4.astype(str) + ', ' df.Column5.astype(str))

it's simple, maybe long but it worked for me

edited Sep 12 '19 at 08:36

Derlin

9,003
2
25
47

answered Apr 12 '18 at 08:27

Amin Salgado

151
1
3

Also, if you are doing it for tonnes of data, it is much faster than lambda – Amin Salgado Apr 12 '18 at 08:30

Om Prakash · Answer 3 · 2018-12-14T06:45:28.960

If you have lot of columns say - 1000 columns in dataframe and you want to merge few columns based on particular column name e.g. -Column2 in question and arbitrary no. of columns after that column (e.g. here 3 columns after 'Column2 inclusive of Column2 as OP asked).

We can get position of column using .get_loc() - as answered here

source_col_loc = df.columns.get_loc('Column2') # column position starts from 0

df['ColumnA'] = df.iloc[:,source_col_loc+1:source_col_loc+4].apply(
    lambda x: ",".join(x.astype(str)), axis=1)

df

Column1  Column2  Column3  Column4  Column5  ColumnA
0       a        1        2        3        4  1,2,3,4
1       a        3        4        5      NaN    3,4,5
2       b        6        7        8      NaN    6,7,8
3       c        7        7      NaN      NaN      7,7

To remove NaN, use .dropna() or .fillna()

Hope it helps!

Merge multiple column values into one column in python pandas

3 Answers3

Linked

Related