Pandas split column into multiple columns by comma

Question

I am trying to split a column into multiple columns based on comma/space separation.

My dataframe currently looks like

     KEYS                                                  1
0   FIT-4270                                          4000.0439
1   FIT-4269                                          4000.0420, 4000.0471
2   FIT-4268                                          4000.0419
3   FIT-4266                                          4000.0499
4   FIT-4265                                          4000.0490, 4000.0499, 4000.0500, 4000.0504,

I would like

   KEYS                                                  1           2            3        4 
0   FIT-4270                                          4000.0439
1   FIT-4269                                          4000.0420  4000.0471
2   FIT-4268                                          4000.0419
3   FIT-4266                                          4000.0499
4   FIT-4265                                          4000.0490  4000.0499  4000.0500  4000.0504

My code currently removes The KEYS column and I'm not sure why. Could anyone improve or help fix the issue?

v = dfcleancsv[1]

#splits the columns by spaces into new columns but removes KEYS?

dfcleancsv = dfcleancsv[1].str.split(' ').apply(Series, 1)

You want `pd.concat([df[[0]], df[1].str.split(', ', expand=True)], axis=1)` IIUC — EdChum, Jun 02 '16 at 20:11
I know you are a pominent figure with lots of reputation on here, but if you make it an official answer ill accept. It worked great — Anekdotin, Jun 02 '16 at 22:12
@Eddwinn EdChum does this. Make sure you up-vote his comment at least. — piRSquared, Jun 02 '16 at 23:13
@Eddwinn Also, post his answer as your own giving EdChum the credit. You won't get rep for the answer when you select it as such in 2 days, which is good. If others up vote the answer after that c'est la vie, EdChum knew what he was doing. — piRSquared, Jun 02 '16 at 23:19
I didn't post an answer as it was late and I couldn't confirm whether this is what you wanted, also sometimes I'm pretty busy — EdChum, Jun 03 '16 at 08:19

Anthony R · Accepted Answer · 2019-12-04T18:02:02.877

62

In case someone else wants to split a single column (deliminated by a value) into multiple columns - try this:

series.str.split(',', expand=True)

This answered the question I came here looking for.

Credit to EdChum's code that includes adding the split columns back to the dataframe.

pd.concat([df[[0]], df[1].str.split(', ', expand=True)], axis=1)

Note: The first argument df[[0]] is DataFrame.

The second argument df[1].str.split is the series that you want to split.

split Documentation

concat Documentation

edited Dec 04 '19 at 18:02

answered Oct 20 '17 at 19:50

Anthony R

2,359
1
11
10

While splitting, I got 350 columns, many of them are blank, is there any dynamic way of handling the split? – SUSHMA KUMARI Oct 14 '20 at 10:26

score 12 · Answer 2 · answered Jun 02 '16 at 23:33

12

Using Edchums answer of

pd.concat([df[[0]], df[1].str.split(', ', expand=True)], axis=1)

I was able to solve it by substituting my variables.

dfcleancsv = pd.concat([dfcleancsv['KEYS'], dfcleancsv[1].str.split(', ', expand=True)], axis=1)

answered Jun 02 '16 at 23:33

Anekdotin

1,386
3
17
39

Paul Rougieux · Answer 3 · 2021-04-16T18:16:40.303

The OP had a variable number of output columns. In the particular case of a fixed number of output columns another elegant solution to name the resulting columns is to use a multiple assignation.

Load a sample dataset and reshape it to long format to obtain a variable called organ_dimension.

import seaborn
iris = seaborn.load_dataset('iris')
df = iris.melt(id_vars='species', var_name='organ_dimension', value_name='value')

Split the organ_dimension variable in 2 variables organ and dimension based on the _ separator.

df[['organ', 'dimension']] = df['organ_dimension'].str.split('_', expand=True)
df.head()

Out[10]: 
  species organ_dimension  value  organ dimension
0  setosa    sepal_length    5.1  sepal    length
1  setosa    sepal_length    4.9  sepal    length
2  setosa    sepal_length    4.7  sepal    length
3  setosa    sepal_length    4.6  sepal    length
4  setosa    sepal_length    5.0  sepal    length

Based on this answer "How to split a column into two columns?"

In case warning "FutureWarning: Columnar iteration over characters will be deprecated in future releases.. Use df[['organ', 'dimension']] = df['organ_dimension'].str.split('_', expand=True) — Mark K, Feb 02 '21 at 08:53
@MarkK thanks I updated the answer to use the assignation you suggested. — Paul Rougieux, Apr 16 '21 at 18:17

Haeyoon J. · Answer 4 · 2022-05-01T21:28:41.637

4

The simplest way to use is, vectorization

df = df.apply(lambda x:pd.Series(x))

edited May 01 '22 at 21:28

answered Jan 22 '20 at 15:11

Haeyoon J.

112
1
8

score 2 · Answer 5 · answered Jun 05 '16 at 02:42

2

maybe this should work:

df = pd.concat([df['KEYS'],df[1].apply(pd.Series)],axis=1)

answered Jun 05 '16 at 02:42

Siraj S.

3,051
3
27
42

score 0 · Answer 6 · edited May 30 '20 at 19:50

Check this out

Responder_id    LanguagesWorkedWith
0   1   HTML/CSS;Java;JavaScript;Python
1   2   C++;HTML/CSS;Python
2   3   HTML/CSS
3   4   C;C++;C#;Python;SQL
4   5   C++;HTML/CSS;Java;JavaScript;Python;SQL;VBA
... ... ...
87564   88182   HTML/CSS;Java;JavaScript
87565   88212   HTML/CSS;JavaScript;Python
87566   88282   Bash/Shell/PowerShell;Go;HTML/CSS;JavaScript;W...
87567   88377   HTML/CSS;JavaScript;Other(s):
87568   88863   Bash/Shell/PowerShell;HTML/CSS;Java;JavaScript...`
###Split the LanguagesWorkedWith column into  multiple columns  by using` data= data1['LanguagesWorkedWith'].str.split(';').apply(pd.Series)`.###
` data1 = pd.read_csv('data.csv', sep=',')
data1.set_index('Responder_id',inplace=True)
data1
data1.loc[1,:]
data= data1['LanguagesWorkedWith'].str.split(';').apply(pd.Series)
data.head()`

score -1 · Answer 7 · answered Jun 15 '21 at 05:24

You may also want to try datar, a package ports dplyr, tidyr and related R packages to python:

>>> df
         i       j              A
  <object> <int64>       <object>
0       AR       5    Paris,Green
1      For       3  Moscow,Yellow
2      For       4  NewYork,Black
>>> from datar import f
>>> from datar.tidyr import separate
>>> separate(df, f.A, ['City', 'Color'])
         i       j     City    Color
  <object> <int64> <object> <object>
0       AR       5    Paris    Green
1      For       3   Moscow   Yellow
2      For       4  NewYork    Black

Pandas split column into multiple columns by comma

7 Answers7

Linked

Related