How to split a string in a column within a pandas dataframe?

Question

This is an example of the file I have,

Name     Att1     Att2     Att3
AB_EN    1        2        3
CD       5        6        7
FG_EN    7        8        9

So, in the column 'Name', where '_EN' is present, I want to remove the '_EN' part. The output should be as follows:

Name     Att1     Att2     Att3
AB       1        2        3
CD       5        6        7
FG       7        8        9

This is what I was trying:

name = df['Name']

for entry in name:
    if "_EN" in entry:
       entry = entry.split('_')[0]

However, this is not working. What is a good way to do this?

Possible duplicate of [Replacing few values in a pandas dataframe column with another value](https://stackoverflow.com/questions/27060098/replacing-few-values-in-a-pandas-dataframe-column-with-another-value) — PV8, Oct 10 '19 at 11:50
several duplicates are around: https://stackoverflow.com/questions/58303305/replacing-few-values-in-a-column-based-on-a-list-in-python — PV8, Oct 10 '19 at 11:50
https://stackoverflow.com/questions/27060098/replacing-few-values-in-a-pandas-dataframe-column-with-another-value — PV8, Oct 10 '19 at 11:50

score 1 · Accepted Answer · answered Oct 10 '19 at 11:39

1

Use str.split

Ex:

df = pd.DataFrame({"Name": ["AB_EN", "CD", "FG_EN"]})
df['Name'] = df['Name'].str.split("_").str[0]
print(df)

Output:

  Name
0   AB
1   CD
2   FG

answered Oct 10 '19 at 11:39

Rakesh

score 0 · Answer 2 · answered Oct 10 '19 at 11:40

0

In your case that would be

df['Name']=(df.Name
      .str.split('_')  # split on _
      .str[0] # Only keep first part of the split
    )

answered Oct 10 '19 at 11:40

Ivo Merchiers

score 0 · Answer 3 · answered Oct 10 '19 at 11:47

0

This should work for you:

df['Name'] = [name.split('_')[0] for name in df['Name']]

You just have to make the changes in place to the series in your dataframe.

answered Oct 10 '19 at 11:47

Harsh Patel

3 Answers3