4

Currently I am using the following code to make replacements which is a little cumbersome:

df1['CompanyA'] = df1['CompanyA'].str.replace('.','')
df1['CompanyA'] = df1['CompanyA'].str.replace('-','')
df1['CompanyA'] = df1['CompanyA'].str.replace(',','')
df1['CompanyA'] = df1['CompanyA'].str.replace('ltd','limited')
df1['CompanyA'] = df1['CompanyA'].str.replace('&','and')
df1['Address1A'] = df1['Address1A'].str.replace('.','')
df1['Address1A'] = df1['Address1A'].str.replace('-','')
df1['Address1A'] = df1['Address1A'].str.replace('&','and')
df1['Address1A'].str.replace(r'\brd\b', 'road')
df1['Address2A'] = df1['Address2A'].str.replace('.','')
df1['Address2A'] = df1['Address2A'].str.replace('-','')
df1['Address2A'] = df1['Address2A'].str.replace('&','and')
df1['Address2A'].str.replace(r'\brd\b', 'road')

In order to make changing on the fly easier my ideal scenario would be something like:

df1['CompanyA'] = df1['CompanyA'].str.replace(('&','and'), ('.', ''), ('-','')....)
df1['Address1A'] = df1['Address1A'].str.replace(('&','and'), ('.', ''), ('-','')....)
df1['Address2A'] = df1['Address2A'].str.replace(('&','and'), ('.', ''), ('-','')....)

This is so I could just input/change what I wanted to replace for a particular column without having to adjust multiple lines of code.

Is this possible at all?

Georgy
  • 9,972
  • 7
  • 57
  • 66
Manesh Halai
  • 121
  • 1
  • 2
  • 9

5 Answers5

16

You can create a dictionary and pass it to the function replace() without needing to chain or name the function so many times.

replacers = {',':'','.':'','-':'','ltd':'limited'} #etc....
df1['CompanyA'] = df1['CompanyA'].replace(replacers)
Celius Stingher
  • 14,458
  • 5
  • 18
  • 47
  • 2
    I like this neat answer, also there is nothing wrong with a for loop, more readable and friendly for python rookies – E.Serra Jun 17 '20 at 13:11
  • 3
    Yes it might be more readable and friendly for rookies, however I don't think think passing a simple dictionary is too complex. I'd say dictionaries should be learnt before for loops and pandas together, but that's an opinion of course, thanks for your comment. – Celius Stingher Jun 17 '20 at 13:14
  • 1
    yes, but knowing that you can pass a dictionary to replace has nothing to do with pandas or dictionaries themselves, it is just some internal magic, that was my point, it looks really clean though – E.Serra Jun 17 '20 at 13:16
  • 1
    Amazing! This is exactly the sort of funtionality I was after. – Manesh Halai Jun 17 '20 at 13:27
  • 6
    Can you pass a dictionary to `replace()`? I'm confused here. With Python 3.8.5, evaluating `"abcdefghi".replace({'b':'B', 'g':'G'})` gives `TypeError: replace expected at least 2 arguments, got 1`. Am I missing something? – bitinerant Apr 06 '21 at 18:26
  • 1
    This is pandas replace we are using, not python's built-in replace :) – Celius Stingher Apr 06 '21 at 20:25
2

you could chain the replacings:

df1['CompanyA'] = df1['CompanyA'].str.replace('.','').replace('-','').replace(',','').replace('ltd','limited').replace('&','and')
...
bigbear3001
  • 480
  • 6
  • 18
1

Replace function accepts values as dictionaries as well. You can do something like this:

df1.replace({'CompanyA' : { '&' : 'and', '.': '' , '-': ''}},regex=True)
LazyCoder
  • 1,409
  • 8
  • 23
1

You can use a dictionary to map the characters for each column:

to_replace = {'.': '',
              ',': '',
              'foo': 'bar'
             }

for k, v in to_replace.items():
    df1['CompanyA'] = df1['CompanyA'].str.replace(k, v)
PApostol
  • 1,644
  • 2
  • 8
  • 15
1

most likely you use pd.Dataframe so i suggest to make universal remover

def remover(row, replaces):
    for k,v in replacers.items():
        if k in row:
            row = row.replace(k, v)
    return row      


replacers = {',' : "",
         '.':'',
         '-':'',
         'ltd':'limited'
        }

for column in df.columns:
    df[column] = df[column].apply(lambda row: remover(row, replacers))

or you can specify specific column names to modify

Igor Tischenko
  • 560
  • 7
  • 19