Remove punctuations in pandas

Question

code: df['review'].head()
        index         review
output: 0      These flannel wipes are OK, but in my opinion

I want to remove punctuations from the column of the dataframe and create a new column.

code: import string 
      def remove_punctuations(text):
          return text.translate(None,string.punctuation)

      df["new_column"] = df['review'].apply(remove_punctuations)

Error:
  return text.translate(None,string.punctuation)
  AttributeError: 'float' object has no attribute 'translate'

I am using python 2.7. Any suggestions would be helpful.

You want to have a new column with the same string values but without the punctuation? Why? — Joe T. Boka, Sep 30 '16 at 02:30

score 58 · Accepted Answer · edited Sep 30 '16 at 06:52

58

Using Pandas str.replace and regex:

df["new_column"] = df['review'].str.replace('[^\w\s]','')

edited Sep 30 '16 at 06:52

nalzok

13,395
18
64
118

answered Sep 30 '16 at 02:59

Bob Haffner

7,393
1
33
39

@ Bob Haffner, thank you for this but how would I preserve spaces that previously existed? – bernando_vialli Jun 12 '19 at 15:16
Hi @bob-haffner, I want to remove punctuation (only dot `.`) only after the letter `c` and `p`. How can I do that? – Roy Mar 10 '22 at 23:57

score 26 · Answer 2 · answered Aug 09 '17 at 20:45

26

You can build a regex using the string module's punctuation list:

df['review'].str.replace('[{}]'.format(string.punctuation), '')

answered Aug 09 '17 at 20:45

David C

6,789
4
48
65

score 12 · Answer 3 · answered Sep 30 '16 at 02:23

I solved the problem by looping through the string.punctuation

def remove_punctuations(text):
    for punctuation in string.punctuation:
        text = text.replace(punctuation, '')
    return text

You can call the function the same way you did and It should work.

df["new_column"] = df['review'].apply(remove_punctuations)

Remove punctuations in pandas

3 Answers3

Linked

Related