2

I'm working on a project that involves parsing pages of text. I've written the following function to remove certain punctuation from a word and convert it to lowercase:

def format_word(word):
    return word.replace('.', '').replace(',', '').replace('\"', '').lower()

Is there any way to combine all of the calls to .replace() into one? This looks rather ugly the way it is! The only way I can think of doing it is as follows:

def format_word(word):
    for punct in '.,\"':
        word.replace(punct, '')
    return word.lower()
falsetru
  • 336,967
  • 57
  • 673
  • 597
Ryan
  • 6,771
  • 4
  • 15
  • 31
  • 1
    unrelated: you don't need to escape `"` inside `'` string literals. – jfs Jan 06 '15 at 08:41
  • related: [Best way to strip punctuation from a string in Python](http://stackoverflow.com/q/265960/4279) – jfs Jan 06 '15 at 08:45
  • related: [Remove punctuation from Unicode formatted strings](http://stackoverflow.com/q/11066400/4279) – jfs Jan 06 '15 at 08:45

4 Answers4

8

You can use str.translate if you want to remove characters:

In python 2.x:

>>> 'Hello, "world".'.translate(None, ',."')
'Hello world'

In python 3.x:

>>> 'Hello, "world".'.translate(dict.fromkeys(map(ord, ',."')))
'Hello world'
glglgl
  • 85,390
  • 12
  • 140
  • 213
falsetru
  • 336,967
  • 57
  • 673
  • 597
  • upvote. You could also emulate `.lower()` in the same `.translate()` call. – jfs Jan 06 '15 at 08:42
  • [code example that shows how to use `translate()` to emulate `lower()` for ascii data](http://ideone.com/VpW59X) – jfs Jan 06 '15 at 08:59
4

You can use the re module for that

import re
>>> def format_word(word):
...     return re.sub(r'[,."]', "", word)
...
>>> print format_word('asdf.,"asdf')
asdfsdf
nu11p01n73R
  • 25,677
  • 2
  • 36
  • 50
0

You are quite close. If you don't only call .replace(), but as well use its result, you are done:

def format_word(word):
    for punct in '.,\"':
        word = word.replace(punct, '')
    return word.lower()
glglgl
  • 85,390
  • 12
  • 140
  • 213
0

You can do this using regular expressions:

re.sub("[.,\"]", "", "\"wo,rd.")
bigblind
  • 12,015
  • 13
  • 68
  • 115