2

I found this thread: Best way to strip punctuation from a string in Python

But was hoping to come up with a way to do this except not to strip out the periods in links. So if the string is

I love using stackoverflow.com on Fridays, Saturdays and Mondays!

It would return

I love using stackoverflow.com on Fridays Saturdays and Monday

In fact ideally I would be able to pass in a list of common link endings like .com, .net, .ly etc.

Community
  • 1
  • 1
JiminyCricket
  • 6,632
  • 7
  • 41
  • 58

3 Answers3

5

You can use negative look-aheads:

[,!?]|\.(?!(com|org|ly))
Jacob
  • 1,486
  • 9
  • 7
3

Conventions suggest that you use a space after . , ! or things like that. If you can count on correct typing you can create a regex which strips these character only if they are followed by spaces. (Or at least do like this with the fullstop character).

The following regex will identify these:

[.,!?-](\s|$)

An other possibility is to use a list of legal TLD names. prefixes like www. or other patters like @ which keep the original punctuation around them.

vbence
  • 19,714
  • 8
  • 64
  • 114
1

how about this (which is pretty much what Felix Kling already suggested):

original = 'I love using stackoverflow.com on Fridays, Saturdays and Mondays!'
unwanted_chars = ',.!?;:'

bits = original.split()
cleaned_up = ' '.join([bit.strip(unwanted_chars) for bit in bits])
print cleaned_up
# I love using stackoverflow.com on Fridays Saturdays and Mondays

edit:

ps: 'cleaned_up' would then be the depunctuated string

martineau
  • 112,593
  • 23
  • 157
  • 280
HumanCatfood
  • 932
  • 1
  • 6
  • 19