0

Given a string like:

'Foo他有一支 20 老枪 16。'
'Bar他有一支 20老枪 16。'
'Baz他有一支20 老枪 16。'

How can I use re.sub to remove the number, and the spaces surrounding the number, to return:

'Foo他有一支老枪。'
'Bar他有一支老枪。'
'Baz他有一支老枪。'

I would like to retain Chinese and English.

Matthew Moisen
  • 14,590
  • 25
  • 104
  • 205
  • `re.sub('\s*\d+\s*', '', text)` – cs95 Dec 14 '18 at 03:42
  • @coldspeed 太好了! – Matthew Moisen Dec 14 '18 at 03:44
  • ^ No, find a better one please. – cs95 Dec 14 '18 at 03:56
  • Sorry, I don't have a duplicate for this particular regex. As for regex `a*b*c*` and for regex `12345` and so on. Should we create new questions for all possible regexes? –  Dec 14 '18 at 03:58
  • @dyukha I don't know, but maybe try not to... downvote every single answer to a regex question? I'm getting sick of answering this tag, every other answer is downvoted by users on a crusade to close every single question as duplicate, even if it's not. – cs95 Dec 14 '18 at 04:01
  • @dyukha Even regex questions on the pandas tag are closed as duplicates of non-pandas questions, it's quite frankly irritating. Maybe invest a little time into making a few canonical regex questions (not like [this one](https://stackoverflow.com/a/2759417/4909087), it doesn't count) to a few common problems that you can use to close questions. That would be a better use of time than downvoting. – cs95 Dec 14 '18 at 04:06
  • @dyukha Even though this is not a duplicate of the question posted, I've closed it. Just because it is easier than getting into these petty arguments. No offense directed at you in particular, just fed up with the mentality here. Respect, boundaries, and civility. We are all unpaid volunteers on this site, is it too much to ask for a little mutual respect? – cs95 Dec 14 '18 at 04:09
  • @coldspeed, I don't understand any of your argument. What downvoting? What's wrong with mentality? This question shows zero research, which makes it not acceptable, and I don't understand why it doesn't have negative reputation (again, what downvoting you are talking about?). What's the purpose of creating canonical question? You think that if a person didn't google it, he will try to find the answer here? –  Dec 14 '18 at 04:14
  • @dyukha Please understand that many users do try things that don't work out before posting here; just that they don't believe their broken solutions are worth posting. Of course, there are those lazy ones that don't try, but if you've been around here as long as I have, it becomes easier to tell the difference-the user's rep and the tag are considerations. Pandas is quite similar. Now, I don't have problems with you downvoting questions for whatever reason, but I cannot support downvoting good answers just because you don't like the question. – cs95 Dec 14 '18 at 04:18
  • Feel free to mark as duplicate, and ping the user who answered. I am one of few people who actually delete their answers if the question is identified as duplicate as I don't believe in fragmenting information. You can ask Wiktor Stribizew (who also enjoys downvoting other answers) that this is what I do, because I have had this argument with him before. – cs95 Dec 14 '18 at 04:19

1 Answers1

2

You can use a regex pattern matching digits and surrounding spaces: r'\s*\d+\s*'.

Code example:

import re
text_clean = re.sub(r'\s*\d+\s*', '', text)

See the regex101 demo (in particular, the "Substitution" panel).

cs95
  • 330,695
  • 80
  • 606
  • 657