2

I have strings similar to

text='Studied b-tech from college in 2010-13'

Using

text.replace('-', ' ')

will produce

Studied b tech from college in 2010 13

But what I want is:

Studied b tech from college in 2010-13

I have prepared below pattern for grepping tokens like 2010-13, but how do I use it in my code?

regex_pattern='(\d{4}-\d{2,4})'
TT--
  • 2,622
  • 1
  • 24
  • 44
user3560077
  • 152
  • 2
  • 10

6 Answers6

1

I think what you are looking for is:

>>> import re
>>> text = "Studied b-tech from college in 2010-13"

>>> re.sub("\-([a-zA-Z]+)", r"\1", text)
"Studied btech from college in 2010-13"

[a-zA-Z] will not match with a number coming after -. You can find more about re.sub here.

Ozgur Vatansever
  • 45,449
  • 17
  • 80
  • 115
  • 1
    This is the correct answer. I thought `.replace()` would work but with the conditional, it gets too crazy. – Jeremy Jun 02 '17 at 16:42
1

You have to describe the two possibilities for your hyphen using negative lookarounds:

  • not preceded by four digits: (?<!\b[0-9]{4})
  • not followed by two or four digits: (?![0-9]{2}(?:[0-9]{2})?\b)

( "not preceded by A or not followed by B" is the negation of "preceded by A and followed by B" )

example:

import re

text = 'Studied b-tech from college in 2010-13'

result = re.sub(r'-(?:(?<!\b[0-9]{4}-)|(?![0-9]{2}(?:[0-9]{2})?\b))', ' ', text)

demo

( writing - (?: (?<! ... - ) | (?! ... ) ) is more efficient than (?<! ... )-|-(?! ... ), that's why you retrieve the hyphen in the lookbehind )

Community
  • 1
  • 1
Casimir et Hippolyte
  • 85,718
  • 5
  • 90
  • 121
0

There is third optional argument for replace that allows you to denote which instance you'd like to replace.

text.replace('-',' ', 1) 
etemple1
  • 1,708
  • 1
  • 10
  • 13
  • Would this still work if the string was: `text='Studied in 2010-13 b-tech from college` – Jeremy Jun 02 '17 at 16:32
  • I assume you mean `text='Studied b-tech from college in 2010-13 at B-college'` ? If so, no it will not still work. You've changed your requirements, please update your original question. – etemple1 Jun 02 '17 at 16:36
  • This is not my question :) I'm just thinking that if the OP wants a way to remove ALL instances of the hyphen that are not for dates, there should be a better way than replacing the first instance. – Jeremy Jun 02 '17 at 16:37
  • 1
    My apologies, I didn't initially notice you were not the original poster. That is correct, the `replace` above is only for the first instance and assumes the date is not first. He would need a regex for more instances but ignore the date. OP did not specify if the order would change. – etemple1 Jun 02 '17 at 16:39
  • 2
    No problem :) I think @Ozgur has the perfect answer here. – Jeremy Jun 02 '17 at 16:41
0

Python's string replace takes a max argument meaning the maximum number of occurrences to replace.

If you want just the 1st use text.replace(*, 1)

Pythonista
  • 11,152
  • 2
  • 29
  • 49
0

I would use Python's .replace() over the regex here.

Something like:

str.replace(old, new[, max])

where max is the number of instances you would want to replace. If you just want to replace the hyphen of non-number strings though, I would go with something similar to this question: How do I check if a string is a number (float) in Python? instead changing it to catch if the characters next to the hyphen are numbers.

Jeremy
  • 1,744
  • 2
  • 12
  • 21
0

You just need to match the anti-pattern

regex: (\d{0,3}(?:\D|^)\d{0,3})-(\d?(?:\D|$)\d?)
replace: $1 $2

Tezra
  • 7,632
  • 2
  • 23
  • 64