0

I'm trying to split an extremely long string by commas. I have two requirements, however:

  1. the comma cannot be followed by a space
  2. the comma cannot be followed by a '+' symbol

so for example, the input would be:

text = "hello,+how are you?,I am fine, thanks"

and the output of this is:

['hello,+how are you?', 'I am fine, thanks']

i.e. the only comma that seperated the values was the one that was not followed by a '+' or a space

I have managed requirement 1) as follows:

re.split(r',(?=[^\s]+)',text)

I cannot figure out how to add requirement 2)

Georgy
  • 9,972
  • 7
  • 57
  • 66
Callum Brown
  • 141
  • 5

3 Answers3

3

The simplest solution is to only look for the pattern that you don't want, and exclude it altogether. You do that using negative-lookahead in regular-expression.

>>> text = "hello,+how are you?,I am fine, thanks"
>>> re.split(r',(?![+ ])', text)
['hello,+how are you?', 'I am fine, thanks']

This will match , unless it's followed either by a literal + or a space.

Hampus Larsson
  • 2,707
  • 2
  • 12
  • 17
0

Try this

re.split(r',(?=[^\s +])',text)
0

I suggest you go with @HampusLarsson's answer, but I'd like to squeeze in an answer that doesn't use imported modules:

s = "hello,+how are you?,I am fine, thanks"

ind = [0]+[i for i,v in enumerate(s)
           if v == ',' and s[i+1] not in [' ','+']]

parts = [s[i:j].lstrip(',')
         for i,j in zip(ind, ind[1:]+[None])]

print(parts)

Output:

['hello,+how are you?', 'I am fine, thanks']
Ann Zen
  • 25,080
  • 7
  • 31
  • 51