2

I have this python script. That uses some regular expression. I want to split the string s, but commas while ignoring any commas that exists within the brackets.

s = """aa,bb,(cc,dd),m(ee,ff)"""
splits = re.split(r'\s*(\([^)]*\)|[^,]+)', s, re.M|re.S)
print('\n'.join(splits))
Actual output:
    aa
    ,
    bb
    ,
    (cc,dd)
    ,
    m(ee
    ,
    ff)
Desired output: 
    aa
    bb
    (cc,dd)
    m(ee,ff)

So I can't make it handle having text outside the brackets. Was hoping someone could help me out.

h33
  • 743
  • 7
  • 19

3 Answers3

2

You may use this regex with a lookahead for split:

>>> s = """aa,bb,(cc,dd),m(ee,ff)"""
>>> print ( re.split(r',(?![^()]*\))', s) )
['aa', 'bb', '(cc,dd)', 'm(ee,ff)']

RegEx Demo

RegEx Details:

  • ,: Match a comma
  • (?![^()]*\)): A negative lookahead assertion that makes sure we don't match comma inside (...) by asserting that there is no ) ahead after 0 or more not bracket characters.
anubhava
  • 713,503
  • 59
  • 514
  • 593
1

Consider using findall instead - repeat a group that matches (s followed by non-) characters, followed by ), or matches non-, characters:

s = """aa,bb,m(cc,dd)"""
matches = re.findall(r'(?:\([^(]+\)|[^,])+', s, re.M|re.S)
print('\n'.join(matches))

If speed is an issue, you can make it a bit more efficient by putting ( in the other negative character set, and alternating it first:

(?:[^(,]+|\([^(]+\))+
CertainPerformance
  • 313,535
  • 40
  • 245
  • 254
  • Check no of steps it takes on https://regex101.com/r/UXdHRe/2 (272 steps) vs my suggested split regex (146 steps). – anubhava Mar 12 '19 at 06:47
-1

try : r',([^,()][(][^()][)][^,])|([^,]+)'

tested on regex101 : https://regex101.com/r/pJxRwQ/1

shikai ng
  • 137
  • 3