Does * have side effect in Python regular expression matching?

Question

I'm learning Python's regular expression, following is working as I expected:

>>> import re
>>> re.split('\s+|:', 'find   a str:s2')
['find', 'a', 'str', 's2']

But when I change + to *, the output is weird to me:

>>> re.split('\s*|:', 'find  a str:s2')
['find', 'a', 'str:s2']

How is such pattern interpreted in Python?

Also see [Reference - What does this regex mean?](http://stackoverflow.com/q/22937618) — Martijn Pieters, Jun 24 '14 at 14:54

Martijn Pieters · Accepted Answer · 2014-06-24T15:02:25.223

9

The 'side effect' you are seeing is that re.split() will only split on matches that are longer than 0 characters.

The \s*|: pattern matches either on zero or more spaces, or on :, whichever comes first. But zero spaces matches everywhere. In those locations where more than zero spaces matched, the split is made.

Because the \s* pattern matches every time a character is considered for splitting, the next option : is never considered.

Splitting on non-empty matches is called out explicitly in the re.split() documentation:

Note that split will never split a string on an empty pattern match.

If you reverse the pattern, : is considered, as it is the first choice:

>>> re.split(':|\s*', 'find  a str:s2')
['find', 'a', 'str', 's2']

edited Jun 24 '14 at 15:02

answered Jun 24 '14 at 14:57

Martijn Pieters

963,270
265
3,804
3,187

So for the first character 'f' in the string, can I say it matches the pattern, but won't be split because it is by "empty pattern match"? – Deqing Jun 24 '14 at 15:20
@Deqing: For `f`, the `\s*` part matches. It is a 0-width match, so no split takes place. Next, `i` is tested, and it too matches `\s*`, etc. – Martijn Pieters Jun 24 '14 at 15:22
Thanks that make sense – Deqing Jun 25 '14 at 05:18

score -4 · Answer 2 · answered Jun 24 '14 at 14:56

-4

If you meant to do "or" for your matching, then you have to do something like this: re.split('(\s*|:)', 'find a str:s2') In short: "+" means "at least one character". "*" any (or none)

answered Jun 24 '14 at 14:56

nochkin

674
6
16

Does * have side effect in Python regular expression matching?

2 Answers2