Why do Python regex spans extend one place past the actual match?

Question

Looking at the spans returned from my regex matches, I noticed that they always return one past the actual match; e.g. in the example at Regular Expression HOWTO

>>> print(p.match('::: message'))
None
>>> m = p.search('::: message'); print(m)  
<_sre.SRE_Match object at 0x...>
>>> m.group()
'message'
>>> m.span()
(4, 11)

The resulting span in the example is (4, 11) vs. the actual location (4, 10). This causes some trouble for me as the left-hand and right-hand boundaries have different meanings and I need to compare the relative positions of the spans.

Is there a good reason for this or can I go ahead and modify the spans to my liking by subtracting one from the right boundary?

this answer should your understanding http://stackoverflow.com/a/509297/659346 — GP89, Oct 10 '14 at 10:31

score 5 · Accepted Answer · answered Oct 10 '14 at 10:28

5

Because in Python, slicing and ranges never the end value is always exclusive, and '::: message'[4:11] reflects the actual matched text:

>>> '::: message'[4:11]
'message'

Thus, you can use the MatchObject.span() results to slice the matched text from the original string:

>>> import re
>>> s = '::: message'
>>> match = p.search(s)
>>> match.span()
(4, 11)
>>> s[slice(*match.span())]
'message'

answered Oct 10 '14 at 10:28

Martijn Pieters

963,270
265
3,804
3,187

I see so the designers wanted a way to slice nothing wile still providing indexes. I.e. they wanted 'message'[x:x] to return empty. – Toaster Oct 10 '14 at 10:34
@Colin: exactly; also see [Python's slice notation](http://stackoverflow.com/q/509211) for some helpful diagrams. – Martijn Pieters Oct 10 '14 at 10:45

Why do Python regex spans extend one place past the actual match?

1 Answers1