0

Looking at the spans returned from my regex matches, I noticed that they always return one past the actual match; e.g. in the example at Regular Expression HOWTO

>>> print(p.match('::: message'))
None
>>> m = p.search('::: message'); print(m)  
<_sre.SRE_Match object at 0x...>
>>> m.group()
'message'
>>> m.span()
(4, 11)

The resulting span in the example is (4, 11) vs. the actual location (4, 10). This causes some trouble for me as the left-hand and right-hand boundaries have different meanings and I need to compare the relative positions of the spans.

Is there a good reason for this or can I go ahead and modify the spans to my liking by subtracting one from the right boundary?

jonrsharpe
  • 107,083
  • 22
  • 201
  • 376
Toaster
  • 1,809
  • 2
  • 22
  • 39

1 Answers1

5

Because in Python, slicing and ranges never the end value is always exclusive, and '::: message'[4:11] reflects the actual matched text:

>>> '::: message'[4:11]
'message'

Thus, you can use the MatchObject.span() results to slice the matched text from the original string:

>>> import re
>>> s = '::: message'
>>> match = p.search(s)
>>> match.span()
(4, 11)
>>> s[slice(*match.span())]
'message'
Martijn Pieters
  • 963,270
  • 265
  • 3,804
  • 3,187
  • I see so the designers wanted a way to slice nothing wile still providing indexes. I.e. they wanted 'message'[x:x] to return empty. – Toaster Oct 10 '14 at 10:34
  • @Colin: exactly; also see [Python's slice notation](http://stackoverflow.com/q/509211) for some helpful diagrams. – Martijn Pieters Oct 10 '14 at 10:45