35

I have a Python regular expression that contains a group which can occur zero or many times - but when I retrieve the list of groups afterwards, only the last one is present. Example:

re.search("(\w)*", "abcdefg").groups()

this returns the list ('g',)

I need it to return ('a','b','c','d','e','f','g',)

Is that possible? How can I do it?

Brian Tompsett - 汤莱恩
  • 5,438
  • 68
  • 55
  • 126
John B
  • 3,141
  • 5
  • 31
  • 29

2 Answers2

40
re.findall(r"\w","abcdefg")
Douglas Leeder
  • 50,599
  • 9
  • 90
  • 136
32

In addition to Douglas Leeder's solution, here is the explanation:

In regular expressions the group count is fixed. Placing a quantifier behind a group does not increase group count (imagine all other group indexes increment because an eralier group matched more than once).

Groups with quantifiers are the way of making a complex sub-expression atomic, when there is need to match it more than once. The regex engine has no other way than saving the last match only to the group. In short: There is no way to achieve what you want with a single "unarmed" regular expression, and you have to find another way.

Community
  • 1
  • 1
Tomalak
  • 322,446
  • 66
  • 504
  • 612
  • 2
    As an addition: Modern regex implementations like the one in .NET allow you to access previous occurrences of a group besides the last one. Therefore, the above statement is not univerally true, but still holds for the most implementations. – Tomalak Jun 07 '11 at 18:21
  • 4
    For the record, there's a regex implementation for Python which also permits access to all of the matches of a capture group: http://pypi.python.org/pypi/regex – MRAB Sep 03 '12 at 01:18