I wanted to combine multiple regex patterns. When I did this I noticed that is some cases empty matches appear, I wasn't aware of this behaviour:
import re
s = 'some test set of words'
# if I use round brackets as a capturing group and a or pipe to combine them empty matches appear
re.findall('(some)|(test)', s, flags=re.IGNORECASE)
[('some', ''), ('', 'test')]
# no empty matches by avoiding the round brackets
re.findall('some|test', s, flags=re.IGNORECASE)
['some', 'test']
# no empty matches if round brackets are used with a single pattern.
re.findall('(some)', s, flags=re.IGNORECASE)
['some']
Can someone explain this behaviour?
The documentation mentions that it will include empty matches.:
re.findall(pattern, string, flags=0)Return all non-overlapping matches of pattern in string, as a list of strings. The string is scanned left-to-right, and matches are returned in the order found. If one or more groups are present in the pattern, return a list of groups; this will be a list of tuples if the pattern has more than one group. Empty matches are included in the result.
But this doesn't explain the behaviour, only that it is intended to have empty matches.