-2

I don't see a point of non capture groups as you can simply leave that part of the regex out of the groups and it has the same effect. It also doesn't appear to cancel out the enclosed text when matched within another capture group.

Am I missing a key application of them?

-- Example just in case --

Given the string '15nice12' with the aim of extracting '1512'.

Using the regex: (\d+)(?:[a-zA-Z]+)(\d+) or (\d+)[a-zA-Z]+(\d+)

Match 1: '15nice12'

Group 1: '15'

Group 2: '12'

These two regex patterns return the same thing making the non capture group redundant (The groups are also not in the desired output in where the two numbers are captured within the same output group)

Using the regex: (\d+(?:[a-zA-Z]+)\d+)

Match 1: '15nice12'

Group 1: '15nice12'

Here, the non capturing group does not stop the word from being matched

1 Answers1

2

It doesn't "have the same effect" - in one case the group is captured and accessible, in the other it is only used to complete the match.

People use non-capturing groups when they are not interesting in accessing the value of the group - to save space for situations with many matches, but also for better performance in cases where the regex engine is optimised for it.

A useless example in Python to illustrate the point:

from timeit import timeit
import re

chars = 'abcdefghij'
s = ''.join(chars[i % len(chars)] for i in range(100000))


def capturing():
    re.findall('(a(b(c(d(e(f(g(h(i(j))))))))))', s)


def noncapturing():
    re.findall('(?:a(?:b(?:c(?:d(?:e(?:f(?:g(?:h(?:i(j))))))))))', s)


print(timeit(capturing, number=1000))
print(timeit(noncapturing, number=1000))

Output:

5.8383678999998665
1.0528525999998237

Note: this is in spite of PyCharm (if you happen to use it) warning "Unnecessary non-capturing group" - the warning is correct, but not the whole truth, clearly. It's logically unneeded, but definitely does not have the same practical effect.

If the reason you wanted to get rid of them was to suppress such warnings, PyCharm allows you to do so with this:

# noinspection RegExpUnnecessaryNonCapturingGroup
re.findall('(?:a(?:b(?:c(?:d(?:e(?:f(?:g(?:h(?:i(j))))))))))', s)

Another note for the pedantic: the examples above aren't strictly logically equivalent either. But they match the same strings, just with different results.

c = re.findall('(a(b(c(d(e(f(g(h(i(j))))))))))', s)
nc = re.findall('(?:a(?:b(?:c(?:d(?:e(?:f(?:g(?:h(?:i(j))))))))))', s)

c is a list of 10-tuples ([('abcdefghij', 'bcdefghij', ..), ..]), while nc is a list of single strings (['j', ..]).

Grismar
  • 20,449
  • 4
  • 26
  • 46