7

How can create a regex class that is the intersection of two other regex classes? For example, how can I search for consonants with the [a-z] and [^aeiou] without explicitly constructing a regex class containing all the consonants like so:

[bcdfghjlkmnpqrstvwxyz] # explicit consonant regex class
Malik Brahimi
  • 15,933
  • 5
  • 33
  • 65

3 Answers3

8

This regex should do the trick : (?=[^aeiou])(?=[a-z]).

The first group (?=...) asserts that the pattern [^aeiou] can be matched, then restarts the matching at the beginning and moves on to the second pattern (which works the same way), it's like a logical AND, and the whole regex will only match if all of these two expressions match.

Community
  • 1
  • 1
6

As an alternative to Python's re module, you can do this explicitly with the regex library, which supports set operations for character classes:

The operators, in order of increasing precedence, are:

|| for union (“x||y” means “x or y”)

~~ (double tilde) for symmetric difference (“x~~y” means “x or y, but not > both”)

&& for intersection (“x&&y” means “x and y”)

-- (double dash) for difference (“x––y” means “x but not y”)

So to match only consonants, your regular expression could be:

>>> regex.findall('[[a-z]&&[^aeiou]]+', 'abcde', regex.VERSION1)
['bcd']

Or equivalently using set difference:

>>> regex.findall('[[a-z]--[aeiou]]+', 'abcde', regex.VERSION1)
['bcd']
artu-hnrq
  • 575
  • 1
  • 4
  • 26
Alex Riley
  • 152,205
  • 43
  • 245
  • 225
0

The character class difference or intersection is not available in the re module, so what you can do?

using ranges:

[bcdfghj-np-tv-z]

using the \w character class:

[^\W0-9_aeiouAEIOU]

a lookahead (not very efficient since you need to make a test for each character):

(?:(?![eiou])[b-z])

using the new regex module that has the difference feature:

[[b-z]--[eiou]]
Casimir et Hippolyte
  • 85,718
  • 5
  • 90
  • 121