12

I would like to find all alternating digits in a string using regular expressions. An alternating digit is defined as two equal digits having a digit in between; for example, 1212 contains 2 alternations (121 and 212) and 1111 contains 2 alternations as well (111 and 111). I have the following regular expression code:

s = "1212"
re.findall(r'(\d)(?:\d)(\1)+', s)

This works for strings like "121656", but not "1212". This is a problem to do with overlapping matches I think. How can I deal with that?

Peter Mortensen
  • 30,030
  • 21
  • 100
  • 124
user1879926
  • 1,243
  • 3
  • 13
  • 23
  • Specific answer given by @vks; also see answer to similar question @ http://stackoverflow.com/a/320478/43774. – rivy Jan 03 '16 at 05:45

4 Answers4

17
(?=((\d)\d\2))

Use lookahead to get all overlapping matches. Use re.findall and get the first element from the tuple. See the demo:

https://regex101.com/r/fM9lY3/54

Peter Mortensen
  • 30,030
  • 21
  • 100
  • 124
vks
  • 65,133
  • 10
  • 87
  • 119
4

You can use a lookahead to allow for overlapping matches:

r'(\d)(?=(\d)\1)'

To reconstruct full matches from this:

matches = re.findall(r'(\d)(?=(\d)\1)', s)
[a + b + a for a, b in matches]

Also, to avoid other Unicode digits like ١ from being matched (assuming you don’t want them), you should use [0-9] instead of \d.

Ry-
  • 209,133
  • 54
  • 439
  • 449
3

With the regex module you don't have to use a trick to get overlapped matches since there's a flag to obtain them:

import regex
res = [x.group(0) for x in regex.finditer(r'(\d)\d\1', s, overlapped=True)]

if s contains only digits, you can do this too:

res = [s[i-2:i+1] for i in range(2, len(s)) if s[i]==s[i-2]]
Casimir et Hippolyte
  • 85,718
  • 5
  • 90
  • 121
2

A non regex approach if you string is made up of just digits:

from itertools import islice as isl, izip

s = "121231132124123"
out = [a + b + c for a, b, c in zip(isl(s, 0, None), isl(s, 1, None), isl(s, 2, None)) if a == c]

Output:

['121', '212', '212']

It is actually a nice bit faster than a regex approach.

Padraic Cunningham
  • 168,988
  • 22
  • 228
  • 312