5

A period p of a string w is any positive integer p such that w[i]=w[i+p] whenever both sides of this equation are defined. Let per(w) denote the size of the smallest period of w . We say that a string w is periodic iff per(w) <= |w|/2.

So informally a periodic string is just a string that is made up from a prefix repeated at least twice. The only complication is that at the end of the string we don't require a full copy of the prefix.

For, example consider the string x = abcab. per(abcab) = 3 as x[1] = x[1+3] = a, x[2]=x[2+3] = b and there is no smaller period. The string abcab is therefore not periodic. However, the string ababa is periodic as per(ababa) = 2.

As more examples, abcabca, ababababa and abcabcabc are also periodic.

Is there a regex to determine if a string is periodic or not?

I don't really mind which flavor of regex but if it makes a difference, anything that Python re supports.

graffe
  • 17,307
  • 35
  • 112
  • 239

3 Answers3

5

What you need is backreference

\b(\w*)(\w+\1)\2+\b

This matches even abcabca and ababababa.

\1 and \2 are used to match the first and second capturing groups, respectively.

horcrux
  • 6,493
  • 6
  • 27
  • 40
  • This looks very nice! I am just playing with it. – graffe Jul 12 '16 at 08:22
  • I think that if you consider periodic the empty string too, the right expression is: `\b(\w*)(\w*\1)\2+\b` – horcrux Jul 12 '16 at 08:26
  • 1
    Very nice. Shorter than mine, but maybe a bit harder to grasp. IMHO you should use `^...$` instead of `\b...\b` to ensure that the entire string is periodic. – tobias_k Jul 12 '16 at 08:41
  • This is really nice. It even handles sequences *cut* in the beginning, like `cdabcdabcdabcd`. Thumbs up – SamWhan Jul 12 '16 at 10:52
4

You could use Regex back references.

For example (.+)\1+. This pattern will match a group () formed of at least one character .+. This group \1 (back reference) must repeat at least one time for a match.

The string ababa matches and it finds ab as the 1st group.

The string abcab is not a match.

Later edit

If you want a prefix that is repeated at least twice, you can change the pattern to: ^(.+)\1+. The problem is that I don't think you can match the end of the string to a substring of the prefix. So any string that starts with a repeating pattern will match but it will ignore the ending of the string.

Even later edit

Inspired from @tobias_k answer, here is how I would do it ^((.+)(?:.*))\1+\2?$. It looks for a string that has a prefix (it looks for the longest prefix it can find) that repeats at least twice and the ending must be the starting part of the prefix.

The first capturing group from the match will be the prefix that is repeating.

https://regex101.com/r/jQ3yY1/2

If you want the shortest prefix that repeats, you can use this pattern ^((.+?)(?:.*?))\1+\2?$.

Andrei Tătar
  • 7,213
  • 17
  • 34
4

You can use a regex like ^(.+)(.*)(\1\2)+\1?$.

  • ^...$ from start to end of string
  • (.+) part of period that is always repeated (e.g. a in ababa)
  • (.*) optional part of period that is repeated except at the end (e.g. b in ababa)
  • (\1\2)+ one or more repetitions of the entire period
  • \1? optional final repetition of first part of the period

In Python:

>>> p = r"^(.+)(.*)(\1\2)+\1?$"
>>> re.match(p, "abcab")
None
>>> re.match(p, "abcabca")
<_sre.SRE_Match at 0x7f5fde6e51f8>

Note that this does not match the empty string "" though, which could also be considered periodic. If the empty string should be matched, you will have to treat it separately, e.g. by simply appending |^$ at the end of the regex.

tobias_k
  • 78,071
  • 11
  • 109
  • 168