3

I have a string like this:

string = 'aaabbbcccddd'

and next I want to have a list that contains ALL the pieces that are 3 indices long, so:

aaa, aab, abb, bbb, bbc, bcc, ccc, ccd, cdd, ddd

How do I get there? Because re.finditer & re.findall won't take overlapping matches, which I do need.

Martijn Pieters
  • 963,270
  • 265
  • 3,804
  • 3,187
RonaldN
  • 119
  • 8

3 Answers3

5

Well, there's a simple way to do this:

>>> for a, b, c in zip(string[:], string[1:], string[2:]):
...     print(a, b, c)
...      
a a a
a a b
a b b
b b b
b b c
b c c
c c c
c c d
c d d
d d d

This using a list comprehension:

>>> ["".join(var) for var in zip(string, string[1:], string[2:])]
['aaa', 'aab', 'abb', 'bbb', 'bbc', 'bcc', 'ccc', 'ccd', 'cdd', 'ddd']
Games Brainiac
  • 75,856
  • 32
  • 131
  • 189
4

You want to create a sliding window over the string:

from itertools import islice

def window(seq, n=2):
    "Returns a sliding window (of width n) over data from the iterable"
    "   s -> (s0,s1,...s[n-1]), (s1,s2,...,sn), ...                   "
    it = iter(seq)
    result = tuple(islice(it, n))
    if len(result) == n:
        yield result    
    for elem in it:
        result = result[1:] + (elem,)
        yield result

print [''.join(slice) for slice in window(string, 3)]

This produces:

>>> string = 'aaabbbcccddd'
>>> [''.join(slice) for slice in window(string, 3)]
['aaa', 'aab', 'abb', 'bbb', 'bbc', 'bcc', 'ccc', 'ccd', 'cdd', 'ddd']
Martijn Pieters
  • 963,270
  • 265
  • 3,804
  • 3,187
3

An alternative that surely may be improved:

>>> s = 'aaabbbcccddd'
>>> [s[i:i+3] for i in range(len(s)-2)]
['aaa', 'aab', 'abb', 'bbb', 'bbc', 'bcc', 'ccc', 'ccd', 'cdd', 'ddd']
Martijn Pieters
  • 963,270
  • 265
  • 3,804
  • 3,187
Robert
  • 29,597
  • 6
  • 79
  • 88