21

This question is similar to Slicing a list into a list of sub-lists, but in my case I want to include the last element of the each previous sub-list, as the first element in the next sub-list. And have to take into account that the last element have always to have at least two elements.

For example:

list_ = ['a','b','c','d','e','f','g','h']

The result for a size 3 sub-list:

resultant_list = [['a','b','c'],['c','d','e'],['e','f','g'],['g','h']]
Peter Mortensen
  • 30,030
  • 21
  • 100
  • 124
efirvida
  • 4,209
  • 2
  • 37
  • 61

4 Answers4

28

The list comprehension in the answer you linked is easily adapted to support overlapping chunks by simply shortening the "step" parameter passed to the range:

>>> list_ = ['a', 'b', 'c', 'd', 'e', 'f', 'g', 'h']
>>> n = 3  # group size
>>> m = 1  # overlap size
>>> [list_[i:i+n] for i in range(0, len(list_), n-m)]
[['a', 'b', 'c'], ['c', 'd', 'e'], ['e', 'f', 'g'], ['g', 'h']]

Other visitors to this question mightn't have the luxury of working with an input list (slicable, known length, finite). Here is a generator-based solution that can work with arbitrary iterables:

from collections import deque

def chunks(iterable, chunk_size=3, overlap=0):
    # we'll use a deque to hold the values because it automatically
    # discards any extraneous elements if it grows too large
    if chunk_size < 1:
        raise Exception("chunk size too small")
    if overlap >= chunk_size:
        raise Exception("overlap too large")
    queue = deque(maxlen=chunk_size)
    it = iter(iterable)
    i = 0
    try:
        # start by filling the queue with the first group
        for i in range(chunk_size):
            queue.append(next(it))
        while True:
            yield tuple(queue)
            # after yielding a chunk, get enough elements for the next chunk
            for i in range(chunk_size - overlap):
                queue.append(next(it))
    except StopIteration:
        # if the iterator is exhausted, yield any remaining elements
        i += overlap
        if i > 0:
            yield tuple(queue)[-i:]

Note: I've since released this implementation in wimpy.util.chunks. If you don't mind adding the dependency, you can pip install wimpy and use from wimpy import chunks rather than copy-pasting the code.

wim
  • 302,178
  • 90
  • 548
  • 690
  • 1
    This method can result in unnecessary "stubs" being left over, for example if you run your first example on `['a', 'b', 'c', 'd', 'e', 'f', 'g']`, it produces `[['a', 'b', 'c'], ['c', 'd', 'e'], ['e', 'f', 'g'], ['g']]`. To avoid such unnecessary chunks that contain only elements already captured in previous chunks, subtract the overlap size `m` from the list length when calculating the range, i.e. `[list_[i:i+n] for i in range(0, len(list_)-m, n-m)]` – Jojanzing May 26 '22 at 19:33
11

more_itertools has a windowing tool for overlapping iterables.

Given

import more_itertools as mit


iterable = list("abcdefgh")
iterable
# ['a', 'b', 'c', 'd', 'e', 'f', 'g', 'h']

Code

windows = list(mit.windowed(iterable, n=3, step=2))
windows
# [('a', 'b', 'c'), ('c', 'd', 'e'), ('e', 'f', 'g'), ('g', 'h', None)]

If required, you can drop the None fillvalue by filtering the windows:

[list(filter(None, w)) for w in windows]
# [['a', 'b', 'c'], ['c', 'd', 'e'], ['e', 'f', 'g'], ['g', 'h']]

See also more_itertools docs for details on more_itertools.windowed

pylang
  • 34,585
  • 11
  • 114
  • 108
  • I really like the fact that it fills the groups with `None`, is there any way to do it with the standard library? – Mattwmaster58 Apr 17 '19 at 02:03
  • @MattM. Sure. Here is an alternative using itertools: `list(itertools.islice(itertools.zip_longest(s, s[1:], s[2:]), None, None, 2))`, where `s = "abcdefgh"`. Notice, `None` is also controlled by a `fillvalue` parameter in `zip_longest`. – pylang Apr 17 '19 at 02:56
  • that's a bit cleaner then my list comprehension version. Thanks. – Mattwmaster58 Apr 17 '19 at 03:56
3
[list_[i:i+n] for i in xrange(0,len(list_), n-m)]
2

Here's what I came up with:

l = [1, 2, 3, 4, 5, 6]
x = zip (l[:-1], l[1:])
for i in x:
    print (i)

(1, 2)
(2, 3)
(3, 4)
(4, 5)
(5, 6)

Zip accepts any number of iterables, there is also zip_longest

Ibolit
  • 8,547
  • 6
  • 50
  • 84