Splitting a list based on a delimiter word

Question

I have a list containing various string values. I want to split the list whenever I see WORD. The result will be a list of lists (which will be the sublists of original list) containing exactly one instance of the WORD I can do this using a loop but is there a more pythonic way to do achieve this ?

Example = ['A', 'WORD', 'B' , 'C' , 'WORD' , 'D']

result = [['A'], ['WORD','B','C'],['WORD','D']]

This is what I have tried but it actually does not achieve what I want since it will put WORD in a different list that it should be in:

def split_excel_cells(delimiter, cell_data):

    result = []

    temp = []

    for cell in cell_data:
        if cell == delimiter:
            temp.append(cell)
            result.append(temp)
            temp = []
        else:
            temp.append(cell)

    return result

score 37 · Answer 1 · edited Oct 10 '16 at 08:00

37

import itertools

lst = ['A', 'WORD', 'B' , 'C' , 'WORD' , 'D']
w = 'WORD'

spl = [list(y) for x, y in itertools.groupby(lst, lambda z: z == w) if not x]

this creates a splitted list without delimiters, which looks more logical to me:

[['A'], ['B', 'C'], ['D']]

If you insist on delimiters to be included, this should do the trick:

spl = [[]]
for x, y in itertools.groupby(lst, lambda z: z == w):
    if x: spl.append([])
    spl[-1].extend(y)

edited Oct 10 '16 at 08:00

Drake Guan

13,644
11
59
92

answered Mar 12 '13 at 10:14

georg

204,715
48
286
369

1

Strongly suggest to use this answer as it's much pythonic with the builtin `itertools` module! – Drake Guan Oct 10 '16 at 08:01
1

Unfortunately, the second version gives incorrect result if delimeter is repeated. – Ilya V. Schurov Oct 08 '17 at 12:18
AttributeError: 'list' object has no attribute 'groupby' – Deepa MG Sep 04 '19 at 10:54

score 23 · Accepted Answer · answered Mar 12 '13 at 09:54

23

I would use a generator:

def group(seq, sep):
    g = []
    for el in seq:
        if el == sep:
            yield g
            g = []
        g.append(el)
    yield g

ex = ['A', 'WORD', 'B' , 'C' , 'WORD' , 'D']
result = list(group(ex, 'WORD'))
print(result)

This prints

[['A'], ['WORD', 'B', 'C'], ['WORD', 'D']]

The code accepts any iterable, and produces an iterable (which you don't have to flatten into a list if you don't want to).

answered Mar 12 '13 at 09:54

NPE

464,258
100
912
987

2

Note that if u want to exclude the delimiter from the results, u can add continue statement inside the if statement in the `group` function. – tjysdsg Aug 06 '19 at 13:35
Note that if you exclude the stop-word, you would be yielding a empty list if the stop-word is at the end of your input – norok2 Dec 06 '19 at 12:15

score 3 · Answer 3 · edited Jul 02 '20 at 21:33

3

@NPE's solution looks very pythonic to me. This is another one using itertools:
izip is specific to python 2.7. Replace izip with zip to work in python 3

from itertools import izip, chain
example = ['A', 'WORD', 'B' , 'C' , 'WORD' , 'D']
indices = [i for i,x in enumerate(example) if x=="WORD"]
pairs = izip(chain([0], indices), chain(indices, [None]))
result = [example[i:j] for i, j in pairs]

This code is mainly based on this answer.

edited Jul 02 '20 at 21:33

Trenton McKinney

43,885
25
111
113

answered Mar 12 '13 at 10:03

A. Rodas

19,561
8
63
70

Thanks I also attempted to split based on indices but wasnt sure how to pair them up. This is a very nice way. – Cemre Mengü Mar 12 '13 at 11:00

score 3 · Answer 4 · answered Jul 14 '18 at 00:06

Given

import more_itertools as mit


iterable = ["A", "WORD", "B" , "C" , "WORD" , "D"]
pred = lambda x: x == "WORD"

Code

list(mit.split_before(iterable, pred))
# [['A'], ['WORD', 'B', 'C'], ['WORD', 'D']]

more_itertools is a third-party library installable via > pip install more_itertools.

Splitting a list based on a delimiter word

4 Answers4

Linked

Related