26

I have a list containing various string values. I want to split the list whenever I see WORD. The result will be a list of lists (which will be the sublists of original list) containing exactly one instance of the WORD I can do this using a loop but is there a more pythonic way to do achieve this ?

Example = ['A', 'WORD', 'B' , 'C' , 'WORD' , 'D']

result = [['A'], ['WORD','B','C'],['WORD','D']]

This is what I have tried but it actually does not achieve what I want since it will put WORD in a different list that it should be in:

def split_excel_cells(delimiter, cell_data):

    result = []

    temp = []

    for cell in cell_data:
        if cell == delimiter:
            temp.append(cell)
            result.append(temp)
            temp = []
        else:
            temp.append(cell)

    return result
Georgy
  • 9,972
  • 7
  • 57
  • 66
Cemre Mengü
  • 16,738
  • 25
  • 102
  • 161

4 Answers4

37
import itertools

lst = ['A', 'WORD', 'B' , 'C' , 'WORD' , 'D']
w = 'WORD'

spl = [list(y) for x, y in itertools.groupby(lst, lambda z: z == w) if not x]

this creates a splitted list without delimiters, which looks more logical to me:

[['A'], ['B', 'C'], ['D']]

If you insist on delimiters to be included, this should do the trick:

spl = [[]]
for x, y in itertools.groupby(lst, lambda z: z == w):
    if x: spl.append([])
    spl[-1].extend(y)
Drake Guan
  • 13,644
  • 11
  • 59
  • 92
georg
  • 204,715
  • 48
  • 286
  • 369
23

I would use a generator:

def group(seq, sep):
    g = []
    for el in seq:
        if el == sep:
            yield g
            g = []
        g.append(el)
    yield g

ex = ['A', 'WORD', 'B' , 'C' , 'WORD' , 'D']
result = list(group(ex, 'WORD'))
print(result)

This prints

[['A'], ['WORD', 'B', 'C'], ['WORD', 'D']]

The code accepts any iterable, and produces an iterable (which you don't have to flatten into a list if you don't want to).

NPE
  • 464,258
  • 100
  • 912
  • 987
  • 2
    Note that if u want to exclude the delimiter from the results, u can add continue statement inside the if statement in the `group` function. – tjysdsg Aug 06 '19 at 13:35
  • Note that if you exclude the stop-word, you would be yielding a empty list if the stop-word is at the end of your input – norok2 Dec 06 '19 at 12:15
3
  • @NPE's solution looks very pythonic to me. This is another one using itertools:
  • izip is specific to python 2.7. Replace izip with zip to work in python 3
from itertools import izip, chain
example = ['A', 'WORD', 'B' , 'C' , 'WORD' , 'D']
indices = [i for i,x in enumerate(example) if x=="WORD"]
pairs = izip(chain([0], indices), chain(indices, [None]))
result = [example[i:j] for i, j in pairs]
Trenton McKinney
  • 43,885
  • 25
  • 111
  • 113
A. Rodas
  • 19,561
  • 8
  • 63
  • 70
3

Given

import more_itertools as mit


iterable = ["A", "WORD", "B" , "C" , "WORD" , "D"]
pred = lambda x: x == "WORD"

Code

list(mit.split_before(iterable, pred))
# [['A'], ['WORD', 'B', 'C'], ['WORD', 'D']]

more_itertools is a third-party library installable via > pip install more_itertools.

See also split_at and split_after.

pylang
  • 34,585
  • 11
  • 114
  • 108