37

I want to split a string by a list of indices, where the split segments begin with one index and end before the next one.

Example:

s = 'long string that I want to split up'
indices = [0,5,12,17]
parts = [s[index:] for index in indices]
for part in parts:
    print part

This will return:

long string that I want to split up
string that I want to split up
that I want to split up
I want to split up

I'm trying to get:

long
string
that
I want to split up

smci
  • 29,564
  • 18
  • 109
  • 144
Yarin
  • 159,198
  • 144
  • 384
  • 498

3 Answers3

52
s = 'long string that I want to split up'
indices = [0,5,12,17]
parts = [s[i:j] for i,j in zip(indices, indices[1:]+[None])]

returns

['long ', 'string ', 'that ', 'I want to split up']

which you can print using:

print '\n'.join(parts)

Another possibility (without copying indices) would be:

s = 'long string that I want to split up'
indices = [0,5,12,17]
indices.append(None)
parts = [s[indices[i]:indices[i+1]] for i in xrange(len(indices)-1)]
eumiro
  • 194,053
  • 32
  • 286
  • 259
  • 3
    Another way is, `[s[i:j] for i,j in izip_longest(indices,indices[1:])]` but I like your way better! – jamylak Jun 01 '12 at 13:51
  • This copies the indices list with `indices[1:]` and creates a new list with double size by the `zip` function -> Bad performance and memory consumption. – schlamar Jun 01 '12 at 13:58
  • 2
    @ms4py This is fine, performance is not an issue in this case, this is a very readable solution. If performance is an issue my suggestion can be used. – jamylak Jun 01 '12 at 14:01
  • 1
    eumiro- thank you, this works great. Can you explain how the +[None] part works? – Yarin Jun 01 '12 at 14:06
  • @ms4py - ok, there's an updated version withou copying of the list and without zip. Although your `itertools` version is probably more performant. – eumiro Jun 01 '12 at 14:06
  • @Yarin - `indices[1:] + [None]` copies the array without the first element and adds a `None` at the end. So for your `indices` it looks like `[5,12,17,None]`. I am using it to be able to access the last part of the string with `s[17:None]` (the same like `s[17:]`, just using two variables I have anyway). – eumiro Jun 01 '12 at 14:08
  • @Yarin `[1:None]` for example is the same as `[1:]` – jamylak Jun 01 '12 at 14:08
  • @ms4py What do you mean by that? – jamylak Jun 01 '12 at 14:11
  • Not sure it's your fortee but how would on do this in NodeJs? – lonewarrior556 Apr 23 '20 at 15:35
  • This had been a hectic for me since an hour and half. Thanks @eumiro – Siva Sankar Apr 17 '22 at 13:29
4

Here is a short solution with heavy usage of the itertools module. The tee function is used to iterate pairwise over the indices. See the Recipe section in the module for more help.

>>> from itertools import tee, izip_longest
>>> s = 'long string that I want to split up'
>>> indices = [0,5,12,17]
>>> start, end = tee(indices)
>>> next(end)
0
>>> [s[i:j] for i,j in izip_longest(start, end)]
['long ', 'string ', 'that ', 'I want to split up']

Edit: This is a version that does not copy the indices list, so it should be faster.

flywire
  • 851
  • 9
  • 27
schlamar
  • 8,904
  • 3
  • 36
  • 76
  • Thanks for the alt approach- ill have to check out itertools sometime – Yarin Jun 01 '12 at 14:10
  • Neat approach, learned something new. Is there an easy way to get rid of the extra blank at the end of the first 3 strings inside the expression? I tried `s[i:j].strip()` but that didn't work at all (not sure why not) – Levon Jun 01 '12 at 14:11
  • If you are gonna use this you may as well use the pairwise function straight from the itertools docs. Also using `next(end)` is preferred to `end.next()` for python 3 compatibility. – jamylak Jun 01 '12 at 14:34
3

You can write a generator if you don't want to make any modifications to the list of indices:

>>> def split_by_idx(S, list_of_indices):
...     left, right = 0, list_of_indices[0]
...     yield S[left:right]
...     left = right
...     for right in list_of_indices[1:]:
...         yield S[left:right]
...         left = right
...     yield S[left:]
... 
>>> 
>>> 
>>> s = 'long string that I want to split up'
>>> indices = [5,12,17]
>>> [i for i in split_by_idx(s, indices)]
['long ', 'string ', 'that ', 'I want to split up']
Zhou Shao
  • 81
  • 1
  • 2