3

I'm trying to split a string by the positions given from a list, and add the parts to a new list. I start with:

seq = 'ATCGATCGATCG'
seq_new = []
seq_cut = [2, 8 , 10]

I would like to get:

seq_new = ['AT', 'CGATCG', 'AT', 'CG'] 

The list with the positions is variable in size and values. How can I process my data like this?

davidism
  • 110,080
  • 24
  • 357
  • 317

2 Answers2

7

Use zip to create indexes for slicing:

seq_new = [seq[start:end] for start, end in zip([None] + seq_cut, seq_cut + [None])]

This zips together [None, 2, 8 , 10] and [2, 8, 10, None] to create the indexes [(None, 2), (2, 8), (8, 10), (10, None)]. None as first index defaults to zero, None as the second index defaults to the size of the sequence being sliced.

Community
  • 1
  • 1
Steven Rumbalski
  • 42,094
  • 8
  • 83
  • 115
  • 1
    Darn it - was just going to copy/paste from my editor almost exactly the same :p – Jon Clements Mar 13 '15 at 17:20
  • 1
    For added symmetry you can use `None` on the first one (like I just did! :P) – DSM Mar 13 '15 at 17:20
  • This can be made even more elegant by using a modified version of the "pairwise" recipe (left as exercise to the reader) described in [this post](http://stackoverflow.com/a/21303303/4621513)! The resulting expression would be `[seq[start:end] for start, end in pairwise(seq_cut)]` – mkrieger1 Mar 13 '15 at 17:52
  • @mkrieger1: An implementation of `pairwise` can be found in the recipe's section of the `itertools` docs. I don't think it's worth it here. The code is roughly the same: `seq_new = [seq[start:end] for start, end in pairwise([None] + seq_cut + [None])]` – Steven Rumbalski Mar 13 '15 at 18:01
  • Yes, I also think it's not worth it unless it's going to be used more than only a few times. And I meant to append `None` in the front and back inside a modified version of `pairwise`, to hide that little detail from the user. – mkrieger1 Mar 13 '15 at 21:18
5

Use slicing:

seq = "ATCGATCGATCG"
seq_new = []
seq_cut = [2, 8, 10]

last = 0
for idx in seq_cut:
    seq_new.append(seq[last:idx])
    last = idx
seq_new.append(seq[last:])
orlp
  • 106,415
  • 33
  • 201
  • 300
  • Python slicing syntax can be daunting to people unfamiliar with it, but it's super powerful (+1) https://docs.python.org/2.3/whatsnew/section-slices.html – TeneCursum Mar 13 '15 at 17:14