530

Is it possible to split a string every nth character?

For example, suppose I have a string containing the following:

'1234567890'

How can I get it to look like this:

['12','34','56','78','90']
Georgy
  • 9,972
  • 7
  • 57
  • 66
Brandon L Burnett
  • 5,317
  • 3
  • 13
  • 4
  • 1
    The list equivalent of this question: [How do you split a list into evenly sized chunks?](https://stackoverflow.com/q/312443/6045800) (while some answers overlap and apply for both, there are some unique for each) – Tomerikoo Dec 28 '21 at 15:05

17 Answers17

718
>>> line = '1234567890'
>>> n = 2
>>> [line[i:i+n] for i in range(0, len(line), n)]
['12', '34', '56', '78', '90']
satomacoto
  • 10,389
  • 2
  • 15
  • 13
  • 1
    @TrevorRudolph It only does exactly what you tell it. The above answer is really only just a for loop but expressed pythonically. Also, if you need to remember a "simplistic" answer, there are at least hundreds of thousands of ways to remember them: starring the page on stackoverflow; copying and then pasting into an email; keeping a "helpful" file with stuff you want to remember; simply using a modern search engine whenever you need something; using bookmarks in (probably) every web browser; etc. – dylnmc Nov 02 '14 at 04:03
  • It is easier to understand but it has the downside that you must reference 'line' twice. – Damien Jan 05 '16 at 14:33
  • 1
    Great for breaking up long lines for printing, e.g. ``for i in range(0, len(string), n): print(string[i:i+n])`` – PatrickT Aug 06 '21 at 06:12
  • follows the philosophy, keeping it simple; that's pythonic elegance! – Minhaj Dec 16 '21 at 10:24
292

Just to be complete, you can do this with a regex:

>>> import re
>>> re.findall('..','1234567890')
['12', '34', '56', '78', '90']

For odd number of chars you can do this:

>>> import re
>>> re.findall('..?', '123456789')
['12', '34', '56', '78', '9']

You can also do the following, to simplify the regex for longer chunks:

>>> import re
>>> re.findall('.{1,2}', '123456789')
['12', '34', '56', '78', '9']

And you can use re.finditer if the string is long to generate chunk by chunk.

Georgy
  • 9,972
  • 7
  • 57
  • 66
the wolf
  • 32,251
  • 12
  • 52
  • 71
  • 9
    This is by far the best answer here and deserves to be on top. One could even write `'.'*n` to make it more clear. No joining, no zipping, no loops, no list comprehension; just find the next two characters next to each other, which is exactly how a human brain thinks about it. If Monty Python were still alive, he'd love this method! – SO_fix_the_vote_sorting_bug Dec 12 '18 at 01:27
  • 1
    This is the fastest method for reasonably long strings too: https://gitlab.com/snippets/1908857 – Ralph Bolton Oct 30 '19 at 16:03
  • 10
    This won't work if the string contains newlines. This needs `flags=re.S`. – Aran-Fey Nov 14 '19 at 17:17
  • 1
    Yeah this is not a good answer. Regexes have so many gotchas (as Aran-Fey found!) that you should use them *very sparingly*. You definitely don't need them here. They're only faster because they're implemented in C and Python is crazy slow. – Timmmm Mar 22 '22 at 15:17
  • This is fast but more_itertools.sliced seems more efficient. – FifthAxiom Jun 01 '22 at 04:42
235

There is already an inbuilt function in python for this.

>>> from textwrap import wrap
>>> s = '1234567890'
>>> wrap(s, 2)
['12', '34', '56', '78', '90']

This is what the docstring for wrap says:

>>> help(wrap)
'''
Help on function wrap in module textwrap:

wrap(text, width=70, **kwargs)
    Wrap a single paragraph of text, returning a list of wrapped lines.

    Reformat the single paragraph in 'text' so it fits in lines of no
    more than 'width' columns, and return a list of wrapped lines.  By
    default, tabs in 'text' are expanded with string.expandtabs(), and
    all other whitespace characters (including newline) are converted to
    space.  See TextWrapper class for available keyword args to customize
    wrapping behaviour.
'''
Diptangsu Goswami
  • 4,823
  • 3
  • 23
  • 34
  • 3
    print(wrap('12345678', 3)) splits the string into groups of 3 digits, but starts in front and not behind. Result: ['123', '456', '78'] – Atalanttore May 20 '19 at 19:20
  • 4
    It is interesting to learn about 'wrap' yet it is not doing exactly what was asked above. It is more oriented towards displaying text, rather than splitting a string to a fixed number of characters. – Oren Jun 05 '19 at 15:21
  • 6
    `wrap` may not return what is asked for if the string contains space. e.g. `wrap('0 1 2 3 4 5', 2)` returns `['0', '1', '2', '3', '4', '5']` (the elements are stripped) – satomacoto Jun 20 '19 at 09:22
  • 3
    This indeed answers the question, but what happens if there's spaces and you want them maintained in the split characters? wrap() removes spaces if they fall straight after a split group of characters – Iron Attorney Jul 05 '19 at 18:56
  • 2
    This works poorly if you want to split text with hyphens (the number you give as argument is actually the MAXIMUM number of characters, not exact one, and it breaks i.e. on hyphens and white spaces). – MrVocabulary Aug 06 '19 at 14:11
  • `wrap()` appears to be pretty slow (and much slower than say the regex solution): https://gitlab.com/snippets/1908857 – Ralph Bolton Oct 30 '19 at 16:01
  • I can't believe this isn't the top answer. This is by far the best solution! – EmilioAK Jun 29 '20 at 00:10
  • 1
    you can use `drop_whitespace=False` and `break_on_hyphens=False` to prevent the issues stated by satomacoto and MrVocabulary. See the [full documentation](https://docs.python.org/3/library/textwrap.html#textwrap.TextWrapper) – bmurauer Mar 25 '21 at 08:40
  • 1
    @Atalanttore Just do the following: `".".join(wrap(str(12345678)[::-1], 3))[::-1]` and you end up with `12.345.678`. – Gilfoyle May 02 '22 at 07:43
  • This is so slow. more_itertools.sliced and re.findall are much faster. – FifthAxiom Jun 01 '22 at 04:38
92

Another common way of grouping elements into n-length groups:

>>> s = '1234567890'
>>> map(''.join, zip(*[iter(s)]*2))
['12', '34', '56', '78', '90']

This method comes straight from the docs for zip().

Andrew Clark
  • 192,132
  • 30
  • 260
  • 294
  • 2
    In [19]: a = "hello world"; list( map( "".join, zip(*[iter(a)]*4) ) ) get the result ['hell', 'o wo']. – truease.com Apr 18 '13 at 15:54
  • 20
    If someone finds `zip(*[iter(s)]*2)` tricky to understand, read [How does `zip(*[iter(s)]*n)` work in Python?](http://stackoverflow.com/questions/2233204/how-does-zipitersn-work-in-python). – Grijesh Chauhan Jan 11 '14 at 14:49
  • 17
    This does not account for an odd number of chars, it'll simply drop those chars: `>>> map(''.join, zip(*[iter('01234567')]*5))` -> `['01234']` – Bjorn Sep 15 '14 at 19:39
  • 4
    To also handle odd number of chars just replace `zip()` with `itertools.zip_longest()`: `map(''.join, zip_longest(*[iter(s)]*2, fillvalue=''))` – user222758 Jun 08 '17 at 07:44
  • Also useful: docs for [`maps()`](https://docs.python.org/3/library/functions.html#map) – winklerrr Apr 23 '19 at 11:17
73

I think this is shorter and more readable than the itertools version:

def split_by_n(seq, n):
    '''A generator to divide a sequence into chunks of n units.'''
    while seq:
        yield seq[:n]
        seq = seq[n:]

print(list(split_by_n('1234567890', 2)))
Diptangsu Goswami
  • 4,823
  • 3
  • 23
  • 34
Russell Borogove
  • 17,663
  • 3
  • 38
  • 47
  • 8
    but not really efficient: when applied to strings: too many copies – Eric Aug 27 '15 at 21:17
  • 1
    It also doesn't work if seq is a generator, which is what the itertools version is _for_. Not that OP asked for that, but it's not fair to criticize itertool's version not being as simple. – mikenerone Jun 28 '17 at 20:47
35

Using more-itertools from PyPI:

>>> from more_itertools import sliced
>>> list(sliced('1234567890', 2))
['12', '34', '56', '78', '90']
Tim Diels
  • 2,958
  • 2
  • 18
  • 22
31

I like this solution:

s = '1234567890'
o = []
while s:
    o.append(s[:2])
    s = s[2:]
vlk
  • 2,333
  • 3
  • 31
  • 32
17

You could use the grouper() recipe from itertools:

Python 2.x:

from itertools import izip_longest    

def grouper(iterable, n, fillvalue=None):
    "Collect data into fixed-length chunks or blocks"
    # grouper('ABCDEFG', 3, 'x') --> ABC DEF Gxx
    args = [iter(iterable)] * n
    return izip_longest(fillvalue=fillvalue, *args)

Python 3.x:

from itertools import zip_longest

def grouper(iterable, n, fillvalue=None):
    "Collect data into fixed-length chunks or blocks"
    # grouper('ABCDEFG', 3, 'x') --> ABC DEF Gxx"
    args = [iter(iterable)] * n
    return zip_longest(*args, fillvalue=fillvalue)

These functions are memory-efficient and work with any iterables.

Eugene Yarmash
  • 131,677
  • 37
  • 301
  • 358
12

This can be achieved by a simple for loop.

a = '1234567890a'
result = []

for i in range(0, len(a), 2):
    result.append(a[i : i + 2])
print(result)

The output looks like ['12', '34', '56', '78', '90', 'a']

Sunil Purushothaman
  • 7,487
  • 1
  • 19
  • 20
Kasem777
  • 451
  • 5
  • 10
  • 3
    While this code may answer the question, providing additional context regarding why and/or how this code answers the question improves its long-term value. – β.εηοιτ.βε May 22 '20 at 18:41
  • 4
    This is the same solution as here: https://stackoverflow.com/a/59091507/7851470 – Georgy May 22 '20 at 20:23
  • 1
    This is the same solution as the top voted answer - except for the fact that the top answer is using list comprehension. – Leonardus Chen Dec 07 '20 at 04:54
9

I was stucked in the same scenrio.

This worked for me

x="1234567890"
n=2
list=[]
for i in range(0,len(x),n):
    list.append(x[i:i+n])
print(list)

Output

['12', '34', '56', '78', '90']
Strick
  • 1,228
  • 7
  • 15
8

Try the following code:

from itertools import islice

def split_every(n, iterable):
    i = iter(iterable)
    piece = list(islice(i, n))
    while piece:
        yield piece
        piece = list(islice(i, n))

s = '1234567890'
print list(split_every(2, list(s)))
enderskill
  • 6,748
  • 3
  • 23
  • 23
  • Your answer doesn't meet OP's requirement, you have to use `yield ''.join(piece)` to make it work as expected: https://eval.in/813878 – user222758 Jun 08 '17 at 08:15
7

Try this:

s='1234567890'
print([s[idx:idx+2] for idx,val in enumerate(s) if idx%2 == 0])

Output:

['12', '34', '56', '78', '90']
U12-Forward
  • 65,118
  • 12
  • 70
  • 89
6
>>> from functools import reduce
>>> from operator import add
>>> from itertools import izip
>>> x = iter('1234567890')
>>> [reduce(add, tup) for tup in izip(x, x)]
['12', '34', '56', '78', '90']
>>> x = iter('1234567890')
>>> [reduce(add, tup) for tup in izip(x, x, x)]
['123', '456', '789']
ben w
  • 2,452
  • 12
  • 17
5

As always, for those who love one liners

n = 2  
line = "this is a line split into n characters"  
line = [line[i * n:i * n+n] for i,blah in enumerate(line[::n])]
Sqripter
  • 89
  • 2
  • 7
  • When I run this in Python Fiddle with a `print(line)` I get `this is a line split into n characters` as the output. Might you be better putting: `line = [line[i * n:i * n+n] for i,blah in enumerate(line[::n])]`? Fix this and it's a good answer :). – Peter David Carter May 20 '16 at 20:24
  • Can you explain the `,blah` and why it's necessary? I notice I can replace `blah` with any alpha character/s, but not numbers, and can't remove the `blah` or/and the comma. My editor suggests adding whitespace after `,` :s – toonarmycaptain Jul 17 '17 at 20:11
  • `enumerate` returns two iterables, so you need two places to put them. But you don't actually need the second iterable for anything in this case. – Daniel F Jul 27 '17 at 09:18
  • 1
    Rather than `blah` I prefer to use an underscore or double underscore, see: https://stackoverflow.com/questions/5893163/what-is-the-purpose-of-the-single-underscore-variable-in-python – Andy Royal Aug 15 '17 at 10:39
3

A simple recursive solution for short string:

def split(s, n):
    if len(s) < n:
        return []
    else:
        return [s[:n]] + split(s[n:], n)

print(split('1234567890', 2))

Or in such a form:

def split(s, n):
    if len(s) < n:
        return []
    elif len(s) == n:
        return [s]
    else:
        return split(s[:n], n) + split(s[n:], n)

, which illustrates the typical divide and conquer pattern in recursive approach more explicitly (though practically it is not necessary to do it this way)

englealuze
  • 1,393
  • 10
  • 17
2

more_itertools.sliced has been mentioned before. Here are four more options from the more_itertools library:

s = "1234567890"

["".join(c) for c in mit.grouper(2, s)]

["".join(c) for c in mit.chunked(s, 2)]

["".join(c) for c in mit.windowed(s, 2, step=2)]

["".join(c) for c in  mit.split_after(s, lambda x: int(x) % 2 == 0)]

Each of the latter options produce the following output:

['12', '34', '56', '78', '90']

Documentation for discussed options: grouper, chunked, windowed, split_after

pylang
  • 34,585
  • 11
  • 114
  • 108
2

A solution with groupby:

from itertools import groupby, chain, repeat, cycle

text = "wwworldggggreattecchemggpwwwzaz"
n = 3
c = cycle(chain(repeat(0, n), repeat(1, n)))
res = ["".join(g) for _, g in groupby(text, lambda x: next(c))]
print(res)

Output:

['www', 'orl', 'dgg', 'ggr', 'eat', 'tec', 'che', 'mgg', 'pww', 'wza', 'z']
TigerTV.ru
  • 1,020
  • 2
  • 14
  • 33