593

I'm looking for the Python equivalent of

String str = "many   fancy word \nhello    \thi";
String whiteSpaceRegex = "\\s";
String[] words = str.split(whiteSpaceRegex);

["many", "fancy", "word", "hello", "hi"]
Martin Thoma
  • 108,021
  • 142
  • 552
  • 849
siamii
  • 21,999
  • 26
  • 89
  • 139

4 Answers4

1092

The str.split() method without an argument splits on whitespace:

>>> "many   fancy word \nhello    \thi".split()
['many', 'fancy', 'word', 'hello', 'hi']
Boris Verkhovskiy
  • 10,733
  • 7
  • 77
  • 79
Sven Marnach
  • 530,615
  • 113
  • 910
  • 808
  • 83
    Also good to know is that if you want the first word only (which means passing `1` as second argument), you can use `None` as the first argument: `s.split(None, 1)` – yak Nov 13 '11 at 19:00
  • 9
    If you only want the first word, use *str.partition*. – Raymond Hettinger Nov 13 '11 at 19:11
  • 35
    @yak : Can you please edit your comment. The way it sounds right now is that s.split(None, 1) would return 1st word only. It rather gives a list of size 2. First item being the first word, second - rest of the string. `s.split(None, 1)[0]` would return the first word only – user3527975 Feb 25 '16 at 21:43
  • Also the default split trims whitespace from either side so you don't have to call str.strip() e.g. `" asdf asdf \t\n ".split()` returns `['asdf', 'asdf']` – lee penkman Nov 24 '16 at 01:32
  • does `str.split()` do something like `re.split('\s+', string)` behind the scenes? – galois Dec 20 '16 at 23:05
  • 2
    @galois No, it uses a custom implementation (which is faster). Also note that it handles leading and trailing whitespace differently. – Sven Marnach Dec 21 '16 at 07:53
  • Sven, in my case line, could contain words like `'Kishor Pawar' 'Sven Marnach'`. What would you suggest? – Kishor Pawar Jan 02 '19 at 07:46
  • 3
    @KishorPawar It's rather unclear to me what you are trying to achieve. Do you want to split on whitespace, but disregard whitespace inside single-quoted substrings? If so, you can look into [`shlex.split()`](https://docs.python.org/3/library/shlex.html#shlex.split), which may be what you are looking for. Otherwise I suggest asking a new question – you will get a much quicker and more detailed answer. – Sven Marnach Jan 02 '19 at 10:12
  • Thank you @SvenMarnach. You guessed the case correctly. I will take a look at shelx.split() – Kishor Pawar Jan 02 '19 at 10:22
82
import re
s = "many   fancy word \nhello    \thi"
re.split('\s+', s)
Óscar López
  • 225,348
  • 35
  • 301
  • 374
28

Using split() will be the most Pythonic way of splitting on a string.

It's also useful to remember that if you use split() on a string that does not have a whitespace then that string will be returned to you in a list.

Example:

>>> "ark".split()
['ark']
digitalnomd
  • 1,302
  • 12
  • 19
21

Another method through re module. It does the reverse operation of matching all the words instead of spitting the whole sentence by space.

>>> import re
>>> s = "many   fancy word \nhello    \thi"
>>> re.findall(r'\S+', s)
['many', 'fancy', 'word', 'hello', 'hi']

Above regex would match one or more non-space characters.

Avinash Raj
  • 166,785
  • 24
  • 204
  • 249