1

So I have a string which I need to parse. The string contains a number of words, separated by a hyphen (-). The string also ends with a hyphen.

For example one-two-three-.

Now, if I want to look at the words on their own, I split up the string to a list.

wordstring = "one-two-three-"
wordlist = wordstring.split('-')

for i in range(0, len(wordlist)):
     print(wordlist[i])

Output

one
two
three
#empty element

What I don't understand is, why in the resulting list, the final element is an empty string. How can I omit this empty element?

Should I simply truncate the list or is there a better way to split the string?

strpeter
  • 2,162
  • 3
  • 24
  • 44
SaAtomic
  • 579
  • 10
  • 26
  • 1
    Possible duplicate of [python split function -avoids last empy space](http://stackoverflow.com/questions/10780423/python-split-function-avoids-last-empy-space) – Chris_Rands Feb 15 '17 at 14:17

9 Answers9

4

You have an empty string because the split on the last - character produces an empty string on the RHS. You can strip all '-' characters from the string before splitting:

wordlist = wordstring.strip('-').split('-')
Moses Koledoye
  • 74,909
  • 8
  • 119
  • 129
3

If the final element is always a - character, you can omit it by using [:-1] which grabs all the elements of the string besides the last character.

Then, proceed to split it as you did:

wordlist = wordstring[:-1].split('-')
print(wordlist)
['one', 'two', 'three']
Dimitris Fasarakis Hilliard
  • 136,212
  • 29
  • 242
  • 233
2

You can use regex to do this :

import re
wordlist = re.findall("[a-zA-Z]+(?=-)", wordstring)

Output :

['one', 'two', 'three']
Jarvis
  • 8,331
  • 3
  • 26
  • 54
  • 1
    Nice idea, but I'd just use `"[^-]+"` instead. We don't know what are the "legal" characters, just that it's not a `-`. – tobias_k Feb 15 '17 at 13:45
  • This answer also handles the `wordstring = "one-two---three-"` case correctly (assuming that in this case there should be no empty strings either) – tobias_k Feb 15 '17 at 13:48
  • 1
    Sure, with "this" I meant _this answer_, not _my suggestion_. – tobias_k Feb 15 '17 at 13:50
1

You should use the strip built-in function of Python before splitting your String. E.g:

wordstring = "one-two-three-"
wordlist = wordstring.strip('-').split('-')
LucG
  • 913
  • 7
  • 19
1

I believe .split() is assuming there is another element after the last - but it is obviously a blank entry.

Are you open to removing the dash in wordstring before splitting it?

wordstring = "one-two-three-"
wordlist = wordstring[:-1].split('-')
print wordlist

OUT: 'one-two-three'
NickBraunagel
  • 1,459
  • 1
  • 16
  • 28
1

This is explained in the docs:

... If sep is given, consecutive delimiters are not grouped together and are deemed to delimit empty strings (for example, '1,,2'.split(',') returns ['1', '', '2']). ...

If you know your strings will always end in '-', then just remove the last one by doing wordlist.pop().

If you need something more complicated you may want to learn about regular expressions.

daphtdazz
  • 7,315
  • 34
  • 52
1

Just for the variaty of options:

wordlist = [x for x in wordstring.split('-') if x]

Note that the above also handles cases such as: wordstring = "one-two--three-" (double hyphen)

Ma0
  • 14,712
  • 2
  • 33
  • 62
  • An inefficient option – Chris_Rands Feb 15 '17 at 13:47
  • @Chris_Rands That is why the disclaimer is there. – Ma0 Feb 15 '17 at 13:47
  • 1
    IMO not worth listing worse alternatives to existing solutions, but here's another then: `wordstring.replace('-','\n').splitlines()` – Chris_Rands Feb 15 '17 at 13:50
  • @Chris_Rands 1) you are free to dv if it messes up with your aesthetics. 2) this comprehension might no be so interesting here because this is a very simple case but the construct `[x for x in y if f(x)]` is valuable in many cases and often encountered. – Ma0 Feb 15 '17 at 14:02
  • I'm not gonna dv but 2) is irrelevant to *this* question – Chris_Rands Feb 15 '17 at 14:08
1

First strip() then split()

wordstring = "one-two-three-"
x = wordstring.strip('-')
y  = x.split('-')

for word in y:
    print word
mtt2p
  • 1,730
  • 1
  • 13
  • 21
-1

Strip/trim the string before splitting. This way you will remove the trailing "\n" and you should be fine.

Chobeat
  • 3,345
  • 5
  • 38
  • 57