23

I have a string in Python like this:

u'\u200cHealth & Fitness'

How can i remove the

\u200c

part from the string ?

joanis
  • 6,977
  • 11
  • 26
  • 33
V.Anh
  • 417
  • 2
  • 6
  • 15

5 Answers5

46

You can encode it into ascii and ignore errors:

u'\u200cHealth & Fitness'.encode('ascii', 'ignore')

Output:

'Health & Fitness'
Arount
  • 8,960
  • 1
  • 27
  • 40
  • 5
    This obviously works in the above example but you are forcing the string into ascii losing all unicode chars, which obviously is not a solution that works for all – Martin Massera Jul 28 '19 at 14:05
29

If you have a string that contains Unicode character, like

s = "Airports Council International \u2013 North America"

then you can try:

newString = (s.encode('ascii', 'ignore')).decode("utf-8")

and the output will be:

Airports Council International North America

Upvote if helps :)

Hayat
  • 1,322
  • 3
  • 15
  • 30
16

I just use replace because I don't need it:

varstring.replace('\u200c', '')

Or in your case:

u'\u200cHealth & Fitness'.replace('\u200c', '')
joanis
  • 6,977
  • 11
  • 26
  • 33
  • 5
    This is actually better than the accepted answer in most strings. The \u200c is a zero width non joiner, which is an unusual whitespace-type character that `strip()` ignores. In most cases with unicode strs you do not want to `encode(ascii, ignore)`. – Chet Mar 28 '19 at 15:41
  • 1
    This is general solution since ascii may remove some other Unicode characters as well. – prosti Dec 03 '19 at 14:31
3

for me the following worked

mystring.encode('ascii', 'ignore').decode('unicode_escape')
Diana
  • 462
  • 4
  • 16
  • 2
    You could improve your answer by explaining _why_ this code works, and what you're doing here. That way, others can be educated. – RyanZim Dec 11 '18 at 13:44
  • tbh, that was a 'Frankenstein' version of all answers that I had previously found but which didn't work. I can't really explain why this one worked over the rest of solutions in my case.. – Diana Oct 23 '19 at 11:19
1

In the specific case in the question: that the string is prefixed with a single u'\200c' character, the solution is as simple as taking a slice that does not include the first character.

original = u'\u200cHealth & Fitness'
fixed = original[1:]

If the leading character may or may not be present, str.lstrip may be used

original = u'\u200cHealth & Fitness'
fixed = original.lstrip(u'\u200c')

The same solutions will work in Python3. From Python 3.9, str.removeprefix is also available

original = u'\u200cHealth & Fitness'
fixed = original.removeprefix(u'\u200c')
snakecharmerb
  • 36,887
  • 10
  • 71
  • 115