-1

how to remove special characters in python dictionary?

output = [{'title': 'title 1\u200c',
  'subject': 'subject1\u200c','a'},
{'title': 'title 1\u200c',
  'subject': ['subject1\u200c','a','b']}]

This is what I tried:

output['title'] = s.replace("\u200c", "") for s in output['title']
user12217822
  • 262
  • 4
  • 14
  • I changed your `output` dictionary based on what you have written in your code, because the dictionary as you had written it wasn't valid (`SyntaxError: (unicode error) 'unicodeescape' codec can't decode bytes in position 7-11: truncated \uXXXX escape`) Please check that it is correct – Pranav Hosangadi Sep 27 '21 at 21:30
  • I get syntax error because of "for" – user12217822 Sep 27 '21 at 21:32
  • Does this answer your question? [Replace non-ASCII characters with a single space](https://stackoverflow.com/questions/20078816/replace-non-ascii-characters-with-a-single-space) – Woodford Sep 27 '21 at 21:34

2 Answers2

2

What are you iterating for? You just need to replace the character from the string using str.replace().

output['title'] = output['title'].replace("\u200c", "")

This only changes value of the 'title' key of output

{'title': 'title 1', 'subject': 'subject1\u200c'}

If you want to remove the character from all items in output, you need a loop.:

for key, value in output.items():
    output[key] = value.replace("\u200c", "")

Or, as a dict comprehension:

output = {key: value.replace("\u200c", "") for key, value in output.items()}
 {'title': 'title 1', 'subject': 'subject1'}

Addressing your comments

I got this error for part one list indices must be integers or slices, not str

I got this error for second answer: 'list' object has no attribute 'items'

Its array of objects

Let's say output looks like this:

output = [{'title': 'title 1\u200c', 'subject': 'subject1\u200c'},
          {'title': 'title 2\u200c', 'subject': 'subject2\u200c'}]

You want to do what I showed above to each dict in output. Just replace output from before with elem

for elem in output:
    elem['title'] = elem['title'].replace("\u200c", "")
[{'title': 'title 1', 'subject': 'subject1\u200c'},
 {'title': 'title 2', 'subject': 'subject2\u200c'}]

Or, using a list and dict comprehension:

output = [
    {key: value.replace("\u200c", "") for key, value in elem.items()}
    for elem in output
    ]
[{'title': 'title 1', 'subject': 'subject1'},
 {'title': 'title 2', 'subject': 'subject2'}]
Pranav Hosangadi
  • 17,542
  • 5
  • 40
  • 65
2

This isn't only a special character, those are Unicode Characters. To remove Unicode characters we can use the encode() python method. The encode will return a bytes object, and you can transform in string by using the decode method.

In [1]: title = "subject1\u200c"

In [2]: title.encode("ascii", "ignore")
Out[2]: b'subject1'

In [3]: title.encode("ascii", "ignore").decode()
Out[3]: 'subject1'

For your list of dicts, what you need is something like:

In [15]: output = [{'title': 'title 1\u200c',
    ...:   'subject': 'subject1\u200c'}, {'title': 'title 1\u200c',
    ...:   'subject': 'subject1\u200c'}]

In [16]: decoded_output = [value["title"].encode("ascii", "ignore").decode() for val
    ...: ue in output]

In [17]: decoded_output
Out[17]: ['title 1', 'title 1']

EDIT:

In [20]: for i in output:
    ...:     for key, value in i.items():
    ...:         value.encode("ascii", "ignore").decode()
    ...:         print(value)
    ...: 
title 1‌
subject1‌
title 1‌
subject1‌

As you have a list of dicts, you have to iterate in the list, and for each item of the list (that are dicts) you will iterate again using the items() dict method.