I am working with a bunch of labeled text data generated by customers of my company. I often come across strings with weird characters like this: .
My machine learning models don't like these characters and I've resorted to simply removing them from strings following this answer on stackoverflow. This often times partially/completely destroys the meaning of the data I'm working with.
What I'd like to do is this:
normal_text = glitchy_to_ascii(' ')
print(normal_text)
'My name is Sam'
Is there some standard way of doing this? Are there Python packages out there? Also, what is this sort of text called? I've seen it called 'glitchy' text on various websites.