I'm working with a JSON which has some fields filled in by humans in different countries. Some scripts share a letter, but the each script has distinct unicode IDs.
Here are some examples of this:
Р is the Cyrillic script has the unicode ID 0x420.P in the Latin alphabet has the unicode ID 0x50.
е and e have unicode IDs 0x435 and 0x45 respectively.
М and M have unicode IDs 0x41c and 0x4d.
This results in hard to understand behavior:
'МРе' in 'MPe'
False
'MPe' in 'MPe'
True
I don't have any way to control the incoming data.
What is the best way to handle alike unicode characters in Python? Phrased differently, what is the best way to make 'MPe' in 'MPe' is True regardless of the script that the data comes in?