python3 wirdt chars and utf-8 can't remove it

Question

I have a problem, i trying to get a string to be equel in python3 and in mysql, the problem is i expect its shut be utf-8 but the problem is its not the same.

i have this string

station√¶r pc > station√¶r pc

and what i wich now is its shut look like this

stationr pc > stationr pc

and i have try to use bytes(string, 'utf-8').decode('utf-8') and a lots of orther things.

hope one here can help me to strip all the wirdt charters out of my strings so i can use it better, the problem is the data coming from extenrel files and i can't control the encoding.

Shouldn't this actually be "stationær pc"? This looks exactly like mojibake for interpreting UTF-8 data with the Mac Roman codec. I can reproduce it with `'stationær'.encode('utf8').decode('macroman')`. — lenz, Jan 10 '18 at 13:09
In general, there's no need to control the encoding of input data. It's important to *know* what encoding was used, then you can always decode accordingly. — lenz, Jan 10 '18 at 13:16
If you really want to convert "stationær pc" to "stationr pc", you can do `"stationær pc".encode('ascii', errors='ignore').decode('ascii')`. — lenz, Jan 10 '18 at 13:20
thanks yeah its working this way, need to ignore it by using bytes(cat['Title'],'utf-8').decode('utf8').encode('ascii', errors='ignore').strip() thanks a lot :) will you make a anwser? — ParisNakitaKejser, Jan 10 '18 at 13:23
I'm sure there are dozens of duplicates of this question, no need for another duplicate answer. Searching for "python remove non-ascii characters", I found [this answer](https://stackoverflow.com/a/18430817/1698431), for example. — lenz, Jan 10 '18 at 13:43
Btw, `bytes(x, 'utf8').decode('utf8') == x` for any x, so you can skip that. — lenz, Jan 10 '18 at 13:44

score 0 · Answer 1 · answered Jan 11 '18 at 01:10

0

As lenz found out, you have "Mojibake" with CHARACTER SET macroman versus utf8.

See this for ways that Mojibake can happen. (It reads "latin1" instead of "macroman".)

answered Jan 11 '18 at 01:10

Rick James

122,779
10
116
195

python3 wirdt chars and utf-8 can't remove it

1 Answers1