Weird encoding on JSON non-english characters (such as accents). How can I decode it?

Asked Jul 18 '21 at 10:48

Active Jul 18 '21 at 11:01

Viewed 39 times

I've got a JSON file (a chat from my Instagram data dump), and I'm parsing it in Java (Processing), importing it as a JSONObject and looping through it, getting the Strings I need.

It works fine for standard English characters, but seems to have a problem with any non-English characters, such as accents. For example: the word inútil is written as in\u00c3\u00batil. \u00c3\u00ba looks like unicode, but in Processing, when printing it to the console or writing it on the screen, is output as Ã⁰ instead of the ú it should be.

I found a similar question already on StackOverflow (Decode UTF-8 encoding in JSON string), and the problem's solved if I encode and then decode the Strings with a temporary variable:

String temp = new String(json.getString("text").getBytes("latin1"), "utf8").

All I've understood from that other thread is that it looks like the JSON could be encoded incorrectly, and it needs to be reencoded and decoded as I've done, but I'd like to actually understand why.

Could anyone clear it up a bit for me?

Thanks!

edited Jul 18 '21 at 11:01

asked Jul 18 '21 at 10:48

Hubbit200

It's not clear here how 'encoding' is done, or 'decoding' for that matter… – g00se Jul 18 '21 at 10:54

Weird encoding on JSON non-english characters (such as accents). How can I decode it?

0 Answers0