4

I've download json with my conversations archive. I stuck with odd encoding.

Example of json:

{
  "sender_name": "Micha\u00c5\u0082",
  "timestamp": 1411741499,
  "content": "b\u00c4\u0099d\u00c4\u0099",
  "type": "Generic"
},

It should be something like this:

{
  "sender_name": "Michał",
  "timestamp": 1411741499,
  "content": "będę",
  "type": "Generic"
},

I'm trying to deserialize it like this:

var result = File.ReadAllText(jsonPath, encodingIn);
JavaScriptSerializer serializer = new JavaScriptSerializer();
serializer.MaxJsonLength = Int32.MaxValue;
var conversation = serializer.Deserialize<Conversation>(System.Net.WebUtility.HtmlDecode(result));

Unfortunately the output is like this:

{
  "sender_name": "MichaÅ\u0082",
  "timestamp": 1411741499,
  "content": "bÄ\u0099dÄ\u0099",
  "type": "Generic"
},

Anyone know how Facebook encoding the json? I've tried several methods but without results.

Thanks for your help.

Lavoriel
  • 91
  • 6
  • Check [How to decode a Unicode character in a string](https://stackoverflow.com/questions/9303257/how-to-decode-a-unicode-character-in-a-string) – Fabjan Jun 11 '18 at 13:48
  • what is encodingIn ? – Prany Jun 11 '18 at 14:22
  • couldnot find your latin characters with encoding that you mentioned - http://etutorials.org/Programming/actionscript/Appendix+A.+Unicode+Escape+Sequences+for+Latin+1+Characters/ – Prany Jun 11 '18 at 15:43
  • That's not encoding, that is Unicode character escaping as defined in the JSON standard: http://www.json.org/ -> https://stackoverflow.com/a/27516892 as well as https://tools.ietf.org/html/rfc7159#section-7. The standard states that in the `\uXXXX` escape sequence, the hex digits `XXXX` correspond to a **Unicode code point**. And U+00C5 really is [LATIN CAPITAL LETTER A WITH RING ABOVE](https://www.fileformat.info/info/unicode/char/00c5/index.htm) so the JSON is being parsed and interpreted correctly. Thus the JSON must have been mangled somehow, can you show how you obtained it? – dbc Jun 11 '18 at 18:30
  • See also https://stackoverflow.com/questions/50008296/facebook-json-badly-encoded – asmaier Jul 03 '18 at 16:11

1 Answers1

5

Here is the answer:

private string DecodeString(string text)
{
    Encoding targetEncoding = Encoding.GetEncoding("ISO-8859-1");
    var unescapeText = System.Text.RegularExpressions.Regex.Unescape(text);
    return Encoding.UTF8.GetString(targetEncoding.GetBytes(unescapeText));
}

I've collect all answers, mixed them and here we are. Thank you.

Lavoriel
  • 91
  • 6
  • This worked well for me in fixing the mess that you get with Facebook data. – Phil John Sep 20 '18 at 08:22
  • I was searching for a while on how to accomplish this in C#. Thanks for this. It's been a few years and Facebook still hasn't fixed their JSON files. – SparkleStep Sep 30 '21 at 04:47