4

I have some binary data which is in Python in the form of an array of byte strings.

Is there a portable way to serialize this data that other languages could read?

JSON loses because I just found out that it has no real way to store binary data; its strings are expected to be Unicode.

I don't want to use pickle because I don't want the security risk, and that limits its use to other Python programs.

Any advice? I would really like to use a builtin library (or at least one that's part of the standard Anaconda distribution).

Russia Must Remove Putin
  • 337,988
  • 84
  • 391
  • 326
Jason S
  • 178,603
  • 161
  • 580
  • 939

1 Answers1

4

If you just need the binary data in the strings and can recover the boundaries between the individual strings easily, you could just write them to a file directly, as raw strings.

If you can't recover the string boundaries easily, JSON seems like a good option:

a = [b"abc\xf3\x9c\xc6", b"xyz"]
serialised = json.dumps([s.decode("latin1") for s in a])
print [s.encode("latin1") for s in json.loads(serialised)]

will print

['abc\xf3\x9c\xc6', 'xyz']

The trick here is that arbitrary binary strings are valid latin1, so they can always be decoded to Unicode and encoded back to the original string again.

Sven Marnach
  • 530,615
  • 113
  • 910
  • 808
  • The boundaries aren't impossible to recreate, but they're not easy to get either. (the [packet-framing problem](http://www.embeddedrelated.com/showarticle/113.php) ) So yeah, I can live with JSON's overhead given its widespread nature. – Jason S Mar 24 '14 at 22:22