11

I have a ASCII string = "abcdefghijk". I want to write this to a binary file in binary format using python.

I tried following:

str  = "abcdefghijk"
fp = file("test.bin", "wb")
hexStr = "".join( (("\\x%s") % (x.encode("hex"))) for x in str)
fp.write(hexStr)
fp.close()

However, when I open the test.bin I see the following in ascii format instead of binary.

\x61\x62\x63\x64\x65\x66\x67

I understand it because for two slashes here ("\\x%s"). How could I resolve this issue? Thanks in advance.

Update :

Following gives me the expected result:

file = open("test.bin", "wb")
file.write("\x61\x62\x63\x64\x65\x66\x67")
file.close() 

But how do I achieve this with "abcdef" ASCII string. ?

aMa
  • 599
  • 2
  • 9
  • 18
  • 1
    You *very carefully* encode the characters as hex - why are you expecting to see anything else? – jonrsharpe Mar 19 '15 at 17:28
  • What output did you expect then? I'm not sure you understood what binary mode *does*, or even what Python uses `\xhh` notation for (and it is just *syntax*, a way to produce a value, not the value itself). – Martijn Pieters Mar 19 '15 at 17:30
  • @jonrsharpe, I want to write "\x61\x62\x63\x64\x65\x66\x67" as binary to test.bin (not as ascii string). How can I do that? And finally the test.bin should be a binary file. – aMa Mar 19 '15 at 17:30
  • 1
    @aMa: all files are binary. Opening a file in text mode only enables special handling of newlines (and on Windows, causes 0x0a being interpreted as the end of the file). As such, just write `'abcd'`. On Python 3, you'd need to encode text to bytes with `str.encode('ascii')`. – Martijn Pieters Mar 19 '15 at 17:33
  • 2
    @aMa: However, a binary file is *not* a sequence of hexadecimals! Some hex *editors* may *display* contents as hex, but that's just *represetation*, not the actual value in the file. – Martijn Pieters Mar 19 '15 at 17:35
  • @aMa perhaps *exactly equivelent* wasnt clear enough? – Joran Beasley Mar 19 '15 at 17:36

2 Answers2

15

You misunderstood what \xhh does in Python strings. Using \x notation in Python strings is just syntax to produce certain codepoints.

You can use '\x61' to produce a string, or you can use 'a'; both are just two ways of saying give me a string with a character with hexadecimal value 61, e.g. the a ASCII character:

>>> '\x61'
'a'
>>> 'a'
'a'
>>> 'a' == '\x61'
True

The \xhh syntax then, is not the value; there is no \ and no x and no 6 and 1 character in the final result.

You should just write your string:

somestring = 'abcd'

with open("test.bin", "wb") as file:
    file.write(somestring)

There is nothing magical about binary files; the only difference with a file opened in text mode is that a binary file will not automatically translate \n newlines to the line separator standard for your platform; e.g. on Windows writing \n produces \r\n instead.

You certainly do not have to produce hexadecimal escapes to write binary data.

On Python 3 strings are Unicode data and cannot just be written to a file without encoding, but on Python the str type is already encoded bytes. So on Python 3 you'd use:

somestring = 'abcd'

with open("test.bin", "wb") as file:
    file.write(somestring.encode('ascii'))

or you'd use a byte string literal; b'abcd'.

Martijn Pieters
  • 963,270
  • 265
  • 3,804
  • 3,187
  • 1
    good clarification i always forget about string changes in py3 – Joran Beasley Mar 19 '15 at 17:44
  • you can go even further `'a' is '\x61'` – Joran Beasley Mar 19 '15 at 17:46
  • 2
    @JoranBeasley: **no you can not**. The CPython interpreter *may* choose to optimise and you *could* end up with the same string object (and then `is` works), but that is **not** advisable and you should never count on it. – Martijn Pieters Mar 19 '15 at 17:47
  • ok fair point ... i thought with single chars it maybe was ok... in the shell it has worked every time for me ... but it was just a test (are you sure is isnt safe with single characters in the single byte range (ie less than 255)? just like small ints? – Joran Beasley Mar 19 '15 at 17:47
  • 1
    @JoranBeasley: see [About the changing id of a Python immutable string](https://stackoverflow.com/a/24245514) for the current state of Python string interning and constant objects; it is an implementation detail and should never be relied upon. – Martijn Pieters Mar 19 '15 at 17:49
2

I think you don't necessarily understand what binary/ascii is ... all files are binary in the sense that its just bits. ascii is just a representation of some bits... 99.9999 % of file editors will display your bits as ascii if they can , and if there is no other encoding declared in the file itself ...

fp.write("abcd") 

is exactly equivelent to

fp.write("\x61\x62\x63\x64")
Joran Beasley
  • 103,130
  • 11
  • 146
  • 174