35

I would love some help with a Bash script loop that will show all the differences between two binary files, using just

cmp file1 file2 

It only shows the first change I would like to use cmp because it gives a offset an a line number of where each change is but if you think there's a better command I'm open to it :) thanks

Lewis Denny
  • 629
  • 3
  • 8
  • 17
  • The offset is valid, but the line number will not be valid when comparing binary files, as they have no concept of lines (only text have lines). – Some programmer dude Dec 05 '11 at 13:01
  • Yeah I understand, in this case I use the line number to reference to a hexdump of the binary so I read whats around the different offset :) – Lewis Denny Dec 05 '11 at 13:11

3 Answers3

43

I think cmp -l file1 file2 might do what you want. From the manpage:

-l  --verbose
      Output byte numbers and values of all differing bytes.

The output is a table of the offset, the byte value in file1 and the value in file2 for all differing bytes. It looks like this:

4531  66  63
4532  63  65
4533  64  67
4580  72  40
4581  40  55
[...]

So the first difference is at offset 4531, where file1's decimal octal byte value is 66 and file2's is 63.

fdermishin
  • 3,267
  • 3
  • 20
  • 43
rwos
  • 1,661
  • 1
  • 15
  • 18
  • 4
    +1: this is 'the way to do it', but the problem with it is that `cmp` does not look for inserted or deleted material; it just checks 'if the byte at offset N in file1 the same as the byte at offset N in file2; if yes, then print nothing, else print difference'. So the files have to be very similar (eg, just some bytes in the Unix timestamp when the object files were compiled - which is built into some object files) but the rest needs to be the same. Add 3 bytes to a constant string and everything after that is different. – Jonathan Leffler Dec 05 '11 at 15:39
  • Thanks heaps this is just what I wanted, i try that in the past but I did know the the numbers on the side where the offsets :) Thanks heaps! – Lewis Denny Dec 05 '11 at 20:14
  • 1
    I've edited the answer by add a correction about format of the bytes that differ. This is a not so well documented feature of cmp. I hope that the edit is appropriate. – fdermishin Feb 07 '21 at 21:52
5

Method that works for single byte addition/deletion

diff <(od -An -tx1 -w1 -v file1) \
     <(od -An -tx1 -w1 -v file2)

Generate a test case with a single removal of byte 64:

for i in `seq 128`; do printf "%02x" "$i"; done | xxd -r -p > file1
for i in `seq 128`; do if [ "$i" -ne 64 ]; then printf "%02x" $i; fi; done | xxd -r -p > file2

Output:

64d63
<  40

If you also want to see the ASCII version of the character:

bdiff() (
  f() (
    od -An -tx1c -w1 -v "$1" | paste -d '' - -
  )
  diff <(f "$1") <(f "$2")
)

bdiff file1 file2

Output:

64d63
<   40   @

Tested on Ubuntu 16.04.

I prefer od over xxd because:

  • it is POSIX, xxd is not (comes with Vim)
  • has the -An to remove the address column without awk.

Command explanation:

  • -An removes the address column. This is important otherwise all lines would differ after a byte addition / removal.
  • -w1 puts one byte per line, so that diff can consume it. It is crucial to have one byte per line, or else every line after a deletion would become out of phase and differ. Unfortunately, this is not POSIX, but present in GNU.
  • -tx1 is the representation you want, change to any possible value, as long as you keep 1 byte per line.
  • -v prevents asterisk repetition abbreviation * which might interfere with the diff
  • paste -d '' - - joins every two lines. We need it because the hex and ASCII go into separate adjacent lines. Taken from: Concatenating every other line with the next
  • we use parenthesis () to define bdiff instead of {} to limit the scope of the inner function f, see also: How to define a function inside another function in Bash?

See also:

2

The more efficient workaround I've found is to translate binary files to some form of text using od.

Then any flavour of diff works fine.

mouviciel
  • 64,811
  • 11
  • 108
  • 139
  • Yep, it really depends on what the OP wants to do with the diff. A diff of a hexdump is probably of more value for humans, while a `cmp` may be easier for programs to parse/use. – rwos Dec 05 '11 at 16:03