How to use "cmp" to compare two binaries and find all the byte offsets where they differ?

Question

I would love some help with a Bash script loop that will show all the differences between two binary files, using just

cmp file1 file2

It only shows the first change I would like to use cmp because it gives a offset an a line number of where each change is but if you think there's a better command I'm open to it :) thanks

The offset is valid, but the line number will not be valid when comparing binary files, as they have no concept of lines (only text have lines). — Some programmer dude, Dec 05 '11 at 13:01
Yeah I understand, in this case I use the line number to reference to a hexdump of the binary so I read whats around the different offset :) — Lewis Denny, Dec 05 '11 at 13:11

score 43 · Accepted Answer · edited Feb 07 '21 at 21:47

43

I think cmp -l file1 file2 might do what you want. From the manpage:

-l  --verbose
      Output byte numbers and values of all differing bytes.

The output is a table of the offset, the byte value in file1 and the value in file2 for all differing bytes. It looks like this:

4531  66  63
4532  63  65
4533  64  67
4580  72  40
4581  40  55
[...]

So the first difference is at offset 4531, where file1's ~~decimal~~ octal byte value is 66 and file2's is 63.

edited Feb 07 '21 at 21:47

fdermishin

3,267
3
20
43

answered Dec 05 '11 at 15:31

rwos

1,661
1
15
18

4

+1: this is 'the way to do it', but the problem with it is that `cmp` does not look for inserted or deleted material; it just checks 'if the byte at offset N in file1 the same as the byte at offset N in file2; if yes, then print nothing, else print difference'. So the files have to be very similar (eg, just some bytes in the Unix timestamp when the object files were compiled - which is built into some object files) but the rest needs to be the same. Add 3 bytes to a constant string and everything after that is different. – Jonathan Leffler Dec 05 '11 at 15:39
Thanks heaps this is just what I wanted, i try that in the past but I did know the the numbers on the side where the offsets :) Thanks heaps! – Lewis Denny Dec 05 '11 at 20:14
1

I've edited the answer by add a correction about format of the bytes that differ. This is a not so well documented feature of cmp. I hope that the edit is appropriate. – fdermishin Feb 07 '21 at 21:52

Ciro Santilli Путлер Капут 六四事 · Answer 2 · 2021-04-14T07:18:21.450

Method that works for single byte addition/deletion

diff <(od -An -tx1 -w1 -v file1) \
     <(od -An -tx1 -w1 -v file2)

Generate a test case with a single removal of byte 64:

for i in `seq 128`; do printf "%02x" "$i"; done | xxd -r -p > file1
for i in `seq 128`; do if [ "$i" -ne 64 ]; then printf "%02x" $i; fi; done | xxd -r -p > file2

Output:

64d63
<  40

If you also want to see the ASCII version of the character:

bdiff() (
  f() (
    od -An -tx1c -w1 -v "$1" | paste -d '' - -
  )
  diff <(f "$1") <(f "$2")
)

bdiff file1 file2

Output:

64d63
<   40   @

Tested on Ubuntu 16.04.

I prefer od over xxd because:

it is POSIX, xxd is not (comes with Vim)
has the -An to remove the address column without awk.

Command explanation:

-An removes the address column. This is important otherwise all lines would differ after a byte addition / removal.
-w1 puts one byte per line, so that diff can consume it. It is crucial to have one byte per line, or else every line after a deletion would become out of phase and differ. Unfortunately, this is not POSIX, but present in GNU.
-tx1 is the representation you want, change to any possible value, as long as you keep 1 byte per line.
-v prevents asterisk repetition abbreviation * which might interfere with the diff
paste -d '' - - joins every two lines. We need it because the hex and ASCII go into separate adjacent lines. Taken from: Concatenating every other line with the next
we use parenthesis () to define bdiff instead of {} to limit the scope of the inner function f, see also: How to define a function inside another function in Bash?

See also:

This has the inherent flaw that it will not stream the data but load everything into RAM, meaning you will need at least 2 - 3 times the size of the files as memory, which most binary diff tools use. The only one I found that doesn't behave like this is xdelta3... — Izzy, Nov 09 '17 at 11:58
@Izzy add it to an answer showing to use it and why and get upvotes :-) — Ciro Santilli Путлер Капут 六四事, Nov 09 '17 at 12:03
Sadly, to my knowledge, it can't. At least not the kind you'd expect. It produces VCDIFF output, which is a highly compressed binary delta. So you can just diff, patch and few the command structure. My comment was more of a "be aware that this answer will blow your main memory with a 5GB file" — Izzy, Nov 14 '17 at 08:48

score 2 · Answer 3 · answered Dec 05 '11 at 15:48

2

The more efficient workaround I've found is to translate binary files to some form of text using od.

Then any flavour of diff works fine.

answered Dec 05 '11 at 15:48

mouviciel

64,811
11
108
139

Yep, it really depends on what the OP wants to do with the diff. A diff of a hexdump is probably of more value for humans, while a `cmp` may be easier for programs to parse/use. – rwos Dec 05 '11 at 16:03

How to use "cmp" to compare two binaries and find all the byte offsets where they differ?

3 Answers3

Linked