13

Why do we use hex representation as default for the output of a hash function's result?

For example, the SHA-256 hash function: the output of SHA-256 in hex representation uses 64 characters, while using Base64 on the raw output produces 44 characters.

Demo:

<?php
$password = "password";
$sha256 = hash('sha256',$password);
echo 'sha256('.strlen($sha256).'): '.$sha256.'<br />';

$sha256Base64 = base64_encode(hash('sha256',$password,true)); echo 'sha256('.strlen($sha256Base64).'): '.$sha256Base64.'<br />';

Output:

sha256(64): 5e884898da28047151d0e56f8dc6292773603d0d6aabbdd62a11ef721d1542d8
sha256(44): XohImNooBHFR0OVvjcYpJ3NgPQ1qq73WKhHvch0VQtg=
Patriot
  • 3,132
  • 3
  • 18
  • 65
Neil Yoga Crypto
  • 293
  • 1
  • 2
  • 11
  • My guess: because hex is more common for people and usually easier to understand and parse than base64. – SEJPM May 01 '16 at 19:25
  • This is more of an implementation question, and not really cryptography I think - but many programming languages understand hex, while base64 usually is a higher level function. Note that some cryptographic protocols use other bases since they give shorter representations (see e.g. base 58 in Bitcoin and Monero) – aegbert May 01 '16 at 19:40
  • In addition to all mentioned reasons, it's easy to see how many bytes a given hex string represents. – Florian Wendelborn May 04 '16 at 02:37
  • @Dodekeract Yes, I also noticed that a sha256 base64 representation of a string does not always produce 44 characters, sometimes more (example, 58 characaters) – Neil Yoga Crypto May 04 '16 at 11:25
  • @maarten seriously? do you have an example for this? just by the workings of how b64 works with using 6 bit on each character, you have a little over 2 by deviding 256 by 6, and since you cant have half a char you obviously go to 43, and then the equal sign at the end is just padding because b64 encodes 3 byte to 4 chars and therefore the byte count (32) needs to be divisible by 3 which means 1 passing byte gets added. this means it wouldnt be possible to have a Sha256 with more than 44 chars in b64 – My1 Sep 21 '17 at 10:53
  • with base32 in contrast, which encodes 5 bytes with 8 chars (usually used for transmission over a human, because you dont have mixed case and stuff) your SHA256 with 32 bytes would be padded to 35 bytes and multiplying by 8 to get a bit count of 280 and dividing by 5 gets you 56 characters including all the padding, although base32 is usually used without padding, meaning after 30 bytes or 240 bits (the last fully encodable piece of data, you have 16bit or 2 byte left, which is encoded using 4 except of 8 characters, which results in 52 chars total., more than base64 but still less than 58... – My1 Sep 21 '17 at 11:01

3 Answers3

14

Hexadecimal is traditional -- by this, I mean that there first were command-line tools that used hexadecimal for output, then other people using the hash functions found it fit to stick to hexadecimal, if only to be able to compare their values with the output of the aforementioned tools. That's how traditions get established: a more-or-less random choice at the start, then the need for interoperability and backward compatibility kicks in.

In the case of hexadecimal in cryptographic algorithms, one can probably trace it to the use of C language for reference implementations. Most algorithms are described with a specification (mathematical description, usually typeset in LaTeX), and a reference implementation that produces basic test vectors. For better or worse, the reference implementation is usually in C (or sometimes C++). In C, there is no standard facility for Base64 encoding (some programming platforms offer that, or external libraries, but it is not standard); but hexadecimal is easily obtained with a simple printf() with a "%08x" format string. As a very classic example, consider the MD5 specification (RFC 1321), which contains a reference implementation that does hexadecimal output.

The tradition is well entrenched; for the SHA-3 competition, NIST actually asked for reference implementations in C, and known-answer tests with a fully-specified text format that was hexadecimal throughout.

It must also be said that hexadecimal is convenient for debugging: the human developer can easily observe hexadecimal output and map these to individual bits, by doing the simple conversion in his head. Base64 is not as simple, because it entails 64 glyphs instead of 16, including some which are prone to induce visual confusion (1 vs I vs l, 0 vs O...). Also, many algorithm internally use 32-bit or 64-bit words, that map well to CPU registers; 32 and 64 are multiples of 4 but not of 6, so Base64 encoding again implies some non-trivial splitting.

Thomas Pornin
  • 86,974
  • 16
  • 242
  • 314
  • IBM S/360 and its assembler BAL 'standardized' 8-bit bytes in hex (including BCD and floating-point using 4-bit fields) for much of the computer industry almost a decade before C. And I believe (but can't find a good source) Lucifer started as an option on S/360. OTOH C supports octal about equally to hex -- and did you know that if you give an IPv4 address string like 010.020.030.040 to most Unix software (starting with BSD) it's octal and means 8.16.24.32? – dave_thompson_085 May 06 '16 at 08:24
3

In short: Hexadecimal is virtually a gold standard for radix 16 encoding. Base64 isn't standard at all.

Hex (quoting):-

the letters A–F or a–f represent the values 10–15, while the numerals 0–9 are used to represent their usual values.

And each character represents a nibble. So exactly two characters per byte.

Now consider Base64. There may or may not be padding. The 62nd and 63rd characters can vary according to protocol. Sometimes there's even a cyclic redundancy check automatically included. Let me just list part of the Base64 Wiki page contents:-

3 Examples
3.1 Output padding
3.2 Decoding Base64 with padding
3.3 Decoding Base64 without padding

4 Implementations and history

4.1 Variants summary table
4.2 Privacy-enhanced mail
4.3 MIME
4.4 UTF-7
4.5 OpenPGP
4.6 RFC 3548
4.7 RFC 4648
4.8 The URL applications
4.9 HTML
4.10 Other applications
4.11 Radix-64 applications not compatible with Base64

It's a protocol dependent mess. And notice §4.11! So it's just simpler and less prone to implementations/interpretations and variations/errors.

Paul Uszak
  • 15,390
  • 2
  • 28
  • 77
-4

Because in JS, numbers are always a representation of a 64-bit floating point number (the bit, mantissa and exponent), and on the SHA-256 hash function, hex, which is also equivalent to 16, always generates a 64-bit encoded representation.

Patriot
  • 3,132
  • 3
  • 18
  • 65
yarn
  • 1
  • Cryptography is independent of any programming language if you mean JavaScript by JS. Also, you're confusing 64-bit and 64-character. – DannyNiu Jul 17 '21 at 10:05