3

I'm wondering why all databases seem to store hashed passwords and typical hexadecimal data as char(x)/varchar(x)

MySQL uses latin1_swedish_ci as default character encoding, where each character is made of 8 bits.

When storing hexadecimal data as strings, you're only using 4 bits of those 8 bits you can use, as there are only 16 usable characters.

So am I not seeing something here? Is there a character encoding specifically for this kind of stuff using 4 bits per character? Or are some companies actually making use of all the reserved space to store hexadecimal stuff at least, doesn't matter how?

  • Why would you store binary data as a string? Hexadecimal is a way to represent the data as a string that is less verbose, it would be unusual / silly to actually store that as a string when the database supports binary types. – TZHX Oct 08 '17 at 18:12
  • Looking at all the "tutorials" on the internet I've seen, nobody really seems to care about it.. – JonasAnon Oct 08 '17 at 18:16
  • I would avoid those “tutorials” then. – TZHX Oct 08 '17 at 18:19
  • 4
    Nobody cares about saving 30 bytes per user in exchange for additional complexity. – CodesInChaos Oct 08 '17 at 19:19
  • Even storing the hashes as strings of hexadecimal characters, the "wasted" storage is neglible. Most systems will have maybe a few thousands of users, so you are looking at less than 100Kb of overhead. Even with millions of users, it's only a few megabytes of overhead. If you're Facebook and you have billions of users, then you are looking at gigabytes, but you are also then putting up hundreds of terabytes of RAM just for caching (and your user database is likely highly partitioned). – Derek Elkins left SE Oct 08 '17 at 19:21
  • 1
    Database development (okay, not development, but production and administration) involves a surprising amount of manual data modification requests, plus troubleshooting. This is where the convenience of strings trumps software development concerns. – rwong Oct 08 '17 at 22:55
  • Funny people would say it is more complex to store binary data as binary data. The chances for error increase when you round-trip binary data to strings; it is always better to store binary data as binary data, if you have that option. – Frank Hileman Oct 09 '17 at 16:42

1 Answers1

6

Firstly, hexadecimal is not really a storage format... it is a display format. You can choose to store hexadecimal strings in a database, if you wish, but the more natural (and more common) technique is to store the actual numbers in a database using the database's native numeric integer format, and then format the retrieved numbers as hexadecimal for display purposes.

Hash functions return a set of bits. Traditionally, these bits are provided as hexadecimal because it is easier to read than zeroes and ones. In addition, many hashes contain a larger number of bits than native numeric types can support (MD5 is 128 bits, for example), and so hex is a more natural choice. Given an arbitrary number of hex digits, it's simpler and easier to just store the hex that the hash function provides.

Further Reading
Why do we use hex output for hash functions?
Why do most hashing functions produce hashes that have characters a-f 0-9?

Robert Harvey
  • 199,517
  • Wheh hex uses 4 bits per byte, base64 uses 6; if efficiency of storing / transferring long bit strings as plain text is important, base64 or ASCII85 can be used to improve efficiency. – 9000 Oct 08 '17 at 19:54
  • The hex is for humans but the binary data is for the database, if possible. – Frank Hileman Oct 09 '17 at 16:43