423

I know what base64 encoding is and how to calculate base64 encoding in C#, however I have seen several times that when I convert a string into base64, there is an = at the end.

A few questions came up:

  1. Does a base64 string always end with =?
  2. Why does an = get appended at the end?
Sergey
  • 1,529
  • 1
  • 25
  • 38
santosh singh
  • 26,384
  • 26
  • 81
  • 125
  • 13
    This has absolutely nothing to do with C#. – BoltClock Aug 03 '11 at 08:48
  • 26
    Actually it is related to c#, not all languages will include the =, for example many perl libraries omit the =, so knowing the environment the user is using is actually relevant. – Jay Feb 20 '14 at 00:12
  • It kind of seems like this makes it a less effective method of obfuscation in some cases as it is quite detectable. – dgo Feb 25 '17 at 03:27
  • 17
    @user1167442 Base64 is not for obfuscation. It is for transporting binary data (or strings with unicode and other special characters) as a string. – NH. Aug 18 '17 at 16:24
  • @jay, I'm sorry but I have to disagree. According to the documentation (https://perldoc.perl.org/MIME::Base64) Perl does use padding as it confirms to RFC 2045 - MIME (https://datatracker.ietf.org/doc/html/rfc2045) – ThE uSeFuL Mar 24 '22 at 18:02

9 Answers9

580

Q Does a base64 string always end with =?

A: No. (the word usb is base64 encoded into dXNi)

Q Why does an = get appended at the end?

A: As a short answer:
The last character (= sign) is added only as a complement (padding) in the final process of encoding a message with a special number of characters.

You will not have an = sign if your string has a multiple of 3 characters, because Base64 encoding takes each three bytes (a character=1 byte) and represents them as four printable characters in the ASCII standard.

Example:

(a) If you want to encode

ABCDEFG <=> [ABC] [DEF] [G]

Base64 deals with the first block (producing 4 characters) and the second (as they are complete). But for the third, it will add a double == in the output in order to complete the 4 needed characters. Thus, the result will be QUJD REVG Rw== (without spaces).

[ABC] => QUJD

[DEF] => REVG

[G] => Rw==

(b) If you want to encode ABCDEFGH <=> [ABC] [DEF] [GH]

similarly, it will add one = at the end of the output to get 4 characters.

The result will be QUJD REVG R0g= (without spaces).

[ABC] => QUJD

[DEF] => REVG

[GH] => R0g=

Alexis Wilke
  • 17,282
  • 10
  • 73
  • 131
Badr Bellaj
  • 9,167
  • 1
  • 36
  • 34
  • 55
    This is more complete and clear than other answer and even Wikipedia and should deserve more votes than the accepted answer which does nothing but point to wikipedia link. Kudos to you! Upvoted! – ANewGuyInTown Jan 30 '18 at 23:14
  • 3
    @ANewGuyInTown the wikipedia link in accepted solution is incorrect, it has nothing to do with padding on base64. [Correct page](https://en.wikipedia.org/wiki/Base64#Output_padding) was linked by Legolas in his [answer below](https://stackoverflow.com/a/6916845/1369473) – Fr0zenFyr Jul 05 '19 at 10:55
  • 2
    This is the best answer. – renatoaraujoc Jun 23 '21 at 13:27
  • 1
    The word "_USB_" does **not** encode into "dXNi", "**usb**" does. "USB" encodes to "VVNC". – user5532169 Feb 19 '22 at 13:06
  • @user5532169 you are right. it was a typo thanks for the correction – Badr Bellaj Feb 19 '22 at 19:53
315

It serves as padding.

A more complete answer is that a base64 encoded string doesn't always end with a =, it will only end with one or two = if they are required to pad the string out to the proper length.

Andrew Hare
  • 333,516
  • 69
  • 632
  • 626
  • 5
    "One case in which padding characters are required is concatenating multiple Base64 encoded files." – André Puel Nov 30 '14 at 19:41
  • 1
    @AndréPuel: resynch one single `=` would suffice. If you want to find the boundaries back then a terminator should always be present (and still only one char is needed). The whole padding concept of Base64 is just a brainfart... – 6502 Aug 20 '15 at 19:07
  • 11
    That link is completely irrelevant to base64, though. – NH. Aug 18 '17 at 16:25
  • 2
    I just wish a relevant and reliable link was posted that explains about padding in `base64` efficiently with illustrations and examples. The present link to wikipedia is absolutely irrelevant like @NH. mentioned. – Fr0zenFyr Jul 05 '19 at 10:48
  • 4
    @Fr0zenFyr If you want a link, https://en.wikipedia.org/wiki/Base64#Output_padding is pretty good. But the [answer by Badr](https://stackoverflow.com/a/36571117/1739000) is really a better one (it just hasn't caught up in votes yet). – NH. Jul 09 '19 at 18:58
  • I already understand how padding works for `base64`. I just was concerned about irrelevant link in accepted answer. But thanks. Cheers – Fr0zenFyr Jul 10 '19 at 03:04
74

From Wikipedia:

The final '==' sequence indicates that the last group contained only one byte, and '=' indicates that it contained two bytes.

Thus, this is some sort of padding.

Jeff
  • 820
  • 9
  • 18
Legolas
  • 1,392
  • 9
  • 11
19

Its defined in RFC 2045 as a special padding character if fewer than 24 bits are available at the end of the encoded data.

iandotkelly
  • 8,874
  • 8
  • 45
  • 66
18
  1. No.
  2. To pad the Base64-encoded string to a multiple of 4 characters in length, so that it can be decoded correctly.
Ian Kemp
  • 26,561
  • 17
  • 107
  • 129
13

The equals sign (=) is used as padding in certain forms of base64 encoding. The Wikipedia article on base64 has all the details.

Sam Holloway
  • 1,989
  • 14
  • 14
  • 2
    Could you explain the logic of why "==" is 1 byte and "=" is 2 bytes? I just can't understand it. How come input: "any carnal pleasure." could get result "YW55IGNhcm5hbCBwbGVhc3VyZS4=", while "any carnal pleasure" could get result "YW55IGNhcm5hbCBwbGVhc3VyZQ==" ? – null Mar 21 '13 at 06:25
  • 16
    It's not that case that '==' is 1 byte and '=' is 2 bytes. It's the case that you need to always have a multiple of 4 bytes in your entire string. So you pad with '=' signs until you get that. The first string has one more character than the second string, so one fewer '=' of padding is required. – Sam Holloway Mar 27 '13 at 13:31
  • 3
    Is this answer supposed to be a comment? – Fr0zenFyr Jul 05 '19 at 10:57
10

It's padding. From http://en.wikipedia.org/wiki/Base64:

In theory, the padding character is not needed for decoding, since the number of missing bytes can be calculated from the number of Base64 digits. In some implementations, the padding character is mandatory, while for others it is not used. One case in which padding characters are required is concatenating multiple Base64 encoded files.

Thomas Leonard
  • 6,908
  • 2
  • 34
  • 40
  • 2
    The part about "One case in which padding characters are required is concatenating multiple Base64 encoded files." is wrong. For example when concatenating two base64 files where the source bytes for each file is 3 bytes long the base64 strings will be 4 characters long and have no padding bytes. When you concatenate these two base64 strings there will be no way to tell where one starts and one stops based soley on the concatenated string. So relying on base64 padding to help with that is not going to work. This issue will exist for any file with byte lengths evenly divisible by 3. – RonC Feb 10 '17 at 14:51
  • 1
    I guess it means the case where the final result should be the concatenation of the inputs. e.g. `decode(encode(A)+encode(B))=A+B` works with padding but not without. – Thomas Leonard Feb 11 '17 at 16:26
  • perhaps but such limited use doesn't allow the padding char(s) to be relied on for the general case of separating encoded strings when the encoded strings are concatenated together. I only mention it to help developers that may be thinking they can use it that way. – RonC Feb 13 '17 at 13:52
  • 1
    I think your objection really just highlights the difference between the concepts of padding and delimiting. The results of concatenation aren't generally expected to include enough information to make it reversible. You won't know if "c3dpenpsZXJz" was originally "c3dpenps" + "ZXJz" or "c3dp" + "enpsZXJz". But you also don't know if "swizzlers" was originally "swi" + "zzlers" or "swizzl" + "ers". – GargantuChet Apr 21 '17 at 21:22
  • 1
    Copying my comment from a related [Base64 padding answer](https://stackoverflow.com/questions/4080988/why-does-base64-encoding-require-padding-if-the-input-length-is-not-divisible-by#comment79055772_26632221): > Base64 concatenation [with '=' padding] allows encoders to process large chunks in parallel without the burden of aligning the chunk sizes to a multiple of three. Similarly, as an implementation detail, there might be an encoder out there that needs to flush an internal data buffer of a size that is not a multiple of three. – Andre D Sep 05 '17 at 06:39
8

http://www.hcidata.info/base64.htm

Encoding "Mary had" to Base 64

In this example we are using a simple text string ("Mary had") but the principle holds no matter what the data is (e.g. graphics file). To convert each 24 bits of input data to 32 bits of output, Base 64 encoding splits the 24 bits into 4 chunks of 6 bits. The first problem we notice is that "Mary had" is not a multiple of 3 bytes - it is 8 bytes long. Because of this, the last group of bits is only 4 bits long. To remedy this we add two extra bits of '0' and remember this fact by putting a '=' at the end. If the text string to be converted to Base 64 was 7 bytes long, the last group would have had 2 bits. In this case we would have added four extra bits of '0' and remember this fact by putting '==' at the end.

Dev
  • 137
  • 2
  • 8
3

= is a padding character. If the input stream has length that is not a multiple of 3, the padding character will be added. This is required by decoder: if no padding present, the last byte would have an incorrect number of zero bits.

Better and deeper explanation here: https://base64tool.com/detect-whether-provided-string-is-base64-or-not/

Vladimir Ignatyev
  • 1,924
  • 17
  • 31
  • 1
    To expand on this, while standard base64 specifies padding, it is not because it can't be decoded without it. It is possible to make a base64 implementation whose decoder does not require padding, and the decoder can still obtain all the same information from the position of the end of the string. Padding allows the following extra benefits: 1) that base64 strings will all be a multiple of 4 characters long, which may simplify decoder design, and 2) that you can concatenate two base64 strings without re-encoding and there is enough information at the break to properly get back into sync. – thomasrutter Apr 23 '21 at 08:08
  • This is not true for JavaScript (and maybe other languages). When you call `btoa('ipsum')` in your console you get `aXBzdW0=`. But removing the = sign and decoding `atob('aXBzdW0')` still results in `ipsum`. Maybe `atob` pads internally? – Sjeiti Jun 03 '22 at 09:08
  • @Sjeiti if you looked into my implementation of base64 encoding and decoding, you could see that padding is an optional thing. Check out https://github.com/vladignatyev/base64tool/blob/master/modules/base64/src/decode.worker.js#L7 – Vladimir Ignatyev Jun 04 '22 at 10:20