1

We're using a file system/url safe variation of base64 encoding such that:

"=" replaced with ""  
"+" replaced with "-"  
"/" replaced with "_"  

We are now using Azure blob storage that does not allow use of "_" within container names.

We are base64 encoding a Guid. If I was to replace underscore with say a "0" am I at risk of collisions?

Update

Not sure why the downvote. But to clarify.

Why not just use a Guid?

  1. The Guid is the id of an entity within my application. Since the paths are public, I don't really like exposing the Id, hence why I'm encoding it.
  2. I want shorter and more friendly looking paths. Contrary to one of the comments below, the base 64 encoding is NOT longer:

    Guid: 5b263cdd-2bc2-485d-83d4-81b96930dc5a
    Base64 Encoded: 3TwmW8IrXUiD1IG5aTDcWg== (even shorter after removing ==)

(Another) Update

Seems there is some confusion about what it is I'm trying to achieve (so sorry about that). Heres the short version.

  • I have a Guid that represents an entity in my application.
  • I need to create a publicly accessible directory for the entity (via a Url).
  • I don't want to use the Guid as the directory name, for the reasons above.
  • I asked previously on SO about how I could generate a friendlier looking Url that guaranteed uniqueness and did not expose the original Guid. The suggestion was Base64 encoding.
  • This has worked fine until recently when we needed to use Azure blob storage, which does not allow underscores "_" in it's directory (Container) names.

This is where I'm at.

Ben Foster
  • 33,525
  • 36
  • 167
  • 284

4 Answers4

7

Just "encode" the GUID in base16. The only characters it uses are 0123456789ABCDEF which should be safe for most purposes.

var encoded = guid.ToString("N");
R. Martinho Fernandes
  • 219,040
  • 71
  • 423
  • 503
  • Using Base16 results in a 33% longer string than using Base64. Having said that, if the OP finds a 24-character random-ish string "short and friendly" then I'm sure they wouldn't have too much trouble with a 32-character string either. – LukeH Jul 28 '11 at 11:36
  • 1
    @LukeH but it's 400% friendlier because it uses less distinct characters! :) – R. Martinho Fernandes Jul 28 '11 at 11:39
  • @Martinho, agree. I've also updated my question as to why I was encoding. Do I have any risk of collisions with this? – Ben Foster Jul 28 '11 at 11:43
  • 1
    @Ben: It's a 1-to-1 map, so you only have collisions if you have colliding GUIDs. – R. Martinho Fernandes Jul 28 '11 at 11:45
  • Just realized, this doesn't really mask the original Guid. Seems equivalent to just removing the "-"? – Ben Foster Jul 28 '11 at 11:59
  • @Ben and what's the problem with that? – R. Martinho Fernandes Jul 28 '11 at 12:01
  • 1
    @Ben: I thought you wanted some way of encoding a GUID that didn't have any invalid characters. This one fits that purpose. I won't suggest an alternative if you don't tell what other requirements you have. – R. Martinho Fernandes Jul 28 '11 at 12:06
  • To be fair, the question was that I'm *already* base64 encoding the Guid but I can't use underscores. However, I realize I should have been a bit more descriptive (hence my update). – Ben Foster Jul 28 '11 at 12:14
  • In the end I just went for the base 16 representation. Trying to make a shorter/masked version whilst guaranteeing uniqueness was just not worth the hassle. – Ben Foster Jul 28 '11 at 22:51
3

The base 64 character set is

ABCDEFGHIJKLMNOPQRSTUVWXYZabcdefghijklmnopqrstuvwxyz0123456789+/=

So you can't use 0 since it is already in use.

David Heffernan
  • 587,191
  • 41
  • 1,025
  • 1,442
0

Encoding your identifiers does not encrypt them. Any technically savvy observer can base64-uncode an identifier. If you want to make your paths opaque, then either encrypt them or hash them with a salt. If you do want to keep your paths transparent, just use hex without any hyphens or braces. That way, your UUID is serialized to 32 code points, whereas Azure container names can be up to 63 character long.


If you really want shorter and funnier container names, and if Azure supports internationalized domain names, Braille encoding fits the bill as the least typable option. Here's a Haskell one-liner for generating a UUIDv4, mapping each octet of the UUID to a braille letter and encoding the resulting string in UTF-16BE (for a total of 32 octets).

import Data.Binary (encode)
import Data.ByteString.Lazy (intersperse, cons)
import Data.Functor ((<&>))
import Data.UUID.V4 (nextRandom)

braille :: IO Data.ByteString.Lazy.Internal.ByteString
braille = nextRandom <&> encode <&> intersperse 40 <&> cons 40

(In F#, |> would be used instead of <&>.)

For your amusement, see the following gist for how to convert an octet-stream into UTF-16LE or UTF-8 encoded braille strings which makes each bit literally stand out.

https://gist.github.com/bjartur/ea5db281f0b88128455ed79621abbd1d

0

Instead of taking base64 and change 4 characters you could encode your data in base60.

Your base60 char list doesn't contain the 4 chars you don't like and so there's no need to replace anything.

VVS
  • 18,967
  • 3
  • 45
  • 65