72

Are there any other characters except A-Za-z0-9 that can be used to shorten links without getting into trouble? :)

I was thinking about +,;- or something.

Is there a defined standard regarding what characters can be used in a URL that browser vendors respect?

d-_-b
  • 19,976
  • 37
  • 134
  • 224
Florian Fida
  • 3,634
  • 3
  • 28
  • 29

2 Answers2

126

A path segment (the parts in a path separated by /) in an absolute URI path can contain zero or more of pchar that is defined as follows:

  pchar       = unreserved / pct-encoded / sub-delims / ":" / "@"
  pct-encoded = "%" HEXDIG HEXDIG
  unreserved  = ALPHA / DIGIT / "-" / "." / "_" / "~"
  sub-delims  = "!" / "$" / "&" / "'" / "(" / ")"
              / "*" / "+" / "," / ";" / "="

So it’s basically AZ, az, 09, -, ., _, ~, !, $, &, ', (, ), *, +, ,, ;, =, :, @, as well as % that must be followed by two hexadecimal digits. Any other character/byte needs to be encoded using the percent-encoding.

Although these are 79 characters in total that can be used in a path segment literally, some user agents do encode some of these characters as well (e.g. %7E instead of ~). That’s why many use just the 62 alphanumeric characters (i.e. AZ, az, 09) or the Base 64 Encoding with URL and Filename Safe Alphabet (i.e. AZ, az, 09, -, _).

Steffo
  • 280
  • 5
  • 12
Gumbo
  • 620,600
  • 104
  • 758
  • 828
  • @Joey: Not in a path segment as it’s the path segment delimiter. – Gumbo Jan 12 '11 at 14:19
  • 1
    Ok, I was kinda assuming the OP was talking about the whole path of an URI, not only a single segment. At least, URI shorteners usually work in the way of `http://domain.foo/` where it doesn't need to be restricted to a single sement. – Joey Jan 12 '11 at 14:24
  • So, it means that path part of URI can contain `&`, right? But this symbol is usually used as parameter delimiters in query part of URI. – 23W Apr 21 '21 at 14:20
41

According to RFC 3986 the valid characters for the path component are:

a-z A-Z 0-9 . - _ ~ ! $ & ' ( ) * + , ; = : @

as well as percent-encoded characters and of course, the slash /.

Keep in mind, though, that many applications (not necessarily browsers) that attempt to parse URIs to make them clickable, for example, may support a much smaller set of characters. This is akin to parsing e-mail addresses where most attempts also don't catch all addresses allowed by the standard.

Steffo
  • 280
  • 5
  • 12
Joey
  • 330,812
  • 81
  • 665
  • 668
  • Sorry - where are you referencing this, specifically in the spec? https://tools.ietf.org/html/rfc3986#page-22 - I don't see any call-outs for character constraints on the path or segments. – Jmoney38 Jul 03 '19 at 21:06
  • 1
    @Jmoney38: See the definition of `pchar`. – Joey Jul 04 '19 at 19:36