When is a space in a URL encoded to +, and when is it encoded to %20?
- 8,881
- 15
- 47
- 69
- 23,138
- 12
- 46
- 62
4 Answers
From Wikipedia (emphasis and link added):
When data that has been entered into HTML forms is submitted, the form field names and values are encoded and sent to the server in an HTTP request message using method GET or POST, or, historically, via email. The encoding used by default is based on a very early version of the general URI percent-encoding rules, with a number of modifications such as newline normalization and replacing spaces with "+" instead of "%20". The MIME type of data encoded this way is application/x-www-form-urlencoded, and it is currently defined (still in a very outdated manner) in the HTML and XForms specifications.
So, the real percent encoding uses %20 while form data in URLs is in a modified form that uses +. So you're most likely to only see + in URLs in the query string after an ?.
- 20,787
- 6
- 56
- 89
- 330,812
- 81
- 665
- 668
-
3So + encoding would technically be multipart/form-data encoding, while percent encoding is application/x-www-form-urlencoded? – BC. Oct 27 '09 at 23:34
-
22@BC: no - `multipart/form-data` uses MIME encoding; `application/x-www-form-urlencoded` uses `+` and properly encoded URIs use `%20`. – McDowell Oct 27 '09 at 23:41
-
9"So you're most likely to only see + in URLs in the query string after an ?" Is an understatement. You should never see "+" in the path part of the URL because it will not do what you expect (space). – Adam Gent Jul 22 '11 at 17:37
-
@McDowell your response the comment from BC was very helpful to me, along with the input from Adam Gent – Chris Marisic Jul 09 '12 at 17:39
-
Hi, I am confused too, sometime I saw the book use "+" but sometime "%20", When user submit the form, how the form encode the space ? with which character? Is the result browser-dependent? – Sam YC Nov 07 '12 at 06:32
-
41So basically: Target of GET submission is `http://www.bing.com/search?q=hello+world` and a resource with space in the name `http://camera.phor.net/cameralife/folders/2012/2012-06%20Pool%20party/` – William Entriken Apr 13 '13 at 23:55
-
2[Data uris](http://www.ietf.org/rfc/rfc2397.txt) use the same encoding as a [uris](http://www.ietf.org/rfc/rfc2396.txt). After reading that RFC I can confidently say I'm not smart enough to decipher whether encoding of a space should be allowed as a + character. I can, however, say that if you use + instead of %20 the data uri won't work in browsers. – Rob Murphy Apr 28 '14 at 14:22
-
@Rob: It probably isn't allowed in data URIs, indeed. Because as stated, itś only in the query part where the `+` is used. – Joey Apr 28 '14 at 18:34
-
14Note that for email links, you do need %20 and not + after the ?. For example, `mailto:support@example.org?subject=I%20need%20help`. If you tried that with +, the email will open with +es instead of spaces. – Sygmoral Feb 19 '15 at 00:30
-
This helped me, https://stackoverflow.com/questions/5572718/php-convert-spaces-in-string-into-20 – zeros-and-ones Dec 19 '17 at 19:58
-
1The problem with using plus is that if you want to accept plus signs distinctly from spaces such as ?search=The A+ School – Curtis Jul 31 '19 at 23:35
This confusion is because URLs are still 'broken' to this day.
From a blog post:
Take "http://www.google.com" for instance. This is a URL. A URL is a Uniform Resource Locator and is really a pointer to a web page (in most cases). URLs actually have a very well-defined structure since the first specification in 1994.
We can extract detailed information about the "http://www.google.com" URL:
+---------------+-------------------+ | Part | Data | +---------------+-------------------+ | Scheme | http | | Host | www.google.com | +---------------+-------------------+If we look at a more complex URL such as:
"https://bob:bobby@www.lunatech.com:8080/file;p=1?q=2#third"
we can extract the following information:
+-------------------+---------------------+ | Part | Data | +-------------------+---------------------+ | Scheme | https | | User | bob | | Password | bobby | | Host | www.lunatech.com | | Port | 8080 | | Path | /file;p=1 | | Path parameter | p=1 | | Query | q=2 | | Fragment | third | +-------------------+---------------------+ https://bob:bobby@www.lunatech.com:8080/file;p=1?q=2#third \___/ \_/ \___/ \______________/ \__/\_______/ \_/ \___/ | | | | | | \_/ | | Scheme User Password Host Port Path | | Fragment \_____________________________/ | Query | Path parameter AuthorityThe reserved characters are different for each part.
For HTTP URLs, a space in a path fragment part has to be encoded to "%20" (not, absolutely not "+"), while the "+" character in the path fragment part can be left unencoded.
Now in the query part, spaces may be encoded to either "+" (for backwards compatibility: do not try to search for it in the URI standard) or "%20" while the "+" character (as a result of this ambiguity) has to be escaped to "%2B".
This means that the "blue+light blue" string has to be encoded differently in the path and query parts:
"http://example.com/blue+light%20blue?blue%2Blight+blue".
From there you can deduce that encoding a fully constructed URL is impossible without a syntactical awareness of the URL structure.
This boils down to:
You should have %20 before the ? and + after.
- 30,030
- 21
- 100
- 124
- 54,146
- 29
- 227
- 241
-
2>> you should have %20 before the ? and + after Sorry for the silly question. I know a bit somehow that hashtag parameter is used after "?" question mark parameter. Though it is somehow different because using "#" does not reload the page. But I've been trying to use %20 and + sign after the "#" hashtag, and it seems not working. Which one needs to be used after "#"? – Philcyb Dec 22 '15 at 01:59
-
@Philcyb You might wanna read this https://en.wikipedia.org/wiki/Percent-encoding – Matas Vaitkevicius Dec 23 '15 at 08:56
-
Does the query part actually have an "official" standard? I thought basically that part is application specific. 99.99% of apps use `key1=value1&key1=value2` where keys and values are encoded with whatever rules `encodeURIComponent` follow but AFAIK the contents of the query part is entirely 100% up to the app. Other then it only goes to the first `#` there's no official encoding. – gman Jul 26 '18 at 20:38
-
Thanks for pointing out that the confusing inconsistency is due to legacy broken design. – wlnirvana Apr 09 '20 at 01:39
-
4Actually, I just took a look at the LunaTech blog article, which you kindly referenced, and the take-home message seems to be more like: **You must use %20 and not + before the `?`, but after the `?` it is simply a matter of taste**. For the love of God, people, just always use the percent sign-based encoding and clear out some brain space for more important stuff. – nydame Dec 06 '20 at 18:19
-
4
I would recommend %20.
Are you hard-coding them?
This is not very consistent across languages, though.
If I'm not mistaken, in PHP urlencode() treats spaces as + whereas Python's urlencode() treats them as %20.
EDIT:
It seems I'm mistaken. Python's urlencode() (at least in 2.7.2) uses quote_plus() instead of quote() and thus encodes spaces as "+".
It seems also that the W3C recommendation is the "+" as per here: http://www.w3.org/TR/html4/interact/forms.html#h-17.13.4.1
And in fact, you can follow this interesting debate on Python's own issue tracker about what to use to encode spaces: http://bugs.python.org/issue13866.
EDIT #2:
I understand that the most common way of encoding " " is as "+", but just a note, it may be just me, but I find this a bit confusing:
import urllib
print(urllib.urlencode({' ' : '+ '})
>>> '+=%2B+'
- 155,156
- 41
- 207
- 295
- 5,175
- 5
- 41
- 54
-
Not hardcoding. Trying to determine from an aesthetic perspective what my urls containing spaces will look like. – BC. Oct 27 '09 at 23:36
-
Hi, I am confused too, When user submit the html form, how the form encode the space ? with which character? Is the result browser-dependent? – Sam YC Nov 07 '12 at 06:34
-
1And the `URLEncoder.encode()` method in Java converts it in `+` as well. – рüффп Oct 24 '14 at 12:48
-
And then the question arises as to how to treat encoding in the body of a POST request: "Content-Type: application/x-www-form-urlencoded" where the parameters are in the form of "a=b&c=d", but aren't in a URL at all, just the body of the "document." They made a real mess out of this issue, and it's darned difficult to find definitive answers. – fyngyrz Dec 05 '14 at 19:50
-
-
A problem with %20 is if you add it to a url then your server redirects that to a new url with the same query, it might encode the percent and you end up with %2520 instead of %20 – Curtis Jul 31 '19 at 23:32
-
In Python 3, the method urllib.parse.urlencode has a parameter called quote_via, that accepts a function. Its default value is urllib.parse.quote_plus, but it's possible to choose another function. So, we can use urllib.parse.quote, and thus encodes spaces as "%20". – wensiso Mar 23 '22 at 22:15
A space may only be encoded to "+" in the "application/x-www-form-urlencoded" content-type key-value pairs query part of an URL. In my opinion, this is a may, not a must. In the rest of URLs, it is encoded as %20.
In my opinion, it's better to always encode spaces as %20, not as "+", even in the query part of an URL, because it is the HTML specification (RFC 1866) that specified that space characters should be encoded as "+" in "application/x-www-form-urlencoded" content-type key-value pairs (see paragraph 8.2.1. subparagraph 1.)
This way of encoding form data is also given in later HTML specifications. For example, look for relevant paragraphs about application/x-www-form-urlencoded in HTML 4.01 Specification, and so on.
Here is a sample string in a URL where the HTML specification allows encoding spaces as pluses: "http://example.com/over/there?name=foo+bar". So, only after "?", spaces can be replaced by pluses. In other cases, spaces should be encoded to %20. But since it's hard to determine the context correctly, it's the best practice to never encode spaces as "+".
I would recommend to percent-encode all character except "unreserved" defined in RFC 3986, p.2.3
unreserved = ALPHA / DIGIT / "-" / "." / "_" / "~"
The implementation depends on the programming language that you chose.
If your URL contains national characters, first encode them to UTF-8 and then percent-encode the result.
- 2,995
- 3
- 45
- 69
-
1Why should anyone care about HTML specification if the requested resource isn't HTML? I've seen "+" in some Web APIs which don't respond with HTML e.g. you request a pdf. I consider it wrong that they dont use "%20". – The incredible Jan Oct 12 '17 at 14:09
-
@TheincredibleJan, I agree with you. That's what my reply is about. – Maxim Masiutin Apr 02 '18 at 16:31
-
2@MaximMasiutin When your answer says "This is a MAY, not a MUST", which spec are you referring to? I'm struggling to find a spec that has it as a may. In https://www.w3.org/TR/1999/REC-html401-19991224/interact/forms.html#h-17.13.4.1 using '+' (in the query section) is within a 'must' section of the spec. – JosephH May 07 '19 at 13:19
-
2@JosephH - thank you for your note. It is my persional opinion about MAY. I have edited the post. What I meant is that HTML specification you qouted defines "+", but in the URL context, other rules apply, which permit encoding spaces as %20 also. – Maxim Masiutin Jun 03 '19 at 10:10