25

I'm trying to encode non-ASCII characters so I can put them inside an url and use them in urlopen. The problem is that I want an encoding like JavaScript (that for example encodes ó as %C3%B3):

encodeURIComponent(ó)
'%C3%B3'

But urllib.quote in python returns ó as %F3:

urllib.quote(ó)
'%F3'

I want to know how to achieve an encoding like javascript's encodeURIComponent in Python, and also if I can encode non ISO 8859-1 characters like Chinese. Thanks!

Bono
  • 4,649
  • 6
  • 46
  • 76
Saúl Pilatowsky-Cameo
  • 1,084
  • 1
  • 13
  • 19
  • related: http://stackoverflow.com/questions/6338469/how-to-url-safe-encode-a-string-with-python-and-urllib-quote-is-wrong – Wooble Jun 21 '11 at 20:05

2 Answers2

36

You want to make sure you're using unicode.

Example:

import urllib

s = u"ó"
print urllib.quote(s.encode("utf-8"))

Outputs:

%C3%B3

Bryan
  • 6,219
  • 2
  • 28
  • 15
33

in Python 3 the urllib.quote has been renamed to urllib.parse.quote.

Also in Python 3 all strings are unicode strings (the byte strings are called bytes).

Example:

from urllib.parse import quote

print(quote('ó'))
# output: %C3%B3
Messa
  • 22,818
  • 5
  • 58
  • 84