0

This code works as expected:

from urllib.request import urlopen 
with urlopen('https://mr.wikipedia.org/s/4jp4') as f:
    f.read().decode('utf-8')

But similar code returns an error. Both the URL's point to the same wiki article.

from urllib.request import urlopen 
with urlopen('https://mr.wikipedia.org/wiki/किशोरावस्था') as f:
    f.read().decode('utf-8')

UnicodeEncodeError: 'ascii' codec can't encode characters in position 10-20: ordinal not in range(128)

I need to use python built-in modules and therefore can not use requests module.


This works. But in my case the URL is part of an API and I do not know which word to quote. Is there more general solution like requests?

from urllib.parse   import quote
from urllib.request import urlopen

url = 'https://mr.wikipedia.org/wiki/' + quote("किशोरावस्था")
content = urlopen(url).read()
shantanuo
  • 30,102
  • 75
  • 225
  • 364

1 Answers1

2

The url is the error creator here. Try:

from urllib.request import urlopen 
with urlopen('https://mr.wikipedia.org/wiki/'+urllib.parse.quote('किशोरावस्था')) as f:
    f.read().decode('utf-8')
Joshua
  • 4,917
  • 1
  • 10
  • 33
  • lol ok the question closed with the right answer. I found this answer before that. Sorry if this was late – Joshua Apr 12 '20 at 16:15