Python 3 Get HTTP page

Question

How can I get python to get the contents of an HTTP page? So far all I have is the request and I have imported http.client.

Greg Hewgill · Accepted Answer · 2010-01-07T21:53:54.137

56

Using urllib.request is probably the easiest way to do this:

import urllib.request
f = urllib.request.urlopen("http://stackoverflow.com")
print(f.read())

edited Jan 07 '10 at 21:53

answered Jan 07 '10 at 21:48

Greg Hewgill

890,778
177
1,125
1,260

Tried that and I got "AttributeError: 'module' object has no attribute 'urlopen'" – BiscottiGummyBears Jan 07 '10 at 21:52
1

Sorry, I just noticed that you were using Python 3. I've updated my example to match. – Greg Hewgill Jan 07 '10 at 21:53
2

@Davide Gualano: The Python 2.x `urllib2` module has been rolled into the Python 3.x `urllib` set of modules: http://docs.python.org/library/urllib2.html – Greg Hewgill Jan 07 '10 at 21:58
@Greg: my bad, I didn't read the question title carefully enough :) – Davide Gualano Jan 07 '10 at 22:08

score 14 · Answer 2 · edited Dec 26 '19 at 10:54

Usage built-in module "http.client"

import http.client

connection = http.client.HTTPSConnection("api.bitbucket.org", timeout=2)
connection.request('GET', '/2.0/repositories')
response = connection.getresponse()
print('{} {} - a response on a GET request by using "http.client"'.format(response.status, response.reason))
content = response.read().decode('utf-8')
print(content[:100], '...')

Result:

200 OK - a response on a GET request by using "http.client" {"pagelen": 10, "values": [{"scm": "hg", "website": "", "has_wiki": true, "name": "tweakmsg", "links ...

Usage third-party library "requests"

response = requests.get("https://api.bitbucket.org/2.0/repositories")
print('{} {} - a response on a GET request by using "requests"'.format(response.status_code, response.reason))
content = response.content.decode('utf-8')
print(content[:100], '...')

Result:

200 OK - a response on a GET request by using "requests" {"pagelen": 10, "values": [{"scm": "hg", "website": "", "has_wiki": true, "name": "tweakmsg", "links ...

Usage built-in module "urllib.request"

response = urllib.request.urlopen("https://api.bitbucket.org/2.0/repositories")
print('{} {} - a response on a GET request by using "urllib.request"'.format(response.status, response.reason))
content = response.read().decode('utf-8')
print(content[:100], '...')

Result:

200 OK - a response on a GET request by using "urllib.request" {"pagelen": 10, "values": [{"scm": "hg", "website": "", "has_wiki": true, "name": "tweakmsg", "links ...

Notes:

Python 3.4
Result from the responses most likely will be differ only content

score 1 · Answer 3 · answered May 18 '16 at 06:04

1

You can also use the requests library. I found this particularly useful because it was easier to retrieve and display the HTTP header.

import requests

source = 'http://www.pythonlearn.com/code/intro-short.txt'

r = requests.get(source)

print('Display actual page\n')
for line in r:
    print (line.strip())

print('\nDisplay all headers\n')
print(r.headers)

answered May 18 '16 at 06:04

dimsum88

37
4

Is this Python 3? – Nam G VU Oct 01 '17 at 07:21

score 1 · Answer 4 · answered Nov 09 '18 at 19:08

1

pip install requests

import requests

r = requests.get('https://api.spotify.com/v1/search?type=artist&q=beyonce')
r.json()

answered Nov 09 '18 at 19:08

Anthony Awuley

2,835
25
18

score 0 · Answer 5 · edited Oct 15 '15 at 13:30

0

Add this code which can format data for human reading:

text = f.read().decode('utf-8')

edited Oct 15 '15 at 13:30

kenorb

137,499
74
643
694

answered Oct 15 '15 at 07:53

SKGoC

21
2

score 0 · Answer 6 · answered Oct 21 '17 at 20:23

https://stackoverflow.com/a/41862742/8501970 Check this out instead. Its about the same issue you have and this one is very simple and very few lines of codes. This sure helped me when i realized python3 cannot use simply get_page.

This is a fine alternative. (hope this helps, cheers!)

Python 3 Get HTTP page

6 Answers6

Linked