Given a URL to a text file, what is the simplest way to read the contents of the text file?

Question

In Python, when given the URL for a text file, what is the simplest way to access the contents off the text file and print the contents of the file out locally line-by-line without saving a local copy of the text file?

TargetURL=http://www.myhost.com/SomeFile.txt
#read the file
#print first line
#print second line
#etc

score 140 · Accepted Answer · edited Jan 09 '20 at 14:03

Edit 09/2016: In Python 3 and up use urllib.request instead of urllib2

Actually the simplest way is:

import urllib2  # the lib that handles the url stuff

data = urllib2.urlopen(target_url) # it's a file like object and works just like a file
for line in data: # files are iterable
    print line

You don't even need "readlines", as Will suggested. You could even shorten it to: ^*

import urllib2

for line in urllib2.urlopen(target_url):
    print line

But remember in Python, readability matters.

However, this is the simplest way but not the safe way because most of the time with network programming, you don't know if the amount of data to expect will be respected. So you'd generally better read a fixed and reasonable amount of data, something you know to be enough for the data you expect but will prevent your script from been flooded:

import urllib2

data = urllib2.urlopen("http://www.google.com").read(20000) # read only 20 000 chars
data = data.split("\n") # then split it into lines

for line in data:
    print line

^{* Second example in Python 3:}

import urllib.request  # the lib that handles the url stuff

for line in urllib.request.urlopen(target_url):
    print(line.decode('utf-8')) #utf-8 or iso8859-1 or whatever the page encoding scheme is

score 59 · Answer 2 · answered Sep 08 '17 at 21:33

I'm a newbie to Python and the offhand comment about Python 3 in the accepted solution was confusing. For posterity, the code to do this in Python 3 is

import urllib.request
data = urllib.request.urlopen(target_url)

for line in data:
    ...

or alternatively

from urllib.request import urlopen
data = urlopen(target_url)

Note that just import urllib does not work.

score 46 · Answer 3 · answered Jun 19 '18 at 16:39

46

The requests library has a simpler interface and works with both Python 2 and 3.

import requests

response = requests.get(target_url)
data = response.text

answered Jun 19 '18 at 16:39

leafmeal

1,386
12
15

score 28 · Answer 4 · answered Sep 08 '09 at 20:55

28

There's really no need to read line-by-line. You can get the whole thing like this:

import urllib
txt = urllib.urlopen(target_url).read()

answered Sep 08 '09 at 20:55

Ken Kinder

11,875
6
48
70

3

It doesn't work: _AttributeError: module 'urllib' has no attribute 'urlopen'_ – Iratzar Carrasson Bores Feb 16 '18 at 09:06
1

This answer only works in Python 2. EDIT: see [Andrew Mao's answer](https://stackoverflow.com/a/46124819/7830612) for Python 3. – leafmeal Jun 19 '18 at 16:01
3

For Python 3 it would be: txt = urllib.request.urlopen(target_url).read() – delimiter Mar 16 '20 at 01:09

score 12 · Answer 5 · answered Sep 08 '09 at 11:02

12

import urllib2
for line in urllib2.urlopen("http://www.myhost.com/SomeFile.txt"):
    print line

answered Sep 08 '09 at 11:02

Fabian

1,802
13
17

score 6 · Answer 6 · answered Sep 08 '09 at 10:59

6

import urllib2

f = urllib2.urlopen(target_url)
for l in f.readlines():
    print l

answered Sep 08 '09 at 10:59

Will

71,757
38
162
237

2

+1, but please note that it's the simplest way, NOT THE SAFEST. If any error occurs on the server side and this one delivery content for ever, you could ends up with an infinite loop. – e-satis Sep 08 '09 at 11:03

score 5 · Answer 7 · answered Jun 19 '18 at 16:27

5

Another way in Python 3 is to use the urllib3 package.

import urllib3

http = urllib3.PoolManager()
response = http.request('GET', target_url)
data = response.data.decode('utf-8')

This can be a better option than urllib since urllib3 boasts having

Thread safety.

Connection pooling.

Client-side SSL/TLS verification.

File uploads with multipart encoding.

Helpers for retrying requests and dealing with HTTP redirects.

Support for gzip and deflate encoding.

Proxy support for HTTP and SOCKS.

100% test coverage.

answered Jun 19 '18 at 16:27

leafmeal

1,386
12
15

2

The [requests](https://2.python-requests.org/en/master/) library is partly based on urllib3. – floydn Jun 14 '19 at 17:30
Actually this is the only one of the above answers that will install (urllibx) for the latest version of Python to date. – Abstract Space Crack Jan 24 '20 at 00:26

score 5 · Answer 8 · answered Mar 12 '20 at 09:43

5

For me, none of the above responses worked straight ahead. Instead, I had to do the following (Python 3):

from urllib.request import urlopen

data = urlopen("[your url goes here]").read().decode('utf-8')

# Do what you need to do with the data.

answered Mar 12 '20 at 09:43

bmiselis

192
2
10

delimiter · Answer 9 · 2022-01-13T20:12:39.493

4

Just updating here the solution suggested by @ken-kinder for Python 2 to work for Python 3:

import urllib
urllib.request.urlopen(target_url).read()

edited Jan 13 '22 at 20:12

answered Mar 16 '20 at 01:13

delimiter

645
3
12

score 4 · Answer 10 · answered Sep 22 '20 at 18:11

requests package works really well for simple ui as @Andrew Mao suggested

import requests
response = requests.get('http://lib.stat.cmu.edu/datasets/boston')
data = response.text
for i, line in enumerate(data.split('\n')):
    print(f'{i}   {line}')

o/p:

0    The Boston house-price data of Harrison, D. and Rubinfeld, D.L. 'Hedonic
1    prices and the demand for clean air', J. Environ. Economics & Management,
2    vol.5, 81-102, 1978.   Used in Belsley, Kuh & Welsch, 'Regression diagnostics
3    ...', Wiley, 1980.   N.B. Various transformations are used in the table on
4    pages 244-261 of the latter.
5   
6    Variables in order:

Checkout kaggle notebook on how to extract dataset/dataframe from URL

score 4 · Answer 11 · answered Feb 24 '21 at 15:35

4

I do think requests is the best option. Also note the possibility of setting encoding manually.

import requests
response = requests.get("http://www.gutenberg.org/files/10/10-0.txt")
# response.encoding = "utf-8"
hehe = response.text

answered Feb 24 '21 at 15:35

xiaoou wang

688
8
12

Given a URL to a text file, what is the simplest way to read the contents of the text file?

11 Answers11

Linked

Related