I am trying to download some data from the web (web scraping); I have a list of URLs, a few of the URLs take too much time, and thus the loop gets stuck there; I am implementing a function that timeout after a certain period of threshold value and loop should be continued.
For example, if the downloading_source looks like this:
import time
import numpy as np
def downloading_source(x):
wt = np.random.randint(1,50)
print("waiting time", wt)
time.sleep(wt)
return x**2
for the demo, I am taking random values as time. sleep, and the timeout function looks like this
import error
import os
import signal
import functools
class TimeoutError(Exception):
pass
def timeout(seconds=10, error_message=os.strerror(errno.ETIME)):
def decorator(func):
def _handle_timeout(signum, frame):
raise TimeoutError(error_message)
@functools.wraps(func)
def wrapper(*args, **kwargs):
signal.signal(signal.SIGALRM, _handle_timeout)
signal.alarm(seconds)
try:
result = func(*args, **kwargs)
finally:
signal.alarm(0)
return result
return wrapper
return decorator
The download loop:
# Timeout after 5 seconds
@timeout(5)
def long_running_function2(x):
return downloading_source(x)
all_urls = list(range(1,100))
downloaded_data = []
for url in all_urls:
try:
print(url)
down_data = long_running_function2(url)
except Exception as e:
pass
downloaded_data.append(down_data)
It's working; I was wondering, is there any better way to do this?