0

I am trying to download some data from the web (web scraping); I have a list of URLs, a few of the URLs take too much time, and thus the loop gets stuck there; I am implementing a function that timeout after a certain period of threshold value and loop should be continued.

For example, if the downloading_source looks like this:

import time
import numpy as np

def downloading_source(x):
    wt = np.random.randint(1,50)
    print("waiting time", wt)
    time.sleep(wt)
    return x**2

for the demo, I am taking random values as time. sleep, and the timeout function looks like this

import error
import os
import signal
import functools

class TimeoutError(Exception):
    pass

def timeout(seconds=10, error_message=os.strerror(errno.ETIME)):
    def decorator(func):
        def _handle_timeout(signum, frame):
            raise TimeoutError(error_message)

        @functools.wraps(func)
        def wrapper(*args, **kwargs):
            signal.signal(signal.SIGALRM, _handle_timeout)
            signal.alarm(seconds)
            try:
                result = func(*args, **kwargs)
            finally:
                signal.alarm(0)
            return result

        return wrapper

    return decorator

The download loop:

# Timeout after 5 seconds
@timeout(5)
def long_running_function2(x):
    return downloading_source(x)

all_urls = list(range(1,100))

downloaded_data = []

for url in all_urls:
    
    try:
        print(url)
        down_data = long_running_function2(url)
    except Exception as e:
        pass
    
    downloaded_data.append(down_data)

It's working; I was wondering, is there any better way to do this?

martineau
  • 112,593
  • 23
  • 157
  • 280
Aaditya Ura
  • 10,695
  • 7
  • 44
  • 70

0 Answers0