0

I have a code which launches selenium in multithreading mode

ThreadPool(2).map(identifier.check_type, data)

threadLocal = threading.local()

def get_driver():
    driver = getattr(threadLocal, 'driver', None)
    if driver is None:
        chrome_options = Options()
        ua = UserAgent()
        userAgent = ua.random
        chrome_options.add_argument(f'user-agent={userAgent}')
        # chrome_options.headless = True
        driver = webdriver.Chrome(chrome_options=chrome_options)
        driver.execute_cdp_cmd("Page.addScriptToEvaluateOnNewDocument", {
            "source":
                "const newProto = navigator.__proto__;"
                "delete newProto.webdriver;"
                "navigator.__proto__ = newProto;"
        })
        driver.execute_cdp_cmd("Page.addScriptToEvaluateOnNewDocument", {
            "source": """
                               Object.defineProperty(navigator, 'webdriver', {
                                 get: () => undefined
                               })
                             """
        })
        driver.delete_all_cookies()
        setattr(threadLocal, 'driver', driver)

    return driver


class TypeIdentifier():
    def __init__(self):
        pass
        #self.driver = self.launch_driver()

    def check_type(self, input_data):
        driver = get_driver()
        url = input_data['start_url']
        print(url)
        driver.get(url)
        type1 = type1_crawler.my_type(driver)

How do I keep track of driver instances and how to manage them properly so I can close any instance or pass change the url of any driver instance? Right now whenever a url failes to load or some kind of exception occur while crawling, the thread just stop responding or going for the new urls.

Booboo
  • 29,245
  • 3
  • 32
  • 48
Waqar
  • 550
  • 2
  • 6
  • 18
  • What does it even mean to "change the url of any driver instance"? You are calling `map` passing an *iterable* so method `check_type` gets invoked repeatedly for each element of that *iterable* bound to argument *input_data* and that element is determining the URL that will be fetched. You just need to catch possible exceptions thrown by your driver. You also need a mechanism for terminating all the browsers when you are done running `map`. See [this answer](https://stackoverflow.com/questions/53475578/python-selenium-multiprocessing#64513719). – Booboo Nov 09 '21 at 16:40

0 Answers0