I am trying to detect the "google retclecha" page while scraping Google search results with selenium. Some of the scraping codes I wrote.
def spider(search_term, intext_term, include_term, target_site):
driver = open_webdriver()
driver.implicitly_wait(10)
num_records_scraped = 0
for page in range(0, Max_Page, 10):
search_url = target_url(search_term, intext_term, include_term, target_site, page)
driver.get(search_url)
items = select_wholePage(driver)
for item in items:
record = get_result(item)
if record:
records.append(record)
num_records_scraped += 1
time_interval()
driver.quit()
The page is start=0, increasing by 10 and moving to the next 10 pages usually appears as "Move Captcha Page" -> for page in range(0, Max_Page, 10). Verify that the captcha page contains the element ID as "recaptcha token". So I will use this
recaptcha = WebDriverWait(driver, 10).until(EC.presence_of_element_located((By.ID, "recaptcha-token")))
and tried like this
for page in range(0, Max_Page, 10):
search_url = target_url(search_term, intext_term, include_term, target_site, page)
driver.get(search_url)
recaptcha = WebDriverWait(driver, 10).until(EC.presence_of_element_located((By.ID, "recaptcha-token")))
if recaptcha :
print('This is recaptcha')
else:
items = select_wholePage(driver)
for item in items:
record = get_result(item)
if record:
records.append(record)
num_records_scraped += 1
time_interval()
driver.quit()
but it has timeout error
recaptcha = WebDriverWait(driver, 10).until(EC.presence_of_element_located((By.ID, "recaptcha-token"))) in until raise TimeoutException(message, screen, stacktrace) selenium.common.exceptions.TimeoutException: Message:
I thought my logic had a problem detecting catch ID or something. Please help me.