-2

I am web-scraping a few websites on Debian with Python 2.7, but maybe my code automatically stop (if it can't load in time (freeze) or there isn't an internet connection).

Is there any solution to solve this, and maybe just skip the problem and run my code to the next URL? Because if I get a problem like this, this script just stop automatically..

Here my code:

#!/usr/bin/python
#!/bin/sh
# -*- coding: utf-8 -*-
from bs4 import BeautifulSoup
from selenium import webdriver
import urllib2
import subprocess
import unicodecsv as csv
import os
import sys
import io
import time
import datetime
import pandas as pd
import MySQLdb
import re
import contextlib
import selenium.webdriver.support.ui as ui
import numpy as np
from datetime import datetime, timedelta
from selenium.webdriver.support.ui import WebDriverWait
from selenium.webdriver.support import expected_conditions as EC
from selenium.webdriver.common.by import By 
import pyautogui 
from pykeyboard import PyKeyboard

reload(sys)
sys.setdefaultencoding('utf-8')


cols = ['MYCOLS..'] 

browser = webdriver.Firefox()
datatable=[]

browser.get('LINK1')
time.sleep(5)

browser.find_element_by_xpath('//button[contains(text(), "CLICK EVENT")]').click()
time.sleep(5)
browser.find_element_by_xpath('//button[contains(text(), "CLICK EVENT")]').click()
html = browser.page_source
soup=BeautifulSoup(html,"html.parser")
table = soup.find('table', { "class" : "table table-condensed table-hover data-table m-n-t-15" })    

for record in table.find_all('tr', class_="hidden-xs hidden-sm ng-scope"):   
    for data in record.find_all("td"):
        temp_data.append(data.text.encode('utf-8'))
    newlist = filter(None, temp_data)
    datatable.append(newlist)
       
time.sleep(10) 
browser.close()

#HERE I INSERT MY DATAES INTO MYSQL..IT IS NOT IMPORTANT, AND MY SECOND LINK STARTING HERE

browser = webdriver.Firefox()
datatable=[]
   
browser.get('LINK2')
browser.find_element_by_xpath('//button[contains(text(), "LCLICK EVENT")]').click()
time.sleep(5)
html = browser.page_source
soup=BeautifulSoup(html,"html.parser")
table = soup.find('table', { "class" : "table table-condensed table-hover data-table m-n-t-15" })

for record in table.find_all('tr', class_="hidden-xs hidden-sm ng-scope"):   
    for data in record.find_all("td"):
        temp_data.append(data.text.encode('utf-8'))
    newlist = filter(None, temp_data)
    datatable.append(newlist)
       
time.sleep(10) 
browser.close()

#MYSQLDB PART AGAIN...AND THE NEXT LINK IS COMING.

+1 EDIT:

When the script not find this CLICK EVENT stop too. Why? How can I avoid this?

Harley
  • 299
  • 1
  • 4
  • 14
  • two things, 1. make your code modeler 1. use exception handling – Gaurang Shah Oct 05 '17 at 10:49
  • Welcome to Stack Overflow! See: [How do I do X?](https://meta.stackoverflow.com/questions/253069/whats-the-appropriate-new-current-close-reason-for-how-do-i-do-x) The expectation on SO is that the user asking a question not only does research to answer their own question but also shares that research, code attempts, and results. This demonstrates that you’ve taken the time to try to help yourself, it saves us from reiterating obvious answers, and most of all it helps you get a more specific and relevant answer! See also: [ask] – JeffC Oct 05 '17 at 17:32

2 Answers2

0

Using Selenium, you can configure your driver (browser object) to wait for specific elements or condition. Then you may use regular try/except to handle any error, for instance TimeoutException or many others.

Selenium explained the wait system pretty well on their documentation.

Here is a code snippet for exception handling on Selenium :

from selenium.webdriver.support.ui import WebDriverWait
from selenium.common.exceptions import TimeoutException
from selenium.common.exceptions import NoSuchElementException
from selenium.webdriver.common.by import By
from selenium.webdriver.support import expected_conditions as EC

try:
    # Wait for any element / condition, you can even had lambda if you wish to
    WebDriverWait(browser, 10).until(
        EC.visibility_of_all_elements_located((By.ID, 'my-item'))
    )
except TimeoutException:
    # Here I raise an error but you can do whatever you want like exiting properly or logging something
    raise RuntimeError('No Internet connection')
rak007
  • 935
  • 12
  • 24
  • For example i can wait for a button? This is the example webpage: https://www.flightradar24.com/data/airports/grz/departures and my code: browser.find_element_by_xpath('//button[contains(text(), "Load earlier flights")]').click() , How can I build it into my code and where i have to put? – Harley Oct 05 '17 at 12:29
  • I won't do everything for you, I gave you a tip. Now using Selenium you can easily build so, just see their documentation – rak007 Oct 05 '17 at 12:30
  • I just asked with a correct example..thank you anyway. – Harley Oct 05 '17 at 12:32
  • Just use WebDriverWait(browser, 10).until( EC.visibility_of_all_elements_located((By.XPATH, 'YOUR XPATH')) ) in a try/except. If nothing is wrong, then you know you can get your element – rak007 Oct 05 '17 at 12:35
0

So the last comment on this page has some hints towards an answer: Cant login into Nike with python selenium

Apparently Nike uses akamai as bot protector, which has so far been very good in derailing all our attempts into nike :) I too have been attempting the same via code below. Please update or comment any solution, as and when. Will keep and eye out here.

#automating nike purchase for a bud

from selenium import webdriver
from selenium.webdriver.common.keys import Keys
from time import sleep as sl
from selenium.webdriver.common.by import By
from selenium.webdriver.support.ui import WebDriverWait
from selenium.webdriver.support import expected_conditions as EC 
import datetime as dt

#setting path to driver

PATH = 'C:\Program Files (x86)\chromedriver.exe'

#creating instance of the driver

driver = webdriver.Chrome(PATH)

#target date to be set manually for initiation of get requests

tg_day = dt.datetime(2021, 5, 22, 1, 8, 30, 1)
print(tg_day)

#match now() to target date then proceeds

def time_check():
date = dt.datetime.now()
print(date)

if tg_day.hour == date.hour and tg_day.minute == date.minute and tg_day.second == date.second:
    print("Activated")
    nike_website() # this breaks out of the while loop if it's the right day.

else:
    print("Waiting a bit longer....")
    sl(1) #wait 1 seconds

        

#driver and actionable methods here

def nike_website():

driver.get('https://gs.nike.com/?checkoutId=c8195822-6539-4651-bfc7-eb8bf4156237&launchId=e18f55bb-c188-4fab-83c0-cc19107f59e7&skuId=28a2f06b-565d-5fc1-9dbd-6b1bfaf4acd3&country=IN&locale=en-GB&appId=com.nike.commerce.snkrs.web&returnUrl=https:%2F%2Fwww.nike.com%2Fin%2Flaunch%2Ft%2Flebron-18-low-mimi-plange-higher-learning%2F')
print(driver.title)

driver.switch_to.default_content()
wait = WebDriverWait(driver,10)
wait.until(EC.element_to_be_clickable((By.NAME, 'emailAddress')))

#for added buffer sleep 5 seconds, redundant too yes!

sl(5)

#finding the element email via name

email = driver.find_element_by_name("emailAddress")
email.send_keys("<PUT YOUR EMAIL HERE>")
email.send_keys(Keys.RETURN)

#finding the element email via name

password = driver.find_element_by_name("password")
password.send_keys("<PUT YOUR PASSWORD HERE>")
password.send_keys(Keys.RETURN)

#buffer sleep to watch response

sl(10)

#activation

while True:
time_check()
zora
  • 3
  • 3