0

I am trying to collect the ASIN of products from Amazon.in webpage.I have the code which will open a web driver and search for a product name and navigate to the first page of the product page.It is able to collect the data only for the first page,but how to move to the next page to collect same data. here is my code :

import time
import json
import re
import numpy as np
from bs4 import BeautifulSoup
from selenium import webdriver
import urllib.request
from selenium.webdriver.common.by import By
from selenium.webdriver.support.ui import WebDriverWait
from selenium.webdriver.support import expected_conditions as EC
from selenium.common.exceptions import TimeoutException
from selenium.webdriver.common.keys import Keys
import pandas as pd


temp = []


def init_driver():
    driver = webdriver.Chrome(executable_path = "C:\\Users\\Desktop\\chromedriver")
    driver.wait = WebDriverWait(driver, 10)
    return driver


def get_asin(driver):

    driver.get("https://www.amazon.in")
    print ('Getting the URL')
    HTML = driver.page_source
    search_button = driver.find_element_by_id("twotabsearchtextbox")
    search_button.send_keys("Mobiles")
    select_button = driver.find_element_by_class_name("nav-input")
    select_button.click()
    HTML1=driver.page_source
    soup = BeautifulSoup(HTML1, "html.parser")


    styles = soup.find_all('li')
    #print(styles)
    #print(type(styles))
    ASIN=[]
    for link in styles:
        if link.has_attr('data-asin'):
            ASIN.append(link['data-asin'])

    return(ASIN)
    #print(ASIN)


if __name__ == "__main__":
    driver = init_driver()
    ASIN_NO = get_asin(driver)
    #time.sleep(3)
    #print ('opening search page')
    #for i in range(0,len(ASIN_NO)):
        #scrape(driver,ASIN_NO[i])

    print (ASIN_NO)
    time.sleep(5)

I have tried the both the following syntax which shows the errors :

select_button = driver.find_element_by_id('pagnNextString')
select_button.click()

Exception in logs:

WebDriverException: Message: unknown error: Element ... is not clickable at point (778, 606). Other element would receive the click:

select_button = driver.find_element_by_class_name('srSprite pagnNextArrow')
select_button.click()

InvalidSelectorException: Message: invalid selector: Compound class names not permitted

please help with the correct way. Thanks in advance.

iamsankalp89
  • 4,363
  • 2
  • 13
  • 35
POOFY
  • 1
  • 1

2 Answers2

0

I thinks you have to maximize the window as element is not view-able thats why the issue element is not clickable appears

driver.maximize_window()

USe this xpath for next button (for InvalidSelctor issue)

.//*[@id='nav-search']/form/div[2]/div/input

I don't have too much knowledge of python. Thisis java coding works fine in my system . convert it to Python

WebDriver driver=new FirefoxDriver();
driver.get("https://www.amazon.in");
driver.manage().timeouts().implicitlyWait(10, TimeUnit.SECONDS);
WebElement search_txt=driver.findElement(By.xpath("//*[@id='twotabsearchtextbox']"));
search_txt.sendKeys("Mobiles");
driver.manage().window().maximize();
driver.findElement(By.xpath(".//*[@id='nav-search']/form/div[2]/div/input")).click();
WebElement select_btn=driver.findElement(By.xpath("//*[@id='pagnNextString']"));
select_btn.click();
iamsankalp89
  • 4,363
  • 2
  • 13
  • 35
0

To be able to click Next button you can use below code:

from selenium.webdriver.common.by import By
from selenium.webdriver.support.ui import WebDriverWait
from selenium.webdriver.support import expected_conditions as EC

next_button = WebDriverWait(driver, 10).until(EC.visibility_of_element_located((By.ID, "pagnNextString")))
next_button.location_once_scrolled_into_view
next_button.click()

This should allow you to wait until button appears on page, scroll down to it and successfully click

Andersson
  • 49,746
  • 15
  • 64
  • 117
  • yes.The code works nicely to navigate through next pages.But I am facing a problem here.all the ASIN numbers were extracted only for the 1st page but while navigating through next pages,it's extracted only first 6 ASIN numbers and getting stopped.Can you help in this case? – POOFY Sep 20 '17 at 10:33
  • IMHO, there is no need in using BeautifulSoup as you can simply use Selenium built-in methods to parse current page. Try `styles = WebDriverWait(driver, 10).until(EC.presence_of_all_elements_located((By.XPATH, "//li[@data-asin]")))` – Andersson Sep 20 '17 at 10:47