4

I am trying to parse amazon to compile a list of prices, as part of a bigger project relating to statistics. However, I am stumped. I was wondering If anyone can review my code and tell me where I went wrong?

#!/usr/bin/python
# -*- coding:  utf-8 -*-
import mechanize
from bs4 import BeautifulSoup

URL_00 = "http://www.amazon.co.uk/Call-Duty-Black-Ops-PS3/dp/B007WPF7FE/ref=sr_1_2?ie=UTF8&qid=1352117194&sr=8-2"

bro = mechanize.Browser()
resp = bro.open(URL_00)
html = resp.get_data()
soup_00 = BeautifulSoup(html)
price = soup_00.find('b', {'class':'priceLarge'})
print price #this should return at the very least the text enclosed in a tag

According to the screenshot, what I wrote above should work, shouldn't it?

http://i.imgur.com/bPVe1.png (cannot post an image as a newbie..)

Well all I get in the print out is "[]", if I change the line before last to this:

 price = soup_00.find('b', {'class':'priceLarge'}).contents[0].string

or

price = soup_00.find('b', {'class':'priceLarge'}).text

I get a "noneType" error.

I am quite confused as to why this is happening. The page encoding in the URL on chrome says UTF8, to which my script is adjusted in line #2. I have changed it to ISO (as per inner HTML of the page) but this makes zero difference, so I am positive encoding is not the issue here.

Also, don't know if this is relevant at all, but my system locale on linux being UTF-8 should not cause a problem should it?

Any ideas would be welcome.

NopeNopeNope
  • 55
  • 1
  • 4
  • 2
    And you have confirmed that Mechanize is given the exact same HTML as your browser? Do *not* assume that Amazon will send the exact same response to different user agents. – Martijn Pieters Nov 05 '12 at 12:19
  • Just so you don't think I ignored the comment =) I have added a user agent to mechanize to mach mine. And a downloaded version of the page from my browser is equal to the one mechanize pulls, so this should not be an issue... – NopeNopeNope Nov 05 '12 at 12:26

1 Answers1

1

There's no need to do this as Amazon provide an API

https://affiliate-program.amazon.co.uk/gp/advertising/api/detail/main.html

The Product Advertising API helps you advertise Amazon products using product search and look up capability, product information and features such as Customer Reviews, Similar Products, Wish Lists and New and Used listings.

More detail here: Amazon API library for Python?

I'm using the API and it so much easier and reliable then scraping the data from the webpage, even with BS. You will also get access to a list of prices for new, second hand etc and not just the "headline" price.

Community
  • 1
  • 1
Paul Collingwood
  • 9,066
  • 3
  • 21
  • 34
  • I will check this out. However since I am not doing this "officially", but more as a personal side-project, I was actually hoping for something quick and dirty. But thank you for this. – NopeNopeNope Nov 05 '12 at 12:29
  • It's literally a matter of moments to sign up and the interface you get is then really usable. Trust me, it'll be much quicker then doing it the way you are attempting (it's why BS exists!) – Paul Collingwood Nov 05 '12 at 12:30
  • So the three main choices are pyaws, bottlenose or pyamazon? Which of those would you suggest ? – NopeNopeNope Nov 05 '12 at 12:42
  • Marking this as the answer, because it got me to this - https://github.com/yoavaviram/python-amazon-simple-product-api Very simple and easy api that get's what I want. Many thanks! – NopeNopeNope Nov 05 '12 at 13:02