I am trying to parse amazon to compile a list of prices, as part of a bigger project relating to statistics. However, I am stumped. I was wondering If anyone can review my code and tell me where I went wrong?
#!/usr/bin/python
# -*- coding: utf-8 -*-
import mechanize
from bs4 import BeautifulSoup
URL_00 = "http://www.amazon.co.uk/Call-Duty-Black-Ops-PS3/dp/B007WPF7FE/ref=sr_1_2?ie=UTF8&qid=1352117194&sr=8-2"
bro = mechanize.Browser()
resp = bro.open(URL_00)
html = resp.get_data()
soup_00 = BeautifulSoup(html)
price = soup_00.find('b', {'class':'priceLarge'})
print price #this should return at the very least the text enclosed in a tag
According to the screenshot, what I wrote above should work, shouldn't it?
Well all I get in the print out is "[]", if I change the line before last to this:
price = soup_00.find('b', {'class':'priceLarge'}).contents[0].string
or
price = soup_00.find('b', {'class':'priceLarge'}).text
I get a "noneType" error.
I am quite confused as to why this is happening. The page encoding in the URL on chrome says UTF8, to which my script is adjusted in line #2. I have changed it to ISO (as per inner HTML of the page) but this makes zero difference, so I am positive encoding is not the issue here.
Also, don't know if this is relevant at all, but my system locale on linux being UTF-8 should not cause a problem should it?
Any ideas would be welcome.