Scraping Google Finance (BeautifulSoup)

Question

I'm trying to scrape Google Finance, and get the "Related Stocks" table, which has id "cc-table" and class "gf-table" based on the webpage inspector in Chrome. (Sample Link: https://www.google.com/finance?q=tsla)

But when I run .find("table") or .findAll("table"), this table does not come up. I can find JSON-looking objects with the table's contents in the HTML content in Python, but do not know how to get it. Any ideas?

Dan-Dev · Answer 1 · 2017-12-09T17:19:12.637

The page is rendered with JavaScript. There are several ways to render and scrape it.

I can scrape it with Selenium. First install Selenium:

sudo pip3 install selenium

Then get a driver https://sites.google.com/a/chromium.org/chromedriver/downloads

import bs4 as bs
from selenium import webdriver  
browser = webdriver.Chrome()
url = ("https://www.google.com/finance?q=tsla")
browser.get(url)
html_source = browser.page_source
browser.quit()
soup = bs.BeautifulSoup(html_source, "lxml")
for el in soup.find_all("table", {"id": "cc-table"}):
    print(el.get_text())

Alternatively PyQt5

from PyQt5.QtGui import *  
from PyQt5.QtCore import *  
from PyQt5.QtWebKit import *  
from PyQt5.QtWebKitWidgets import QWebPage
from PyQt5.QtWidgets import QApplication
import bs4 as bs
import sys

class Render(QWebPage):  
    def __init__(self, url):  
        self.app = QApplication(sys.argv)  
        QWebPage.__init__(self)  
        self.loadFinished.connect(self._loadFinished)  
        self.mainFrame().load(QUrl(url))  
        self.app.exec_()  

    def _loadFinished(self, result):  
        self.frame = self.mainFrame()  
        self.app.quit()  

url = "https://www.google.com/finance?q=tsla"
r = Render(url)  
result = r.frame.toHtml()
soup = bs.BeautifulSoup(result,'lxml')
for el in soup.find_all("table", {"id": "cc-table"}):
    print(el.get_text())

Alternatively Dryscrape

import bs4 as bs
import dryscrape

url = "https://www.google.com/finance?q=tsla"
session = dryscrape.Session()
session.visit(url)
dsire_get = session.body()
soup = bs.BeautifulSoup(dsire_get,'lxml')
for el in soup.find_all("table", {"id": "cc-table"}):
    print(el.get_text())

all output:

Valuation▲▼Company name▲▼Price▲▼Change▲▼Chg %▲▼d | m | y▲▼Mkt Cap▲▼TSLATesla Inc328.40-1.52-0.46%53.69BDDAIFDaimler AG72.94-1.50-2.01%76.29BFFord Motor Company11.53-0.17-1.45%45.25BGMGeneral Motors Co...36.07-0.34-0.93%53.93BRNSDFRENAULT SA EUR3.8197.000.000.00%28.69BHMCHonda Motor Co Lt...27.52-0.18-0.65%49.47BAUDVFAUDI AG NPV840.400.000.00%36.14BTMToyota Motor Corp...109.31-0.53-0.48%177.79BBAMXFBAYER MOTOREN WER...94.57-2.41-2.48%56.93BNSANYNissan Motor Co L...20.400.000.00%42.85BMMTOFMITSUBISHI MOTOR ...6.86+0.091.26%10.22B

EDIT

QtWebKit got deprecated upstream in Qt 5.5 and removed in 5.6.

You can switch to PyQt5.QtWebEngineWidgets

could you name some of the several ways to render and scrape javascript? I thought the only way to deal with it is Selenium. — 0xMH, Jul 22 '17 at 23:05
@Mohamed In my examples I have shown ways using Dryscrape, PyQt5 using QtWebKit and selenium you can use all 3 examples separately. Dryscrape is my favourite but doesn't run on windows PyQt5 is my next favourite but I find Selenium clunky. There are 3 examples here I expect there are others checkout scrapy-splash for one example. — Dan-Dev, Jul 22 '17 at 23:13
Can dryscrape, like requests and similar tools in python, get you banned on google for scraping? (That is without using any proxies) — programmerskillz, Apr 19 '19 at 14:57
I didn't get banned when I developed the script. I know google has advanced anti-robot tools. If you dont want to get banned you could try using a proxy with Selenium see my answer to https://stackoverflow.com/questions/55130791/how-to-enable-built-in-vpn-in-operadriver/55203283#55203283 — Dan-Dev, Apr 28 '19 at 15:54

cdo256 · Answer 2 · 2017-07-22T21:51:13.610

Most website owners don't like scrapers because they take data the company values, use up a whole bunch of their server time and bandwidth, and give nothing in return. Big companies like Google may have entire teams employing a whole host of methods to detect and block bots trying to scrape their data.

There are several ways around this:

Scrape from another less secured website.
See if Google or another company has an API for public use.
Use a more advanced scraper like Selenium (and probably still be blocked by google).

Scraping Google Finance (BeautifulSoup)

2 Answers2

Linked