How do I make selenium click on elements and scrape data before the page has fully loaded? My internet connection is quite terrible so it sometimes takes forever to load the page entirely, is there anyway around this?
- 151,581
- 34
- 225
- 281
- 551
- 1
- 8
- 26
-
2See: [How do I do X?](https://meta.stackoverflow.com/questions/253069/whats-the-appropriate-new-current-close-reason-for-how-do-i-do-x) The expectation on SO is that the user asking a question not only does research to answer their own question but also shares that research, code attempts, and results. This demonstrates that you’ve taken the time to try to help yourself, it saves us from reiterating obvious answers, and most of all it helps you get a more specific and relevant answer! See also: [ask] – JeffC Sep 20 '17 at 15:22
-
1Does this answer your question? [How to make Selenium not wait till full page load, which has a slow script?](https://stackoverflow.com/questions/44770796/how-to-make-selenium-not-wait-till-full-page-load-which-has-a-slow-script) – Aug 23 '20 at 20:18
3 Answers
ChromeDriver 77.0 (which supports Chrome version 77) now supports eager as pageLoadStrategy.
Resolved issue 1902: Support eager page load strategy [Pri-2]
As you question mentions of click on elements and scrape data before the page has fully loaded in this case we can take help of an attribute pageLoadStrategy. When Selenium loads a page/url by default it follows a default configuration with pageLoadStrategy set to normal. Selenium can start executing the next line of code from different Document readiness state. Currently Selenium supports 3 different Document readiness state which we can configure through the pageLoadStrategy as follows:
none(undefined)eager(page becomes interactive)normal(complete page load)
Here is the code block to configure the pageLoadStrategy:
from selenium import webdriver
from selenium.webdriver.common.desired_capabilities import DesiredCapabilities
binary = r'C:\Program Files\Mozilla Firefox\firefox.exe'
caps = DesiredCapabilities().FIREFOX
# caps["pageLoadStrategy"] = "normal" # complete
caps["pageLoadStrategy"] = "eager" # interactive
# caps["pageLoadStrategy"] = "none" # undefined
driver = webdriver.Firefox(capabilities=caps, firefox_binary=binary, executable_path="C:\\Utility\\BrowserDrivers\\geckodriver.exe")
driver.get("https://google.com")
- 151,581
- 34
- 225
- 281
-
-
@nonein Using the `DesiredCapabilities` you can implement it using any browser Chrome, IE, Safari, Edge etc. Please Accept the Answer if it catered to your Question. – undetected Selenium Sep 25 '17 at 03:22
-
Do I simply just add capabilities=caps to chrome as well? or do I use the argument function? – no nein Sep 26 '17 at 05:52
-
is it also possible to use a strategy where it fully waits for the page to load? – no nein Nov 01 '20 at 18:08
For Chromedriver it works the same as in @DebanjanB's answer, however the 'eager' page load strategy is not yet supported
So for chromedriver you get:
from selenium import webdriver
from selenium.webdriver.common.desired_capabilities import DesiredCapabilities
caps = DesiredCapabilities().CHROME
# caps["pageLoadStrategy"] = "normal" # Waits for full page load
caps["pageLoadStrategy"] = "none" # Do not wait for full page load
driver = webdriver.Chrome(desired_capabilities=caps, executable_path="path/to/chromedriver.exe")
Note that when using the 'none' strategy you most likely have to implement your own wait method to check if the element you need is loaded.
from selenium.webdriver.common.by import By
from selenium.webdriver.support.ui import WebDriverWait
from selenium.webdriver.support import expected_conditions as ec
WebDriverWait(driver, timeout=10).until(
ec.visibility_of_element_located((By.ID, "your_element_id"))
)
Now you can start interacting with your element before the page is fully loaded!
- 113
- 1
- 5
SAME AS ABOVE for those that use chrome.. USED "EAGER" IN CAPS. WORKS PERFECT. Sped up my time greatly.
from selenium import webdriver
from selenium.webdriver.common.desired_capabilities import DesiredCapabilities
caps = DesiredCapabilities().CHROME
# caps["pageLoadStrategy"] = "normal" # Waits for full page load
caps["pageLoadStrategy"] = "eager" # Do not wait for full page load
driver = webdriver.Chrome(desired_capabilities=caps, executable_path="path/to/chromedriver.exe")
- 11
- 2