3

Overview

I am using a proxy network and want to configure it with Selenium on Python. I have seen many post use the HOST:PORT method, but proxy networks uses the "URL method" of http://USER:PASSWORD@PROXY:PORT

SeleniumWire

I found SeleniumWire to be a way to connect the "URL method" of proxy networks to a Selenium Scraper. See basic SeleniumWire configuration:

from seleniumwire import webdriver

options = {
    'proxy':
    {
        'http': 'http://USER:PASSWORD@PROXY:PORT',
        'https': 'http://USER:PASSWORD@PROXY:PORT'
    },
}

driver = webdriver.Chrome(seleniumwire_options=options)
driver.get("https://some_url.com")

This correctly adds and cycles a proxy to the driver, however on many websites the scraper is quickly blocked by CloudFlare. This blocking is something that does not happen when running on Local IP. After searching through SeleniumWire's GitHub Repository Issues, I found that this is caused by TLS fingerprinting and that there is no current solution to this issue.

Selenium Options

I tried to configure proxies the conventional selenium way:

from selenium import webdriver

options = webdriver.ChromeOptions()
options.add_argument("--proxy-server=http://USER:PASSWORD@PROXY:PORT")
driver = webdriver.Chrome(options=options)
driver.get("https://some_url.com")

A browser instance does open but fails because of a network error. Browser instance does not load in established URL.

Docker Configuration

The end result of this configuration would be running python code within a docker container that is within a Lambda function. Don't know whether or not that introduces a new level of abstraction or not.

Summary

What other resources can I use to correctly configure my Selenium scraper to use the "URL method" of IP cycling?

Versions

  • python 3.9
  • selenium 3.141.0
  • docker 20.10.11

Support Tickets

Github: https://github.com/SeleniumHQ/selenium/issues/10605

ChromeDriver: https://bugs.chromium.org/p/chromedriver/issues/detail?id=4118

Luke Hamilton
  • 196
  • 14

1 Answers1

0

try to use DesiredCapabilities instead of ChromeOptions :

from selenium import webdriver
from selenium.webdriver.common.proxy import *

proxy_url = "127.0.0.1:9009"
proxy = Proxy({
    'proxyType': ProxyType.MANUAL,
    'httpProxy': proxy_url,
    'sslProxy': proxy_url,
    'noProxy': ''})

capabilities = webdriver.DesiredCapabilities.CHROME
proxy.add_to_capabilities(capabilities)

driver = webdriver.Chrome(desired_capabilities=capabilities)
driver.get("https://some_url.com")
Asmoun
  • 1,431
  • 3
  • 12
  • 41
  • 1
    Appreciate the answer! I changed the proxy_url variable to the proxy network configuration- http ://USER:PASSWORD@PROXY:PORT however this does not seem to add a proxy at all. It gave me the same ip address if I added the desired capabilities or just ran a normal webdriver. Why is your proxy_url variable a standard host:port configuration in this answer? – Luke Hamilton Apr 25 '22 at 14:57
  • 1
    The ip is also the same as my local machines ip^ – Luke Hamilton Apr 25 '22 at 15:14