38

google finds my browser is being manipulated/controlled/automated by software, and because of that I get reCaptcha. When I manual start chromium and do the same steps the reCaptcha doesn't appear.

Question 1)

Is it possible to solve captcha Programmatically or get rid of it when using puppeteer? Any way to solve this?

Question 2)

Does this happens only when without headless option i.e

const browser = await puppeteer.launch({
  headless: false
})

OR this is something the fact we have to accept and move on?

rinold simon
  • 2,185
  • 4
  • 17
  • 37
  • Check out this blogpost. It is close to your own situation. https://medium.com/@jsoverson/bypassing-captchas-with-headless-chrome-93f294518337 – Paula Livingstone Apr 14 '19 at 17:48
  • I already came across that blog. He uses `2captcha` which is not FREE :P – rinold simon Apr 14 '19 at 17:52
  • 2
    Your accepted answer is PAID service from 2captcha.com. If you want to pay then why use Headless Chrome + Puppeteer? Why dont you just use `CURL` ? – Cyborg Jan 25 '20 at 23:08

4 Answers4

51

Try generating random useragent using this npm package. This usually solves the user agent-based protection.

In puppeteer pages can override browser user agent with page.setUserAgent

var userAgent = require('user-agents');
...
await page.setUserAgent(userAgent.toString())

Additionally, you can add these two extra plugins,

puppeteer-extra-plugin-recaptcha - Solves reCAPTCHAs automatically, using a single line of code: page.solveRecaptchas()

NOTE: puppeteer-extra-plugin-recaptcha uses a paid service 2captcha

puppeteer-extra-plugin-stealth - Applies various evasion techniques to make detection of headless puppeteer harder.

rinold simon
  • 2,185
  • 4
  • 17
  • 37
  • 3
    This solution is using PAID service from 2captcha.com. You dont need ALL this stuff if you use 2captcha PAID service! – Cyborg Jan 25 '20 at 23:09
  • 2
    Yes. It's not just one line of code. You also need to sign up for the service and pay for every captcha sloved – Tim Kozak Nov 11 '20 at 11:08
38

Here is a list of things I'm doing to bypass the captchas and similar blockings:

  • Enable stealth mode (via puppeteer-extra-plugin-stealth)
  • Randomize User-agent or Set a valid one (via random-useragent)
  • Randomize Viewport size
  • Skip images/styles/fonts loading for better performance
  • Pass "WebDriver check"
  • Pass "Chrome check"
  • Pass "Notifications check"
  • Pass "Plugins check"
  • Pass "Languages check"

Link to full code is here

    const randomUseragent = require('random-useragent');

    //Enable stealth mode
    const puppeteer = require('puppeteer-extra')
    const StealthPlugin = require('puppeteer-extra-plugin-stealth')
    puppeteer.use(StealthPlugin())
    
    const USER_AGENT = 'Mozilla/5.0 (Macintosh; Intel Mac OS X 10_14_1) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/73.0.3683.75 Safari/537.36';
    
    async function createPage (browser,url) {

        //Randomize User agent or Set a valid one
        const userAgent = randomUseragent.getRandom();
        const UA = userAgent || USER_AGENT;
        const page = await browser.newPage();

        //Randomize viewport size
        await page.setViewport({
            width: 1920 + Math.floor(Math.random() * 100),
            height: 3000 + Math.floor(Math.random() * 100),
            deviceScaleFactor: 1,
            hasTouch: false,
            isLandscape: false,
            isMobile: false,
        });

        await page.setUserAgent(UA);
        await page.setJavaScriptEnabled(true);
        await page.setDefaultNavigationTimeout(0);

        //Skip images/styles/fonts loading for performance
        await page.setRequestInterception(true);
        page.on('request', (req) => {
            if(req.resourceType() == 'stylesheet' || req.resourceType() == 'font' || req.resourceType() == 'image'){
                req.abort();
            } else {
                req.continue();
            }
        });

        await page.evaluateOnNewDocument(() => {
            // Pass webdriver check
            Object.defineProperty(navigator, 'webdriver', {
                get: () => false,
            });
        });

        await page.evaluateOnNewDocument(() => {
            // Pass chrome check
            window.chrome = {
                runtime: {},
                // etc.
            };
        });

        await page.evaluateOnNewDocument(() => {
            //Pass notifications check
            const originalQuery = window.navigator.permissions.query;
            return window.navigator.permissions.query = (parameters) => (
                parameters.name === 'notifications' ?
                    Promise.resolve({ state: Notification.permission }) :
                    originalQuery(parameters)
            );
        });

        await page.evaluateOnNewDocument(() => {
            // Overwrite the `plugins` property to use a custom getter.
            Object.defineProperty(navigator, 'plugins', {
                // This just needs to have `length > 0` for the current test,
                // but we could mock the plugins too if necessary.
                get: () => [1, 2, 3, 4, 5],
            });
        });

        await page.evaluateOnNewDocument(() => {
            // Overwrite the `languages` property to use a custom getter.
            Object.defineProperty(navigator, 'languages', {
                get: () => ['en-US', 'en'],
            });
        });

        await page.goto(url, { waitUntil: 'networkidle2',timeout: 0 } );
        return page;
    }
Tim Kozak
  • 3,538
  • 36
  • 39
9

Have you tried setting the browser agent?

await page.setUserAgent('5.0 (Windows NT 10.0; Win64; x64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/69.0.3497.100 Safari/537.36');
Hellonearthis
  • 1,596
  • 1
  • 13
  • 26
  • No. Will give it a try. But what happens by having the same UserAgent? Isn't the `UserAgent` to be `random`? Could you brief it? – rinold simon Apr 15 '19 at 09:06
  • Having the agent as default says you are using puppeteer, so setting it to chrome (like above) gets you past the basic test. But you will still end up with the captcha at some time. If you login it might also help keep ya scraper working for a bit. – Hellonearthis Apr 16 '19 at 10:16
  • 6
    Even after setting specific useragent we end up with the captcha after some logins. so I tried generating random useragent each time using npm package (https://www.npmjs.com/package/random-useragent) . Now it working fine. – rinold simon Apr 16 '19 at 11:53
  • 1
    This answer is great, but the user agent is quite old (69.0.3497 is a few months old by the time I'm writing this answer), the latest one by now is this: ``` Mozilla/5.0 (Windows NT 10.0; Win64; x64) AppleWebKit/537.36 (KHTML, like Gecko) Brave Chrome/84.0.4147.89 Safari/537.36 ``` [https://developers.whatismybrowser.com/useragents/explore/software_name/brave/](link) if you want other versions – Dr4kk0nnys May 26 '21 at 15:17
3

After a few tests, a couple of packages helped me avoid recaptcha:

//const puppeteer = require('puppeteer');
const puppeteerExtra = require('puppeteer-extra');
const pluginStealth = require('puppeteer-extra-plugin-stealth');
const randomUseragent = require('random-useragent');

class PuppeteerService {

    constructor() {
        this.browser = null;
        this.page = null;
        this.pageOptions = null;
        this.waitForFunction = null;
        this.isLinkCrawlTest = null;
    }

    async initiate(countsLimitsData, isLinkCrawlTest) {
        this.pageOptions = {
            waitUntil: 'networkidle2',
            timeout: countsLimitsData.millisecondsTimeoutSourceRequestCount
        };
        this.waitForFunction = 'document.querySelector("body")';
        puppeteerExtra.use(pluginStealth());
        //const browser = await puppeteerExtra.launch({ headless: false });
        this.browser = await puppeteerExtra.launch({ headless: false });
        this.page = await this.browser.newPage();
        await this.page.setRequestInterception(true);
        this.page.on('request', (request) => {
            if (['image', 'stylesheet', 'font', 'script'].indexOf(request.resourceType()) !== -1) {
                request.abort();
            } else {
                request.continue();
            }
        });
        this.isLinkCrawlTest = isLinkCrawlTest;
    }

    async crawl(link) {
        const userAgent = randomUseragent.getRandom();
        const crawlResults = { isValidPage: true, pageSource: null };
        try {
            await this.page.setUserAgent(userAgent);
            await this.page.goto(link, this.pageOptions);
            await this.page.waitForFunction(this.waitForFunction);
            crawlResults.pageSource = await this.page.content();
        }
        catch (error) {
            crawlResults.isValidPage = false;
        }
        if (this.isLinkCrawlTest) {
            this.close();
        }
        return crawlResults;
    }

    close() {
        if (!this.browser) {
            this.browser.close();
        }
    }
}

const puppeteerService = new PuppeteerService();
module.exports = puppeteerService;
Or Assayag
  • 4,346
  • 10
  • 43
  • 76