19

I'm running puppeteer on express/node/ubuntu as follow:

var puppeteer = require('puppeteer');
var express = require('express');
var router = express.Router();

/* GET home page. */
router.get('/', function(req, res, next) {
    (async () => {
        headless = true;
        const browser = await puppeteer.launch({headless: true, args:['--no-sandbox']});
        const page = await browser.newPage();
        url = req.query.url;
        await page.goto(url);
        let bodyHTML = await page.evaluate(() => document.body.innerHTML);
        res.send(bodyHTML)
        await browser.close();
    })();
});

running this script multiple times leaves hundred of Zombies:

$ pgrep chrome | wc -l
133

Which clogs the srv,

How do I fix this?

Running kill from a Express JS script could solve it?

Is there a better way to get the same result other than puppeteer and headless chrome?

Flame
  • 5,185
  • 3
  • 29
  • 45
Elia Weiss
  • 6,956
  • 12
  • 59
  • 95

8 Answers8

29

Ahhh! This is a simple oversight. What if an error occurs and your await browser.close() never executes thus leaving you with zombies.

Using shell.js seems to be a hacky way of solving this issue.

The better practice is to use try..catch..finally. The reason being you would want the browser to be closed irrespective of a happy flow or an error being thrown. And unlike the other code snippet, you don't have to try and close the browser in the both the catch block and finally block. finally block is always executed irrespective of whether an error is thrown or not.

So, your code should look like,

const puppeteer = require('puppeteer');
const express = require('express');

const router = express.Router();

/* GET home page. */
router.get('/', function(req, res, next) {
  (async () => {
    const browser = await puppeteer.launch({
      headless: true,
      args: ['--no-sandbox'],
    });

    try {
      const page = await browser.newPage();
      url = req.query.url;
      await page.goto(url);
      const bodyHTML = await page.evaluate(() => document.body.innerHTML);
      res.send(bodyHTML);
    } catch (e) {
      console.log(e);
    } finally {
      await browser.close();
    }
  })();
});

Hope this helps!

andromeda
  • 3,626
  • 4
  • 26
  • 38
Ramaraja Ramanujan
  • 2,296
  • 14
  • 19
  • 1
    Can you explain whats happening here? Your code sample looks the same to me as the original one. It's only in a try-catch-finally block, but the order of execution is the same right? Also, what do you mean with `"execution time stops"`? Execution doesn't stop unless you use `return`. You can do any amount of work after calling `res.send()`. – Flame Jun 18 '20 at 22:33
  • 3
    `browser` needs to be declared in an outer scope so it's in scope for the `finally` block, otherwise you'll get `ReferenceError: browser is not defined`. This code also makes various globals, `headless` and `url`. Always use `const` or `let` to scope variables correctly. – ggorlen Jun 09 '21 at 18:30
21

wrap your code in try-catch like this and see if it helps

headless = true;
const browser = await puppeteer.launch({headless: true, args:['--no-sandbox']});
try {
  const page = await browser.newPage();
  url = req.query.url;
  await page.goto(url);
  let bodyHTML = await page.evaluate(() => document.body.innerHTML);
  res.send(bodyHTML);
  await browser.close();
} catch (error) {
  console.log(error);
} finally {
  await browser.close();
}
Mukesh
  • 1,229
  • 10
  • 27
  • 2
    Good point, 10x. how ever, I think the `try` should be after `const browser =...` for it to be used in the `catch/finally`, isn't it? – Elia Weiss Dec 28 '18 at 09:38
  • UPDATE: I tried both solution, I have significantly less zombie, but still I have about 20 left every day – Elia Weiss Dec 30 '18 at 09:40
  • 1
    `await browser.close();` in the `try` and `catch` blocks is redundant. `finally` will run after both blocks regardless of whether an error is thrown. Use `const url` and `const headless` to avoid unnecessary, error-prone globals. – ggorlen Jun 09 '21 at 19:58
  • Yes, it was redundant – Mukesh Jun 21 '21 at 08:17
8

I solve it with https://www.npmjs.com/package/shelljs

var shell = require('shelljs');
shell.exec('pkill chrome')
Elia Weiss
  • 6,956
  • 12
  • 59
  • 95
  • 1
    this works for me, somehow browser.close doesn't work but this command works. – Pencilcheck Jun 19 '19 at 21:31
  • 1
    will it kill all chrome instances running in system or it will only the chrome instances puppeteer script started – Austin Aug 26 '19 at 05:12
  • 1
    it suppose to kill all process – Elia Weiss Aug 27 '19 at 09:11
  • 2
    can't believe this is the accepted answer. Not for the answer itself, but for the fact that we are almost forced to do this to simply close puppeteer processes – cub33 Jan 22 '21 at 13:43
  • 1
    This answer is terrible, because it will kill all chrome instances. In the original code, it was clear that puppeteer was running in an API, meaning that if there are more than 1 api request at the same time, the other requests will fail. – Joel'-' Sep 16 '21 at 08:22
7

From my experience, the browser closing process may take some time after close is called. Anyway, you can check the browser process property to check if it's still not closed and force kill it.

if (browser && browser.process() != null) browser.process().kill('SIGINT');

I'm also posting the full code of my puppeteer resources manager below. Take a look at bw.on('disconnected', async () => {

const puppeteer = require('puppeteer-extra')
const randomUseragent = require('random-useragent');
const StealthPlugin = require('puppeteer-extra-plugin-stealth')

const USER_AGENT = 'Mozilla/5.0 (Macintosh; Intel Mac OS X 10_14_1) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/73.0.3683.75 Safari/537.36';
puppeteer.use(StealthPlugin())

function ResourceManager(loadImages) {
    let browser = null;
    const _this = this;
    let retries = 0;
    let isReleased = false;

    this.init = async () => {
        isReleased = false;
        retries = 0;
        browser = await runBrowser();
    };

    this.release = async () => {
        isReleased = true;
        if (browser) await browser.close();
    }

    this.createPage = async (url) => {
        if (!browser) browser = await runBrowser();
        return await createPage(browser,url);
    }

    async function runBrowser () {
        const bw = await puppeteer.launch({
            headless: true,
            devtools: false,
            ignoreHTTPSErrors: true,
            slowMo: 0,
            args: ['--disable-gpu','--no-sandbox','--no-zygote','--disable-setuid-sandbox','--disable-accelerated-2d-canvas','--disable-dev-shm-usage', "--proxy-server='direct://'", "--proxy-bypass-list=*"]
        });

        bw.on('disconnected', async () => {
            if (isReleased) return;
            console.log("BROWSER CRASH");
            if (retries <= 3) {
                retries += 1;
                if (browser && browser.process() != null) browser.process().kill('SIGINT');
                await _this.init();
            } else {
                throw "===================== BROWSER crashed more than 3 times";
            }
        });

        return bw;
    }

    async function createPage (browser,url) {
        const userAgent = randomUseragent.getRandom();
        const UA = userAgent || USER_AGENT;
        const page = await browser.newPage();
        await page.setViewport({
            width: 1920 + Math.floor(Math.random() * 100),
            height: 3000 + Math.floor(Math.random() * 100),
            deviceScaleFactor: 1,
            hasTouch: false,
            isLandscape: false,
            isMobile: false,
        });
        await page.setUserAgent(UA);
        await page.setJavaScriptEnabled(true);
        await page.setDefaultNavigationTimeout(0);
        if (!loadImages) {
            await page.setRequestInterception(true);
            page.on('request', (req) => {
                if(req.resourceType() == 'stylesheet' || req.resourceType() == 'font' || req.resourceType() == 'image'){
                    req.abort();
                } else {
                    req.continue();
                }
            });
        }

        await page.evaluateOnNewDocument(() => {
            //pass webdriver check
            Object.defineProperty(navigator, 'webdriver', {
                get: () => false,
            });
        });

        await page.evaluateOnNewDocument(() => {
            //pass chrome check
            window.chrome = {
                runtime: {},
                // etc.
            };
        });

        await page.evaluateOnNewDocument(() => {
            //pass plugins check
            const originalQuery = window.navigator.permissions.query;
            return window.navigator.permissions.query = (parameters) => (
                parameters.name === 'notifications' ?
                    Promise.resolve({ state: Notification.permission }) :
                    originalQuery(parameters)
            );
        });

        await page.evaluateOnNewDocument(() => {
            // Overwrite the `plugins` property to use a custom getter.
            Object.defineProperty(navigator, 'plugins', {
                // This just needs to have `length > 0` for the current test,
                // but we could mock the plugins too if necessary.
                get: () => [1, 2, 3, 4, 5],
            });
        });

        await page.evaluateOnNewDocument(() => {
            // Overwrite the `plugins` property to use a custom getter.
            Object.defineProperty(navigator, 'languages', {
                get: () => ['en-US', 'en'],
            });
        });

        await page.goto(url, { waitUntil: 'networkidle2',timeout: 0 } );
        return page;
    }
}

module.exports = {ResourceManager}
ggorlen
  • 33,459
  • 6
  • 59
  • 67
Tim Kozak
  • 3,538
  • 36
  • 39
0

try to close the browser before sending the response

var puppeteer = require('puppeteer');
var express = require('express');
var router = express.Router();

router.get('/', function(req, res, next) {
    (async () => {
        headless = true;
        const browser = await puppeteer.launch({headless: true});
        const page = await browser.newPage();
        url = req.query.url;
        await page.goto(url);
        let bodyHTML = await page.evaluate(() => document.body.innerHTML);
        await browser.close();
        res.send(bodyHTML);
    })();
});
0

I ran into the same issue and while your shelljs solution did work, it kills all chrome processes, which might interrupt one that is still processing a request. Here is a better solution that should work.

var puppeteer = require('puppeteer');
var express = require('express');
var router = express.Router();

router.get('/', function (req, res, next) {
    (async () => {
        await puppeteer.launch({ headless: true }).then(async browser => {
            const page = await browser.newPage();
            url = req.query.url;
            await page.goto(url);
            let bodyHTML = await page.evaluate(() => document.body.innerHTML);
            await browser.close();
            res.send(bodyHTML);
        });
    })();
});
voidmind
  • 117
  • 2
  • 5
  • I adapted his code to one of the sample code found on the [Puppeteer web site](https://pptr.dev/#?product=Puppeteer&version=v1.19.0&show=api-class-browser). To be honest, I'm kind of new to async/await and Promises so I don't fully understand why executing the browser object code in a then() is any different than doing a series of statements preceded by await. All I know is that my code is very similar and changing it to that fixed the issue. – voidmind Aug 17 '19 at 06:25
0

I use the following basic setup for running Puppeteer:

const puppeteer = require("puppeteer");

let browser;
(async () => {
  browser = await puppeteer.launch();
  const [page] = await browser.pages();

  /* use the page */
  
})()
  .catch(err => console.error(err))
  .finally(() => browser?.close())
;

Here, the finally block guarantees the browser will close correctly regardless of whether an error was thrown. Errors are logged (if desired). I like .catch and .finally as chained calls because the mainline Puppeteer code is one level flatter, but this accomplishes the same thing:

const puppeteer = require("puppeteer");

(async () => {
  let browser;

  try {
    browser = await puppeteer.launch();
    const [page] = await browser.pages();

    /* use the page */
  }
  catch (err) {
    console.error(err);
  }
  finally {
    await browser?.close();
  }
})();

There's no reason to call newPage because Puppeteer starts with a page open.


As for Express, you need only place the entire code above, including let browser; and excluding require("puppeteer"), into your route, and you're good to go, although you might want to use an async middleware error handler.

You ask:

Is there a better way to get the same result other than puppeteer and headless chrome?

That depends on what you're doing and what you mean by "better". If your goal is to get document.body.innerHTML and the page content you're interested in is baked into the static HTML, you can dump Puppeteer entirely and just make a request to get the resource, then use Cheerio to extract the desired information.

Another consideration is that you may not need to load and close a whole browser per request. If you can use one new page per request, consider the following strategy:

const express = require("express");
const puppeteer = require("puppeteer");

const asyncHandler = fn => (req, res, next) =>
  Promise.resolve(fn(req, res, next)).catch(next)
;
const browserReady = puppeteer.launch({
  args: ["--no-sandbox", "--disable-setuid-sandbox"]
});

const app = express();
app
  .set("port", process.env.PORT || 5000)
  .get("/", asyncHandler(async (req, res) => {
    const browser = await browserReady;
    const page = await browser.newPage();

    try {
      await page.goto(req.query.url || "http://www.example.com");
      return res.send(await page.content());
    }
    catch (err) {
      return res.status(400).send(err.message);
    }
    finally {
      await page.close();
    }
  }))
  .use((err, req, res, next) => res.sendStatus(500))
  .listen(app.get("port"), () =>
    console.log("listening on port", app.get("port"))
  )
;

This is still heavy work, and although Puppeteer runs Chromium as a subprocess, you might want to consider unloading this job onto a task queue such as Bull and running it in the background.

See also:

ggorlen
  • 33,459
  • 6
  • 59
  • 67
0

use

 (await browser).close()

that happens because what the browser contains is a promise you have to solve it, I suffered a lot for this I hope it helps

  • 1
    Your answer could be improved with additional supporting information. Please [edit] to add further details, such as citations or documentation, so that others can confirm that your answer is correct. You can find more information on how to write good answers [in the help center](/help/how-to-answer). – Community Feb 01 '22 at 01:01
  • 1
    This is wrong. `browser` is an object of type `Browser`, as you can see in the documentation: https://puppeteer.github.io/puppeteer/docs/puppeteer.browser – lezhumain Apr 20 '22 at 11:17
  • I don't know how, But this actually worked for me. I was using a puppeteer instance outside of my router function. Then call it inside the router function. I was getting error's like browser.newPage() is not a function. – Md. Hasan Mahmud May 02 '22 at 06:46
  • Then your "browser" object probably was a `Promise` in your case and thus didn't have the "close" method, but that's not how it usually is. – lezhumain May 17 '22 at 22:20