91

I am working on creating PDF from web page.

The application on which I am working is single page application.

I tried many options and suggestion on https://github.com/GoogleChrome/puppeteer/issues/1412

But it is not working

    const browser = await puppeteer.launch({
    executablePath: 'C:\\Program Files (x86)\\Google\\Chrome\\Application\\chrome.exe',
    ignoreHTTPSErrors: true,
    headless: true,
    devtools: false,
    args: ['--no-sandbox', '--disable-setuid-sandbox']
});

const page = await browser.newPage();

await page.goto(fullUrl, {
    waitUntil: 'networkidle2'
});

await page.type('#username', 'scott');
await page.type('#password', 'tiger');

await page.click('#Login_Button');
await page.waitFor(2000);

await page.pdf({
    path: outputFileName,
    displayHeaderFooter: true,
    headerTemplate: '',
    footerTemplate: '',
    printBackground: true,
    format: 'A4'
});

What I want is to generate PDF report as soon as Page is loaded completely.

I don't want to write any type of delays i.e. await page.waitFor(2000);

I can not do waitForSelector because the page has charts and graphs which are rendered after calculations.

Help will be appreciated.

i.brod
  • 3,243
  • 6
  • 25
  • 57
n.sharvarish
  • 1,163
  • 1
  • 11
  • 13

10 Answers10

105

You can use page.waitForNavigation() to wait for the new page to load completely before generating a PDF:

await page.goto(fullUrl, {
  waitUntil: 'networkidle0',
});

await page.type('#username', 'scott');
await page.type('#password', 'tiger');

await page.click('#Login_Button');

await page.waitForNavigation({
  waitUntil: 'networkidle0',
});

await page.pdf({
  path: outputFileName,
  displayHeaderFooter: true,
  headerTemplate: '',
  footerTemplate: '',
  printBackground: true,
  format: 'A4',
});

If there is a certain element that is generated dynamically that you would like included in your PDF, consider using page.waitForSelector() to ensure that the content is visible:

await page.waitForSelector('#example', {
  visible: true,
});
Grant Miller
  • 24,187
  • 16
  • 134
  • 150
  • 3
    Where is the documentation for the signal 'networkidle0'? – Chilly Code Aug 27 '19 at 17:31
  • 6
    'networkidle0' is documented here https://github.com/GoogleChrome/puppeteer/blob/master/docs/api.md#pagegotourl-options – sapeish Oct 04 '19 at 20:31
  • 1
    Should `page.waitForSelector` be called after `page.goto` or before? Could you answer a similar question I asked https://stackoverflow.com/questions/58909236/pupeteer-script-does-not-wait-for-the-selector-to-get-loaded-and-i-get-a-blank-h ? – Amanda Nov 18 '19 at 07:01
  • 3
    Why would I use networkidle0 when I could use the default load event? Is it faster to use networkidle0? – Gary Feb 06 '21 at 09:29
  • If you're clicking something that triggers navigation, there's a race condition if [`Promise.all isn't used`](https://github.com/puppeteer/puppeteer/blob/main/docs/api.md#pageclickselector-options), e.g. `Promise.all([page.click(...), page.waitForNavigation(...)])` – ggorlen Mar 16 '22 at 03:23
  • @Gary See [this comment](https://github.com/puppeteer/puppeteer/issues/1666#issuecomment-354224942) by a (former) core Puppeteer developer. – ggorlen Mar 16 '22 at 03:59
80

Sometimes the networkidle events do not always give an indication that the page has completely loaded. There could still be a few JS scripts modifying the content on the page. So watching for the completion of HTML source code modifications by the browser seems to be yielding better results. Here's a function you could use -

const waitTillHTMLRendered = async (page, timeout = 30000) => {
  const checkDurationMsecs = 1000;
  const maxChecks = timeout / checkDurationMsecs;
  let lastHTMLSize = 0;
  let checkCounts = 1;
  let countStableSizeIterations = 0;
  const minStableSizeIterations = 3;

  while(checkCounts++ <= maxChecks){
    let html = await page.content();
    let currentHTMLSize = html.length; 

    let bodyHTMLSize = await page.evaluate(() => document.body.innerHTML.length);

    console.log('last: ', lastHTMLSize, ' <> curr: ', currentHTMLSize, " body html size: ", bodyHTMLSize);

    if(lastHTMLSize != 0 && currentHTMLSize == lastHTMLSize) 
      countStableSizeIterations++;
    else 
      countStableSizeIterations = 0; //reset the counter

    if(countStableSizeIterations >= minStableSizeIterations) {
      console.log("Page rendered fully..");
      break;
    }

    lastHTMLSize = currentHTMLSize;
    await page.waitForTimeout(checkDurationMsecs);
  }  
};

You could use this after the page load / click function call and before you process the page content. e.g.

await page.goto(url, {'timeout': 10000, 'waitUntil':'load'});
await waitTillHTMLRendered(page)
const data = await page.content()
Arel
  • 3,672
  • 6
  • 36
  • 86
Anand Mahajan
  • 919
  • 6
  • 6
  • 10
    I'm not sure why this answer hasn't gotten more "love". In reality, a lot of the time we really just need to make sure JavaScript is done messing with the page before we scrape it. Network events don't accomplish this, and if you have dynamically generated content, there isn't always something you can reliably do a "waitForSelector/visible:true" on – Jason May 14 '20 at 14:35
  • Thanks @roberto - btw I just updated the answer, you could use this with the 'load' event rather than 'networkidle2' . Thought it would be little more optimal with that. I have tested this in production and can confirm it works well too! – Anand Mahajan Sep 20 '20 at 16:08
  • Great solution and should be part of puppeteer library, however please not waitFor is deprecated an will be removed in a future release: https://github.com/puppeteer/puppeteer/issues/6214 – Michael Paccione Apr 02 '21 at 23:48
  • I tried to put the `checkDurationMsecs` to 200ms, and the bodyHTMLSize keep changing, and give huge numbers, I am using electron and rect also, very strange. – Ambroise Rabier Apr 29 '21 at 13:15
  • Ok I found that ridiculous hard to catch bug. If your luck manage to catch that 100k long html page, you realize there are CSS classes like `CodeMirror`, must be https://codemirror.net/ , meaning.... `document.body.innerHTML` is catching the dev console too ! Just remove `mainWindow.webContents.openDevTools();` for e2e testing. I hope don't get any more bad surprise. – Ambroise Rabier Apr 29 '21 at 13:41
  • Solved a headache for me on a high latency connection .. Well Done – Mark Cupitt Jun 08 '21 at 02:41
44

In some cases, the best solution for me was:

await page.goto(url, { waitUntil: 'domcontentloaded' });

Some other options you could try are:

await page.goto(url, { waitUntil: 'load' });
await page.goto(url, { waitUntil: 'domcontentloaded' });
await page.goto(url, { waitUntil: 'networkidle0' });
await page.goto(url, { waitUntil: 'networkidle2' });

You can check this at puppeteer documentation: https://pptr.dev/#?product=Puppeteer&version=v11.0.0&show=api-pagewaitfornavigationoptions

Eduardo Conte
  • 1,015
  • 8
  • 18
30

I always like to wait for selectors, as many of them are a great indicator that the page has fully loaded:

await page.waitForSelector('#blue-button');
Nicolás A.
  • 433
  • 4
  • 10
  • You are a genius, this is such an obvious solution, especially when you are waiting for specific elements, and as soon as I did not guess myself, thank you! – Arch4Arts Feb 28 '21 at 12:54
  • @Arch4Arts you should create your own clicking function that does the waiting for you as well as clicking – Nicolás A. Apr 02 '21 at 20:01
10

Wrap the page.click and page.waitForNavigation in a Promise.all

  await Promise.all([
    page.click('#submit_button'),
    page.waitForNavigation({ waitUntil: 'networkidle0' })
  ]);
Mark Swardstrom
  • 16,441
  • 5
  • 60
  • 66
  • 1
    `page.waitForNavigation({ waitUntil: 'networkidle0' })` is this same as `page .waitForNetworkIdle()`? – milos Oct 20 '21 at 11:50
10

In the latest Puppeteer version, networkidle2 worked for me:

await page.goto(url, { waitUntil: 'networkidle2' });
attacomsian
  • 2,210
  • 21
  • 22
6

I encountered the same issue with networkidle when I was working on an offscreen renderer. I needed a WebGL-based engine to finish rendering and only then make a screenshot. What worked for me was a page.waitForFunction() method. In my case the usage was as follows:

await page.goto(url);
await page.waitForFunction("renderingCompleted === true")
const imageBuffer = await page.screenshot({});

In the rendering code, I was simply setting the renderingCompleted variable to true, when done. If you don't have access to the page code you can use some other existing identifier.

Dharman
  • 26,923
  • 21
  • 73
  • 125
Tali Oat
  • 121
  • 1
  • 5
5

You can also use to ensure all elements have rendered

await page.waitFor('*')

Reference: https://github.com/puppeteer/puppeteer/issues/1875

Phat Tran
  • 2,652
  • 17
  • 20
  • 2
    `waitFor` is deprecated and will be removed in a future release. See https://github.com/puppeteer/puppeteer/issues/6214 for details and how to migrate your code. – kenberkeley Dec 16 '20 at 23:59
3

As for December 2020, waitFor function is deprecated, as the warning inside the code tell:

waitFor is deprecated and will be removed in a future release. See https://github.com/puppeteer/puppeteer/issues/6214 for details and how to migrate your code.

You can use:

sleep(millisecondsCount) {
    if (!millisecondsCount) {
        return;
    }
    return new Promise(resolve => setTimeout(resolve, millisecondsCount)).catch();
}

And use it:

(async () => {
    await sleep(1000);
})();
Or Assayag
  • 4,346
  • 10
  • 43
  • 76
  • 5
    just use page.waitForTimeout(1000) – Viacheslav Dobromyslov Dec 10 '20 at 15:41
  • Will check it out. Thanks. – Or Assayag Dec 10 '20 at 15:44
  • 3
    The github issue states that they just deprecated the "magic" waitFor function. You can still use one of the specific waitFor*() functions. Hence your sleep() code is needless. (Not to mention that it’s overcomplicated for what it does, and it’s generally a bad idea to tackle concurrency problems with programmatic timeouts.) – lxg Dec 20 '20 at 13:20
0

I can't leave comments, but I made a python version of Anand's answer for anyone who finds it useful (i.e. if they use pyppeteer).

async def waitTillHTMLRendered(page: Page, timeout: int = 30000): 
    check_duration_m_secs = 1000
    max_checks = timeout / check_duration_m_secs
    last_HTML_size = 0
    check_counts = 1
    count_stable_size_iterations = 0
    min_stabe_size_iterations = 3

    while check_counts <= max_checks:
        check_counts += 1
        html = await page.content()
        currentHTMLSize = len(html); 

        if(last_HTML_size != 0 and currentHTMLSize == last_HTML_size):
            count_stable_size_iterations += 1
        else:
            count_stable_size_iterations = 0 # reset the counter

        if(count_stable_size_iterations >= min_stabe_size_iterations):
            break
    

        last_HTML_size = currentHTMLSize
        await page.waitFor(check_duration_m_secs)