9

I'm trying to do some web scraping with Puppeteer and I need to retrieve the value into a Website I'm building.

I have tried to load the Puppeteer file in the html file as if it was a JavaScript file but I keep getting an error. However, if I run it in a cmd window it works well.

Scraper.js:
getPrice();
function getPrice() {
    const puppeteer = require('puppeteer');
    void (async () => {
        try {
            const browser = await puppeteer.launch()
            const page = await browser.newPage()              
            await page.goto('http://example.com') 
            await page.setViewport({ width: 1920, height: 938 })        
            await page.waitForSelector('.m-hotel-info > .l-container > .l-header-section > .l-m-col-2 > .m-button')
            await page.click('.m-hotel-info > .l-container > .l-header-section > .l-m-col-2 > .m-button')
            await page.waitForSelector('.modal-content')
            await page.click('.tile-hsearch-hws > .m-search-tabs > #edit-search-panel > .l-em-reset > .m-field-wrap > .l-xs-col-4 > .analytics-click')
            await page.waitForNavigation();
            await page.waitForSelector('.tile-search-filter > .l-display-none')
            const innerText = await page.evaluate(() => document.querySelector('.tile-search-filter > .l-display-none').innerText);
            console.log(innerText)
        } catch (error) {
            console.log(error)
        }

    })()
}
index.html:
<html>
  <head></head>
  <body>
    <script src="../js/scraper.js" type="text/javascript"></script>
  </body>
</html>

The expected result should be this one in the console of Chrome:

But I'm getting this error instead:

What am I doing wrong?

Let Me Tink About It
  • 13,762
  • 16
  • 86
  • 184
  • 2
    puppeteer is a headless browser, you can't load it inside a web browser. For other packages that can be run in the browser take a look at: https://stackoverflow.com/questions/19059580/client-on-node-uncaught-referenceerror-require-is-not-defined – Roland Starke Feb 12 '19 at 10:18
  • You can translate Puppeteer commands to respective browser API but some likely won't work. `waitForNavigation` - how do you expect it to work? You navigate to another page. There will be another page that will be unaware of this script. The reason why Node packages like Puppeteer exist is that some things cannot be achieved with a browser alone. – Estus Flask Feb 12 '19 at 10:37
  • Thanks to you both! I have tried Browserify and I keep getting this error: https://i.imgur.com/LfWOlyv.png I guess puppeteer doesn't work with Browserify? I use waitForNavigation because I need to get a value that appears after clicking a button, so I have to use it or otherwise I wouldn't be able to get that value because it doesn't wait until the site is completely loaded by itself. All I need is to send the value of the constant innerText to JavaScript, so then I will be able to use it in the website I'm building with HTML, CSS and JS but I have no idea about how to achieve that. –  Feb 12 '19 at 12:51

1 Answers1

18

EDIT: Since puppeteer removed support for puppeteer-web, I moved it out of the repo and tried to patch it a bit.

It does work with browser. The package is called puppeteer-web, specifically made for such cases.

But the main point is, there must be some instance of chrome running on some server. Only then you can connect to it.

You can use it later on in your web page to drive another browser instance through its WS Endpoint:

<script src="https://unpkg.com/puppeteer-web">
</script>

<script>
  const browser = await puppeteer.connect({
    browserWSEndpoint: `ws://0.0.0.0:8080`, // <-- connect to a server running somewhere
    ignoreHTTPSErrors: true
  });

  const pagesCount = (await browser.pages()).length;
  const browserWSEndpoint = await browser.wsEndpoint();
  console.log({ browserWSEndpoint, pagesCount });
</script>

I had some fun with puppeteer and webpack,

See these answers for full understanding of creating the server and more,

Md. Abu Taher
  • 15,529
  • 5
  • 45
  • 64
  • Well, that's kinda hard in my opinion... I'm trying out another alternative that is called Goutte, it seems to fit better for my needs and it's easier when it comes to show the results in a web browser since it uses PHP and it doesn't require an external command. Anyway, I will save your answer in case I need it in the future. Thank you so much!! –  Feb 13 '19 at 09:36
  • You are welcome. However, since your question was regarding puppeteer, and not about scraping in general, I am sure the answer fits your question perfectly :D . – Md. Abu Taher Feb 13 '19 at 09:40
  • Official links 404'd – TheMaster May 22 '20 at 14:36
  • 2
    Here's the old [link](https://github.com/puppeteer/puppeteer/tree/6522e4f524bdbc1f1b9d040772acf862517ed507/utils/browser) from a past commit. It looks like they removed it in this [pull request](https://github.com/puppeteer/puppeteer/pull/5750) – tchan May 24 '20 at 11:33
  • Looks like I need to find an alternative for it soon and update the answers. :D Gimme a bit of time, I'll check after weekends. – Md. Abu Taher May 24 '20 at 15:15
  • This is very interesting. Thanks for sharing. Is it possible to extract the part of puppeteer that creates a PNG/JPG/PDF screenshot of HTML and create a separate library that could run in the browser to allow for browser-based screenshots? – Crashalot Mar 12 '21 at 06:33