-1

everyone I am trying to pass the value which I have taken in the variable getWebsite to the function await page.evaluate(() => {. I am using nodejs library puppeteer and this code is being used for scraping

async function getData(page) { // get data from url
    //regex to check if the last expression is .com or not
    const regexWebsite = new RegExp('(.*\.)(com,net,org,edu)$');

    for (let i = 2; i < 6; i++) {
        var getWebsite = await page.$eval('#pane > div > div > div > div > div:nth-child(7) > div:nth-child(' + i + ') > button > div > div > div.fontBodyMedium'[0], el => el.textContent);
        if (getWebsite == regexWebsite) {
            break;
        }
        else {
            continue;
        }
    }

    console.log(getWebsite);
    const results = await page.evaluate(() => {
        return ({
            scraper_job_id: new Date().getFullYear(),
            company: document.querySelectorAll('#pane > div > div > div > div > div > div > div > div > h1 > span:nth-child(1)')[0].textContent,
            address: document.querySelectorAll('#pane > div > div > div > div > div:nth-child(7) > div:nth-child(1) > button > div > div > div.fontBodyMedium')[0].textContent,
            website: getWebsite,
            created_at: new Date().getFullYear() + '-' + new Date().getMonth() + '-' + new Date().getDate() + ' ' + new Date().getHours() + ':' + new Date().getMinutes() + ':' + new Date().getSeconds(),
            updated_at: new Date().getFullYear() + '-' + new Date().getMonth() + '-' + new Date().getDate() + ' ' + new Date().getHours() + ':' + new Date().getMinutes() + ':' + new Date().getSeconds()
        })
    });

    // insert into db
    con.query('INSERT INTO google_businesses SET ?', results, function (err) {
        if (err) throw err;
        console.log(results.company, 'inserted');
    });

    return results;
}
HumzaXSN
  • 9
  • 3
  • 1
    `results.website = getWebsite;` but it should already contain the value. – jabaa May 09 '22 at 13:58
  • 1
    `website: getWebsite,` won't work because `getWebsite` was never passed into the `evaluate` function. But it seems pointless to do so since you don't appear to be using it in the browser context, so you might as well wait until the rest of the object comes back and then attach it. But there's no site here, so if anything else is wrong, for example, with a selector, I can't help. Please share a [mcve] with a clear goal and problem description. Thanks. `') > button > div > div > div.fontBodyMedium'[0]` also looks like a mistake -- that only takes the `")"`. – ggorlen May 09 '22 at 14:01
  • @ggorlen I have updated the code. Please tell me if it is clear or not. – HumzaXSN May 09 '22 at 14:13
  • 1
    Thanks, but it's still incomplete. I have no website to reference. All of my remarks above still stand. – ggorlen May 09 '22 at 14:15
  • @ggorlen Why do you have to pass `getWebsite`? It's in the scope of the anonymous function. – jabaa May 09 '22 at 14:16
  • 1
    It's not in scope -- this is a common gotcha. Puppeteer serializes the `evaluate` callback and runs it in the browser console, which is a completely different process. Any variables that you want `evaluate` to see need to be passed in: `page.evaluate(getWebsite => /* now use getWebsite in the browser */, getWebsite)`. See [How can I pass variable into an evaluate function?](https://stackoverflow.com/questions/46088351/how-can-i-pass-variable-into-an-evaluate-function) – ggorlen May 09 '22 at 14:17
  • @jabaa I am passing it so that I can insert the value in Database. – HumzaXSN May 09 '22 at 14:20
  • Passing a variable to the browser console doesn't help you get something in the database. As mentioned above, you can simply add it to the result inside the Node process. Or, if you really want to send it to the browser, add it as a parameter to `evaluate` as shown above. – ggorlen May 09 '22 at 14:22

0 Answers0