visit
Here's the kicker, all we are going to use is one script and exactly 50 lines of code. How are we going to do this you may ask? The magic of JavaScript. Furthermore, a node library called and a web service called . If you don't know much about these tools I'd highly recommend you check them out further. They offer a lot of neat capabilities! We will go over some of them in this article.
npm i puppeteer
Here you can find your session usage chart, prepaid balance, API key, and all of the other account goodies.
const puppeteer = require("puppeteer");
const fs = require("fs");
const scrape = async () => {
try {
const browser = await puppeteer.launch({headless: false});
// const browser = await puppeteer.connect({
// browserWSEndpoint:
// "wss://chrome.browserless.io?token=[ADD BROWSERLESS API TOKEN HERE]",
// });
const page = await browser.newPage();
await page.goto("//shopping.google.com/");
await page.type(
"input[placeholder='What are you looking for?']",
"board game"
);
await page.click("button[aria-label='Google Search']");
await page.waitForSelector("#pnnext");
// Go to next results page
await page.click("#pnnext");
await page.waitForSelector("#pnnext");
// Gather product title
const title = await page.$$eval("div.sh-dgr__grid-result h4", (nodes) =>
nodes.map((n) => n.innerText)
);
// Gather price
const price = await page.$$eval(
"div.sh-dgr__grid-result a.shntl div span span[aria-hidden='true'] span:nth-child(1)",
(nodes) => nodes.map((n) => n.innerText)
);
// Consolidate product search data
const googleShoppingSearchArray = title.slice(0, 10).map((value, index) => {
return {
title: title[index],
price: price[index],
};
});
const jsonData = JSON.stringify(googleShoppingSearchArray, null, 2);
fs.writeFileSync("googleShoppingSearchResults.json", jsonData);
// await browser.close();
} catch (err) {
console.error(err);
}
};
scrape();
const puppeteer = require("puppeteer");
const fs = require("fs");
The next section is the start of our actual scrape()
function. At the very beginning of this arrow function, we hook up to the Browserless API using the API key you obtained from the Browserless
const browser = await puppeteer.connect({
browserWSEndpoint:
"wss://chrome.browserless.io?token=[ADD BROWSERLESS API TOKEN HERE]",
});
Alternatively, you could replace this section of code with puppeteer.launch
like so:
const browser = await puppeteer.launch({ headless: false });
This is good to do in your dev environment to test out the script. That will save your Browserless resources and help with debugging if there is any issue with your code. One thing I wanted to point out was the {headless: false}
property that is being passed in. This tells Puppeteer to launch Chrome on the screen so you can visually see how the code is interacting with the webpage. Pretty cool to watch, and extra useful when building out scripts!
const page = await browser.newPage();
await page.goto("//shopping.google.com/");
await page.type(
"input[placeholder='What are you looking for?']",
"board game"
);
await page.click("button[aria-label='Google Search']");
await page.waitForSelector("#pnnext");
await page.click("#pnnext");
await page.waitForSelector("#pnnext");
Alrighty, now to gather the data we want from the page! The following piece of code is what finds the product title and the price of the board game. I do want to add a little disclaimer here. When inspecting Google Shopping's search page, you'll notice their compiled code is a little convoluted. If these specific attributes aren't working for you, make sure to inspect the search results page yourself and find the HTML attributes that work for your scenario. This is where using that puppeteer.launch({ headless: false })
property can help.
// Gather product title
const title = await page.$$eval("div.sh-dgr__grid-result h4", (nodes) =>
nodes.map((n) => n.innerText)
);
// Gather price
const price = await page.$$eval(
"div.sh-dgr__grid-result a.shntl div span span[aria-hidden='true'] span:nth-child(1)",
(nodes) => nodes.map((n) => n.innerText)
);
// Consolidate product search data
const googleShoppingSearchArray = title.slice(0, 10).map((value, index) => {
return {
title: title[index],
price: price[index],
};
});
Lastly, we want to display or save our data. If you simply want to see what results get returned, you can console.log(googleShoppingSearchArray);
. But let's say you want to export this data to a JSON file and maybe use that JSON to make a chart of the results. To save your data into a JSON file you'll use the code in our example.
const jsonData = JSON.stringify(googleShoppingSearchArray, null, 2);
fs.writeFileSync("googleShoppingSearchResults.json", jsonData);
I added extra parameters to the stringify
method for formatting purposes. The last param is a space
parameter.
await browser.close();
As you can see, Puppeteer is a pretty cool API. Pair that with Browserless and you have a super scalable, web scraping, automation tool. Did this article spark an idea or thought? Let me know in the comments, or on Twitter at @tylerreicks. Happy coding ❤️!