visit
npm i puppeteer
//www.google.com/maps/search/coffee/@28.6559457,77.1404218,11z
Coffee is our query. After that, we have our latitudes and longitudes. The number before z at the end is nothing but the zooming intensity of Google Maps. You can decrease or increase it as per your choice. Its value ranges from 2.92, in which the map completely zooms out, to 21, in which the map completely zooms in.
Note: Latitudes and longitudes are required to pass in the URL. But the zoom parameter is optional.
const getMapsData = async () => {
browser = await puppeteer.launch({
headless: false,
args: ["--disabled-setuid-sandbox", "--no-sandbox"],
});
const page = await browser.newPage();
await page.setExtraHTTPHeaders({
"User-Agent":
"Mozilla/5.0 (Macintosh; Intel Mac OS X 11_10) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/100.0.4882.194 Safari/537.36",
})
await page.goto("//www.google.com/maps/search/Starbucks/@26.8484046,75.7215344,12z/data=!3m1!4b1" , {
waitUntil: 'domcontentloaded',
timeout: 60000
})
await page.waitForTimeout(3000);
let data = await scrollPage(page,".m6QErb[aria-label]",2)
console.log(data)
await browser.close();
};
Step-by-step explanation:
puppeteer.launch()
- This will launch the Chromium browser with the options we have set in our code. In our case, we are launching our browser in non-headless mode.
browser.newPage()
- This will open a new page or tab in the browser.
page.setExtraHTTPHeaders()
- It is used to pass HTTP headers with every request the page initiates.
page.goto()
- This will navigate the page to the specified target URL.
page.waitForTimeout()
- It will cause the page to wait for 3 seconds to do further operations.
scrollPage()
- At last, we called our infinite scroller to extract the data we need with the page, the tag for the scroller div
, and the number of items we want as parameters.
const scrollPage = async(page, scrollContainer, itemTargetCount) => {
let items = [];
let previousHeight = await page.evaluate(`document.querySelector("${scrollContainer}").scrollHeight`);
while (itemTargetCount > items.length) {
items = await extractItems(page);
await page.evaluate(`document.querySelector("${scrollContainer}").scrollTo(0, document.querySelector("${scrollContainer}").scrollHeight)`);
await page.evaluate(`document.querySelector("${scrollContainer}").scrollHeight > ${previousHeight}`);
await page.waitForTimeout(2000);
}
return items;
}
Step-by-step explanation:
previousHeight
- Scroll the height of the container.extractItems()
- Function to parse the scraped HTML.previousHeight
.
const extractItems = async(page) => {
let maps_data = await page.evaluate(() => {
return Array.from(document.querySelectorAll(".Nv2PK")).map((el) => {
const link = el.querySelector("a.hfpxzc").getAttribute("href");
return {
title: el.querySelector(".qBF1Pd")?.textContent.trim(),
avg_rating: el.querySelector(".MW4etd")?.textContent.trim(),
reviews: el.querySelector(".UY7F9")?.textContent.replace("(", "").replace(")", "").trim(),
address: el.querySelector(".W4Efsd:last-child > .W4Efsd:nth-of-type(1) > span:last-child")?.textContent.replaceAll("·", "").trim(),
description: el.querySelector(".W4Efsd:last-child > .W4Efsd:nth-of-type(2)")?.textContent.replace("·", "").trim(),
website: el.querySelector("a.lcr4fd")?.getAttribute("href"),
category: el.querySelector(".W4Efsd:last-child > .W4Efsd:nth-of-type(1) > span:first-child")?.textContent.replaceAll("·", "").trim(),
timings: el.querySelector(".W4Efsd:last-child > .W4Efsd:nth-of-type(3) > span:first-child")?.textContent.replaceAll("·", "").trim(),
phone_num: el.querySelector(".W4Efsd:last-child > .W4Efsd:nth-of-type(3) > span:last-child")?.textContent.replaceAll("·", "").trim(),
extra_services: el.querySelector(".qty3Ue")?.textContent.replaceAll("·", "").replaceAll(" ", " ").trim(),
latitude: link.split("!8m2!3d")[1].split("!4d")[0],
longitude: link.split("!4d")[1].split("!16s")[0],
link,
dataId: link.split("1s")[1].split("!8m")[0],
};
});
});
return maps_data;
}
Step-by-step explanation:
document.querySelectorAll()
- It will return all the elements that match the specified CSS selector. In our case, it is Nv2PK
.getAttribute()
-This will return the attribute value of the specified element.textContent
- It returns the text content inside the selected HTML element.split()
- Used to split a string into substrings with the help of a specified separator and return them as an array.trim()
- Removes the spaces from the starting and end of the string.replaceAll()
- Replaces the specified pattern from the whole string.
const puppeteer = require('puppeteer');
const extractItems = async(page) => {
let maps_data = await page.evaluate(() => {
return Array.from(document.querySelectorAll(".Nv2PK")).map((el) => {
const link = el.querySelector("a.hfpxzc").getAttribute("href");
return {
title: el.querySelector(".qBF1Pd")?.textContent.trim(),
avg_rating: el.querySelector(".MW4etd")?.textContent.trim(),
reviews: el.querySelector(".UY7F9")?.textContent.replace("(", "").replace(")", "").trim(),
address: el.querySelector(".W4Efsd:last-child > .W4Efsd:nth-of-type(1) > span:last-child")?.textContent.replaceAll("·", "").trim(),
description: el.querySelector(".W4Efsd:last-child > .W4Efsd:nth-of-type(2)")?.textContent.replace("·", "").trim(),
website: el.querySelector("a.lcr4fd")?.getAttribute("href"),
category: el.querySelector(".W4Efsd:last-child > .W4Efsd:nth-of-type(1) > span:first-child")?.textContent.replaceAll("·", "").trim(),
timings: el.querySelector(".W4Efsd:last-child > .W4Efsd:nth-of-type(3) > span:first-child")?.textContent.replaceAll("·", "").trim(),
phone_num: el.querySelector(".W4Efsd:last-child > .W4Efsd:nth-of-type(3) > span:last-child")?.textContent.replaceAll("·", "").trim(),
extra_services: el.querySelector(".qty3Ue")?.textContent.replaceAll("·", "").replaceAll(" ", " ").trim(),
latitude: link.split("!8m2!3d")[1].split("!4d")[0],
longitude: link.split("!4d")[1].split("!16s")[0],
link,
dataId: link.split("1s")[1].split("!8m")[0],
};
});
});
return maps_data;
}
const scrollPage = async(page, scrollContainer, itemTargetCount) => {
let items = [];
let previousHeight = await page.evaluate(`document.querySelector("${scrollContainer}").scrollHeight`);
while (itemTargetCount > items.length) {
items = await extractItems(page);
await page.evaluate(`document.querySelector("${scrollContainer}").scrollTo(0, document.querySelector("${scrollContainer}").scrollHeight)`);
await page.evaluate(`document.querySelector("${scrollContainer}").scrollHeight > ${previousHeight}`);
await page.waitForTimeout(2000);
}
return items;
}
const getMapsData = async () => {
browser = await puppeteer.launch({
headless: false,
args: ["--disabled-setuid-sandbox", "--no-sandbox"],
});
const [page] = await browser.pages();
await page.setExtraHTTPHeaders({
"User-Agent":
"Mozilla/5.0 (Macintosh; Intel Mac OS X 11_10) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/100.0.4882.194 Safari/537.36",
})
await page.goto("//www.google.com/maps/search/Starbucks/@26.8484046,75.7215344,12z/data=!3m1!4b1" , {
waitUntil: 'domcontentloaded',
timeout: 60000
})
await page.waitForTimeout(5000)
let data = await scrollPage(page,".m6QErb[aria-label]",2)
console.log(data)
await browser.close();
};
getMapsData();
[
{
title: 'The Coffee Bean & Tea Leaf',
avg_rating: '4.7',
reviews: '79',
address: 'The Coffee Bean & Tea Lea,Ground Floor, Epicuria Food Court, Plot No-10 Shivaji Place, Najafgarh Rd',
description: 'Chain coffee bar known for frozen drinks',
category: 'Coffee shop',
timings: 'Open ⋅ Closes 11PM',
phone_num: 'Open ⋅ Closes 11PM',
extra_services: 'Dine-in Drive-through No-contact delivery Reserve a table',
latitude: '28.6511983',
longitude: '77.1215014',
link: '//www.google.com/maps/place/The+Coffee+Bean+%26+Tea+Leaf/data=!4m7!3m6!1s0x390ce3a69997ad37:0xff83fd9a57a7a71e!8m2!3d28.6511983!4d77.1215014!16s%2Fg%2F11sgxr14tq!19sChIJN62XmabjDDkRHqenV5r9g_8?authuser=0&hl=en&rclk=1',
dataId: '0x390ce3a69997ad37:0xff83fd9a57a7a71e'
},
{
title: 'The Coffee Bean & Tea Leaf',
avg_rating: '4.0',
reviews: '271',
address: 'T320, Ambience Mall, Gurgaon - Delhi Expy',
description: 'Chain coffee bar known for frozen drinks',
category: 'Coffee shop',
timings: 'Open ⋅ Closes 11PM',
phone_num: 'Open ⋅ Closes 11PM',
extra_services: 'Dine-in Takeaway No-contact delivery',
latitude: '28.5041789',
longitude: '77.0970538',
link: '//www.google.com/maps/place/The+Coffee+Bean+%26+Tea+Leaf/data=!4m7!3m6!1s0x390d194c1a223247:0x611f25bf4fddaf08!8m2!3d28.5041789!4d77.0970538!16s%2Fg%2F11cs6ch67r!19sChIJRzIiGkwZDTkRCK_dT78lH2E?authuser=0&hl=en&rclk=1',
dataId: '0x390d194c1a223247:0x611f25bf4fddaf08'
},
.....
const axios = require('axios');
axios.get('//api.serpdog.io/reviews?api_key=APIKEY&data_id=0x89c25090129c363d:0x40c6a5770d25022b')
.then(response => {
console.log(response.data);
})
.catch(error => {
console.log(error);
});
Results:
"location_info": {
"title": "Statue of Liberty",
"address": "New York, NY",
"avgRating": "4.7",
"totalReviews": "83,109 reviews"
},
"reviews": [
{
"user": {
"name": "Vo Kien Thanh",
"link": "//www.google.com/maps/contrib/6934504158?hl=en-US&sa=X&ved=2ahUKEwj7zY_J4cv4AhUID0QIHZCtC0cQvvQBegQIARAZ",
"thumbnail": "//lh3.googleusercontent.com/a/AATXAJxv5_uPnmyIeoARlf7gMWCduHV1cNI20UnwPicE=s40-c-c0x00000000-cc-rp-mo-ba4-br100",
"localGuide": true,
"reviews": "111",
"photos": "329"
},
"rating": "Rated 5.0 out of 5,",
"duration": "5 months ago",
"snippet": "The icon of the U.S. 🗽🇺🇸. This is a must-see for everyone who visits New York City, you would never want to miss it.There’s only one cruise line that is allowed to enter the Liberty Island and Ellis Island, which is Statue Cruises. You can purchase tickets at the Battery Park but I’d recommend you purchase it in advance. For $23/adult it’s actually very reasonably priced. Make sure you go early because you will have to go through security at the port. Also take a look at the departure schedule available on the website to plan your trip accordingly.As for the Statue of Liberty, it was my first time seeing it in person so what I could say was a wow. It was absolutely amazing to see this monument. I also purchased the pedestal access so it was pretty cool to see the inside of the statue. They’re not doing the Crown Access due to Covid-19 concerns, but I hope it will be resumed soon.There are a gift shop, a cafeteria and a museum on the island. I would say it takes around 2-3 hours to do everything here because you would want to take as many photos as possible.I absolutely loved it here and I can’t wait to come back.The icon of the U.S. 🗽🇺🇸. This is a must-see for everyone who visits New York City, you would never want to miss it. …More",
"visited": "",
"likes": "91",
"images": [
"//lh5.googleusercontent.com/p/AF1QipPOBhJtq17DAc9_ZTBnN2X4Nn-EwIEet61Y9JQo=w100-h100-p-n-k-no",
"//lh5.googleusercontent.com/p/AF1QipPZ2ut1I7LnECqEB2vzrBk-PSXzBxaHEE4S54lk=w100-h100-p-n-k-no",
"//lh5.googleusercontent.com/p/AF1QipM8nIogBhwcL-dUrd7KaIxZcc_SA6YnJpp50R0C=w100-h100-p-n-k-no",
"//lh5.googleusercontent.com/p/AF1QipPQ-YP7uw_gHTNb1gGZSGRGRrzLMzOrvh98AmSN=w100-h100-p-n-k-no",
"//lh5.googleusercontent.com/p/AF1QipOTqBzK30vQZi9lfuhpk5329bnx-twzgIVjwcI1=w100-h100-p-n-k-no",
"//lh5.googleusercontent.com/p/AF1QipN0TWUE35ajoTdSKelspuUpK-ZTXlRRR9SfPbTa=w100-h100-p-n-k-no",
"//lh5.googleusercontent.com/p/AF1QipPQH_4HtdXmSdkCiDTv2jO30LksCxpe9KQI4YKw=w100-h100-p-n-k-no",
"//lh5.googleusercontent.com/p/AF1QipN_OfX2TgXVNry5fli5v-yExbyTAfV4K7SEy3T0=w100-h100-p-n-k-no",
"//lh5.googleusercontent.com/p/AF1QipNWKl0TeBmnzMaR_W4-7skitDwHjjJxPePbiSyd=w100-h100-p-n-k-no"
]
},
........