visit
This tool will help us to scrape dynamic websites using millions of rotating proxies so that we don’t get blocked. It also provides a captcha clearing facility. It uses headerless chrome to scrape dynamic websites.
is a Python library for pulling data out of HTML and XML files.allow you to send HTTP requests very easily. to extract the HTML code of the target URL.
mkdir scraper
pip install beautifulsoup4
pip install requests
from bs4 import BeautifulSoup
import requests
1. Previous Close
2. Open
3. Bid
4. Ask
5. Day’s Range
6. 52 Week Range
7. Volume
8. Avg. Volume
9. Market Cap
10. Beta
11. PE Ratio
12. EPS
13. Earnings Rate
14. Forward Dividend & Yield
15. Ex-Dividend & Date
16. 1y target EST
r = requests.get("//api.scrapingdog.com/scrape?api_key=<your-api-key>&url=//finance.yahoo.com/quote/AMZN?p=AMZN&.tsrc=fin-srch").text
soup = BeautifulSoup(r,’html.parser’)
Now, on the entire page, we have four “tbody” tags. We are interested in the first two because we currently don’t need the data available inside the third & fourth “tbody” tags.
First, we will find out all those “tbody” tags using variable “soup”.
alldata = soup.find_all(“tbody”)
As you can notice that the first two “tbody” has 8 “tr” tags and every “tr” tag has two “td” tags.
try:
table1 = alldata[0].find_all(“tr”)
except:
table1=None
try:
table2 = alldata[1].find_all(“tr”)
except:
table2 = None
At this point, we are going to declare a list and a dictionary before starting a for loop.
l={}
u=list()
For making the code simple I will be running two different “for” loops for each table. First for “table1”
for i in range(0,len(table1)):
try:
table1_td = table1[i].find_all(“td”)
except:
table1_td = None
l[table1_td[0].text] = table1_td[1].text
u.append(l)
l={}
Now, what we have done is we are storing all the td tags in a variable “table1_td”. And then we are storing the value of the first & second td tag in a “dictionary”. Then we are pushing the dictionary into a list. Since we don’t want to store duplicate data we are going to make dictionary empty at the end. Similar steps will be followed for “table2”.
for i in range(0,len(table2)):
try:
table2_td = table2[i].find_all(“td”)
except:
table2_td = None
l[table2_td[0].text] = table2_td[1].text
u.append(l)
l={}
Then at the end when you print the list “u” you get a JSON response.
{
“Yahoo finance”: [
{
“Previous Close”: “2,317.80”
},
{
“Open”: “2,340.00”
},
{
“Bid”: “0.00 x 1800”
},
{
“Ask”: “2,369.96 x 1100”
},
{
“Day’s Range”: “2,320.00–2,357.38”
},
{
“52 Week Range”: “1,626.03–2,475.00”
},
{
“Volume”: “3,018,351”
},
{
“Avg. Volume”: “6,180,864”
},
{
“Market Cap”: “1.173T”
},
{
“Beta (5Y Monthly)”: “1.35”
},
{
“PE Ratio (TTM)”: “112.31”
},
{
“EPS (TTM)”: “20.94”
},
{
“Earnings Date”: “Jul 23, 2020 — Jul 27, 2020”
},
{
“Forward Dividend & Yield”: “N/A (N/A)”
},
{
“Ex-Dividend Date”: “N/A”
},
{
“1y Target Est”: “2,645.67”
}
]
}
In this article, we understood how we can scrape data using & BeautifulSoup regardless of the type of website.
Feel free to comment and ask me anything. You can follow me on and . Thanks for reading and please hit the like button! 👍