786 reads

Web Scraping with python

by manthanMay 7th, 2020

Too Long; Didn't Read

In this post, we are going to scrape Yahoo Finance using python. This is a great source for stock-market data. We will code a scraper for that. Using that scraper you would be able to scrape stock data of any company from yahoo finance. As you know I like to make things pretty simple, for that, I will also be using a web scraper which will increase your scraping efficiency.

Companies Mentioned

featured image - Web Scraping with python

This tool will help us to scrape dynamic websites using millions of rotating proxies so that we don’t get blocked. It also provides a captcha clearing facility. It uses headerless chrome to scrape dynamic websites.

Requirements

Generally, web scraping is divided into two parts:Fetching data by making an HTTP requestExtracting important data by parsing the HTML DOM

Libraries & Tools

is a Python library for pulling data out of HTML and XML files.allow you to send HTTP requests very easily. to extract the HTML code of the target URL.

Setup

Our setup is pretty simple. Just create a folder and install Beautiful Soup & requests. For creating a folder and installing libraries type below given commands. I am assuming that you have already installed Python 3.x.

mkdir scraper
pip install beautifulsoup4
pip install requests

Now, create a file inside that folder by any name you like. I am using scraping.py.Firstly, you have to sign up for the . It will provide you with 1000 FREE credits. Then just import Beautiful Soup & requests in your file. like this.

from bs4 import BeautifulSoup
import requests

What we are going to scrape

Here is the list of fields we will be extracting:

1. Previous Close

2. Open

3. Bid

4. Ask

5. Day’s Range

6. 52 Week Range

7. Volume

8. Avg. Volume

9. Market Cap

10. Beta

11. PE Ratio

12. EPS

13. Earnings Rate

14. Forward Dividend & Yield

15. Ex-Dividend & Date

16. 1y target EST

Preparing the Food

Now, since we have all the ingredients to prepare the scraper, we should make a GET request to the to get the raw HTML data. If you are not familiar with the scraping tool, I would urge you to go through its . Now we will scrape Yahoo Finance for financial data using requests library as shown below.

r = requests.get("//api.scrapingdog.com/scrape?api_key=<your-api-key>&url=//finance.yahoo.com/quote/AMZN?p=AMZN&.tsrc=fin-srch").text

this will provide you with an HTML code of that target URL.Now, you have to use BeautifulSoup to parse HTML.

soup = BeautifulSoup(r,’html.parser’)

Now, on the entire page, we have four “tbody” tags. We are interested in the first two because we currently don’t need the data available inside the third & fourth “tbody” tags.

First, we will find out all those “tbody” tags using variable “soup”.

alldata = soup.find_all(“tbody”)

As you can notice that the first two “tbody” has 8 “tr” tags and every “tr” tag has two “td” tags.

try:
 table1 = alldata[0].find_all(“tr”)
except:
 table1=None
try:
 table2 = alldata[1].find_all(“tr”)
except:
 table2 = None

Now, each “tr” tag has two “td” tags. The first td tag consists of the name of the property and the other one has the value of that property. It’s something like a key-value pair.

At this point, we are going to declare a list and a dictionary before starting a for loop.

l={}
u=list()

For making the code simple I will be running two different “for” loops for each table. First for “table1”

for i in range(0,len(table1)):
 try:
   table1_td = table1[i].find_all(“td”)
 except:
   table1_td = None
 l[table1_td[0].text] = table1_td[1].text
 u.append(l)
 l={}

Now, what we have done is we are storing all the td tags in a variable “table1_td”. And then we are storing the value of the first & second td tag in a “dictionary”. Then we are pushing the dictionary into a list. Since we don’t want to store duplicate data we are going to make dictionary empty at the end. Similar steps will be followed for “table2”.

for i in range(0,len(table2)):
 try:
   table2_td = table2[i].find_all(“td”)
 except:
   table2_td = None
 l[table2_td[0].text] = table2_td[1].text
 u.append(l)
 l={}

Then at the end when you print the list “u” you get a JSON response.

{
 “Yahoo finance”: [
 {
   “Previous Close”: “2,317.80”
 },
 {
   “Open”: “2,340.00”
 },
 {
   “Bid”: “0.00 x 1800”
 },
 {
   “Ask”: “2,369.96 x 1100”
 },
 {
   “Day’s Range”: “2,320.00–2,357.38”
 },
 {
   “52 Week Range”: “1,626.03–2,475.00”
 },
 {
   “Volume”: “3,018,351”
 },
 {
   “Avg. Volume”: “6,180,864”
 },
 {
   “Market Cap”: “1.173T”
 },
 {
   “Beta (5Y Monthly)”: “1.35”
 },
 {
   “PE Ratio (TTM)”: “112.31”
 },
 {
   “EPS (TTM)”: “20.94”
 },
 {
   “Earnings Date”: “Jul 23, 2020 — Jul 27, 2020”
 },
 {
   “Forward Dividend & Yield”: “N/A (N/A)”
 },
 {
   “Ex-Dividend Date”: “N/A”
 },
 {
   “1y Target Est”: “2,645.67”
 }
 ]
}

Isn’t that amazing. We managed to scrape Yahoo finance in just 5 minutes of setup. We have an array of python Object containing the financial data of the company Amazon. In this way, we can scrape the data from any website.

Conclusion

In this article, we understood how we can scrape data using & BeautifulSoup regardless of the type of website.

Feel free to comment and ask me anything. You can follow me on and . Thanks for reading and please hit the like button! 👍

Additional Resources

And there’s the list! At this point, you should feel comfortable writing your first web scraper to gather data from any website. Here are a few additional resources that you may find helpful during your web scraping journey:

L O A D I N G
. . . comments & more!