visit
Hello Pythonistas, In this tutorial you're are going to learn out how to track worldwide Coronavirus cases using requests and BeautifulSoup library in Python.
Note: If you're new to webscraping I would recommend checking out firstly and then come back to complete this tutorial.
$ pip install requests
$ pip install beautifulsoup4
As we have stated above we are going to be tracking the number of cases worldwide by scraping the information from the cloud, there many choices of websites to scrap from but in this tutorial, we will with .
Let's firstly understand the structure of the website we are about to scrap, once you open up the website, scroll down and then you will see a table similar to what shown below.
The table in HTML is usually represented with a table tag whereby tr indicate row and td indicates a specific column in that low just as shown in the example below;
<table border = "1">
<tr>
<td>Row 1, Column 1</td>
<td>Row 1, Column 2</td>
</tr>
<tr>
<td>Row 2, Column 1</td>
<td>Row 2, Column 2</td>
</tr>
</table>
That means every row in the table is represented by tag tr, therefore we need to filter all rows from the worldometers table then store in CSV file.
Below is the complete code of the spider that you're going to build in this tutorial which is capable of scraping the live Coronavirus number from worldometer website and then store them in a CSV file.
import csv
import requests
from bs4 import BeautifulSoup
html = requests.get('//www.worldometers.info/coronavirus/').text
html_soup = BeautifulSoup(html, 'html.parser')
rows = html_soup.find_all('tr')
def extract_text(row, tag):
element = BeautifulSoup(row, 'html.parser').find_all(tag)
text = [col.get_text() for col in element]
return text
heading = rows.pop(0)
heading_row = extract_text(str(heading), 'th')[1:9]
with open('corona.csv', 'w') as store:
Store = csv.writer(store, delimiter=',')
Store.writerow(heading_row)
for row in rows:
test_data = extract_text(str(row), 'td')[1:9]
Store.writerow(test_data)
import csv
import requests
from bs4 import BeautifulSou
html = requests.get('//www.worldometers.info/coronavirus/').text
html_soup = BeautifulSoup(html, 'html.parser')
rows = html_soup.find_all('tr')
After extracting all the rows in the coronavirus table, we need a way to parse all the details on each column in that row, and that's what the below function does.
def extract_text(row, tag):
element = BeautifulSoup(row, 'html.parser').find_all(tag)
text = [col.get_text() for col in element]
return text
Since we don't wanna confuse the header naming and real stats, we need to pop out the header from the row list as shown below.
heading = rows.pop(0)
heading_row = extract_text(str(heading), 'th')[1:9]
Now finally our last job is to parse all the individual details of every row in the table and then store in the CSV file using the CSV module as shown below;
Congratulations you have just made your own How to track Coronavirus in Python, to share it with your fellow developers.Based on your interest, you might also love these articles;Previously published at