Example of Web Scraping using BeautifulSoup in Python

In this Example we try to scrap point table of T20I tri-series,2018 between India, Srilanka and Bangladesh and try to store value into DataFrame in same format.

Here is how point tabe exactly looks like on cricbuzz website.

We try to extract points table along with header and Teams and store these values in DataFrame like this.

prerequisites :

Basic working programming knowledge in python
Knowledge of Pandas DataFrame.
How to import modules in python

If you know nothing about python web scraping I suggest please check out this link Python Web Scraping.

To start we need to enable developer mode in our browser. To that press F12 key in chrome and I will recommend you to use chrome because in chrome its quite easy to navigate through codes of a webpage.

Once you press F12 key your browser will look like below screenshot. From right opened code book you can navigate to table. I suggest you do it by yourself search for table tag with class table cb-srs-pnts and click your cursor on it you get to know how it works.

Before you see the code you should know that using requests library in python we can easily download the source code of any webpage where using get method of requests library return a response. If you understand nothing leave it you'll find out once you see the code.


from bs4 import BeautifulSoup
import numpy as np
import pandas as pd
import requests

page = requests.get("http://www.cricbuzz.com/cricket-series/2678/india-and-bangladesh-in-sri-lanka-t20i-tri-series-2018/points-table")

soup = BeautifulSoup(page.text)
#print(soup.prettify())


scoretable = soup.find('table',class_='table cb-srs-pnts')
team_name = [tn.get_text() for tn in scoretable.find_all('td',class_='cb-srs-pnts-name')]
#team_name.insert(0,'Team')
#print(team_name)



table_head = [th.get_text() for th in scoretable.find_all('td',class_='cb-srs-pnts-th')]
table_head.insert(5,'pts')
#print(table_head)



scores = [s.get_text() for s in soup.find_all('td',class_='cb-srs-pnts-td')]
teams_point = np.array(scores)
teams_point=teams_point.reshape(3,7)
#print(teams_point)

df = pd.DataFrame([teams_point[0][:],teams_point[1][:],teams_point[2][:]]
,index=team_name,columns=table_head)
df.columns.name = 'Teams'
print(df)

Note: Please, before trying it in your ide check to install BeautifulSoup, Pandas and Numpy.

Output: