Example of Web Scraping using BeautifulSoup in Python - Python for Data Analytics

Python Programs | Python Tricks | Solution for problems | Data Cleaning | Data Science

Example of Web Scraping using BeautifulSoup in Python

In this Example we try to scrap point table of T20I tri-series,2018 between India, Srilanka and Bangladesh and try to store value into DataFrame in same format.

Here is how point tabe exactly looks like on cricbuzz website.


We try to extract points table along with header and Teams and store these values in DataFrame like this.


prerequisites :
  • Basic working programming knowledge in python
  • Knowledge of Pandas DataFrame.
  • How to import modules in python
If you know nothing about python web scraping I suggest please check out this link Python Web Scraping.

To start we need to enable developer mode in our browser. To that press F12 key in chrome and I will recommend you to use chrome because in chrome its quite easy to navigate through codes of a webpage.

Once you press F12 key your browser will look like below screenshot. From right opened code book you can navigate to table. I suggest you do it by yourself search for table tag with class table cb-srs-pnts and click your cursor on it you get to know how it works.


Before you see the code you should know that using requests library in python we can easily download the source code of any webpage where using get method of requests library return a response. If you understand nothing leave it you'll find out once you see the code.


from bs4 import BeautifulSoup
import numpy as np
import pandas as pd
import requests

page = requests.get("http://www.cricbuzz.com/cricket-series/2678/india-and-bangladesh-in-sri-lanka-t20i-tri-series-2018/points-table")

soup = BeautifulSoup(page.text)
#print(soup.prettify())


scoretable = soup.find('table',class_='table cb-srs-pnts')
team_name = [tn.get_text() for tn in scoretable.find_all('td',class_='cb-srs-pnts-name')]
#team_name.insert(0,'Team')
#print(team_name)



table_head = [th.get_text() for th in scoretable.find_all('td',class_='cb-srs-pnts-th')]
table_head.insert(5,'pts')
#print(table_head)



scores = [s.get_text() for s in soup.find_all('td',class_='cb-srs-pnts-td')]
teams_point = np.array(scores)
teams_point=teams_point.reshape(3,7)
#print(teams_point)

df = pd.DataFrame([teams_point[0][:],teams_point[1][:],teams_point[2][:]]
,index=team_name,columns=table_head)
df.columns.name = 'Teams'
print(df)



Note: Please, before trying it in your ide check to install BeautifulSoup, Pandas and Numpy.

Output:

9 comments:

  1. Thanks for sharing such a helpful codes of Web Scraping. Keep posting.

    ReplyDelete
  2. Nice blog,Thank you for sharing keep going on. See more: Python Online Training

    ReplyDelete
  3. Automate Everything with web bots! This actually made my life easier... https://simplestipsandtricks.blogspot.com/2018/10/the-power-of-headless-chrome-and.html
    Make your Own Web Crawler - Web bots

    ReplyDelete
  4. Best information about software.Thanks for sharing such great information. hope you keep sharing such kind of information Web Data Extractor

    ReplyDelete
  5. Your article is superbly awesome.Web data extractor software is best to extract data from websites and search engine. email marketing has taken a clear stride Web data extractor

    ReplyDelete
  6. I have been using beautiful soup for long time. This article showed exactly how its done. Such a helpful piece of article. Though you can outsource your web extraction including web research, web scraping and data conversion.

    ReplyDelete
  7. Wow, cool post. I’d like to write like this too – taking time and real hard work to make a great article… but I put things off too much and never seem to get started. Thanks though. Ubot studio

    ReplyDelete
  8. Do you know why only some people get more traffic, revenue and rank on google? the answer is only one - Ads Clicker Bot. Use traffic bot today to boost your traffic.

    ReplyDelete