EDUCBA

EDUCBA

MENUMENU
  • Free Tutorials
  • Free Courses
  • Certification Courses
  • 600+ Courses All in One Bundle
  • Login

Python BeautifulSoup

Home » Software Development » Software Development Tutorials » Python Tutorial » Python BeautifulSoup

Python BeautifulSoup

Introduction to Python BeautifulSoup

BeautifulSoup reduces human effort and time while working. A Python library for data pulling from files of markup languages such as HTML and XML is Python BeautifulSoup. It is also Provides analogical ways to produce navigation, modifying, and searching of necessary files. Also used in tree parsing using your favorite parser. In this tutorial, let’s learn how the beautifulsoup works and how an individual can make what he wants to achieve. Also, the determination of action when it violates your orders. When you are provided with the downloads directly, when some webpages you seek show the relevant data of your research, This helps you to overcome such problems which is basically web scraping.

Installation of Python BeautifulSoup

Explanation on Installing python beautifulsoup is given below:

Start Your Free Software Development Course

Web development, programming languages, Software testing & others

pip install beautifulsoup4
pip install lxml
sudo pip install lxml
pip install future
sudo pip install future

Note: It becomes easier if you already have Python installers such as pip.

Accessing of the HTML Through a Webpage

import requests
URL = ”https://www.educba.com/software-development/”
r = requests.get(URL)
print(r.content)

Let me elaborate every piece of code for you:

  • Import the library requests.
  • Scraping the webpage of your desire by specifying the URL.
  • To the specified URL, send an HTTP request and save the response. Response object is called r.
  • r.content print is to be done later which is webpage’s raw HTML content and is not of ‘ string ’ type.

Parsing of the Content HTML

import requests
from bs4 import BeautifulSoup
URL = ”https://www.educba.com/software-development/”
r = requests.get(URL)

soup = BeautifulSoup(r.content, ‘html5lib’ )
print = (soup.prettify())

BeautifulSoup library has a really nice thing that HTML parsing libraries like html.parser, lxml, html5lib, and others can be built.

Understanding the Python BeautifulSoup with Examples

Example of python beautifulsoup better are given below:

A Simple Quick Scrape: It is nothing more than using requests to request the data and providing the URL to the special HTML file that there is. Secondly, supplying some regex and extract data out of the HTML file. Note that this HTML file is full of names, emails, and phone numbers and it is all just generated data. It is all just garbage and it is real.

Popular Course in this category
Python Training Program (36 Courses, 13+ Projects)36 Online Courses | 13 Hands-on Projects | 189+ Hours | Verifiable Certificate of Completion | Lifetime Access
4.8 (8,365 ratings)
Course Price

View Course

Related Courses
Programming Languages Training (41 Courses, 13+ Projects, 4 Quizzes)Angular JS Training Program (9 Courses, 7 Projects)

All that is done here is to create a regex as one that matches phone numbers for the data that I found in the file and one that matches just a really basic email format. It is not a great regex. To grab HTML from a webpage, let us write code. Let us see how to parse through it. The below code sends a request of GET to the desired webpage and creates a beautifulsoup object with HTML.

import requests
from bs4 import BeautifulSoup
vgm_url = ‘https://www.vgmusic.com/music/console/nintendo/nes/’
html_text = requests.get(vgm_url).text
soup = BeautifulSoup(html_text,’html.parser’)

The soup object searches and navigates with the HTML for your desired data.
find and findall are the modules amongst most powerful one’s. Where there is only one element, you know that there is soup.find() to body tag. Whereas, soup.findall() is used in adventures of web scraping. Using which, iteration and printing their URLs through all the hyperlinks is done. Also, tagging attributes and providing different arguments with findall like regular expressions and change as specifically as you want.

Prior to writing the code for a parse, we will look into the HTML that is being rendered by the browser. Pattern recognition and experimentation are required for a little web scraping as every webpage is different on it’s own. Let us download a bunch of MIDI files. Writing a code through a webpage to parse it usually helps through developer tools available in modern browsers. Inspecting HTML will help you figure out if you can access the data programmatically.

We are going to use findall() method for going through the links using regular expressions cause our goal is to get only the links containing MIDI files by filtering out and texts of such with no parenthesis. This allows us to exclude all the remixes and duplicates.

import re
import requests
from bs4 import BeautifulSoup
vgm_url = ‘https://www.vgmusic.com/music/console/nintendo/nes/’
html_text = requests.get(vgm_url).text
soup = BeautifulSoup(html_text,’html.parser’)
if __name__ == '__main__':
attrs = {
'href' : re.compile(r'\.mid$')
}
tracks = soup.find_all('a', attrs=attrs, string=re.compile(r'^((?!\().)*$'))
count = 0
for track in tracks:
print(track)
count += 1
print(len(tracks))

Helps us to filter out all the MIDI files and let us understand how to download them all.

The same, we need to look into iterating all the MIDI files and just understand how to download them by giving in a code. Adding a little donwload_track and calling the function to the above helps us to download the files through iterating all the MIDI files.

import re
import requests
from bs4 import BeautifulSoup
vgm_url = ‘https://www.vgmusic.com/music/console/nintendo/nes/’
html_text = requests.get(vgm_url).text
soup = BeautifulSoup(html_text,’html.parser’)
def download_track(count, track_element):
#Get the title of the track from the HTML element
track_title = track_element.text.strip().replace('/','-')
download_url = '{}{}'.format(vgm_url, track_element['href'])
file_name = '{}_{}.mid'.format(count,track_title)
#Download the track
r = requests.get(download_url,allow_redirects=True)
with open(file_name, 'wb') as f:
f.write(r.content)
#Print to the console to keep track of how the scraping is coming along.
print('Downloaded: {}'.format(track_title, download_url))
if __name__ == '__main__':
attrs = {
'href' : re.compile(r'\.mid$')
}
tracks = soup.find_all('a', attrs=attrs, string=re.compile(r'^((?!\().)*$'))
count = 0
for track in tracks:
print(track)
count += 1
print(len(tracks))

Passing the object of BeautifulSoup which represents the element of HTML and linking to a MIDI file with a unique number along and using filename and overcome possible naming collisions.

Conclusion

If you want to get some data out of any webpage, BeautifulSoup is here for you. It helps you overcome the code hurdles of web scraping. A Python library which helps to get out the data from markup languages such as XML and HTML. Content parsing from the data is simply created using an object of BeautifulSoup.

Recommended Articles

This is a guide to Python BeautifulSoup. Here we also discuss the introduction and installing of python beautifulsoup along with an example and its code implementation. You may also have a look at the following articles to learn more –

  1. Python Counter
  2. Python Concurrency
  3. Python argparse
  4. Python Unique List

Python Training Program (36 Courses, 13+ Projects)

36 Online Courses

13 Hands-on Projects

189+ Hours

Verifiable Certificate of Completion

Lifetime Access

Learn More

0 Shares
Share
Tweet
Share
Primary Sidebar
Python Tutorial
  • Function
    • Python Built-in Functions
    • Math Functions in Python
    • Python String Functions
    • Trigonometric Functions in Python
    • Python Input Function
    • Python Input String
    • Python String Operations
    • Python Stream
    • Python Multiline String
    • Python Regex
    • Python Regex Tester
    • Python regex replace
    • Python File Methods
    • Python Import CSV
    • Python Read CSV File
    • Python write CSV file
    • Python Delete File
    • Python File readline
    • Python if main
    • Python Main Method
    • List Method in Python
    • Python List Length
    • Recursive Function in Python
    • Copy List in Python
    • Python Range Function
    • Python Substring
    • Python list remove()
    • Python List Index
    • Python Set Function
    • Python len Function
    • Python eval()
    • Python Counter
    • ord Function in Python
    • strip Function in Python
    • Split Function in Python
    • Python Round Function
    • Python Map Function
    • Python String Join
    • Python format() Function
    • Python Contextlib
    • Python Compare Strings
    • Python Return Value
    • Python List count
    • Filter in Python
    • Python Slice String
    • Python Absolute Value
    • Python Trim String
    • Python Type Function
    • Lowercase in Python
    • Python xrange
    • Python yield
    • Python Find String
    • Max Function in Python
    • Python Power Function
    • pop() in Python
    • Python argparse
    • Python Pickle
    • Python Zip Function
    • Python Split String
    • super() in Python
    • Python Extend
    • Python String Replace
    • Python PEP8
    • Python Filter Function
    • Python if then else
    • Lambda in Python
    • Python BeautifulSoup
    • Python Sleep
    • Python Function Generator
    • Python @classmethod decorator
    • Python Endswith
    • Python BufferedReader
    • Python Async
    • Python Parser
    • Python SystemExit
    • Python pip
    • Python kwargs
  • Basics Part I
    • Introduction To Python
    • What Is Python
    • Careers in Python
    • Advantages of Python
    • Uses of Python
    • Python Features
    • Python Fast And python psyco
    • Python ImportError
    • Benefits and Limitations of Using Python
    • What can I do with?Python
    • Is Python a scripting language
    • Is Python Object Oriented
    • Is Python Open Source
    • Python Socket Programming
    • Useful Tips on Python Programming
    • Python You Should Be Using It
    • Python Web Development
    • Python Programming Beginners Tutorails
    • Practical Python Programming for Non-Engineers
    • Python Programming for the Absolute Beginner
    • Versions of?Python
  • Basic Part II
    • Comments in Python
    • Finally in Python
    • Python Multiline Comment
    • Python Data Types
    • Python Variables
    • Python Variable Types
    • Python Global Variable
    • Python Variable Scope
    • Python Private Variables
    • Python Default Arguments
    • Python Command-line Arguments
    • Indentation in Python
    • Object in Python
    • Python Keywords
    • Python Literals
    • Pointers in Python
    • Iterators in Python
    • Python User Input
    • Python Enumerate
    • Python Commands
    • Type Casting in Python
    • Python Identifiers
    • Python Constants
    • What is NumPy in Python?
    • Cheat Sheet Python
  • Frameworks
    • Python Frameworks
    • Python Compilers
    • Python Editors
    • Best Compiler for Python
    • Python IDE for Windows
    • Python IDE on Linux
  • Installation
    • How To Install Python
    • Install Python on Linux
    • Install Python on Windows
    • Install Anaconda Python
  • Operator
    • Python Operators
    • Arithmetic Operators in Python
    • Python Comparison Operators
    • Logical Operators in Python
    • Assignment Operators in Python
    • Unary Operators in Python
    • String Operators in Python
    • Boolean Operators in Python
    • Identity Operators in Python
    • Python Bitwise Operator
    • Python Remainder Operator
    • Python Modulus Operator
  • Control Statement
    • Conditional Statements in Python
    • Control Statements in Python
    • If Condition in Python
    • If Statement in Python
    • If Else Statement in Python
    • else if Statement in Python
    • Nested IF Statement in Python
    • Break Statement in Python
    • Python Switch Statement
  • Loops
    • Loops in Python
    • For Loop in Python
    • While Loop in Python
    • Do While Loop in Python
    • Python Nested Loops
    • Python Infinite Loop
    • Python Event Loop
  • Sorting
    • Sorting in Python
    • Sorting Algorithms in Python
    • Bubble Sort in Python
    • Merge Sort in Python
    • Heap Sort in Python
    • Quick Sort in Python
    • Python Sorted Function
  • Array
    • Arrays in Python
    • 2D Arrays In Python
    • 3d Arrays in Python
    • Multidimensional Array in Python
    • Python Array Functions
    • String Array in Python
    • Python Sort Array
    • Python Array Length
  • Inheritance
    • Inheritance in Python
    • Single Inheritance in Python
    • Multiple Inheritance in Python
    • Interface in Python
  • Exception
    • Python Exception Handling
    • Custom Exception in Python
    • Indentation Error in Python
    • Python IOError
    • Python EOFError
    • Python NotImplementedError
    • Python TypeError
    • Python ValueError
    • Python AssertionError
    • Python Unicode Error
    • Python NameError
    • Python StopIteration
    • Python OverflowError
    • Python KeyboardInterrupt
  • Advanced
    • Scope in Python
    • Python Collections
    • Constructor in Python
    • Destructor in Python
    • Python Overloading
    • Overriding in Python
    • Function Overloading in Python
    • Method Overloading in Python
    • Operator Overloading in Python
    • Method Overriding in Python
    • Encapsulation in Python
    • Static Method in Python
    • Assert in Python
    • Python References
    • Python Virtualenv
    • Python mkdir
    • Logistic Regression in Python
    • Dictionary in Python
    • Regular Expression in Python
    • Python Import Module
    • Python OS Module
    • Python Sys Module
    • Python Generators
    • Abstract Class in Python
    • Python File Operations
    • Sequences in Python
    • Stack in Python
    • Queue in Python
    • Tuples in Python
    • Python Magic Method
    • Python Sets
    • Python Set Methods
    • Priority Queues in Python
    • Reverse Engineering with Python
    • String Formatting in Python
    • Python isinstance
    • String Length Python
    • Python Concurrency
    • Python List
    • Python Initialize List
    • Python Unique List
    • Python Sort List
    • Python Reverse List
    • Python Empty List
    • List Comprehensions Python
    • List Operations in Python
    • Python Database Connection
    • Python SQLite
    • Python SQLite Create Database
    • Send Mail in Python
    • Bash Scripting and Python
    • Violent Python Book
    • NLP in Python
    • Matplotlib In Python
    • Gray Hat Python: Security
    • Python Subprocess
    • Python Threading Timer
    • Python Threadpool
    • Python Statistics Module
    • How to Call a Function in Python?
    • Python Curl
    • JSON in Python
    • Python json.dumps
    • Python Turtle
    • Python Unit Test
    • pass Keyword in Python
    • Tokenization in Python
    • Random Module in Python
    • Python Multiprocessing
    • Python getattr
    • Collection Module in Python
    • Print Statement in Python
    • Python Countdown Timer
    • Python Context Manager
    • File Handling in Python
    • Python Event Handler
    • Python Print Table
    • Python Docstring
    • Python Dictionary Keys
    • Python Iterator Dictionary
    • Python Class Attributes
    • Python Dictionary Methods
    • Namedtuple Python
    • Namedtuple Python
    • Namedtuple Python
    • Python Class Constants
    • Python Validation
    • Python Switch Case
    • Python Rest Server
    • Python Yield vs Return
    • Python Pickle vs JSON
  • Tkinter
    • Tkinter Widgets
    • Python Tkinter Button
    • Python Tkinter Canvas
    • Tkinter Frame
    • Tkinter LabelFrame
    • Python Tkinter Label
    • Tkinter Scrollbar
    • Tkinter Listbox
    • Tkinter Spinbox
    • Tkinter Checkbutton
    • Tkinter Menu
    • Tkinter Menubutton
    • Tkinter OptionMenu
    • Tkinter Messagebox
    • Tkinter Grid
    • Python Tkinter Entry
    • Tkinter after
    • Tkinter Colors
    • Tkinter Font
    • Tkinter PhotoImage
    • Tkinter TreeView
    • Tkinter Notebook
    • Tkinter Bind
    • Tkinter Icon
    • Tkinter Window Size
    • Tkinter Color Chart
    • Tkinter Slider
    • Tkinter Calculator
  • Programs
    • Patterns in Python
    • Star Patterns in Python
    • Swapping in Python
    • Factorial in Python
    • Fibonacci Series in Python
    • Reverse Number in Python
    • Palindrome in Python
    • Random Number Generator in Python
    • Prime Numbers in Python
    • Armstrong Number in Python
    • Strong Number in Python
    • Leap Year Program in Python
    • Square Root in Python
    • Python Reverse String
    • Python Object to String
    • Python Object to JSON
    • Python Classmethod vs Staticmethod
  • Python 3
    • Python 3 Commands
    • Python 3 cheat sheet
  • Interview Question
    • Python Interview Questions And Answers

Related Courses

Python Certification Course

Programming Languages Courses

Angular JS Certification Training

Footer
About Us
  • Blog
  • Who is EDUCBA?
  • Sign Up
  • Corporate Training
  • Certificate from Top Institutions
  • Contact Us
  • Verifiable Certificate
  • Reviews
  • Terms and Conditions
  • Privacy Policy
  •  
Apps
  • iPhone & iPad
  • Android
Resources
  • Free Courses
  • Java Tutorials
  • Python Tutorials
  • All Tutorials
Certification Courses
  • All Courses
  • Software Development Course - All in One Bundle
  • Become a Python Developer
  • Java Course
  • Become a Selenium Automation Tester
  • Become an IoT Developer
  • ASP.NET Course
  • VB.NET Course
  • PHP Course

© 2020 - EDUCBA. ALL RIGHTS RESERVED. THE CERTIFICATION NAMES ARE THE TRADEMARKS OF THEIR RESPECTIVE OWNERS.

EDUCBA
Free Software Development Course

Web development, programming languages, Software testing & others

*Please provide your correct email id. Login details for this Free course will be emailed to you
Book Your One Instructor : One Learner Free Class

Let’s Get Started

This website or its third-party tools use cookies, which are necessary to its functioning and required to achieve the purposes illustrated in the cookie policy. By closing this banner, scrolling this page, clicking a link or continuing to browse otherwise, you agree to our Privacy Policy

EDUCBA

*Please provide your correct email id. Login details for this Free course will be emailed to you
EDUCBA Login

Forgot Password?

EDUCBA
Free Software Development Course

Web development, programming languages, Software testing & others

*Please provide your correct email id. Login details for this Free course will be emailed to you

Special Offer - Python Training Program (36 Courses, 13+ Projects) Learn More