EDUCBA

EDUCBA

MENUMENU
  • Free Tutorials
  • Free Courses
  • Certification Courses
  • 600+ Courses All in One Bundle
  • Login

Scrapy selector

Secondary Sidebar
Scrapy Tutorial
  • Scrapy Learn
    • Scrapy Redis
    • Scrapy Response
    • Scrapy LinkExtractor
    • Scrapy log
    • Scrapy proxy pool
    • Scrapy FormRequest
    • Scrapy Web Scraping
    • Scrapy selector
    • Scrapy cloud
    • Scrapy Python
    • Scrapy XPath
    • Scrapy CSS selector
Home Software Development Software Development Tutorials Scrapy Tutorial Scrapy selector

Scrapy selector

Definition of scrapy selector

Scrapy selector data from a source of HTML is the most common activity when scraping web pages. To do so, we can use one of several libraries like BeautifulSoup, a popular web scraping library among Python programmers. It creates code and deals relatively well with faulty markup. However, it has one drawback, it’s slow. Lxml is a pythonic XML parsing package based on ElementTree that also parses HTML.

What is a scrapy selector?

  • The Python standard library does not include lxml. However, Scrapy has a built-in data extraction mechanism. Selectors are named after selecting specific elements of the HTML document using XPath or CSS expressions.
  • XPath is a node-selection language for XML documents that are used with HTML. CSS is a stylesheet language for HTML publications. It establishes selectors to link such styles to specific HTML components.
  • Scrapy Selectors are used to choose items, as the name implies. When it comes to CSS, selectors pick and determine which CSS effects of applying to text and HTML tags.

How to construct it?

  • Selectors are used in Scrapy to specify which parts of the webpage of our spiders. As a result, selecting the tags that accurately describe data is critical if we are to scrape the correct data from the site.
  • Based on the input type, it automatically selects the HTML parsing rules. The below step shows how to construct the scrapy selectors as follows.

1) We install the scrapy using the pip command in this step. In the below example, we have already established a scrapy package in our system, so it will show that the requirement is already satisfied, so we do not need to do anything.

Start Your Free Software Development Course

Web development, programming languages, Software testing & others

All in One Software Development Bundle(600+ Courses, 50+ projects)
Python TutorialC SharpJavaJavaScript
C Plus PlusSoftware TestingSQLKali Linux
Price
View Courses
600+ Online Courses | 50+ projects | 3000+ Hours | Verifiable Certificates | Lifetime Access
4.6 (86,064 ratings)
pip install scrapy

1

2) After installing the scrapy in this step, we log into the python shell using the python3 command.

python3

rr

3) After logging into the python shell, we import the selector module by using the import keyword in this step. The below example shows that importing the selector module in scrapy is as follows.

from scrapy.selector import Selector

b

4) After importing the selector module in this step, we import the HtmlResponse module using the import keyword. The below example shows that importing the HtmlResponse module in scrapy is as follows.

from scrapy.http import HtmlResponse

rw

5) In the example below, we construct a scrapy selector using text. To build the scrapy using text, we need to define the body in our code. The below example shows creating a scrapy selector by using text. In the below example, we can see that we have defined the body; in that line, we have defined html line of code. In the second line, we have used a selector.

Code:

body = '<html><body><span>scrapy</span></body></html>'
Selector (text=body).xpath ('//span/text ()').extract()

pp

Using Scrapy selector

  • We can use scrapy selectors by using scrapy shell; we use the sample html page to use scrapy selectors. The below example shows how to use scrapy selectors as follows. To use scrapy selectors, we need html code to be run by using scrapy shell. The below steps show how to use scrapy selectors as follows.

1) For using scrapy selectors, first, we are creating the HTML page. Our created webpage will look like as below.

Code:

<html>
<head>
<base href = 'http://example.com/' />
<title>Example website</title>
</head>
<body>
<div id = 'images'>
<a href = 'image1.html'>Image 1 <br /><img src = 'image1_thumb.jpg' /></a>
<a href = 'image2.html'>Image 2 <br /><img src = 'image2_thumb.jpg' /></a>
<a href = 'image3.html'>Image 3 <br /><img src = 'image3_thumb.jpg' /></a>
<a href = 'image4.html'>Image 4 <br /><img src = 'image4_thumb.jpg' /></a>
<a href = 'image5.html'>Image 5 <br /><img src = 'image5_thumb.jpg' /></a>
</div>
</body>
</html>

ff

2) After creating the HTML code in this step, we open the scrapy shell using html code as follows. The below example shows opening the scrapy shell by using html code.

# scrapy shell http://doc.scrapy.org/en/latest/_static/selectors-sample1.html

image

image 2

3) The answer will then be available as a response shell variable, with its associated selector in response, when the shell has loaded. The selector is a type of attribute. Because we’re working with HTML, the selection will use an HTML parser by default. The below example shows the construct of the xpath by selecting the text.

response.selector.xpath('//title/text()')

u

4) In this step, we are querying responses using xpath and css as follows.

response.xpath('//title/text()')
response.css('title::text')

y

5) The below example shows the API used to select the nested data.

response.css ('img').xpath('@src').extract()

o

6) The example below shows the data extraction using the selector and extracts method.

response.xpath ('//title/text()').extract()
response.css ('title::text').extract()

Scrapy selector output

7) Below example shows get the base URL, and the images are as follows.

response.xpath('//base/@href').extract()
response.css('base::attr(href)').extract()
response.xpath('//a[contains(@href, "image")]/@href').extract()
response.xpath('//a[contains(@href, "image")]/img/@src').extract()

Scrapy selector output

Scrapy selector Types

  • There are two major types of selectors in Scrapy. Both perform the same function and choose the exact text or data, but the format in which the arguments are passed is different.

1) CSS selectors – We can use CSS selectors to pick parts of an HTML file in Scrapy because CSS languages are declared in any HTML file. The below example shows the CSS selector as follows.

response.css('html').get()

Scrapy selector Output

2) XPath selectors – This is a language for selecting nodes in XML documents. Because HTML files can also be represented as XML documents, they can also be used in HTML files. The below example shows XPath selectors are as follows.

response.xpath('//title/text()')
response.xpath('//title/text()').get()

Scrapy selector output

Conclusion

Scrapy selectors are Selector class objects created by supplying text or a TextResponse object. Scrapy selector data from a source of HTML is the most common activity when scraping web pages. Scrapy Selectors are used to choose items, as the name implies.

Recommended Articles

This is a guide to Scrapy selector. Here we discuss the Definition, What is scrapy selector, How to construct it, and examples with code implementation. You may also have a look at the following articles to learn more –

  1. f String in Python
  2. Sparse Matrix in Python
  3. Binary tree in Python
  4. Python Reduce
0 Shares
Share
Tweet
Share
Primary Sidebar
Footer
About Us
  • Blog
  • Who is EDUCBA?
  • Sign Up
  • Live Classes
  • Corporate Training
  • Certificate from Top Institutions
  • Contact Us
  • Verifiable Certificate
  • Reviews
  • Terms and Conditions
  • Privacy Policy
  •  
Apps
  • iPhone & iPad
  • Android
Resources
  • Free Courses
  • Java Tutorials
  • Python Tutorials
  • All Tutorials
Certification Courses
  • All Courses
  • Software Development Course - All in One Bundle
  • Become a Python Developer
  • Java Course
  • Become a Selenium Automation Tester
  • Become an IoT Developer
  • ASP.NET Course
  • VB.NET Course
  • PHP Course

ISO 10004:2018 & ISO 9001:2015 Certified

© 2022 - EDUCBA. ALL RIGHTS RESERVED. THE CERTIFICATION NAMES ARE THE TRADEMARKS OF THEIR RESPECTIVE OWNERS.

EDUCBA
Free Software Development Course

C# Programming, Conditional Constructs, Loops, Arrays, OOPS Concept

*Please provide your correct email id. Login details for this Free course will be emailed to you

By signing up, you agree to our Terms of Use and Privacy Policy.

EDUCBA Login

Forgot Password?

By signing up, you agree to our Terms of Use and Privacy Policy.

EDUCBA
Free Software Development Course

Web development, programming languages, Software testing & others

*Please provide your correct email id. Login details for this Free course will be emailed to you

By signing up, you agree to our Terms of Use and Privacy Policy.

EDUCBA

*Please provide your correct email id. Login details for this Free course will be emailed to you

By signing up, you agree to our Terms of Use and Privacy Policy.

Let’s Get Started

By signing up, you agree to our Terms of Use and Privacy Policy.

This website or its third-party tools use cookies, which are necessary to its functioning and required to achieve the purposes illustrated in the cookie policy. By closing this banner, scrolling this page, clicking a link or continuing to browse otherwise, you agree to our Privacy Policy

Loading . . .
Quiz
Question:

Answer:

Quiz Result
Total QuestionsCorrect AnswersWrong AnswersPercentage

Explore 1000+ varieties of Mock tests View more

Special Offer - All in One Software Development Bundle (600+ Courses, 50+ projects) Learn More