current position:Home>Introduction to Python urllib module

Introduction to Python urllib module

2022-01-30 15:34:32 Little cute in the circle of friends

Little knowledge , Great challenge ! This article is participating in “ A programmer must have a little knowledge ” Creative activities .

This article has participated in  「 Digging force Star Program 」 , Win a creative gift bag , Challenge creation incentive fund .

Preface

We were in the last issue Computer network knowledge The basis of computing network has been discussed in, such as TCP/IP Four layer protocol model 、URL Resource locator 、HTTP/HTTPS Understand and learn the agreement, etc .

as everyone knows ,Python It's a high-level language , Not only support GUI Programming , You can also do network programming .

among ,Python The built-in library provides for operation URL Function of urllib modular , Request library for the server requests modular

therefore , We use Python Crawlers use the most commonly used modules urllib Module knowledge learning ,Let's go~

1. urllib The module overview

️urllib yes python Built in HTTP Request Library , It can be used without installation .

urllib Very powerful , Provide support for many functions as follows

  1. Web request
  2. Response get
  3. agent 、cookie Set up
  4. exception handling
  5. URL analysis

urllib The offer includes 4 A module

  • request: It's fundamental http Request module , Used to simulate sending requests

  • error: Exception handling module , If an error occurs, you can catch these exceptions

  • parse: A tool module , Provides a lot URL processing method , Such as : Split 、 analysis 、 Merger, etc

  • robotparser: Mainly used to identify the website robots.txt file , Then determine which websites can climb

Important note

  • Python2 in , You can import urllib
  • Python3 in , When importing, you need to specify the imported module

2. urllib.request Module related methods

urllib.request The module consists of opening in various cases HTTP Agreed URL Methods , Such as certification 、 Redirect 、cookie operation .

urllib.request The most used scenario of the module is to send a request to the server .

urllib.request The module provides the request URL Common methods are as follows :

Method effect
urllib.request.urlopen(url,data) open url,url It can be characters or request object
urllib.request.build_opener() Concatenate functions and return OpenerDirector
urllib.request.Request(url,data) take data Data sent to the server
urllib.request.HTTPBasicAuthHandler() Handle identity authentication of remote host
urllib.request.ProxyHandler(proxies) Use the specified proxy

Key points

urllib.request The common use steps of the module are as follows :

  1. The import module :import urllib. request
  2. File manager with Read the contents of the web page
import urllib.request

req = urllib.request.urlopen("https://juejin.cn/user/211521683863847/posts")

with req as f:

    print(f.read(300).decode('utf-8'))
 Copy code 

3. urllib.parse Module related methods

urllib.parse The module is used to parse URL, take URL Split the string into protocols 、 Network location 、 Path and other parts

urllib.parse Modules can also be composed of various parts URL character string , Relative url Convert to complete absolute URL Address

urllib.parse Module definition method URL Analytic and URL Transcoding two parts .

urllib.parse About parsing URL The method is as follows :

Method effect
urllib.parse.urlparse(urlstring) url It can be divided into six parts scheme://netloc/path;parameters?query#fragment
urllib.parse.parse_qs(qs) Parsing takes the form of string parameters (application/x-www-form-urlencoded) Give the query string
urllib.parse.urlnparse(parts) Returns the constructed as a tuple URL
urllib.parse.urlsplit(urlstring) analysis url, And return tuple form, which can be queried through index url Parameters
urllib.parse.urljoin(base,url) adopt base and url To form a complete url
urllib.parse.urldefrag(url) Will return a message that does not contain a fragment identifier url

urllib.parse About transcoding URL The method is as follows :

Method effect
urllib.parse.quote(string) Use %xx Escape character replacement string Special characters in
urllib.parse.urldecod(query) Convert a string to an encoded ASCII Text string

urllib.parse analysis url Field description :

attribute explain
schema URL agreement
netloc Network location
path Layered path
query Query components
fragment Fragment identifier
username user name
password password
hostname Host name ( A lowercase letter )
post Port number

Illustrate with examples

import urllib.parse

pa = urllib.parse.urlparse("https://juejin.cn/user/211521683863847/posts")

print(" agreement :",pa.scheme)
print(" Network location :",pa.netloc)
print(" Layered path :",pa.path)
 Copy code 

4. urllib.error Module related methods

urllib.error The module deals exclusively with urllib.request The scenario of the exception class thrown

urllib.error It mainly provides two HTTPError and URLError

Method effect
urllib.error.HTTPError Handle HTTP Exception thrown by error
urllib.error.URLError The exception thrown by the handler when it encounters a problem

Important note

  • HTTPError yes URLError Subclasses of , Designed to handle HTTP Protocol error exception
  • URLError yes OSError A subclass of , contain reason For the cause of the exception

5. urllib.robotparser Module related methods

urllib.robotparser It is specially used to parse robots.txt File website crawl specific URL The problem of

class effect
urllib.robotparser.RobotFileParse(url) Provides a series of reads 、 Analyze and answer url How to solve the problem
Method effect
set_url(url) Set the point to  robots.txt  Of documents URL
read() Read  robots.txt URL And input it into the parser .
parse() Parse line parameters .
can_fetch() If allowed  useragent  Be parsed according to  robots.txt  File to get  url  Then return to  True.

summary

In this issue , We are right. urllib Four modules provided in the library urllib.request、urllib.parse、urllib.error、urllib.robotparser Preliminary understanding and use of relevant methods .

The above is the content of this issue , Welcome big guys to praise and comment ღ( ´・ᴗ・` ) finger heart , See you next time ~

copyright notice
author[Little cute in the circle of friends],Please bring the original link to reprint, thank you.
https://en.pythonmana.com/2022/01/202201301534292959.html

Random recommended