current position:Home>Introduction to Python urllib module
Introduction to Python urllib module
2022-01-30 15:34:32 【Little cute in the circle of friends】
Little knowledge , Great challenge ! This article is participating in “ A programmer must have a little knowledge ” Creative activities .
This article has participated in 「 Digging force Star Program 」 , Win a creative gift bag , Challenge creation incentive fund .
Preface
We were in the last issue Computer network knowledge The basis of computing network has been discussed in, such as TCP/IP Four layer protocol model 、URL Resource locator 、HTTP/HTTPS Understand and learn the agreement, etc .
as everyone knows ,Python It's a high-level language , Not only support GUI Programming , You can also do network programming .
among ,Python The built-in library provides for operation URL Function of urllib modular , Request library for the server requests modular
therefore , We use Python Crawlers use the most commonly used modules urllib Module knowledge learning ,Let's go~
1. urllib The module overview
️urllib yes python Built in HTTP Request Library , It can be used without installation .
urllib Very powerful , Provide support for many functions as follows
- Web request
- Response get
- agent 、cookie Set up
- exception handling
- URL analysis
urllib The offer includes 4 A module
-
request: It's fundamental http Request module , Used to simulate sending requests
-
error: Exception handling module , If an error occurs, you can catch these exceptions
-
parse: A tool module , Provides a lot URL processing method , Such as : Split 、 analysis 、 Merger, etc
-
robotparser: Mainly used to identify the website robots.txt file , Then determine which websites can climb
Important note
- Python2 in , You can import urllib
- Python3 in , When importing, you need to specify the imported module
2. urllib.request Module related methods
urllib.request The module consists of opening in various cases HTTP Agreed URL Methods , Such as certification 、 Redirect 、cookie operation .
urllib.request The most used scenario of the module is to send a request to the server .
urllib.request The module provides the request URL Common methods are as follows :
Method | effect |
---|---|
urllib.request.urlopen(url,data) | open url,url It can be characters or request object |
urllib.request.build_opener() | Concatenate functions and return OpenerDirector |
urllib.request.Request(url,data) | take data Data sent to the server |
urllib.request.HTTPBasicAuthHandler() | Handle identity authentication of remote host |
urllib.request.ProxyHandler(proxies) | Use the specified proxy |
Key points
urllib.request The common use steps of the module are as follows :
- The import module :import urllib. request
- File manager with Read the contents of the web page
import urllib.request
req = urllib.request.urlopen("https://juejin.cn/user/211521683863847/posts")
with req as f:
print(f.read(300).decode('utf-8'))
Copy code
3. urllib.parse Module related methods
urllib.parse The module is used to parse URL, take URL Split the string into protocols 、 Network location 、 Path and other parts
urllib.parse Modules can also be composed of various parts URL character string , Relative url Convert to complete absolute URL Address
urllib.parse Module definition method URL Analytic and URL Transcoding two parts .
urllib.parse About parsing URL The method is as follows :
Method | effect |
---|---|
urllib.parse.urlparse(urlstring) | url It can be divided into six parts scheme://netloc/path;parameters?query#fragment |
urllib.parse.parse_qs(qs) | Parsing takes the form of string parameters (application/x-www-form-urlencoded) Give the query string |
urllib.parse.urlnparse(parts) | Returns the constructed as a tuple URL |
urllib.parse.urlsplit(urlstring) | analysis url, And return tuple form, which can be queried through index url Parameters |
urllib.parse.urljoin(base,url) | adopt base and url To form a complete url |
urllib.parse.urldefrag(url) | Will return a message that does not contain a fragment identifier url |
urllib.parse About transcoding URL The method is as follows :
Method | effect |
---|---|
urllib.parse.quote(string) | Use %xx Escape character replacement string Special characters in |
urllib.parse.urldecod(query) | Convert a string to an encoded ASCII Text string |
urllib.parse analysis url Field description :
attribute | explain |
---|---|
schema | URL agreement |
netloc | Network location |
path | Layered path |
query | Query components |
fragment | Fragment identifier |
username | user name |
password | password |
hostname | Host name ( A lowercase letter ) |
post | Port number |
Illustrate with examples
import urllib.parse
pa = urllib.parse.urlparse("https://juejin.cn/user/211521683863847/posts")
print(" agreement :",pa.scheme)
print(" Network location :",pa.netloc)
print(" Layered path :",pa.path)
Copy code
4. urllib.error Module related methods
urllib.error The module deals exclusively with urllib.request The scenario of the exception class thrown
urllib.error It mainly provides two HTTPError and URLError
Method | effect |
---|---|
urllib.error.HTTPError | Handle HTTP Exception thrown by error |
urllib.error.URLError | The exception thrown by the handler when it encounters a problem |
Important note
- HTTPError yes URLError Subclasses of , Designed to handle HTTP Protocol error exception
- URLError yes OSError A subclass of , contain reason For the cause of the exception
5. urllib.robotparser Module related methods
urllib.robotparser It is specially used to parse robots.txt File website crawl specific URL The problem of
class | effect |
---|---|
urllib.robotparser.RobotFileParse(url) | Provides a series of reads 、 Analyze and answer url How to solve the problem |
Method | effect |
---|---|
set_url(url) | Set the point to robots.txt Of documents URL |
read() | Read robots.txt URL And input it into the parser . |
parse() | Parse line parameters . |
can_fetch() | If allowed useragent Be parsed according to robots.txt File to get url Then return to True . |
summary
In this issue , We are right. urllib Four modules provided in the library urllib.request、urllib.parse、urllib.error、urllib.robotparser Preliminary understanding and use of relevant methods .
The above is the content of this issue , Welcome big guys to praise and comment ღ( ´・ᴗ・` ) finger heart , See you next time ~
copyright notice
author[Little cute in the circle of friends],Please bring the original link to reprint, thank you.
https://en.pythonmana.com/2022/01/202201301534292959.html
The sidebar is recommended
- Python code reading (Part 44): find the location of qualified elements
- Elegant implementation of Django model field encryption
- 40 Python entry applet
- Pandas comprehensive application
- Chapter 2: Fundamentals of python-3 character string
- Python pyplot draws a parallel histogram, and the x-axis value is displayed in the center of the two histograms
- [Python crawler] detailed explanation of selenium from introduction to actual combat [1]
- Curl to Python self use version
- Python visualization - 3D drawing solutions pyecharts, Matplotlib, openpyxl
- Use python, opencv's meanshift and CAMSHIFT algorithms to find and track objects in video
guess what you like
-
Using python, opencv obtains and changes pixels, modifies image channels, and trims ROI
-
[Python data collection] university ranking data collection
-
[Python data collection] stock information collection
-
Python game development, pyGame module, python takes you to realize a magic tower game from scratch (2)
-
Python solves the problem of suspending execution after clicking the mouse in CMD window (fast editing mode is prohibited)
-
[Python from introduction to mastery] (II) how to run Python? What are the good development tools (pycharm)
-
Python type hints from introduction to practice
-
Python notes (IX): basic operation of dictionary
-
Python notes (8): basic operations of collections
-
Python notes (VII): definition and use of tuples
Random recommended
- Python notes (6): definition and use of lists
- Python notes (V): string operation
- Python notes (IV): use of functions and modules
- Python notes (3): conditional statements and circular statements
- Python notes (II): lexical structure
- Notes on python (I): getting to know Python
- [Python data structure series] - tree and binary tree - basic knowledge - knowledge point explanation + code implementation
- [Python daily homework] Day7: how to combine two dictionaries in an expression?
- How to implement a custom list or dictionary in Python
- 15 advanced Python tips for experienced programmers
- Python string method tutorial - how to use the find() and replace() functions on Python strings
- Python computer network basics
- Python crawler series: crawling global airport information
- Python crawler series: crawling global port information
- How to calculate unique values using pandas groupby
- Application of built-in distribution of Monte Carlo simulation SciPy with Python
- Gradient lifting method and its implementation in Python
- Pandas: how to group and calculate by index
- Can you create an empty pandas data frame and fill it in?
- Python basic exercises teaching! can't? (practice makes perfect)
- Exploratory data analysis (EDA) in Python using SQL and Seaborn (SNS).
- Turn audio into shareable video with Python and ffmpeg
- Using rbind in python (equivalent to R)
- Pandas: how to create an empty data frame with column names
- Talk about quantifying investment using Python
- Python, image restoration in opencv - CV2 inpaint
- Python notes (14): advanced technologies such as object-oriented programming
- Python notes (13): operations such as object-oriented programming
- Python notes (12): inheritance such as object-oriented programming
- Chapter 2: Fundamentals of python-5 Boolean
- Python notes (11): encapsulation such as object-oriented programming
- Python notes (10): concepts such as object-oriented programming
- Gradient lifting method and its implementation in Python
- Van * Python | simple crawling of a site course
- Chapter 1 preliminary knowledge of pandas (list derivation and conditional assignment, anonymous function and map method, zip object and enumerate method, NP basis)
- Nanny tutorial! Build VIM into an IDE (Python)
- Fourier transform of Python OpenCV image processing, lesson 52
- Introduction to python (III) network request and analysis
- China Merchants Bank credit card number recognition project (Part I), python OpenCV image processing journey, Part 53
- Python practice - capture 58 rental information and store it in MySQL database