current position:Home>You need to master these before learning Python crawlers
You need to master these before learning Python crawlers
2022-02-01 03:06:27 【Internet Lao Xin】
This is my participation 11 The fourth of the yuegengwen challenge 17 God , Activity details link to view :2021 One last more challenge
Common protocols
http and https http agreement : Hypertext transfer protocol , It's a release and acceptance HTML Page method , The port is 80
https agreement :http Encrypted version of the protocol , stay HTTP Add the following ssl layer , The port is 443
The following is the official website of meituan : You can see that the port is 443
URL and RUI
Common request methods
http The protocol stipulates that in the process of data interaction between browser and server, an interaction mode must be selected stay http The agreement defines 8 In the request mode , Common is get and post request
get request : Generally, data is only obtained from the server , It doesn't have any impact on server resources .
Pay attention to when asking :
- url
- Request mode
- Request header
post request : Send data to the server ( land ), Upload files, etc , When it has an impact on server resources , Will use post request .
But some websites have anti crawler mechanism , You check the information , Is also used post request , So when we write about reptiles , Be sure to analyze the website .
Common request header parameters :
http Agreement , Send a request to the server , The data is divided into three parts :
- Put the data in url in
- The data is in body in ,(post request )
- The data is in head in
Common request header parameters :
- user-agent : Browser name
- referer: From which current request url Over here
- cookie:http Protocol is stateless , That is, a person sends two requests , The server doesn't have the ability to know if the two requests are from the same person .
Common corresponding status codes
- 200 Request OK , The server returns data normally
- 301 Permanent redirection
- 404 Requested url Could not find... On the server
- 418 Send request encountered server side anti crawler , The server rejects the data
- 500 Server internal error , Maybe there's a server bug
HTTP The corresponding process of the request
Use your browser for website analysis
The website we want to analyze is : movie.douban.com
- Elements: Used to analyze the structure of a website
The content presented on the page , stay Elements There will be corresponding elements .
- Console: The recruitment information will be printed here , Warning, etc .
- Sources
- Network : When the page is displayed , All requests generated
headers Header information
session And cookie
session Represents a session between the server and the browser session It's a server-side mechanism , It is used to store the information needed by a specific user's session , Save in memory , cache , Or in the database .
cookie cooke It is generated by the server and sent to the client ,cookie It's saved on the client side
cookie principle : 1) establish cookie 2) Set up storage cookie 3) send out cookie 4) Read cookie
summary : Study Python Reptiles , Network knowledge is essential
copyright notice
author[Internet Lao Xin],Please bring the original link to reprint, thank you.
https://en.pythonmana.com/2022/02/202202010306262893.html
The sidebar is recommended
- Python avatar animation, come and generate your own animation avatar
- leetcode 1884. Egg Drop With 2 Eggs and N Floors(python)
- leetcode 1910. Remove All Occurrences of a Substring(python)
- Python and binary
- First acquaintance with Python class
- [Python data collection] scrapy book acquisition and coding analysis
- Python crawler from introduction to mastery (IV) extracting information from web pages
- Python crawler from entry to mastery (III) implementation of simple crawler
- The apscheduler module in Python implements scheduled tasks
- 1379. Find the same node in the cloned binary tree (Java / C + + / Python)
guess what you like
-
Python connects redis, singleton and thread pool, and resolves problems encountered
-
Python from 0 to 1 (day 11) - Python data application 1
-
Python bisect module
-
Python + OpenGL realizes real-time interactive writing on blocks with B-spline curves
-
Use the properties of Python VTK implicit functions to select and cut data
-
Learn these 10000 passages and become a humorous person in the IT workplace. Python crawler lessons 8-9
-
leetcode 986. Interval List Intersections(python)
-
leetcode 1860. Incremental Memory Leak(python)
-
How to teach yourself Python? How long will it take?
-
Python Matplotlib drawing pie chart
Random recommended
- Django paging (II)
- Concurrent. For Python concurrent programming Futures or multiprocessing?
- Programmers over the age of 25 can't know a few Chinese herbal medicines. Python crawler lessons 9-9
- Python crawler from introduction to pit full series of tutorials (detailed tutorial + various practical combat)
- The second bullet of class in Python
- Python object oriented programming 03: class inheritance and its derived terms
- How IOS developers learn Python Programming 13 - function 4
- Python crawler from introduction to mastery (VI) form and crawler login
- Python crawler from entry to mastery (V) challenges of dynamic web pages
- Deeply understand pandas to read excel, TXT, CSV files and other commands
- Daily python, Chapter 18, class
- "I just want to collect some plain photos in Python for machine learning," he said. "I believe you a ghost!"
- Django view
- Python implements filtering emoticons in text
- When winter comes, python chooses a coat with temperament for mom! Otherwise, there's really no way to start!
- Python crawler - get fund change information
- Highlight actor using Python VTK
- Python crawler actual combat: crawling southern weekend news articles
- leetcode 406. Queue Reconstruction by Height(python)
- leetcode 1043. Partition Array for Maximum Sum (python)
- Python * * packaging and unpacking details
- Python realizes weather query function
- Python from 0 to 1 (day 12) - Python data application 2 (STR function)
- Python from 0 to 1 (day 13) - Python data application 3
- Numpy common operations of Python data analysis series Chapter 8
- How to implement mockserver [Python version]
- Van * Python! Write an article and publish the script on multiple platforms
- Python data analysis - file reading
- Python data De duplication and missing value processing
- Python office automation - play with browser
- Python series tutorial 127 -- Reference vs copy
- Control flow in Python: break and continue
- Teach you how to extract tables in PDF with Python
- leetcode 889. Construct Binary Tree from Preorder and Postorder Traversal(python)
- leetcode 1338. Reduce Array Size to The Half(python)
- Object oriented and exception handling in Python
- How to configure load balancing for Django service
- How to embed Python in go
- Python Matplotlib drawing graphics
- Python object-oriented programming 05: concluding summary of classes and objects