current position:Home>Python web crawler - Fundamentals (1)

Python web crawler - Fundamentals (1)

2022-01-30 19:03:43 FizzH

「 This is my participation 11 The fourth of the yuegengwen challenge 1 God , Check out the activity details :2021 One last more challenge

Basic principles of reptiles

1. Web page request process

(1)Request, Every web page displayed in front of us must go through this step , That is, send an access request to the server . stay python Need to import requests modular :

import requests
 Copy code 

(2)Response, After the server receives the user request , Will verify the validity of the request , Then send the content of the response to the user ; The user accepts the content of the server response , Show the content , This is the familiar Web request .

2. The way of web page request

(1)GET: The most common way , It is generally used to obtain or query resource information , The parameters are set at URL in .

(2)POST: adopt request body Pass parameters , The information that can send a request is much larger than GET The way .

2.1 use GET To grab data

The following is used Get Try to grab the home page of nuggets , The code is as follows :

import requests
url = 'https://juejin.cn/'
strhtml = requests.get(url)
print(strhtml.text)
 Copy code 

The results are as follows :

image.png

The following article will add HTML Relevant knowledge .

2.2 Use POST To grab data

Continue to use the Nuggets homepage to try , Press F12 Enter developer mode , stand-alone “NETWORK” tab ,

image.png

That's ok , Rollover , There's no need to use the Nuggets home page , Try to find a translation website .

Search for nuggets on the translation website , Pictured , You can see that the request method is POST

image.png

First , take Headers Medium URL Copy it , And assign it to url, The code is as follows :

url = "https://fanyi.baidu.com/v2transapi?from=zh&to=en"
 Copy code 

POST The way to request data is different from GET,GET Can pass URL Pass parameters , and POST Parameters need to be placed in the request entity .

image.png

take FORM DATA Make a dictionary of the request parameters , Next use requests.post() Method to request form data , The code is as follows :

import requests
response = requests.post(url,data = Form_data)
 Copy code 

Convert data in string format to JSON Formatted data , And extract the data according to the data structure , Print out the translation results , The code is as follows .

import json
content = json.loads(response.text)
print(content['translateResult'][0][0]['tgt'])
 Copy code 

The acquisition and response of web pages are written here , If you have a better way , Please share !

copyright notice
author[FizzH],Please bring the original link to reprint, thank you.
https://en.pythonmana.com/2022/01/202201301903411628.html

Random recommended