current position:Home>Python request module learning

Python request module learning

2022-02-02 14:44:56 Wood sauce acridine

Reprint :https://blog.csdn.net/Byweiker/article/details/79234853 

One 、 What is? Requests

Requests yes ⽤Python language ⾔ To write , be based on urllib, Mining ⽤Apache2 Licensed Open source protocol HTTP library . it ⽐ urllib more ⽅ then , It can save us ⼤ Quantitative ⼯ do , Completely full ⾜HTTP Test requirements .

Two 、 install Requests library

Go to the command line win+R perform

command :pip install requests

Import the project :import requests

3、 ... and 、 All kinds of requests

import requests
requests.post('http://httpbin.org/post')
requests.put('http://httpbin.org/put')
requests.delete('http://httpbin.org/delete')
requests.head('http://httpbin.org/get')
requests.options('http://httpbin.org/get')
 

GET: Request the specified page information , And return the entity body .
HEAD: Request only the first part of the page .
POST: Request the server to accept the specified document as a response to the identified URI The new subordinate entity of .
PUT: The data transmitted from the client to the server replaces the content of the specified document .
DELETE: Request the server to delete the specified page .


GET and POST More common
GET Request to place the submitted data in HTTP Request agreement header
POST The submitted data is placed in the entity data

(1) Basic GET request

import requests
response = requests.get('http://httpbin.org/get')
print(response.text)


Output

{
  "args": {}, 
  "headers": {
    "Accept": "*/*", 
    "Accept-Encoding": "gzip, deflate", 
    "Connection": "close", 
    "Host": "httpbin.org", 
    "User-Agent": "python-requests/2.18.4"
  }, 
  "origin": "183.64.61.29", 
  "url": "http://httpbin.org/get"
}

(2) Parameterized GET request

import requests
response = requests.get("http://httpbin.org/get?name=germey&age=22")
print(response.text)

Output
{
  "args": {
    "age": "22", 
    "name": "germey"
  }, 
  "headers": {
    "Accept": "*/*", 
    "Accept-Encoding": "gzip, deflate", 
    "Connection": "close", 
    "Host": "httpbin.org", 
    "User-Agent": "python-requests/2.18.4"
  }, 
  "origin": "183.64.61.29", 
  "url": "http://httpbin.org/get?name=germey&age=22"
}


Or use params Methods :

import requests
 
data = {
 'name': 'germey',
 'age': 22
}


response = requests.get("http://httpbin.org/get", params=data)
print(response.text)

(3) analysis json

The return value has been json Form display of :

import requests
import json
response = requests.get("http://httpbin.org/get")
print(type(response.text))
print(response.json())
print(type(response.json()))
print(json.loads(response.text))


Return value :

<class 'str'>
{'args': {}, 'headers': {'Accept': '*/*', 'Accept-Encoding': 'gzip, deflate', 'Connection': 'close', 'Host': 'httpbin.org', 'User-Agent': 'python-requests/2.18.4'}, 'origin': '183.64.61.29', 'url': 'http://httpbin.org/get'}
<class 'dict'>
{'args': {}, 'headers': {'Accept': '*/*', 'Accept-Encoding': 'gzip, deflate', 'Connection': 'close', 'Host': 'httpbin.org', 'User-Agent': 'python-requests/2.18.4'}, 'origin': '183.64.61.29', 'url': 'http://httpbin.org/get'}

(4) Get binary data

import requests
response = requests.get("https://github.com/favicon.ico")
print(type(response.text), type(response.content))
print(response.text)
print(response.content)
The return value is binary

(5) add to headers


Some websites must be visited with browser and other information , If it doesn't come in headers You're going to report a mistake , as follows

import requests
response = requests.get("https://www.zhihu.com/explore")
print(response.text)
Return value :

<html><body><h1>500 Server Error</h1>
An internal server error occured.
</body></html>

When it comes to headers when

import requests
headers = {
 'User-Agent': 'Mozilla/5.0 (Macintosh; Intel Mac OS X 10_11_4) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/52.0.2743.116 Safari/537.36'
}
response = requests.get("https://www.zhihu.com/explore", headers=headers)
print(response.text)
The source code of the successfully returned web page will not be displayed

(6) basic POST request

import requests
data = {'name': 'germey', 'age': '22'}
response = requests.post("http://httpbin.org/post", data=data)
print(response.text)


return :

{
  "args": {}, 
  "data": "", 
  "files": {}, 
  "form": {
    "age": "22", 
    "name": "germey"
  }, 
  "headers": {
    "Accept": "*/*", 
    "Accept-Encoding": "gzip, deflate", 
    "Connection": "close", 
    "Content-Length": "18", 
    "Content-Type": "application/x-www-form-urlencoded", 
    "Host": "httpbin.org", 
    "User-Agent": "python-requests/2.18.4"
  }, 
  "json": null, 
  "origin": "183.64.61.29", 
  "url": "http://httpbin.org/post"
}

Four 、 Respond to  

response attribute

import requests
response = requests.get('http://www.jianshu.com')
print(type(response.status_code), response.status_code)
print(type(response.headers), response.headers)
print(type(response.cookies), response.cookies)
print(type(response.url), response.url)
print(type(response.history), response.history)
 

Output

<class 'int'> 200
<class 'requests.structures.CaseInsensitiveDict'> {'Date': 'Thu, 01 Feb 2018 20:47:08 GMT', 'Server': 'Tengine', 'Content-Type': 'text/html; charset=utf-8', 'Transfer-Encoding': 'chunked', 'X-Frame-Options': 'DENY', 'X-XSS-Protection': '1; mode=block', 'X-Content-Type-Options': 'nosniff', 'ETag': 'W/"9f70e869e7cce214b6e9d90f4ceaa53d"', 'Cache-Control': 'max-age=0, private, must-revalidate', 'Set-Cookie': 'locale=zh-CN; path=/', 'X-Request-Id': '366f4cba-8414-4841-bfe2-792aeb8cf302', 'X-Runtime': '0.008350', 'Content-Encoding': 'gzip', 'X-Via': '1.1 gjf22:8 (Cdn Cache Server V2.0), 1.1 PSzqstdx2ps251:10 (Cdn Cache Server V2.0)', 'Connection': 'keep-alive'}
<class 'requests.cookies.RequestsCookieJar'> <RequestsCookieJar[<Cookie locale=zh-CN for www.jianshu.com/>]>
<class 'str'> https://www.jianshu.com/
<class 'list'> [<Response [301]>]


Status code judgment : Common web page status codes :

100: ('continue',),
101: ('switching_protocols',),
102: ('processing',),
103: ('checkpoint',),
122: ('uri_too_long', 'request_uri_too_long'),
200: ('ok', 'okay', 'all_ok', 'all_okay', 'all_good', '\\o/', '*'),
201: ('created',),
202: ('accepted',),
203: ('non_authoritative_info', 'non_authoritative_information'),
204: ('no_content',),
205: ('reset_content', 'reset'),
206: ('partial_content', 'partial'),
207: ('multi_status', 'multiple_status', 'multi_stati', 'multiple_stati'),
208: ('already_reported',),
226: ('im_used',),
 
# Redirection.
300: ('multiple_choices',),
301: ('moved_permanently', 'moved', '\\o-'),
302: ('found',),
303: ('see_other', 'other'),
304: ('not_modified',),
305: ('use_proxy',),
306: ('switch_proxy',),
307: ('temporary_redirect', 'temporary_moved', 'temporary'),
308: ('permanent_redirect',
 'resume_incomplete', 'resume',), # These 2 to be removed in 3.0
 
# Client Error.
400: ('bad_request', 'bad'),
401: ('unauthorized',),
402: ('payment_required', 'payment'),
403: ('forbidden',),
404: ('not_found', '-o-'),
405: ('method_not_allowed', 'not_allowed'),
406: ('not_acceptable',),
407: ('proxy_authentication_required', 'proxy_auth', 'proxy_authentication'),
408: ('request_timeout', 'timeout'),
409: ('conflict',),
410: ('gone',),
411: ('length_required',),
412: ('precondition_failed', 'precondition'),
413: ('request_entity_too_large',),
414: ('request_uri_too_large',),
415: ('unsupported_media_type', 'unsupported_media', 'media_type'),
416: ('requested_range_not_satisfiable', 'requested_range', 'range_not_satisfiable'),
417: ('expectation_failed',),
418: ('im_a_teapot', 'teapot', 'i_am_a_teapot'),
421: ('misdirected_request',),
422: ('unprocessable_entity', 'unprocessable'),
423: ('locked',),
424: ('failed_dependency', 'dependency'),
425: ('unordered_collection', 'unordered'),
426: ('upgrade_required', 'upgrade'),
428: ('precondition_required', 'precondition'),
429: ('too_many_requests', 'too_many'),
431: ('header_fields_too_large', 'fields_too_large'),
444: ('no_response', 'none'),
449: ('retry_with', 'retry'),
450: ('blocked_by_windows_parental_controls', 'parental_controls'),
451: ('unavailable_for_legal_reasons', 'legal_reasons'),
499: ('client_closed_request',),
 
# Server Error.
500: ('internal_server_error', 'server_error', '/o\\', '*'),
501: ('not_implemented',),
502: ('bad_gateway',),
503: ('service_unavailable', 'unavailable'),
504: ('gateway_timeout',),
505: ('http_version_not_supported', 'http_version'),
506: ('variant_also_negotiates',),
507: ('insufficient_storage',),
509: ('bandwidth_limit_exceeded', 'bandwidth'),
510: ('not_extended',),
511: ('network_authentication_required', 'network_auth', 'network_authentication')


5、 ... and 、 Advanced operations

(1) Upload files

Use Requests modular , Uploading files is also so simple , The type of file will be processed automatically :

example :

import requests
files = {'file': open('cookie.txt', 'rb')}
response = requests.post("http://httpbin.org/post", files=files)
print(response.text)


This is a test done through the test website , The return value is as follows :

{
  "args": {}, 
  "data": "", 
  "files": {
    "file": "#LWP-Cookies-2.0\r\nSet-Cookie3: BAIDUID=\"D2B4E137DE67E271D87F03A8A15DC459:FG=1\"; path=\"/\"; domain=\".baidu.com\"; path_spec; domain_dot; expires=\"2086-02-13 11:15:12Z\"; version=0\r\nSet-Cookie3: BIDUPSID=D2B4E137DE67E271D87F03A8A15DC459; path=\"/\"; domain=\".baidu.com\"; path_spec; domain_dot; expires=\"2086-02-13 11:15:12Z\"; version=0\r\nSet-Cookie3: H_PS_PSSID=25641_1465_21087_17001_22159; path=\"/\"; domain=\".baidu.com\"; path_spec; domain_dot; discard; version=0\r\nSet-Cookie3: PSTM=1516953672; path=\"/\"; domain=\".baidu.com\"; path_spec; domain_dot; expires=\"2086-02-13 11:15:12Z\"; version=0\r\nSet-Cookie3: BDSVRTM=0; path=\"/\"; domain=\"www.baidu.com\"; path_spec; discard; version=0\r\nSet-Cookie3: BD_HOME=0; path=\"/\"; domain=\"www.baidu.com\"; path_spec; discard; version=0\r\n"
  }, 
  "form": {}, 
  "headers": {
    "Accept": "*/*", 
    "Accept-Encoding": "gzip, deflate", 
    "Connection": "close", 
    "Content-Length": "909", 
    "Content-Type": "multipart/form-data; boundary=84835f570cfa44da8f4a062b097cad49", 
    "Host": "httpbin.org", 
    "User-Agent": "python-requests/2.18.4"
  }, 
  "json": null, 
  "origin": "183.64.61.29", 
  "url": "http://httpbin.org/post"
}

(2) obtain cookie

When you need to cookie when , Call directly response.cookie:(response Is the return value after the request )

import requests
response = requests.get("https://www.baidu.com")
print(response.cookies)
for key, value in response.cookies.items():
print(key + '=' + value)
Output results :

<RequestsCookieJar[<Cookie BDORZ=27315 for .baidu.com/>]>
BDORZ=27315

(3) Conversation maintenance 、 Simulated landing

stay Cookie Version 0 Space specified in 、 square brackets 、 parentheses 、 be equal to 、 comma 、 Double quotes 、 Slash 、 question mark 、@, The colon , Special symbols such as semicolons cannot be used as Cookie The content of .

If a response contains some Cookie, You can access them quickly :

import requests
r = requests.get('http://www.google.com.hk/')
print(r.cookies['NID'])
print(tuple(r.cookies))


To send your cookies To the server , have access to cookies Parameters :

import requests
url = 'http://httpbin.org/cookies'
cookies = {'testCookies_1': 'Hello_Python3', 'testCookies_2': 'Hello_Requests'}
r = requests.get(url, cookies=cookies)
print(r.json())

(4) Certificate validation

because 12306 There is an error Certificate , When we test its website, the following situations will appear , The certificate is not an official certificate , The browser will recognize an error

import requests
response = requests.get('https://www.12306.cn')
print(response.status_code)
Return value : error

How to enter such a website normally , The code is as follows :

import requests
from requests.packages import urllib3
urllib3.disable_warnings()
response = requests.get('https://www.12306.cn', verify=False)
print(response.status_code)
take verify Set bit False that will do , The status code returned is 200

urllib3.disable_warnings() This command is mainly used to eliminate warning messages

(5) Agent settings

When crawling , Sometimes crawlers are blocked by the server , At this time, the main methods used are to reduce the access time , Through agency ip visit , as follows :

import requests
proxies = {
 "http": "http://127.0.0.1:9743",
 "https": "https://127.0.0.1:9743",
}
response = requests.get("https://www.taobao.com", proxies=proxies)
print(response.status_code)
ip You can grab it from the Internet , Or a treasure to buy

If the agent needs to set the account name and password , Just change the dictionary to the following :
proxies = {
"http":"http://user:[email protected]:9999"
}
If your agent is through sokces This approach requires pip install "requests[socks]"
proxies= {
"http":"socks5://127.0.0.1:9999",
"https":"sockes5://127.0.0.1:8888"
}

(6) timeout

Visiting some websites may time out , Now set timeout And then we can solve this problem

import requests
from requests.exceptions import ReadTimeout
try:
 response = requests.get("http://httpbin.org/get", timeout = 0.5)
 print(response.status_code)
except ReadTimeout:
 print('Timeout')
Normal visit , Status? Return to 200

(7) Authentication settings

If you encounter a website that needs certification, you can pass requests.auth Module implementation

import requests
from requests.auth import HTTPBasicAuth
response = requests.get("http://120.27.34.24:9001/",auth=HTTPBasicAuth("user","123"))
print(response.status_code)

import requests
response = requests.get("http://120.27.34.24:9001/",auth=("user","123"))
print(response.status_code)

(8) exception handling

Network problems ( Such as :DNS The query fails 、 Refuse to connect, etc ) when ,Requests Will throw out a ConnectionError abnormal .

Encounter rare ineffectiveness HTTP Response time ,Requests Will throw a HTTPError abnormal .

If the request times out , Then throw a Timeout abnormal .

If the request exceeds the set maximum number of redirects , Will throw a TooManyRedirects abnormal .

all Requests Explicitly thrown exceptions are inherited from requests.exceptions.RequestException .

copyright notice
author[Wood sauce acridine],Please bring the original link to reprint, thank you.
https://en.pythonmana.com/2022/02/202202021444536431.html

Random recommended