current position:Home>Tiger sniffing 24-hour praise device, a case with a crawler skill, python crawler lesson 7-9
Tiger sniffing 24-hour praise device, a case with a crawler skill, python crawler lesson 7-9
2022-01-31 15:56:50 【Dream eraser】
「 This is my participation 11 The fourth of the yuegengwen challenge 16 God , Check out the activity details :2021 One last more challenge 」
A lot of platforms have a little like it , The idea offered today can be applied to many platforms , Hope to master this skill , Implement your own favorite . The goal of this case is to smell the tiger 24 The hour channel likes .
Analysis before crawling
Analyze hot data sources
The case will take some time in the analysis stage , There is a little difficulty in the process , After understanding , You will learn a very common crawler writing technique .
Drag the browser scroll bar to the bottom of the page , Capture request :https://moment-api.huxiu.com/web-v2/moment/feed
The request data and return data of this content are shown in the figure below .
The request method is POST
.
The request data is three values , The last two should be fixed values , first last_dateline
You need to find the source of parameter value to solve this case .
There are many nested levels of returned data , It looks complicated , But the data format is JSON Format , Normal parsing is enough .
The above code has another key content , Is in the returned data , In fact, it provides parameters last_dateline
Value , Then you can keep looking through an endless loop , Until the data acquisition fails, the loop ends .
Here we are , Data acquisition and analysis completed , The conclusions are as follows :
- The request method is POST;
- Request address is moment-api.huxiu.com/web-v2/mome…
- The request parameter is
last_dateline
,platform
,is_ai
; - The first request can be fixed first
last_dateline
Value ; - Get all through the loop hot spot information .
Analyze the like interface
Click the like button to grab the link as https://moment-api.huxiu.com/web/moment/agree
, This link is the like request address .
The request method and address are as follows , Still for POST
Request mode :
The requested data is shown in the figure below :
Parameters in moment_id
It should be the hot spots obtained above monent_id
, The specific location is shown in the figure :
The format of the data returned after the request is successful is shown in the figure below .
Here we are , Like interface analysis is complete , The conclusions are as follows :
- The request method is POST;
- Request address is moment-api.huxiu.com/web/moment/…
- The request parameter is
monent_id
,platform
; - When testing code, you can first make a request ,
monent_id
The value of can be fixed by one value .
Crawler writing time
requests post Ask for an introduction
Let's go to the specific writing process of the crawler , First of all, understand requests
How the library initiates POST
Ask for it. .
POST
Request in requests
There are several uses in , They are as follows .
ordinary post Most used Realization way , Simply pass a dictionary to data Parameters can be .
import requests
# This website is specially used for testing
url = 'http://httpbin.org/post'
data = {'key1':'value1','key2':'value2'}
r =requests.post(url,data)
print(r)
print(r.text)
print(r.content)
Copy code
Note that there are... In the returned data <Response [200]>
It means success .
data Parameter into a tuple list
If multiple elements in the form to be submitted use the same key When , This is especially effective .
import requests
# This website is specially used for testing
url = 'http://httpbin.org/post'
payload = (('key1', 'value1'), ('key1', 'value2'))
r = requests.post('http://httpbin.org/post', data=payload)
print(r.text)
Copy code
Pass a JSON character string
The data sent is not encoded as a form , You need to pass a string , Not a dictionary .
import requests
import json
# This website is specially used for testing
url_json = 'http://httpbin.org/post'
# dumps: take python The object is decoded as json data
data_json = json.dumps({'key1': 'value1', 'key2': 'value2'})
r_json = requests.post(url_json, data_json)
print(r_json.text)
Copy code
Deliver documents
This content is a little beyond the outline of the crawler class , Temporary neglect .
be aware ,POST and GET The difference is data
The difference between this parameter , Now you can actually code .
Get the data to be liked
First of all get One page Data to be liked , The code is as follows :
def get_feed():
# Splice data dictionary
data = {
"last_dateline": "1605591900",
"platform": "www",
"is_ai": 0
}
r = requests.post(
"https://moment-api.huxiu.com/web-v2/moment/feed", data=data)
data = r.json()
# data["success"]
# get data , Note that there are a lot of nesting levels , Take your time
datalist = data["data"]["moment_list"]["datalist"][0]["datalist"]
print(datalist)
Copy code
In the experiment, it is found that the first acquisition can last_dateline
It can also be left blank . The code part of the subsequent cycle is left to you , The reptile class is coming to an end , The rest you need to expand yourself Python Basic grammar .
Like code writing
I encountered a little problem in the process of writing the praise code , The issue involves cookies
This old and difficult problem .
import requests
headers = {
"user-agent": "Mozilla/5.0 (Windows NT 10.0; Win64; x64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/85.0.4183.121 Safari/537.36"
}
def agree(moment_id):
data = {
"moment_id": moment_id,
"platform": "www",
}
r = requests.post(
"https://moment-api.huxiu.com/web/moment/agree", data=data, headers=headers)
res_data = r.json()
print(res_data)
# print(res_data["success"] == "true")
if __name__ == "__main__":
agree(130725)
Copy code
The above code was found during operation , The result of the request is :
{'success': False, 'message': ' Please turn on COOKIE'}
Copy code
Check the request captured by the developer tool again , Did find a unique cookie
, As follows :
One huxiu_analyzer_wcy_id
Parameters , It seems that the problem is here . The acquisition of this value becomes more troublesome , We need to find a way to capture .
The eraser in this place is solved according to the following steps , I hope you can learn this type of problem or find common cookie The way to .
Open Google browser traceless mode , Input https://www.huxiu.com/moment/
Jump directly , The reason for turning on traceless mode is to ensure that cookie Not cached .
Next, search directly in the developer tool huxiu_analyzer_wcy_id
Parameters .
Get the following :
Here's what I found cookie Where it's set up , Find out what to do cookie By checklogin
This interface responds to and sets .
The next solution is very rough , to glance at checklogin
How is it called , And then ask for , Save its response cookie value , Then go to the request like interface .
Finally modify the liking code , To complete the task :
def agree(moment_id):
s = requests.Session()
r = s.get("https://www-api.huxiu.com/v1/user/checklogin", headers=headers)
jar = r.cookies
print(jar.items())
data = {
"moment_id": moment_id,
"platform": "www",
}
r = s.post(
"https://moment-api.huxiu.com/web/moment/agree", data=data, headers=headers, cookies=jar)
res_data = r.json()
print(res_data)
# print(res_data["success"] == "true")
Copy code
Run the above code to get the correct praise success description .
{'success': True, 'message': ' I like it '}
Copy code
At the end of the sentence
This blog is mainly used to introduce requests
Library post
Request mode , By the way, I wrote for you cookie General access to , Here's a reminder that in the process of writing crawler , Search in developer tools is often used , Especially in solving JS When there is an encryption problem .
copyright notice
author[Dream eraser],Please bring the original link to reprint, thank you.
https://en.pythonmana.com/2022/01/202201311556460821.html
The sidebar is recommended
- Django (make an epidemic data report)
- Daily python, Part 8 - if statement
- Django model class 1
- The same Python code draws many different cherry trees. Which one do you like?
- Python code reading (Chapter 54): Fibonacci sequence
- Django model class 2
- Python crawler Basics
- Mapping 3D model surface distances using Python VTK
- How to implement encrypted message signature and verification in Python -- HMAC
- leetcode 1945. Sum of Digits of String After Convert(python)
guess what you like
-
leetcode 2062. Count Vowel Substrings of a String(python)
-
Analysis of Matplotlib module of Python visualization
-
Django permission management
-
Python integrated programming -- visual hot search list and new epidemic situation map
-
[Python data collection] scripy realizes picture download
-
Python interface automation test framework (basic part) -- loop statement of process control for & while
-
Daily python, Chapter 9, while loop
-
Van * Python | save the crawled data with docx and PDF
-
Five life saving Python tips
-
Django frequency control
Random recommended
- Python - convert Matplotlib image to numpy Array or PIL Image
- Python and Java crawl personal blog information and export it to excel
- Using class decorators in Python
- Untested Python code is not far from crashing
- Python efficient derivation (8)
- Python requests Library
- leetcode 2047. Number of Valid Words in a Sentence(python)
- leetcode 2027. Minimum Moves to Convert String(python)
- How IOS developers learn Python Programming 5 - data types 2
- leetcode 1971. Find if Path Exists in Graph(python)
- leetcode 1984. Minimum Difference Between Highest and Lowest of K Scores(python)
- Python interface automation test framework (basic) -- basic syntax
- Detailed explanation of Python derivation
- Python reptile lesson 2-9 Chinese monster database. It is found that there is a classification of color (he) desire (Xie) monsters during operation
- A brief note on the method of creating Python virtual environment in Intranet Environment
- [worth collecting] for Python beginners, sort out the common errors of beginners + Python Mini applet! (code attached)
- [Python souvenir book] two people in one room have three meals and four seasons: 'how many years is it only XX years away from a hundred years of good marriage' ~?? Just come in and have a look.
- The unknown side of Python functions
- Python based interface automation test project, complete actual project, with source code sharing
- A python artifact handles automatic chart color matching
- Python crawls the map of Gaode and the weather conditions of each city
- leetcode 1275. Find Winner on a Tic Tac Toe Game(python)
- leetcode 2016. Maximum Difference Between Increasing Elements(python)
- Run through Python date and time processing (Part 2)
- Application of urllib package in Python
- Django API Version (II)
- Python utility module playsound
- Database addition, deletion, modification and query of Python Sqlalchemy basic operation
- Tiobe November programming language ranking: Python surpasses C language to become the first! PHP is about to fall out of the top ten?
- Learn how to use opencv and python to realize face recognition!
- Using OpenCV and python to identify credit card numbers
- Principle of Python Apriori algorithm (11)
- Python AI steals your voice in 5 seconds
- A glance at Python's file processing (Part 1)
- Python cloud cat
- Python crawler actual combat, pyecharts module, python data analysis tells you which goods are popular on free fish~
- Using pandas to implement SQL group_ concat
- How IOS developers learn Python Programming 8 - set type 3
- windows10+apache2. 4 + Django deployment
- Django parser