current position:Home>Python web crawler - crawling cloud music review (4)
Python web crawler - crawling cloud music review (4)
2022-01-31 01:21:43 【FizzH】
「 This is my participation 11 The fourth of the yuegengwen challenge 4 God , Check out the activity details :2021 One last more challenge 」
Let's review the content of yesterday's article first ,1. Position the target ;2. Download Web page ;3. Set the loading speed , Find the target file . Just open a song of Netease cloud
https://music.163.com/#/song?id=25723157
Copy code
Of every song id It's the last string of numbers . So in principle, we just need to collect the corresponding songs and enter them into the play page , Can get id Number . It's easy to make a cycle , Crawling through all the reviews of multiple songs .
def get_comments(url):
headers = {
'user-agent': 'Mozilla/5.0 (Windows NT 10.0; WOW64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/57.0.2987.98 Safari/537.36',
'referer': 'http://music.163.com/'
}
params = "EuIF/+GM1OWmp2iaIwbVdYDaqODiubPSBToe5EdNp6LHTLf+aID/dWGU6bHWXS0jD9pPa/oY67TOwiicLygJ+BhMkOX/J1tZMhq45dcUIr6fLuoHOECYrOU6ySwH4CjxxdbW3lpVmksGEdlxbZevVPkTPkwvjNLDZHK238OuNCy0Csma04SXfoVM3iLhaFBT"
encSecKey = "db26c32e0cd08a11930639deadefda2783c81034be6445ca8f4fbedd346e1f9567375083aeb1a85e6ad6d9ae4532a49752c2169db8bcc04d38a79f9bed7facea42ee23f1b33538c34f82741318d9b4b846663b53b0b808dd0499dccfbc6c61fbf180c6fb24b1c2dd3c2c450ce09917d74be9424dab836fd2e671988ffbc6ae1b"
data = {
"params": params,
"encSecKey": encSecKey
}
name_id = url.split('=')[1]
target_url = "http://music.163.com/weapi/v1/resource/comments/R_SO_4_{}?csrf_token=".format(name_id)
res = requests.post(target_url, headers=headers, data=data)
return res
Copy code
params and encSecKey Is the data the server wants , These two are encrypted content . To get the core.js file , Because of the js The files are huge , We can according to the key words encSecKey retrieval , Navigate to key code segments Again ,POST The above two parameters , It can also be used in other songs .
Comments found , after , We need to extract key data . We can see that , The data climbed to is JSON Format .JSON Is a lightweight data exchange format , It is often used in network transmission .
Use JSON.LOADS() Method can restore the string to Python Data structure of :
comments_json = json.loads(res.text)
Copy code
So in the dictionary “hotComments” The value corresponding to the key is all the wonderful comments !
def get_hot_comments(res):
comments_json = json.loads(res.text)
hot_comments = comments_json['hotComments']
with open('hot_comments.txt', 'w', encoding='utf-8') as file:
for each in hot_comments:
file.write(each['user']['nickname'] + ':\n\n')
file.write(each['content'] + '\n')
file.write("---------------------------------------\n")
Copy code
thus , Combine the previous parts , We can get the effect we want !
In addition, Netease cloud music API The interface form is as follows :music.163.com/api/v1/reso…
It should be noted that , If the reptile frequency is too fast , Too many , The server will block IP. So a more complete reptile project , Yes, you need to set up agents and IP Pooled .
in addition , Don't want to stick to cracking post Friends of form parameters , Try to use python+selenium+PhantomJs Simulate user operations in a way , Click turn page , Then directly analyze the page elements , This can do “ You can climb when you can see it ”, But the efficiency will be slightly lower .
copyright notice
author[FizzH],Please bring the original link to reprint, thank you.
https://en.pythonmana.com/2022/01/202201310121410107.html
The sidebar is recommended
- Python notes (20): built in high-order functions
- Python notes (17): closure
- Python notes (18): decorator
- Python notes (16): generators and iterators
- Python notes (XV): List derivation
- Python tells you what timing attacks are
- Python -- file and exception
- [Python from introduction to mastery] (IV) what are the built-in data types of Python? Figure out
- Python code to scan code to pay attention to official account login
- [algorithm learning] 1221 Split balanced string (Java / C / C + + / Python / go / trust)
guess what you like
-
Python notes (22): errors and exceptions
-
Python has been hidden for ten years, and once image recognition is heard all over the world
-
Python notes (21): random number module
-
Python notes (19): anonymous functions
-
Use Python and OpenCV to calculate and draw two-dimensional histogram
-
Python, Hough circle transformation in opencv
-
A library for reading and writing markdown in Python: mdutils
-
Datetime of Python time operation (Part I)
-
The most useful decorator in the python standard library
-
Python iterators and generators
Random recommended
- [Python from introduction to mastery] (V) Python's built-in data types - sequences and strings. They have no girlfriend, not a nanny, and can only be used as dry goods
- Does Python have a, = operator?
- Go through the string common sense in Python
- Fanwai 4 Handling of mouse events and solutions to common problems in Python opencv
- Summary of common functions for processing strings in Python
- When writing Python scripts, be sure to add this
- Python web crawler - Fundamentals (1)
- Pandas handles duplicate values
- Python notes (23): regular module
- Python crawlers are slow? Concurrent programming to understand it
- Parameter passing of Python function
- Stroke tuple in Python
- Talk about ordinary functions and higher-order functions in Python
- [Python data acquisition] page image crawling and saving
- [Python data collection] selenium automated test framework
- Talk about function passing and other supplements in Python
- Python programming simulation poker game
- leetcode 160. Intersection of Two Linked Lists (python)
- Python crawler actual combat, requests module, python to grab the beautiful wallpaper of a station
- Fanwai 5 Detailed description of slider in Python opencv and solutions to common problems
- My friend's stock suffered a terrible loss. When I was angry, I crawled the latest data of securities with Python
- Python interface automation testing framework -- if you want to do well, you must first sharpen its tools
- Python multi thread crawling weather website pictures and saving
- How to convert pandas data to excel file
- Python series tutorials 122
- Python Complete Guide - printing data using pyspark
- Python Complete Guide -- tuple conversion array
- Stroke the list in python (top)
- Analysis of Python requests module
- Comments and variables in Python
- New statement match, the new version of Python is finally going to introduce switch case?
- Fanwai 6 Different operations for image details in Python opencv
- Python crawler native code learning (I)
- Python quantitative data warehouse building series 2: Python operation database
- Python code reading (Part 50): taking elements from list intervals
- Pyechart + pandas made word cloud pictures of 19 report documents
- [Python crawler] multithreaded daemon & join() blocking
- Python crawls cat pictures in batches to realize thousand image imaging
- Van * Python | simple crawling of a planet
- Input and output of Python practice