current position:Home>What cat is the most popular? Python crawls the whole network of cat pictures. Which one is your favorite
What cat is the most popular? Python crawls the whole network of cat pictures. Which one is your favorite
2022-01-30 23:43:42 【White and white I】
Suck cats with code ! This article is participating in 【 Meow star essay solicitation activity 】.
Preface
Acquisition target
Web resource address :image.baidu.com/search/inde…
Tool preparation
development tool :pycharm
development environment :python3.7, Windows11
Using the toolkit :requests
Analysis of project ideas
To make a reptile case, you first need to clarify your collection target , Bai Youbai here collects all the picture information of the current web page , Sort out your coding process after you have goals , The basic four steps of a reptile :
- First step : Get the web resource address
- The second step : Send network request to address
- The third step : Extract the corresponding data information
- The methods of extracting data are generally regular 、xpath、bs4、jsonpath、css Selectors
- Step four : Save data information
First step : Find data address
There are generally two ways to load data , A static, a dynamic , The data of the current web page is constantly loaded when it is refreshed , It can be judged that the data loading mode is dynamic , Dynamic data needs to be obtained through the browser's packet capture tool , Right click to check , Or press f12 Shortcut to , Find the loaded data address
Find the corresponding data address , After clicking the pop-up interface, you can click preview , Preview the open page is the data shown to us , When there is a lot of data, use it to view , The data obtained is obtained through the website , The URL data is in the request , Send a network request to the web address
The second step : Code to send network request
There will be many toolkits to send requests , The introductory phase is more about using requests tool kit ,requests It's a third-party toolkit , Need to download :pip install requests When sending a request, you need to note that we request through code ,web The server will http Request message to distinguish whether it is a browser or a crawler , Reptiles are not welcome , The crawler code needs to disguise itself , Send the request with headers The data type transmitted is dictionary key value pair ,ua Field is a very important browser ID card
The third step : Extract the data
The currently acquired data is dynamic data , Dynamic data dynamic data is generally json data ,json Data can be obtained by jsonpath Extract directly , It can also be directly converted into a dictionary , adopt Python The ultimate goal of extraction is to extract the image url Address
After extracting the new address, you need to send a request to the web address again , What we need is picture data , Links are usually stored in data , Send a request to get the hexadecimal data corresponding to the picture
Step four : Save the data
After the data is obtained, the data is stored , Choose where you want to store your data , Select write mode , The data we get is binary data , For file access mode wb, Just write the acquired picture into the data , The suffix of the file must be the suffix at the end of the picture , You can choose to name with a title , White and white use the back part of the website to name .
Easy source sharing
import requests # Import the requested toolkit
import re # Regular matching toolkit
# Add request header
headers = {
# The user agent
"User-Agent": "Mozilla/5.0 (Windows NT 10.0; WOW64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/70.0.3538.67 Safari/537.36",
# Request data source
# "Referer": "https://tupian.baidu.com/search/index",
# "Host": "tupian.baidu.com"
}
key = input(" Please enter the picture to download :")
# The address to save the picture
path = r" picture /"
# Request data interface
for i in range(5, 50):
url = "https://image.baidu.com/search/acjson?tn=resultjson_com&logid=12114112735054631287&ipn=rj&ct=201326592&is=&fp=result&fr=&word=%E7%8C%AB%E5%92%AA&queryWord=%E7%8C%AB%E5%92%AA&cl=2&lm=-1&ie=utf-8&oe=utf-8&adpicid=&st=-1&z=&ic=&hd=&latest=©right=&s=&se=&tab=&width=&height=&face=0&istype=2&qc=&nc=1&expermode=&nojc=&isAsync=&pn=120&rn=30&gsm=78&1635836468641="
# Send a request
response = requests.get(url, headers=headers)
print(response.text)
# Regular match data
url_list = re.findall('"thumbURL":"(.*?)",', response.text)
print(url_list)
# Loop out the picture url and name
for new_url in url_list:
# Send a request to the picture again
result = requests.get(new_url).content
# Split URL to get picture name
name = new_url.split("/")[-1]
print(name)
# write file
with open(path + name, "wb")as f:
f.write(result)
Copy code
I am white and white i, A program Yuan who likes to share knowledge ️
Interested can pay attention to my official account : White and white Python【 Thank you very much for your praise 、 Collection 、 Focus on 、 Comment on , One key three links support 】
copyright notice
author[White and white I],Please bring the original link to reprint, thank you.
https://en.pythonmana.com/2022/01/202201302343408305.html
The sidebar is recommended
- Python notes (20): built in high-order functions
- Python notes (17): closure
- Python notes (18): decorator
- Python notes (16): generators and iterators
- Python notes (XV): List derivation
- Python tells you what timing attacks are
- Python -- file and exception
- [Python from introduction to mastery] (IV) what are the built-in data types of Python? Figure out
- Python code to scan code to pay attention to official account login
- [algorithm learning] 1221 Split balanced string (Java / C / C + + / Python / go / trust)
guess what you like
-
Python notes (22): errors and exceptions
-
Python has been hidden for ten years, and once image recognition is heard all over the world
-
Python notes (21): random number module
-
Python notes (19): anonymous functions
-
Use Python and OpenCV to calculate and draw two-dimensional histogram
-
Python, Hough circle transformation in opencv
-
A library for reading and writing markdown in Python: mdutils
-
Datetime of Python time operation (Part I)
-
The most useful decorator in the python standard library
-
Python iterators and generators
Random recommended
- [Python from introduction to mastery] (V) Python's built-in data types - sequences and strings. They have no girlfriend, not a nanny, and can only be used as dry goods
- Does Python have a, = operator?
- Go through the string common sense in Python
- Fanwai 4 Handling of mouse events and solutions to common problems in Python opencv
- Summary of common functions for processing strings in Python
- When writing Python scripts, be sure to add this
- Python web crawler - Fundamentals (1)
- Pandas handles duplicate values
- Python notes (23): regular module
- Python crawlers are slow? Concurrent programming to understand it
- Parameter passing of Python function
- Stroke tuple in Python
- Talk about ordinary functions and higher-order functions in Python
- [Python data acquisition] page image crawling and saving
- [Python data collection] selenium automated test framework
- Talk about function passing and other supplements in Python
- Python programming simulation poker game
- leetcode 160. Intersection of Two Linked Lists (python)
- Python crawler actual combat, requests module, python to grab the beautiful wallpaper of a station
- Fanwai 5 Detailed description of slider in Python opencv and solutions to common problems
- My friend's stock suffered a terrible loss. When I was angry, I crawled the latest data of securities with Python
- Python interface automation testing framework -- if you want to do well, you must first sharpen its tools
- Python multi thread crawling weather website pictures and saving
- How to convert pandas data to excel file
- Python series tutorials 122
- Python Complete Guide - printing data using pyspark
- Python Complete Guide -- tuple conversion array
- Stroke the list in python (top)
- Analysis of Python requests module
- Comments and variables in Python
- New statement match, the new version of Python is finally going to introduce switch case?
- Fanwai 6 Different operations for image details in Python opencv
- Python crawler native code learning (I)
- Python quantitative data warehouse building series 2: Python operation database
- Python code reading (Part 50): taking elements from list intervals
- Pyechart + pandas made word cloud pictures of 19 report documents
- [Python crawler] multithreaded daemon & join() blocking
- Python crawls cat pictures in batches to realize thousand image imaging
- Van * Python | simple crawling of a planet
- Input and output of Python practice