current position:Home>Python multi thread crawling weather website pictures and saving
Python multi thread crawling weather website pictures and saving
2022-01-30 20:34:57 【Xiaosheng Fanyi】
This is my participation 11 The fourth of the yuegengwen challenge 1 God , Check out the activity details :2021 One last more challenge
1.1 subject
Specify a web site , Crawl all the pictures in this website , For example, China Meteorological Network (www.weather.com.cn), Use single thread and multi thread crawling methods respectively .( Limit the number of crawling pictures to after the student number 3 position )
Output information : Will download Url Information is output on the console , And store the downloaded pictures in images Sub file , And give a screenshot .
1.2 Ideas
1.2.1 Send a request
- Construct request header
import requests,re
import urllib
headers = {
'Connection': 'keep-alive',
'Cache-Control': 'max-age=0',
'Upgrade-Insecure-Requests': '1',
'User-Agent': 'Mozilla/5.0 (Windows NT 10.0; Win64; x64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/94.0.4606.61 Safari/537.36',
'Accept': 'text/html,application/xhtml+xml,application/xml;q=0.9,image/avif,image/webp,image/apng,*/*;q=0.8,application/signed-exchange;v=b3;q=0.9',
'Accept-Language': 'zh-CN,zh;q=0.9',
}
url = "http://www.weather.com.cn/"
request = urllib.request.Request(url, headers=headers)
Copy code
- Send a request
request = urllib.request.Request(url, headers=headers)
r = urllib.request.urlopen(request)
Copy code
1.2.2 Parse web pages
Page parsing , And replace carriage return , It is convenient for subsequent regular matching pictures .
html = r.read().decode().replace('\n','')
Copy code
1.2.3 Get node
Use regular matching , Get all of it first a label
, Then crawl a label
All the pictures below
urlList = re.findall('<a href="(.*?)" ',html,re.S)
Copy code
Get all the pictures
allImageList = []
for k in urlList:
try:
request = urllib.request.Request(k, headers=headers)
r = urllib.request.urlopen(request)
html = r.read().decode().replace('\n','')
imgList = re.findall(r'<img.*?src="(.*?)"', html, re.S)
allImageList+=imgList
except Exception as e:
pass
Copy code
The request here is actually crawled by multithreading , All follow-up will be supplemented by !
1.2.4 Save the data ( Single thread )
for i, img in enumerate(allImageList[:102]):
print(f" Saving the {i + 1} A picture route :{img}")
resp = requests.get(img)
with open(f'./image/{img.split("/")[-1]}', 'wb') as f: # Save to this image Under the path
f.write(resp.content)
Copy code
1.2.4 Save the data ( Multithreading )
- Introduce multi process module
import threading
# Multithreading
def download_imgs(imgList,limit):
threads = []
T = [
threading.Thread(target = download, args=(url,i))
for i, url in enumerate(imgList[:limit + 1])
]
for t in T:
t.start()
threads.append(t)
return threads
Copy code
- Write download function
def download(img_url,name):
resp = requests.get(img_url)
try:
resp = requests.get(img_url)
with open(f'./images/{name}.jpg', 'wb') as f:
f.write(resp.content)
except Exception as e:
print(f" Download failed : {name} {img_url} -> {e}")
else:
print(f" Download complete : {name} {img_url}")
Copy code
It's random
copyright notice
author[Xiaosheng Fanyi],Please bring the original link to reprint, thank you.
https://en.pythonmana.com/2022/01/202201302034565200.html
The sidebar is recommended
- Introduction to python (IV) dynamic web page analysis and capture
- leetcode 119. Pascal's Triangle II(python)
- leetcode 31. Next Permutation(python)
- [algorithm learning] 807 Maintain the city skyline (Java / C / C + + / Python / go / trust)
- The rich woman's best friend asked me to write her a Taobao double 11 rush purchase script in Python, which can only be arranged
- Glom module of Python data analysis module (1)
- Python crawler actual combat, requests module, python realizes the full set of skin to capture the glory of the king
- Summarize some common mistakes of novices in Python development
- Python libraries you may not know
- [Python crawler] detailed explanation of selenium from introduction to actual combat [2]
guess what you like
-
This is what you should do to quickly create a list in Python
-
On the 55th day of the journey, python opencv perspective transformation front knowledge contour coordinate points
-
Python OpenCV image area contour mark, which can be used to frame various small notes
-
How to set up an asgi Django application with Postgres, nginx and uvicorn on Ubuntu 20.04
-
Initial Python tuple
-
Introduction to Python urllib module
-
Advanced Python Basics: from functions to advanced magic methods
-
Python Foundation: data structure summary
-
Python Basics: from variables to exception handling
-
Python notes (22): time module and calendar module
Random recommended
- Python notes (20): built in high-order functions
- Python notes (17): closure
- Python notes (18): decorator
- Python notes (16): generators and iterators
- Python notes (XV): List derivation
- Python tells you what timing attacks are
- Python -- file and exception
- [Python from introduction to mastery] (IV) what are the built-in data types of Python? Figure out
- Python code to scan code to pay attention to official account login
- [algorithm learning] 1221 Split balanced string (Java / C / C + + / Python / go / trust)
- Python notes (22): errors and exceptions
- Python has been hidden for ten years, and once image recognition is heard all over the world
- Python notes (21): random number module
- Python notes (19): anonymous functions
- Use Python and OpenCV to calculate and draw two-dimensional histogram
- Python, Hough circle transformation in opencv
- A library for reading and writing markdown in Python: mdutils
- Datetime of Python time operation (Part I)
- The most useful decorator in the python standard library
- Python iterators and generators
- [Python from introduction to mastery] (V) Python's built-in data types - sequences and strings. They have no girlfriend, not a nanny, and can only be used as dry goods
- Does Python have a, = operator?
- Go through the string common sense in Python
- Fanwai 4 Handling of mouse events and solutions to common problems in Python opencv
- Summary of common functions for processing strings in Python
- When writing Python scripts, be sure to add this
- Python web crawler - Fundamentals (1)
- Pandas handles duplicate values
- Python notes (23): regular module
- Python crawlers are slow? Concurrent programming to understand it
- Parameter passing of Python function
- Stroke tuple in Python
- Talk about ordinary functions and higher-order functions in Python
- [Python data acquisition] page image crawling and saving
- [Python data collection] selenium automated test framework
- Talk about function passing and other supplements in Python
- Python programming simulation poker game
- leetcode 160. Intersection of Two Linked Lists (python)
- Python crawler actual combat, requests module, python to grab the beautiful wallpaper of a station
- Fanwai 5 Detailed description of slider in Python opencv and solutions to common problems