current position：Home>Python practical skills task segmentation
Python practical skills task segmentation
2022-01-30 00:47:15 【cxapython】
Little knowledge , Great challenge ！ This article is participating in “ A programmer must have a little knowledge ” Creative activities .
Let's talk today ,Python Task segmentation in . Take reptiles for example , From a deposit url Of txt In file , Read its contents , We'll get one url list . We put this one url A list is called a big task .
Regardless of memory usage , Let's make a segmentation of the big task above . For example, when we cut a large task into small tasks, we can only access... At most per second 5 individual URL.
import os import time CURRENT_DIR = os.path.dirname(os.path.abspath(__file__)) def read_file(): file_path = os.path.join(CURRENT_DIR, "url_list.txt") with open(file_path, "r", encoding="utf-8") as fs: result = [i.strip() for i in fs.readlines()] return result def fetch(url): print(url) def run(): max_count = 5 url_list = read_file() for index in range(0, len(url_list), max_count): start = time.time() fetch(url_list[index:index + max_count]) end = time.time() - start if end < 1: time.sleep(1 - end) if __name__ == '__main__': run() Copy code
The key code is for In circulation , First, let's make a statement range The third parameter of , This parameter specifies that the iteration step is 5, So every time index All increase with 5 Cardinal number , namely 0,5,10... And then we went to url_list Make a slice , Take five elements at a time , These five elements will follow index The increase of is constantly changing , If there are less than five in the end , According to the characteristics of the slice, you can take as many as you have at this time , It will not cause the problem of index superscript .
With url The addition of the list , We will find that the memory consumption is also increasing . At this time, we need to modify the code , We know that the generator saves more memory space , After modification, the code becomes , The following is like this .
# -*- coding: utf-8 -*- # @ Time : 2019-11-23 23:47 # @ author : Chen Xiangan # @ file name : g.py # @ official account : Python Learning to develop import os import time from itertools import islice CURRENT_DIR = os.path.dirname(os.path.abspath(__file__)) def read_file(): file_path = os.path.join(CURRENT_DIR, "url_list.txt") with open(file_path, "r", encoding="utf-8") as fs: for i in fs: yield i.strip() def fetch(url): print(url) def run(): max_count = 5 url_gen = read_file() while True: url_list = list(islice(url_gen, 0, max_count)) if not url_list: break start = time.time() fetch(url_list) end = time.time() - start if end < 1: time.sleep(1 - end) if __name__ == '__main__': run() Copy code
First , We changed the way the file is read , Put the original reading list form , Change to the form of generator . In this way, we save a lot of memory when calling the file reading method .
Then it's up there for Cycle transformation , Because of the characteristics of the generator , It's not suitable to use for To iterate , Because every iteration consumes the elements of the generator , By using itertools Of islice Yes url_gen Segmentation ,islice Is the slice of the generator , Here, each time we cut out the containing 5 A generator of elements , Because the generator does not __len__ So , Let's turn it into a list , Then judge whether the list is empty , You can know whether the iteration should end .
Modified code , Both performance and memory saving are greatly improved . Reading tens of millions of files is not a problem . besides , When using asynchronous crawlers , You may use asynchronous generator slices . Let's discuss with you , The problem of asynchronous generator segmentation
Asynchronous generator segmentation
First, let's look at a simple asynchronous generator . We know that calling the following code will get a generator
def foo(): for i in range(20): yield i Copy code
If in def Add one in front async, Then it is an asynchronous generator when it is called . The complete example code is as follows
import asyncio async def foo(): for i in range(20): yield i async def run(): async_gen = foo() async for i in async_gen: print(i) if __name__ == '__main__': asyncio.run(run()) Copy code
About async for The segmentation of is a little complicated , It is recommended to use aiostream modular , After use, the code is changed to the following
import asyncio from aiostream import stream async def foo(): for i in range(22): yield i async def run(): index = 0 limit = 5 while True: xs = stream.iterate(foo()) ys = xs[index:index + limit] t = await stream.list(ys) if not t: break print(t) index += limit if __name__ == '__main__': asyncio.run(run()) Copy code
The original content comes from my Zhihu column :zhuanlan.zhihu.com/p/93413442
author[cxapython],Please bring the original link to reprint, thank you.
The sidebar is recommended
- Install tensorflow and python 3.6 in Windows 7
- Python collects and monitors system data -- psutil
- Getting started with Python - object oriented - special methods
- Teach you how to use Python to transform an alien invasion game
- You can easily get started with Excel. Python data analysis package pandas (VI): sorting
- Implementation of top-level design pattern in Python
- Using linear systems in python with scipy.linalg
- How to get started quickly? How to learn Python
- Modifying Python environment with Mac OS security
- Better use atom to support jupyter based Python development
guess what you like
Better use atom to support jupyter based Python development
Fast power modulus Python implementation of large numbers
Python architects recommend the book "Python programmer's Guide" which must be read by self-study Python architects. You are welcome to take it away
Decoding the verification code of Taobao slider with Python + selenium, the road of information security
Python game development, pyGame module, python implementation of skiing games
Python collects and monitors system data -- psutil
Python + selenium automated test: page object mode
You can easily get started with Excel. Python data analysis package pandas (IV): any grouping score bar
Opencv skills | saving pictures in common formats as transparent background pictures (with Python source code) - teach you to easily make logo
Python ThreadPoolExecutor restrictions_ work_ Queue size
- Python generates and deploys verification codes with one click (Django)
- With "Python" advanced, you can catch all the advanced syntax! Advanced function + file operation, do not look at regret Series ~
- At the beginning of "Python", you must see the series. 10000 words are only for you. It is recommended to like the collection ~
- [Python kaggle] pandas basic exercises in machine learning series (6)
- Using linear systems in python with scipy.linalg
- The founder of pandas teaches you how to use Python for data analysis (mind mapping)
- Using Python to realize national second-hand housing data capture + map display
- Python image processing, automatic generation of GIF dynamic pictures
- Pandas advanced tutorial: time processing
- How to make Python run faster? Six tips!
- Django: use of elastic search search system
- Python 3.10 official release
- Python chat room (Tkinter writing interface, streaming, socket to realize private chat, group chat, check chat records, Mysql to store data)
- This pandas exercise must be successfully won
- [algorithm learning] sword finger offer 64 Find 1 + 2 +... + n (Java / C / C + + / Python / go / trust)
- leetcode 58. Length of Last Word（python）
- Problems encountered in writing the HTML content of articles into the database during the development of Django blog
- Understand Python's built-in function and add a print function yourself
- Python implements JS encryption algorithm in thousands of music websites
- leetcode 35. Search Insert Position（python）
- leetcode 1829. Maximum XOR for Each Query（python）
- [introduction to Python visualization]: 12 small examples of complete data visualization, taking you to play with visualization ~
- Learning this Python library can reduce at least 100 lines of code
- leetcode 67. Add Binary（python）
- Regular re parameter replacement of Python 3 interface automation test framework
- V. pandas based on Python
- Only 15 lines of code is needed for face detection! (using Python and openCV)
- [Python crawler Sao operation] you can crawl Sirius cinema movies without paying
- leetcode 69. Sqrt(x)（python）
- Teach you to read the source code of Cpython (I)
- Snowball learning started in the fourth quarter of Python. One needs three meals. I have a new understanding of Python functional programming, process-oriented, object-oriented and functional
- leetcode 88. Merge Sorted Array（python）
- Don't you know more about a python library before the end of 2021?
- Python crawler web page parsing artifact XPath quick start teaching!!!
- Use Python and OpenCV to watermark the image
- String and related methods of Python data type introduction
- Heapq module of Python module
- Introduction to beautiful soup of Python crawler weapon, detailed explanation, actual combat summary!!!
- Event loop of Python collaboration series
- Django docking pin login system