current position:Home>Concurrent. For Python concurrent programming Futures or multiprocessing?

Concurrent. For Python concurrent programming Futures or multiprocessing?

2022-01-31 19:12:46 somenzz

Concurrent programming is just needed , Multithreading , coroutines , Three methods of multi process can realize concurrent or parallel programming .

Multithreading , Coroutines are concurrent operations , Multiprocessing is a parallel operation , So do you know what concurrency is , What is parallel ?

The difference between concurrency and parallelism

To borrow the user's answer :

You eat half of your meal , Here's the phone , You don't pick up until you've finished eating , This means that you don't support concurrency or parallelism .

You eat half of your meal , Here's the phone , You stop and answer the phone , After that, continue to eat , This means that you support concurrency . You eat half of your meal , Here's the phone , You're eating while you're on the phone , This shows that you support parallel . The key to concurrency is your ability to handle multiple tasks , Not necessarily at the same time . The key to parallelism is your ability to handle multiple tasks at the same time .

Multithreading : stay Python in , Because of the global lock (GIL) The existence of , Concurrency is the use of multiple threads in turn CPU, Only one thread is working at the same time , The operating system will switch at the right time , Because the thread switching speed is very fast , It feels like multiple tasks are running . stay I/O In intensive task scenarios , After thread switch ,I/O The operation is still in progress , Threads 1 It's going on I/O In operation , Threads 2 You can get CPU Resources to calculate , Although it increases the switching cost , But it improves efficiency .

coroutines : Coroutines are lightweight threads , A single thread , It can perform concurrent tasks , The reason is that the coprocessor gives the programmer the right to switch , It is up to the programmer to decide which links to switch . Coroutines can handle tens of thousands of concurrency , Multithreading is not possible , Because the switching cost is too high , Will run out of computer resources , You can search C10K problem .

Multi process : parallel , Real time multiple tasks at the same time . If you want to use multicore , Choose multiple processes .

Python There is only one coprocess standard library , namely asyncio, Multi threading is supported , There are two standard libraries for multiple processes :Concurrent.futures and Multiprocessing. This article will share the differences between the two . Let's look at the basic usage first .

Multiprocessing

Multiprocessing That is, the cable pool , It's also a process pool , A simple way to use it is as follows :

Thread pool :

from multiprocessing.dummy import Pool as ThreadPool
with ThreadPool(processes=100) as executor:
    executor.map(func, iterable)
 Copy code 

The process of pool :

from multiprocessing import Pool as ProcessPool
with ProcessPool(processes=10) as executor:
    executor.map(func, iterable)
 Copy code 

Concurrent.futures

Thread pool :

from concurrent.futures import ThreadPoolExecutor
with ThreadPoolExecutor(max_workers=5) as executor:
     executor.map(function, iterable)
 Copy code 

The process of pool :

from concurrent.futures import ProcessPoolExecutor
with ProcessPoolExecutor(max_workers=5) as executor:
     executor.map(function, iterable)
 Copy code 

Do you think they as like as two peas? , So why should the government provide such two standard libraries ?

Difference between the two

Actually , The essential difference is not big , Some are just slightly different in the way they are called .

First of all multiprocessing, After that concurrent.futures, The latter is to reduce the difficulty of writing code , The learning cost of the latter is lower .

In speed , There is no saying that who is fast or who is slow . How much acceleration do you get ( If there is ) It depends on the hardware , Operating system details , It depends in particular on how much interprocess communication is required for a particular task . Backstage , All processes depend on the same OS The original language , Use the advanced of these primitives API Not at all j Main factors affecting speed . Let's share the detailed usage of the two .

About concurrent.futures

The official said concurrent.futures Modules are higher level interfaces , Mainly because its concurrent and parallel code is simpler . The module provides the following objects and functions :

  • Duration object :concurrent.futures.Future
  • Module function :concurrent.futures.wait
  • Actuator object :concurrent.futures.{Executor,ThreadPoolExecutor,ProcessPoolExecutor}

such as ,Futures Medium Executor class , When we execute executor.submit(func) when , It will arrange the inside func() Function execution , And return to the created future example , So that you can query later .

Here we will introduce some common functions .Futures The method in done(), Indicates whether the corresponding operation is completed ——True Express completion ,False Indicates that it is not completed . however , it is to be noted that ,done() yes non-blocking Of , The result will be returned immediately . Corresponding add_done_callback(fn), said Futures After completion , Corresponding parameter function fn, Is notified and the call is executed .

Futures There is also an important function in result(), It means when future After completion , Returns its corresponding result or exception . and as_completed(fs), For a given future iterator fs, After its completion , Returns the iterator after completion .

It's official ThreadPoolExecutor Example :

import concurrent.futures
import urllib.request

URLS = ['http://www.foxnews.com/',
        'http://www.cnn.com/',
        'http://europe.wsj.com/',
        'http://www.bbc.co.uk/',
        'http://some-made-up-domain.com/']

# Retrieve a single page and report the URL and contents
def load_url(url, timeout):
    with urllib.request.urlopen(url, timeout=timeout) as conn:
        return conn.read()

# We can use a with statement to ensure threads are cleaned up promptly
with concurrent.futures.ThreadPoolExecutor(max_workers=5) as executor:
    # Start the load operations and mark each future with its URL
    future_to_url = {executor.submit(load_url, url, 60): url for url in URLS}
    for future in concurrent.futures.as_completed(future_to_url):
        url = future_to_url[future]
        try:
            data = future.result()
        except Exception as exc:
            print('%r generated an exception: %s' % (url, exc))
        else:
            print('%r page is %d bytes' % (url, len(data)))
 Copy code 

Please note that :

ProcessPoolExecutor yes Executor Subclasses of , It uses process pools to implement asynchronous calls , Use multiprocessing avoid Global Interpreter Lock But it also means , Functions as processes can only process and return serializable objects , __main__ Modules must be able to be imported by child processes , It means ProcessPoolExecutor Cannot work in interactive interpreter .

About multiprocessing

multiprocessing Is a package used to generate processes , Have and threading Module similarity API. multiprocessing The package provides both local and remote concurrency , Using child processes instead of threads , Effectively avoid Global Interpreter Lock The impact . therefore ,multiprocessing Modules allow programmers to make full use of multiple cores on the machine . Can run on Unix and Windows .

multiprocessing The module also introduces the threading Not in the module API. A major example is Pool object , It provides a quick way , Give function parallelization the ability to process a series of input values , Input data can be assigned to different processes for processing ( Data parallelism ). The following example demonstrates a common way to define such functions in a module , So that the subprocess can successfully import the module . This basic example of data parallelism uses the Pool ,

from multiprocessing import Pool

def f(x):
    return x*x

if __name__ == '__main__':
    with Pool(5) as p:
        print(p.map(f, [1, 2, 3]))
 Copy code 

Conclusion

therefore , Simple concurrent application , Please use concurrent.futures, More complicated , Do it yourself , Please use multiprocessing Well . Beginners learn directly concurrent.futures.

Reference documents :

docs.python.org/zh-cn/3/lib…

docs.python.org/zh-cn/3/lib…

( End )

copyright notice
author[somenzz],Please bring the original link to reprint, thank you.
https://en.pythonmana.com/2022/01/202201311912441634.html

Random recommended