current position:Home>[Python crawler] multithreaded daemon & join() blocking

[Python crawler] multithreaded daemon & join() blocking

2022-01-30 22:31:01 Dream, killer

「 This is my participation 11 The fourth of the yuegengwen challenge 2 God , Check out the activity details :2021 One last more challenge 」.

The collection
Python Reptiles are slow ? Learn about concurrent programming

The guardian thread

stay Python In a multithreaded , After the code of the main thread runs , If there are other child threads that have not been executed yet , Then the main thread will wait for the execution of the child thread before ending ; This will create a problem , If a thread is set to infinite loop , That means the whole main thread ( Python Program ) It can't end . Let's take a look at .

import threading
import time

#  Non-daemon thread 
def normal_thread():
    for i in range(10000):
        time.sleep(1)
        print(f'normal thread {i}')

print(threading.current_thread().name, ' Thread start ')
thread1 = threading.Thread(target=normal_thread)
thread1.start()
print(threading.current_thread().name, ' Thread end ')
 Copy code 

The above results can be seen , The main thread ( MainThread ) Although it's over , But the child thread is still running , When the child thread runs , The whole process is just beginning The real end . If you want to terminate the main thread while terminating other unfinished threads , You can set the thread to The guardian thread , If only the daemon thread is still executing and the main program ends , that Python The program can exit normally .threading Module provides two ways to set daemon threads .

threading.Thread(target=daemon_thread, daemon=True)

thread.setDaemon(True)

import threading
import time

#  The guardian thread ( Mandatory waiting 1s)
def daemon_thread():
    for i in range(5):
        time.sleep(1)
        print(f'daemon thread {i}')

#  Non-daemon thread ( No forced waiting )
def normal_thread():
    for i in range(5):
        print(f'normal thread {i}')

print(threading.current_thread().name, ' Thread start ')
thread1 = threading.Thread(target=daemon_thread, daemon=True)
thread2 = threading.Thread(target=normal_thread)
thread1.start()
# thread1.setDaemon(True)
thread2.start()
print(threading.current_thread().name, ' Thread end ')
 Copy code 

The above will thread1 Set to daemons , The program is in Non-daemon thread And The main thread ( MainThread ) After completion of operation , End directly , therefore daemon_thread() The output statement in did not have time to execute . The output in the figure shows MainThread Thread end Still outputting normal_thread() What's in the function , as a result of It will take some time from the end of the main thread to the forced stop of the daemon thread .

Inheritance of daemon threads

The child thread will inherit the current thread's daemon attribute , The main thread defaults to Non-daemon thread , Therefore, the new threads in the main thread are also... By default Non-daemon thread , But in The guardian thread When a new thread is created in , Will inherit the current thread daemon attribute , So is the child thread The guardian thread .

join() Blocking

In a multithreaded crawler , Generally, multiple threads crawl the information of different pages at the same time , Then, it is analyzed and processed uniformly , Statistical storage , This requires waiting for all child threads to execute , To continue the following processing , And that's where it comes in join() The method .

join() The function of the method is to block ( Hang up ) Other threads ( The thread that is not started is different from the main thread ), Wait for the called thread to run, and then wake up the operation of other threads . Look at an example .

import threading
import time

def block(second):
    print(threading.current_thread().name, ' The thread is running ')
    time.sleep(second)
    print(threading.current_thread().name, ' Thread end ')

print(threading.current_thread().name, ' The thread is running ')

thread1 = threading.Thread(target=block, name=f'thread test 1', args=[3])
thread2 = threading.Thread(target=block, name=f'thread test 2', args=[1])

thread1.start()
thread1.join()

thread2.start()

print(threading.current_thread().name, ' Thread end ')
 Copy code 

It's just for thread1 Use join() , Pay attention to join() The location of , It's in thread2.start() Performed before startup , After execution thread2 And the main thread are suspended , Only thread1 After thread execution ,thread2 And The main thread will execute , Because here thread2 Not a daemon thread , So when the main thread (MainThread) After execution ,thread2 Will continue to run .

See here , Is there a question ? If you follow the execution process of the above code , The whole program has completely become a single threaded program , That's why join() Caused by improper use position of . Let's change the above code a little .

import threading
import time

def block(second):
    print(threading.current_thread().name, ' The thread is running ')
    time.sleep(second)
    print(threading.current_thread().name, ' Thread end ')

print(threading.current_thread().name, ' The thread is running ')

thread1 = threading.Thread(target=block, name=f'thread test 1', args=[3])
thread2 = threading.Thread(target=block, name=f'thread test 2', args=[1])

thread1.start()
thread2.start()

thread1.join()
print(threading.current_thread().name, ' Thread end ')
 Copy code 

Now the program is really multithreaded , Now we use join() When the method is used , Only the main thread is suspended , When thread1 After execution , To execute the main thread .

Finally, we need to explain ,join() Method blocking is object independent , And whether to guard threads , Whether or not the main thread is irrelevant . Attention should be paid when using , To really multithread, you need to start all the sub threads , Call again join() , Otherwise it will become a single thread !



This is all the content of this article , If it feels good . Let's go with a compliment !!!


For beginners  Python  Or want to get started  Python  Little buddy , You can search through wechat 【Python New horizons 】, Exchange and study together , They all come from novices , Sometimes a simple question card takes a long time , But maybe someone else's advice will suddenly realize , I sincerely hope you can make progress together .

copyright notice
author[Dream, killer],Please bring the original link to reprint, thank you.
https://en.pythonmana.com/2022/01/202201302231000486.html

Random recommended