current position:Home>You used to know Python advanced

You used to know Python advanced

2022-01-29 22:46:05 i aog !

Processes and threads

Now? , Multicore CPU It's very popular , however , Even the past single core CPU, You can also perform multiple tasks . because CPU The execution code is executed in sequence , that , Single core CPU How to perform multitasking ?

The answer is that the operating system alternates tasks , Mission 1 perform 0.01 second , Switch to task 2, Mission 2 perform 0.01 second , Switch to task again 3, perform 0.01 second …… This is repeated . On the surface , Each task is performed alternately , however , because CPU The execution speed of is too fast , We feel like all tasks are being performed at the same time .

For the operating system , A task is a process (Process), For example, opening a browser is to start a browser process , Open a notepad and start a notepad process , Opening two notebooks starts two Notepad processes , Open one Word It starts a Word process .

Some processes do more than one thing at a time , such as Word, It can type at the same time 、 Spelling check 、 Printing and so on . Within a process , Do many things at the same time , You need to run multiple “ The subtasks ”, We put these in the process “ The subtasks ” Called thread (Thread).

Because every process has at least one thing to do , therefore , A process has at least one thread . Of course , image Word This complex process can have multiple threads , Multiple threads can execute at the same time , Multithreading is the same as multiprocessing , It is also the operating system that switches between multiple threads quickly , Let each thread run alternately for a short time , It looks like it's being executed at the same time . Of course , Really multithreading at the same time requires multiple cores CPU It's possible .

What if we want to perform multiple tasks at the same time ?

There are two solutions :

One is to start multiple processes , Although each process has only one thread , But multiple processes can perform multiple tasks together .

Another way is to start a process , Start multiple threads in one process , such , Multiple threads can also perform multiple tasks together .

Of course, there's a third way , Is to start multiple processes , Each process starts multiple threads , In this way, there are more tasks to be performed at the same time , Of course, this model is more complex , In practice, it is rarely used .

So in summary , The implementation of multitasking has 3 Ways of planting :

  • Multi process mode ;
  • Multithreading mode ;
  • Multi process + Multithreading mode .

Executing multiple tasks at the same time is usually not unrelated to each other , It requires mutual communication and coordination , Sometimes , Mission 1 The waiting task must be suspended 2 You can't continue until you're done , Sometimes , Mission 3 And tasks 4 You can't do it at the same time , therefore , The complexity of multiprocess and multithreaded programs is much higher than that of single process and single thread programs we wrote earlier .

Multi process

Let's first understand the relevant knowledge of the operating system .

Unix/Linux The operating system provides a fork() system call , It's very special . Ordinary function call , Call once , Go back to , however fork() Call once , Go back twice , Because the operating system automatically puts the current process ( Called the parent process ) Made a copy of ( It's called a subprocess ), then , Return... In the parent and child processes, respectively .

Subprocesses always return 0, And the parent process returns the ID. The reason for this is , A parent process can fork There are many sub processes , therefore , The parent process should write down the ID, The subprocess only needs to call getppid() You can get the parent process ID.

Python Of os Modules encapsulate common system calls , Among them is fork, Can be in Python It's easy to create subprocesses in the program :

import os

print('Process (%s) start...' % os.getpid())
# Only works on Unix/Linux/Mac:
pid = os.fork()
if pid == 0:
    print('I am child process (%s) and my parent is %s.' % (os.getpid(), os.getppid()))
else:
    print('I (%s) just created a child process (%s).' % (os.getpid(), pid))
 Copy code 

The operation results are as follows :

Process (876) start...
I (876) just created a child process (877).
I am child process (877) and my parent is 876.
 Copy code 

because Windows No, fork call , The above code is in Windows Can't run on .

With fork call , When a process receives a new task, it can copy a sub process to handle the new task , common Apache The server is monitored by the parent process , Whenever there's a new one http When asked , Just fork To process a new http request .

multiprocessing

because Python It's cross platform , Naturally, it should also provide a cross platform multi process support .multiprocessing Module is a multi process module of cross platform version .

multiprocessing The module provides one Process Class to represent a process object , The following example demonstrates starting a subprocess and waiting for it to finish :

from multiprocessing import Process
import os

#  Code to be executed by subprocesses 
def run_proc(name):
    print('Run child process %s (%s)...' % (name, os.getpid()))

if __name__=='__main__':
    print('Parent process %s.' % os.getpid())
    p = Process(target=run_proc, args=('test',))
    print('Child process will start.')
    p.start()
    p.join()
    print('Child process end.')
 Copy code 

The results are as follows :

Parent process 928.
Child process will start.
Run child process test (929)...
Process end.
 Copy code 

When creating a subprocess , You just need to pass in a function to execute it and its parameters , Create a Process example , use start() Method start up , This creates a process ratio of fork() And simple .

join() Method can wait for the subprocess to finish before continuing to run , Usually used for synchronization between processes .

Pool

If you want to start a large number of subprocesses , You can create subprocesses in batch in the way of process pool :

from multiprocessing import Pool
import os, time, random

def long_time_task(name):
    print('Run task %s (%s)...' % (name, os.getpid()))
    start = time.time()
    time.sleep(random.random() * 3)
    end = time.time()
    print('Task %s runs %0.2f seconds.' % (name, (end - start)))

if __name__=='__main__':
    print('Parent process %s.' % os.getpid())
    p = Pool(4)
    for i in range(5):
        p.apply_async(long_time_task, args=(i,))
    print('Waiting for all subprocesses done...')
    p.close()
    p.join()
    print('All subprocesses done.')
 Copy code 

The results are as follows :

Parent process 669.
Waiting for all subprocesses done...
Run task 0 (671)...
Run task 1 (672)...
Run task 2 (673)...
Run task 3 (674)...
Task 2 runs 0.14 seconds.
Run task 4 (673)...
Task 1 runs 0.27 seconds.
Task 3 runs 0.86 seconds.
Task 0 runs 1.41 seconds.
Task 4 runs 1.91 seconds.
All subprocesses done.
 Copy code 

Yes Pool Object call join() Method waits for all subprocesses to finish executing , call join() You have to call it first close(), call close() After that, we can't continue to add new Process 了 .

Please pay attention to the output ,task 0,1,2,3 It's immediate , and task 4 To wait for someone in front task Only after completion , This is because Pool The default size on my computer is 4, therefore , At most at the same time 4 A process . This is a Pool Restrictions on intentional design , It's not a limitation of the operating system . If change to :

p = Pool(5)
 Copy code 

You can run at the same time 5 A process .

because Pool The default size is CPU The number of nuclear , If you unfortunately have 8 nucleus CPU, You have to submit at least 9 The subprocess can see the above waiting effect .

Subprocesses

A lot of times , Subprocesses are not themselves , It's an external process . After we create the subprocess , You also need to control the input and output of subprocesses .

subprocess Modules make it very easy to start a subprocess , Then control its input and output .

The following example shows how to do this in Python Run the command in the code nslookup www.python.org, This is the same effect as running the command line directly :

import subprocess

print('$ nslookup www.python.org')
r = subprocess.call(['nslookup', 'www.python.org'])
print('Exit code:', r)
 Copy code 

Running results :

$ nslookup www.python.org
Server:		192.168.19.4
Address:	192.168.19.4#53

Non-authoritative answer:
www.python.org	canonical name = python.map.fastly.net.
Name:	python.map.fastly.net
Address: 199.27.79.223

Exit code: 0
 Copy code 

If the subprocess still needs input , You can use the communicate() Methods the input :

import subprocess

print('$ nslookup')
p = subprocess.Popen(['nslookup'], stdin=subprocess.PIPE, stdout=subprocess.PIPE, stderr=subprocess.PIPE)
output, err = p.communicate(b'set q=mx\npython.org\nexit\n')
print(output.decode('utf-8'))
print('Exit code:', p.returncode)
 Copy code 

The above code is equivalent to executing commands on the command line nslookup, Then manually input :

set q=mx
python.org
exit
 Copy code 

The operation results are as follows :

$ nslookup
Server:		192.168.19.4
Address:	192.168.19.4#53

Non-authoritative answer:
python.org	mail exchanger = 50 mail.python.org.

Authoritative answers can be found from:
mail.python.org	internet address = 82.94.164.166
mail.python.org	has AAAA address 2001:888:2000:d::a6


Exit code: 0
 Copy code 

Interprocess communication

Process There must be communication between , The operating system provides many mechanisms for interprocess communication .Python Of multiprocessing Modules wrap the underlying mechanism , Provides QueuePipes And so on .

from multiprocessing import Process, Queue
import os, time, random

#  Write code executed by data process :
def write(q):
    print('Process to write: %s' % os.getpid())
    for value in ['A', 'B', 'C']:
        print('Put %s to queue...' % value)
        q.put(value)
        time.sleep(random.random())

#  Read code executed by data process :
def read(q):
    print('Process to read: %s' % os.getpid())
    while True:
        value = q.get(True)
        print('Get %s from queue.' % value)

if __name__=='__main__':
    #  Parent process creation Queue, And passed to all subprocesses :
    q = Queue()
    pw = Process(target=write, args=(q,))
    pr = Process(target=read, args=(q,))
    #  Start subprocess pw, write in :
    pw.start()
    #  Start subprocess pr, Read :
    pr.start()
    #  wait for pw end :
    pw.join()
    # pr There's a dead cycle in the process , Can't wait for it to end , Forced termination only :
    pr.terminate()
 Copy code 

The operation results are as follows :

Process to write: 50563
Put A to queue...
Process to read: 50564
Get A from queue.
Put B to queue...
Get B from queue.
Put C to queue...
Get C from queue.
 Copy code 

Multithreading

Multitasking can be done by multiple processes , It can also be done by multiple threads in a process .

Python The standard library provides two modules :_thread and threading,_thread It's a low-level module ,threading It's an advanced module , Yes _thread It was packaged . In most cases , We just need to use threading This advanced module .

To start a thread is to pass in a function and create Thread example , And then call start() Start execution :

import time, threading

#  Code executed by the new thread :
def loop():
    print('thread %s is running...' % threading.current_thread().name)
    n = 0
    while n < 5:
        n = n + 1
        print('thread %s >>> %s' % (threading.current_thread().name, n))
        time.sleep(1)
    print('thread %s ended.' % threading.current_thread().name)

print('thread %s is running...' % threading.current_thread().name)
t = threading.Thread(target=loop, name='LoopThread')
t.start()
t.join()
print('thread %s ended.' % threading.current_thread().name)
 Copy code 

The results are as follows :

thread MainThread is running...
thread LoopThread is running...
thread LoopThread >>> 1
thread LoopThread >>> 2
thread LoopThread >>> 3
thread LoopThread >>> 4
thread LoopThread >>> 5
thread LoopThread ended.
thread MainThread ended.
 Copy code 

Since any process will start a thread by default , We call this thread the main thread , The main thread can start a new thread ,Python Of threading The module has a current_thread() function , It always returns an instance of the current thread . The name of the main thread instance is MainThread, The name of the child thread is specified at creation time , We use it LoopThread Name the child thread . The name is only used to display , Nothing else at all , If you can't name it Python Automatically name the thread Thread-1,Thread-2……

Lock

The biggest difference between multithreading and multiprocessing is , Multi process , The same variable , Each has a copy in each process , They don't influence each other , And in multithreading , All variables are shared by all threads , therefore , Any variable can be modified by any thread , therefore , The biggest danger of sharing data among threads is that multiple threads change a variable at the same time , Change the content .

Let's see how multiple threads operate on a variable at the same time to change the content :

import threading

#  Suppose it's your bank account :
balance = 0

def change_it(n):
    #  Save before you pick up , The result should be 0:
    global balance
    balance = balance + n
    balance = balance - n

def run_thread(n):
    for i in range(2000000):
        change_it(n)

t1 = threading.Thread(target=run_thread, args=(5,))
t2 = threading.Thread(target=run_thread, args=(8,))
t1.start()
t2.start()
t1.join()
t2.join()
print(balance)
 Copy code 

We defined a shared variable balance, The initial value is 0, And start two threads , Save before you pick up , In theory, the result should be 0, however , Because the scheduling of threads is determined by the operating system , When t1、t2 In alternation , As long as there are enough cycles ,balance The result is not necessarily 0 了 .

Because a statement in a high-level language is CPU When executed, there are several statements , Even a simple calculation :

balance = balance + n
 Copy code 

There are also two steps :

  1. Calculation balance + n, Store in temporary variable ;
  2. Assign the value of a temporary variable to balance.

That is to say, it can be seen as :

x = balance + n
balance = x
 Copy code 

Two threads save and retrieve at the same time , It may lead to the wrong balance , You certainly don't want your bank account to go negative somehow , therefore , We have to make sure that a thread is modifying balance When , Other threads must not be changed .

If we want to make sure that balance The calculation is correct , Will give change_it() Last lock , When a thread starts executing change_it() when , We said , The thread got the lock , So other threads can't execute at the same time change_it(), Can only wait for , Until the lock is released , You can't change it until you get the lock . Because there is only one lock , No matter how many threads , At most one thread holds the lock at the same time , therefore , There will be no conflict of amendment . Creating a lock is done by threading.Lock() To achieve :

balance = 0
lock = threading.Lock()

def run_thread(n):
    for i in range(100000):
        #  First get the lock :
        lock.acquire()
        try:
            #  Don't worry about it :
            change_it(n)
        finally:
            #  Make sure to release the lock after modification :
            lock.release()
 Copy code 

When multiple threads execute at the same time lock.acquire() when , Only one thread can successfully acquire the lock , Then continue to execute the code , Other threads continue to wait until the lock is obtained .

When the thread that obtains the lock runs out, it must release the lock , Otherwise, the threads that are waiting for the lock will wait forever , Become a dead thread . So we use try...finally To make sure that the lock will be released .

Multicore CPU

Multicore should be able to execute multiple threads at the same time , If you write a dead cycle , What will happen ?

Want to put N nucleus CPU The core is all over , You have to start N A dead loop thread .

import threading, multiprocessing

def loop():
    x = 0
    while True:
        x = x ^ 1

for i in range(multiprocessing.cpu_count()):
    t = threading.Thread(target=loop)
    t.start()
 Copy code 

Start with CPU The same number of cores N Threads , stay 4 nucleus CPU We can monitor CPU The occupancy rate is only 102%, That is to say, only one core is used .

But with C、C++ or Java To rewrite the same dead cycle , You can run all the cores directly ,4 The core runs to 400%,8 The core runs to 800%, Why? Python No way ?

because Python Although the thread is the real thread , But when the interpreter executes the code , There is one GIL lock :Global Interpreter Lock, whatever Python Before thread execution , You have to get GIL lock , then , Every execution 100 Bytecode , The interpreter will automatically release GIL lock , Give other threads a chance to execute . This GIL The global lock actually locks the execution code of all threads , therefore , Multithreading in Python Can only be executed alternately , Even if 100 Threads are running in 100 nucleus CPU On , You can only use 1 A nuclear .

Python Although we can't use multithreading to implement multi-core tasks , But multi-core tasks can be implemented through multiple processes . Multiple Python Processes have their own GIL lock , They don't influence each other .

ThreadLocal

In multithreaded environment , Each thread has its own data . It is better for a thread to use its own local variables than global variables , Because local variables can only be seen by the thread itself , Does not affect other threads , The global variables must be locked .

But local variables are also problematic , When the function is called , It's very troublesome to pass it on :

def process_student(name):
    std = Student(name)
    # std It's a local variable , But every function uses it , So it must be passed in :
    do_task_1(std)
    do_task_2(std)

def do_task_1(std):
    do_subtask_1(std)
    do_subtask_2(std)

def do_task_2(std):
    do_subtask_2(std)
    do_subtask_2(std)
 Copy code 

If you use a global dict Store all Student object , And then to thread Act on your own key Get the corresponding thread Student How about the object ?

global_dict = {}

def std_thread(name):
    std = Student(name)
    #  hold std Put it in the global variable global_dict in :
    global_dict[threading.current_thread()] = std
    do_task_1()
    do_task_2()

def do_task_1():
    #  Don't pass in std, Instead, it looks for... Based on the current thread :
    std = global_dict[threading.current_thread()]
    ...

def do_task_2():
    #  Any function can find the of the current thread std Variable :
    std = global_dict[threading.current_thread()]
    ...
 Copy code 

This approach is theoretically feasible , Its biggest advantage is that it eliminates std The transfer of objects in each layer of functions , however , Each function gets std The code is a little ugly .

Is there a simpler way ?

ThreadLocal emerge as the times require , Don't look for dict,ThreadLocal Help you do it automatically :

import threading
    
#  Create a global ThreadLocal object :
local_school = threading.local()

def process_student():
    #  Get the... Associated with the current thread student:
    std = local_school.student
    print('Hello, %s (in %s)' % (std, threading.current_thread().name))

def process_thread(name):
    #  binding ThreadLocal Of student:
    local_school.student = name
    process_student()

t1 = threading.Thread(target= process_thread, args=('Alice',), name='Thread-A')
t2 = threading.Thread(target= process_thread, args=('Bob',), name='Thread-B')
t1.start()
t2.start()
t1.join()
t2.join()
 Copy code 

Execution results :

Hello, Alice (in Thread-A)
Hello, Bob (in Thread-B)
 Copy code 

Global variables local_school It's just one. ThreadLocal object , Every Thread You can read and write about it student attribute , But they don't affect each other . You can take local_school As a global variable , But each attribute is like local_school.student It's all thread local variables , Can read and write without interfering with each other , There's no need to manage the lock problem ,ThreadLocal It will be dealt with internally .

ThreadLocal The most common place is to bind a database connection for each thread ,HTTP request , User identity information, etc , All the processing functions called by such a thread can easily access these resources .

process vs. Threads

To achieve multitasking , Usually we design Master-Worker Pattern ,Master Responsible for assigning tasks ,Worker Be responsible for carrying out tasks , therefore , In a multitasking environment , It's usually a Master, Multiple Worker.

If it's implemented in multiple processes Master-Worker, The main process is Master, Other processes are Worker.

If you use multithreading Master-Worker, The main thread is Master, Other threads are Worker.

The biggest advantage of multi process mode is its high stability , Because a child process crashed , The main process and other child processes are not affected .( Of course, if the main process is suspended, all processes will be suspended , however Master The process is only responsible for assigning tasks , The probability of hanging up is low ) The famous Apache The first is to adopt the multi process mode .

The disadvantage of multiprocess mode is that it costs a lot to create a process , stay Unix/Linux Under the system , use fork The call is OK , stay Windows The cost of creating process is huge . in addition , The number of processes that the operating system can run at the same time is limited , In memory and CPU Under the limitation of , If there are thousands of processes running at the same time , The operating system even has scheduling problems .

Multithreaded mode is usually a little faster than multi-process , But it's not going anywhere fast , and , The fatal drawback of multithreaded mode is that any thread that fails can cause the entire process to crash , Because all threads share the process's memory . stay Windows On , If there's something wrong with the code that a thread is executing , You can often see such hints :“ The program performed an illegal operation , About to close ”, In fact, it's often a thread that has a problem , But the operating system forces the entire process to end .

stay Windows Next , Multithreading is more efficient than multiprocessing , So Microsoft's IIS The server adopts multithreading mode by default . Due to the stability of multithreading ,IIS It's not as stable as Apache. To alleviate the problem ,IIS and Apache Now there are many processes + Mixed mode of multithreading , The more complicated the problem is .

Computationally intensive vs. IO intensive

The second consideration for multitasking is the type of task . We can divide tasks into compute intensive and IO intensive .

Computing intensive tasks are characterized by a large number of calculations , Consume CPU resources , For example, calculate the pi 、 High definition video decoding and so on , All depend on CPU Computing power of . This kind of computing intensive task can also be completed by multitasking , But the more tasks , The more time you spend switching tasks ,CPU The less efficient the task is , therefore , Make the most of CPU, The number of simultaneous computing intensive tasks should be equal to CPU The number of core .

Compute intensive tasks due to major consumption CPU resources , therefore , Code running efficiency is very important .Python This kind of script language is very inefficient , Completely unsuitable for compute intensive tasks . For computing intensive tasks , It is best to C Language writing .

The second type of task is IO intensive , Network involved 、 disk IO The tasks of are IO Intensive task , This kind of task is characterized by CPU Consume little , Most of the time the task is waiting IO Operation is completed ( because IO The speed is much lower than CPU And the speed of memory ). about IO Intensive task , The more tasks ,CPU The more efficient , But there is also a limit . Most common tasks are IO Intensive task , such as Web application .

IO During the execution of intensive tasks ,99% All the time is spent on IO On , Flowers in CPU There is little time on , therefore , Use the very fast C Replace language with Python This is a very slow script language , It can't improve the operation efficiency at all . about IO Intensive task , The most appropriate language is the most efficient development ( The least amount of code ) Language , Scripting language is the first choice ,C The language is the worst .

Distributed processes

stay Thread and Process in , It should be preferred that Process, because Process A more stable , and ,Process It can be distributed to multiple machines , and Thread At most, it can only be distributed to multiple of the same machine CPU On .

Python Of multiprocessing Module not only supports multi process , among managers Sub modules also support the distribution of multiple processes to multiple machines . A service process can act as a dispatcher , Distribute tasks to multiple other processes , Rely on network communication . because managers The module package is very good , You don't need to know the details of network communication , It's easy to write distributed multiprocessing programs .

for instance : If we already have one through Queue The multiprocess program of communication runs on the same machine , Now? , Because the process of processing tasks is heavy , We hope to distribute the processes of sending tasks and processing tasks to two machines . How to use distributed process to realize ?

The original Queue Can continue to use , however , adopt managers Module handle Queue Exposed through the Internet , It can be accessed by the processes of other machines Queue 了 .

Let's first look at the service process , The service process is responsible for starting Queue, hold Queue Sign up to the Internet , Then go to Queue Write tasks in it :

# task_master.py

import random, time, queue
from multiprocessing.managers import BaseManager

#  Send task queue :
task_queue = queue.Queue()
#  Queue to receive results :
result_queue = queue.Queue()

#  from BaseManager inherited QueueManager:
class QueueManager(BaseManager):
    pass

#  Take two. Queue All registered on the Internet , callable Parameters are associated with Queue object :
QueueManager.register('get_task_queue', callable=lambda: task_queue)
QueueManager.register('get_result_queue', callable=lambda: result_queue)
#  Binding port 5000,  Set the verification code 'abc':
manager = QueueManager(address=('', 5000), authkey=b'abc')
#  start-up Queue:
manager.start()
#  Get access through the network Queue object :
task = manager.get_task_queue()
result = manager.get_result_queue()
#  Put some tasks in :
for i in range(10):
    n = random.randint(0, 10000)
    print('Put task %d...' % n)
    task.put(n)
#  from result Queue read results :
print('Try get results...')
for i in range(10):
    r = result.get(timeout=10)
    print('Result: %s' % r)
#  close :
manager.shutdown()
print('master exit.')
 Copy code 

Please note that , When we write multiprocessing programs on one machine , Created Queue You can use it directly , however , In a distributed multiprocessing environment , Add tasks to Queue You can't go straight to the original task_queue To operate , That would bypass QueueManager Encapsulation , Must pass manager.get_task_queue() To obtain the Queue Interface to add .

then , Start the task process on another machine ( It can also be started on this machine ):

# task_worker.py

import time, sys, queue
from multiprocessing.managers import BaseManager

#  Create a similar QueueManager:
class QueueManager(BaseManager):
    pass

#  Because of this QueueManager Only from the Internet Queue, So when you register, you only provide your name :
QueueManager.register('get_task_queue')
QueueManager.register('get_result_queue')

#  Connect to the server , Which is running task_master.py Machine :
server_addr = '127.0.0.1'
print('Connect to server %s...' % server_addr)
#  Keep the port and verification code in line with task_master.py The settings are exactly the same :
m = QueueManager(address=(server_addr, 5000), authkey=b'abc')
#  Connect from the network :
m.connect()
#  obtain Queue The object of :
task = m.get_task_queue()
result = m.get_result_queue()
#  from task Queue fetch task , And write the results in result queue :
for i in range(10):
    try:
        n = task.get(timeout=1)
        print('run task %d * %d...' % (n, n))
        r = '%d * %d = %d' % (n, n, n*n)
        time.sleep(1)
        result.put(r)
    except Queue.Empty:
        print('task queue is empty.')
#  End of processing :
print('worker exit.')
 Copy code 

The task process should connect to the service process through the network , So specify the... Of the service process IP.

Now? , You can try the effect of distributed process . Start... First task_master.py Service process :

$ python3 task_master.py 
Put task 3411...
Put task 1605...
Put task 1398...
Put task 4729...
Put task 5300...
Put task 7471...
Put task 68...
Put task 4219...
Put task 339...
Put task 7866...
Try get results...
 Copy code 

task_master.py After the process sends the task , Start the waiting result Results of the queue . Now start task_worker.py process :

$ python3 task_worker.py
Connect to server 127.0.0.1...
run task 3411 * 3411...
run task 1605 * 1605...
run task 1398 * 1398...
run task 4729 * 4729...
run task 5300 * 5300...
run task 7471 * 7471...
run task 68 * 68...
run task 4219 * 4219...
run task 339 * 339...
run task 7866 * 7866...
worker exit.
 Copy code 

task_worker.py End of process , stay task_master.py The process will continue to print out the results :

Result: 3411 * 3411 = 11634921
Result: 1605 * 1605 = 2576025
Result: 1398 * 1398 = 1954404
Result: 4729 * 4729 = 22363441
Result: 5300 * 5300 = 28090000
Result: 7471 * 7471 = 55815841
Result: 68 * 68 = 4624
Result: 4219 * 4219 = 17799961
Result: 339 * 339 = 114921
Result: 7866 * 7866 = 61873956
 Copy code 

This simple Master/Worker What's the use of models ? In fact, this is a simple but real distributed computing , Change the code a little bit , Start multiple worker, The task can be distributed to several or even dozens of machines , For example, calculate n*n Send email instead of sending email , The asynchronous sending of mail queue is realized .

Queue Where the object is stored ? be aware task_worker.py There is no creation at all in Queue Code for , therefore ,Queue Objects stored in task_master.py In progress :

│task_master.py                           │  │  │task_worker.py                        │
│                                         │     │                                      │
│  task = manager.get_task_queue()        │  │  │  task = manager.get_task_queue()     │
│  result = manager.get_result_queue()    │     │  result = manager.get_result_queue() │
│              │                          │  │  │              │                       │
│              │                          │     │              │                       │
│              ▼                          │  │  │              │                       │
│  ┌─────────────────────────────────┐    │     │              │                       │
│  │QueueManager                     │    │  │  │              │                       │
│  │ ┌────────────┐ ┌──────────────┐ │    │     │              │                       │
│  │ │ task_queue │ │ result_queue │ │<───┼──┼──┼──────────────┘                       │
│  │ └────────────┘ └──────────────┘ │    │     │                                      │
│  └─────────────────────────────────┘    │  │  │                                      │
└─────────────────────────────────────────┘     └──────────────────────────────────────┘
                                             │

                                          Network
 Copy code 

and Queue The reason why it can be accessed through the Internet , It is through QueueManager Realized . because QueueManager There's more than one management Queue, therefore , For each Queue Give me a name for the network call interface , such as get_task_queue.

authkey What's the usage? ? This is to ensure normal communication between the two machines , No malicious interference from other machines . If task_worker.py Of authkey and task_master.py Of authkey atypism , It's not going to work .

Regular expressions

Regular expressions are a powerful weapon for matching strings . Its design idea is to define a rule for strings in a descriptive language , Any string that matches the rules , We think it's “ matching ” 了 , otherwise , The string is illegal .

So we judge whether a string is legal Email Approach is to :

  1. Create a match Email Regular expression of ;
  2. Use the regular expression to match the user's input to determine whether it is legal .

Because regular expressions are also represented by strings , therefore , Let's first learn how to use characters to describe characters .

In regular expressions , If the characters are given directly , It's the exact match . use \d Can match a number ,\w Can match a letter or number , therefore :

  • '00\d' Can match '007', But it doesn't match '00A';
  • '\d\d\d' Can match '010';
  • '\w\w\d' Can match 'py3';

. Can match any character , therefore :

  • 'py.' Can match 'pyc''pyo''py!' wait .

To match a longer character , In regular expressions , use * Represents any character ( Include 0 individual ), use + Represents at least one character , use ? Express 0 Or 1 Characters , use {n} Express n Characters , use {n,m} Express n-m Characters :

Let's take a complex example :\d{3}\s+\d{3,8}.

Let's read it from left to right :

  1. \d{3} Represents a match 3 A digital , for example '010';
  2. \s Can match a space ( Also include Tab Equal space character ), therefore \s+ Indicates that there is at least one space , For example, match ' ',' ' etc. ;
  3. \d{3,8} Express 3-8 A digital , for example '1234567'.

combined , The regular expression above can match the phone number with area code separated by any space .

If you want to match '010-12345' Such a number ? because '-' Is a special character , In regular expressions , Use '' escape , therefore , The rule above is \d{3}-\d{3,8}.

however , Still can't match '010 - 12345', Because with spaces . So we need a more complex match .

Advanced

To make a more accurate match , It can be used [] Scope of representation , such as :

  • [0-9a-zA-Z_] Can match a number 、 Letters or underscores ;
  • [0-9a-zA-Z_]+ Can be matched by at least one number 、 A string of letters or underscores , such as 'a100','0_Z','Py3000' wait ;
  • [a-zA-Z_][0-9a-zA-Z_]* It can be matched to start with a letter or underscore , Followed by any number by 、 A string of letters or underscores , That is to say Python Legal variables ;
  • [a-zA-Z_][0-9a-zA-Z_]{0, 19} A more precise limit on the length of a variable is 1-20 Characters ( front 1 Characters + The most behind 19 Characters ).

A|B Can match A or B, therefore (P|p)ython Can match 'Python' perhaps 'python'.

^ Indicates the beginning of a line ,^\d Indicates that it must start with a number .

$ Indicates the end of the line ,\d$ Indicates that it must end with a number .

You may have noticed ,py It can also match 'python', But add ^py$ It becomes a whole line match , It can only match 'py' 了 .

re modular

With the knowledge of preparation , We can do that Python Regular expressions are used in .Python Provide re modular , Contains all the functions of regular expressions . because Python The string itself also uses `` escape , So pay special attention to :

s = 'ABC\-001' # Python String 
#  The corresponding regular expression string becomes :
# 'ABC-001'
 Copy code 

So we strongly recommend using Python Of r Prefix , You don't have to think about escaping :

s = r'ABC-001' # Python String 
#  The corresponding regular expression string does not change :
# 'ABC-001'
 Copy code 

Let's see how to determine if the regular expression matches :

>>> import re
>>> re.match(r'^\d{3}-\d{3,8}$', '010-12345')
<_sre.SRE_Match object; span=(0, 9), match='010-12345'>
>>> re.match(r'^\d{3}-\d{3,8}$', '010 12345')
>>>
 Copy code 

match() Method to determine whether it matches , If the match is successful , Return to one Match object , Otherwise return to None. The common way to judge is :

test = ' String entered by the user '
if re.match(r' Regular expressions ', test):
    print('ok')
else:
    print('failed')
 Copy code 

Cut strings

It is more flexible to use regular expressions to segment strings than fixed characters , Look at the normal sharding code :

>>> 'a b   c'.split(' ')
['a', 'b', '', '', 'c']
 Copy code 

Um. , Continuous spaces are not recognized , Try regular expressions :

>>> re.split(r'\s+', 'a b   c')
['a', 'b', 'c']
 Copy code 

No matter how many spaces can be divided normally . Join in , try :

>>> re.split(r'[\s,]+', 'a,b, c  d')
['a', 'b', 'c', 'd']
 Copy code 

Then add ; try :

>>> re.split(r'[\s,;]+', 'a,b;; c  d')
['a', 'b', 'c', 'd']
 Copy code 

If the user enters a set of tags , Next time, remember to use regular expressions to convert irregular input into correct array .

grouping

In addition to simply deciding whether to match , Regular expressions also have the powerful function of extracting substrings . use () Represents the group to be extracted (Group). such as :

^(\d{3})-(\d{3,8})$ Two groups are defined , Area code and local number can be extracted directly from the matching string :

>>> m = re.match(r'^(\d{3})-(\d{3,8})$', '010-12345')
>>> m
<_sre.SRE_Match object; span=(0, 9), match='010-12345'>
>>> m.group(0)
'010-12345'
>>> m.group(1)
'010'
>>> m.group(2)
'12345'
 Copy code 

If a group is defined in a regular expression , You can go to Match For object group() Method to extract the substring .

be aware group(0) Always the original string ,group(1)group(2)…… It means the first one 1、2、…… Substring .

Extracting substrings is very useful . Take a more ferocious example :

>>> t = '19:05:30'
>>> m = re.match(r'^(0[0-9]|1[0-9]|2[0-3]|[0-9]):(0[0-9]|1[0-9]|2[0-9]|3[0-9]|4[0-9]|5[0-9]|[0-9]):(0[0-9]|1[0-9]|2[0-9]|3[0-9]|4[0-9]|5[0-9]|[0-9])$', t)
>>> m.groups()
('19', '05', '30')
 Copy code 

This regular expression can directly identify the legal time . But sometimes , You can 't do full validation with regular expressions , Such as identification date :

'^(0[1-9]|1[0-2]|[0-9])-(0[1-9]|1[0-9]|2[0-9]|3[0-1]|[0-9])$'
 Copy code 

about '2-30','4-31' Such an illegal date , I can't recognize it with regular , Or it's very difficult to write , At this time, the program is needed to cooperate with the identification .

Greedy matching

Last but not least , Regular matching defaults to greedy matching , That is, match as many characters as possible . Examples are as follows , Match the... After the number 0

>>> re.match(r'^(\d+)(0*)$', '102300').groups()
('102300', '')
 Copy code 

because \d+ Use greedy match , Just take the back one 0 It all matches , result 0* Only empty strings can be matched .

Must let \d+ Use non greedy matching ( That is, match as little as possible ), To put the back 0 Match it , Add one ? You can make \d+ Use non greedy matching :

>>> re.match(r'^(\d+?)(0*)$', '102300').groups()
('1023', '00')
 Copy code 

The graphical interface

Python Third party libraries that support multiple graphical interfaces , and Python The library comes with support Tk Of Tkinter, So use Tkinter, No packages need to be installed , You can use it directly .

Tkinter

Use Tkinter Very simple , So let's write one GUI Version of “Hello, world!”.

The first step is to import Tkinter All the contents of the package :

import tkinter
 Copy code 

The second step is from Frame Derive a Application class , This is all. Widget Parent container of :

class Application(Frame):
    def __init__(self, master=None):
        Frame.__init__(self, master)
        self.pack()
        self.createWidgets()

    def createWidgets(self):
        self.helloLabel = Label(self, text='Hello, world!')
        self.helloLabel.pack()
        self.quitButton = Button(self, text='Quit', command=self.quit)
        self.quitButton.pack()
 Copy code 

stay GUI in , Every Button、Label、 Input box, etc , It's all one Widget.Frame It can accommodate other Widget Of Widget, be-all Widget The combination is a tree .

pack() Method to Widget Add to parent container , And realize the layout .pack() It's the simplest layout ,grid() More complex layouts can be achieved .

stay createWidgets() In the method , We create a Label And a Button, When Button When clicked , Trigger self.quit() Exit the program .

The third step , Instantiation Application, And start the message loop :

app = Application()
#  Set the window title :
app.master.title('Hello World')
#  Main message loop :
app.mainloop()
 Copy code 

image.png

mac No module named '_tkinter'!

If mac newspaper No module named '_tkinter' Error of

install python-tk

brew install python-tk
 Copy code 

Input text

import tkinter
import tkinter.messagebox as messagebox

class Application(tkinter.Frame):
    def __init__(self, master=None):
        tkinter.Frame.__init__(self, master)
        self.alertButton = tkinter.Button(self, text='Hello', command=self.hello)
        self.nameInput = tkinter.Entry(self)
        self.pack()
        self.createWidgets()

    def createWidgets(self):
        self.nameInput.pack()
        self.alertButton.pack()

    def hello(self):
        name = self.nameInput.get() or 'world'
        messagebox.showinfo('Message', 'Hello, %s' % name)
app = Application()
#  Set the window title :
app.master.title('Hello World')
#  Main message loop :
app.mainloop()
 Copy code 

image.png

The turtle drawing

stay 1966 year ,Seymour Papert and Wally Feurzig Invented a language for children to learn programming ——LOGO Language , Its feature is to command a small turtle through programming (turtle) Draw on the screen .

The turtle drawing (Turtle Graphics) Later, it was transplanted into various high-level languages ,Python Built in turtle library , Basically 100% Copied the original Turtle Graphics All functions of .

Let's look at a simple code that instructs a little turtle to draw a rectangle :

#  Import turtle All the contents of the package :
from turtle import *

#  Set the brush width :
width(4)

#  Forward :
forward(200)
#  Turn right 90 degree :
right(90)

#  Brush color :
pencolor('red')
forward(100)
right(90)

pencolor('green')
forward(200)
right(90)

pencolor('blue')
forward(100)
right(90)

#  call done() Make the window wait to be closed , Otherwise, the window will be closed immediately :
done()
 Copy code 

image.png

As can be seen from the program code , Turtle mapping is to guide the turtle forward 、 to turn to , The turtle's trajectory is the line drawn . To draw a rectangle , Just let the turtle go 、 Turn right 90 degree , Over and over again 4 Time .

call width() Function to set the brush width , call pencolor() Function can set the color . For more operations, please refer to turtle library Explanation .

When the drawing is finished , Remember to call done() function , Let the window enter the message loop , Waiting to be closed . otherwise , because Python The process will end immediately , Will cause the window to close immediately .

turtle The package itself is just a drawing library , But with Python Code , You can draw all kinds of complex graphics . for example , Draw by looping 5 Five pointed stars :

from turtle import *

def drawStar(x, y):
    pu()
    goto(x, y)
    pd()
    # set heading: 0
    seth(0)
    for i in range(5):
        fd(40)
        rt(144)

for x in range(0, 250, 50):
    drawStar(x, 0)

done()
 Copy code 

image.png

import turtle

#  Setting the color mode is RGB:
turtle.colormode(255)

turtle.lt(90)

lv = 14
l = 120
s = 45

turtle.width(lv)

#  initialization RGB Color :
r = 0
g = 0
b = 0
turtle.pencolor(r, g, b)

turtle.penup()
turtle.bk(l)
turtle.pendown()
turtle.fd(l)


def draw_tree(l, level):
    global r, g, b
    # save the current pen width
    w = turtle.width()

    # narrow the pen width
    turtle.width(w * 3.0 / 4.0)
    # set color:
    r = r + 1
    g = g + 2
    b = b + 3
    turtle.pencolor(r % 200, g % 200, b % 200)

    l = 3.0 / 4.0 * l

    turtle.lt(s)
    turtle.fd(l)

    if level < lv:
        draw_tree(l, level + 1)
    turtle.bk(l)
    turtle.rt(2 * s)
    turtle.fd(l)

    if level < lv:
        draw_tree(l, level + 1)
    turtle.bk(l)
    turtle.lt(s)

    # restore the previous pen width
    turtle.width(w)


turtle.speed("fastest")

draw_tree(l, 4)

turtle.done()
 Copy code 

image.png

Network programming

Network communication is the communication between two processes on two computers . such as , Browser process and a certain... On Sina server Web The service process is communicating , and QQ Process is communicating with a process on a server of Tencent .

TCP/IP brief introduction

In order to connect all the different types of computers in the world , It has to be a set of global agreements , In order to achieve the goal of Internet , Internet Protocol cluster (Internet Protocol Suite) Is the general agreement standard .Internet By inter and net A combination of two words , The original idea is to connect “ The Internet ” Network of , With Internet, Any private network , Just support this Agreement , You can connect to the Internet .

Because Internet protocol contains hundreds of protocol standards , But the two most important agreements are TCP and IP agreement , therefore , We call the internet protocol for short TCP/IP agreement .

When communicating , Both parties must know each other's logo , It's like having to know each other's email address when sending an email . The only sign of every computer on the Internet is IP Address , similar 123.123.123.123. If a computer is connected to two or more networks at the same time , Like a router , It would have two or more IP Address , therefore ,IP The address corresponds to the network interface of the computer , It's usually a network card .

IP The protocol is responsible for sending data from one computer to another through the network . The data is divided into small pieces , And then through IP Packet sending . Because of the complexity of Internet Links , There are often multiple lines between two computers , therefore , The router is responsible for deciding how to put a IP Forward the package .IP The feature of a packet is to send it in blocks , Route multiple routes , But there is no guarantee that it will arrive , There is no guarantee that the order will arrive .

IP The address is actually a 32 An integer ( be called IPv4), In strings IP Address: e.g 192.168.0.1 It's actually a 32 Bit integers by 8 The number after the bit grouping indicates , The aim is to make it easy to read .

IPv6 The address is actually a 128 An integer , It is currently in use IPv4 Upgraded version , To represent in a string is similar to 2001:0db8:85a3:0042:1000:8a2e:0370:7334.

TCP The agreement is based on IP Above the agreement .TCP The protocol is responsible for establishing a reliable connection between two computers , Ensure that packets arrive in sequence .TCP The protocol will establish a connection by handshaking , then , For each IP Package number , Make sure the other party receives... In order , If the bag is lost , It will automatically resend .

Many of the more advanced protocols commonly used are based on TCP On the basis of the agreement , For example, for browsers HTTP agreement 、 Sent by SMTP Agreements, etc .

One TCP In addition to the message contains the data to be transmitted , Also include source IP Address and destination IP Address , Source and destination ports .

What does the port do ? When two computers communicate , Only hair IP The address is not enough , Because there are many network programs running on the same computer . One TCP When the message comes , Is it for browser or QQ, You need the port number to distinguish . Each network program requests a unique port number from the operating system , such , Two processes need their own... To establish a network connection between two computers IP Address and respective port number .

A process can also be linked to multiple computers at the same time , So it will apply for many ports .

TCP Programming

Socket It's an abstract concept of network programming . Usually we use a Socket Express “ Open a web link ”, And open a Socket Need to know the target computer's IP Address and port number , Then specify the protocol type .

client

Most connections are reliable TCP Connect . establish TCP When the connection , The client who initiates the connection initiatively , A passive response connection is called a server .

#  Import socket library :
import socket

#  establish `Socket` when ,`AF_INET` Specify the use of IPv4 agreement 
s = socket.socket(socket.AF_INET, socket.SOCK_STREAM)
#  Establishing a connection :
s.connect(('www.sina.com.cn', 80))
 Copy code 

The client should actively initiate TCP Connect , You must know the of the server IP Address and port number . Sina website IP The address can be domain name www.sina.com.cn Automatic conversion to IP Address , But how do you know the port number of sina server ?

The answer is as a server , What kind of services are provided , The port number must be fixed . Because we want to visit the web , Therefore, Sina's Web service server must fix the port number at 80 port , because 80 The port is Web Standard port of the service . Other services have corresponding standard port numbers , for example SMTP Service is 25 port ,FTP Service is 21 port , wait . The port number is less than 1024 Yes. Internet Standard service port , The port number is greater than 1024 Of , It can be used at will .

establish TCP After connection , We can send a request to Sina server , Ask to return to the content of the home page :

#  send data :
s.send(b'GET / HTTP/1.1\r\nHost: www.sina.com.cn\r\nConnection: close\r\n\r\n')
 Copy code 

TCP The connection creates a two-way channel , Both parties can send data to each other at the same time . But who starts first, who starts later , How to coordinate , It's up to a specific agreement to decide . for example ,HTTP The protocol states that the client must send the request to the server first , The server sends data to the client after receiving it .

The text format sent must conform to HTTP standard , If the format is OK , Then you can receive the data returned by Sina server :

#  receive data :
buffer = []
while True:
    #  Receive at most each time 1k byte :
    d = s.recv(1024)
    if d:
        buffer.append(d)
    else:
        break
data = b''.join(buffer)
 Copy code 

When receiving data , call recv(max) Method , The maximum number of bytes received at one time , therefore , In a while To receive repeatedly in a cycle , until recv() Return null data , Indicates that the reception is complete , Exit loop .

When we receive the data , call close() Method shut down Socket, such , A complete network communication is over :

#  Close the connection :
s.close()
 Copy code 

The data received includes HTTP The header and the page itself , We just have to take HTTP Separate the header from the web page , hold HTTP Head print out , Web content saved to file :

header, html = data.split(b'\r\n\r\n', 1)
print(header.decode('utf-8'))
#  Write the received data to a file :
with open('sina.html', 'wb') as f:
    f.write(html)
 Copy code 

Now? , Just open this in the browser sina.html file , You can see Sina's homepage .

The server

The server process first binds a port and listens for connections from other clients . If a client connects , The server is set up with the client Socket Connect , That's what the subsequent correspondence depends on Socket Connected to .

therefore , The server will open the fixed port ( such as 80) monitor , Every client connection , Just create the Socket Connect . Because the server will have a lot of connections from clients , therefore , The server should be able to distinguish one Socket Which client is the connection bound to . One Socket rely on 4 term : Server address 、 Server port 、 Client address 、 Client port to uniquely identify a Socket.

But the server also needs to respond to requests from multiple clients at the same time , therefore , Each connection needs a new process or a new thread to handle , otherwise , The server can only serve one client at a time .

Let's write a simple server program , It receives client connections , Add... To the string sent by the client Hello Send it back .

First , Create one based on IPv4 and TCP Agreed Socket:

s = socket.socket(socket.AF_INET, socket.SOCK_STREAM)
 Copy code 

then , We need to bind the listening address and port . The server may have multiple network cards , It can be bound to a certain network card IP Address , It can also be used. 0.0.0.0 Bind to all network addresses , You can also use 127.0.0.1 Bind to native address .127.0.0.1 It's a special one IP Address , Indicates the local address , If you bind to this address , The client must be running on the local machine at the same time to connect , in other words , External computers can't connect in .

The port number needs to be specified in advance . Because the service we wrote is not a standard service , So use 9999 This port number . Please note that , Less than 1024 The port number of must have administrator permission to bind :

#  Listening port :
s.bind(('127.0.0.1', 9999))
 Copy code 

Then , call listen() Method to start listening on the port , The parameter passed in specifies the maximum number of waiting connections :

s.listen(5)
print('Waiting for connection...')
 Copy code 

Next , The server program accepts the connection from the client through a permanent loop ,accept() Will wait and return a client connection :

while True:
    #  Accept a new connection :
    sock, addr = s.accept()
    #  Create a new thread to handle TCP Connect :
    t = threading.Thread(target=tcplink, args=(sock, addr))
    t.start()
 Copy code 

Each connection must create a new thread ( Or the process ) To deal with it , otherwise , Single thread in the process of processing connection , Unable to accept connections from other clients :

def tcplink(sock, addr):
    print('Accept new connection from %s:%s...' % addr)
    sock.send(b'Welcome!')
    while True:
        data = sock.recv(1024)
        time.sleep(1)
        if not data or data.decode('utf-8') == 'exit':
            break
        sock.send(('Hello, %s!' % data.decode('utf-8')).encode('utf-8'))
    sock.close()
    print('Connection from %s:%s closed.' % addr)
 Copy code 

After the connection is established , The server sends a welcome message first , Then wait for the client data , And plus Hello Resend to client . If the client sends exit character string , Just close the connection .

To test this server program , We also need to write a client program :

s = socket.socket(socket.AF_INET, socket.SOCK_STREAM)
#  Establishing a connection :
s.connect(('127.0.0.1', 9999))
#  Receive a welcome message :
print(s.recv(1024).decode('utf-8'))
for data in [b'Michael', b'Tracy', b'Sarah']:
    #  send data :
    s.send(data)
    print(s.recv(1024).decode('utf-8'))
s.send(b'exit')
s.close()
 Copy code 

We need to open two command line windows , A running server program , The other runs the client program , And you can see the effect :

┌────────────────────────────────────────────────────────┐
│Command Prompt                                    - □ x │
├────────────────────────────────────────────────────────┤
│$ python echo_server.py                                 │
│Waiting for connection...                               │
│Accept new connection from 127.0.0.1:64398...           │
│Connection from 127.0.0.1:64398 closed.                 │
│                                                        │
│       ┌────────────────────────────────────────────────┴───────┐
│       │Command Prompt                                    - □ x │
│       ├────────────────────────────────────────────────────────┤
│       │$ python echo_client.py                                 │
│       │Welcome!                                                │
│       │Hello, Michael!                                         │
└───────┤Hello, Tracy!                                           │
        │Hello, Sarah!                                           │
        │$                                                       │
        │                                                        │
        │                                                        │
        └────────────────────────────────────────────────────────┘
 Copy code 

It should be noted that , The client program exits after running , And the server program will run forever , According to the Ctrl+C Exit procedure .

UDP Programming

TCP It's a reliable connection , And both sides of the communication can send data in the form of stream . relative TCP,UDP It's a connectionless protocol .

Use UDP When the agreement , No connection needed , Just need to know each other's IP Address and port number , You can send packets directly . however , I don't know if I can get there .

Although with UDP The transmission of data is unreliable , But it has the advantage of being with TCP Than , Fast , For data that does not require reliable arrival , You can use UDP agreement .

Let's see how to pass UDP Protocol transfer data . and TCP similar , Use UDP The two sides of communication are also divided into client and server . The server needs to bind the port first :

s = socket.socket(socket.AF_INET, socket.SOCK_DGRAM)
#  Binding port :
s.bind(('127.0.0.1', 9999))
 Copy code 

establish Socket when ,SOCK_DGRAM It's assigned this Socket The type is UDP. Binding ports and TCP equally , But you don't need to call listen() Method , It receives data directly from any client :

print('Bind UDP on 9999...')
while True:
    #  receive data :
    data, addr = s.recvfrom(1024)
    print('Received from %s:%s.' % addr)
    s.sendto(b'Hello, %s!' % data, addr)
 Copy code 

recvfrom() Method returns data and the address and port of the client , such , After the server receives the data , Call directly sendto() You can use the data UDP Send to client .

Note that multithreading is omitted here , Because this example is very simple .

Client side usage UDP when , First still create based on UDP Of Socket, then , No call required connect(), Directly through sendto() Send data to server :

s = socket.socket(socket.AF_INET, socket.SOCK_DGRAM)
for data in [b'Michael', b'Tracy', b'Sarah']:
    #  send data :
    s.sendto(data, ('127.0.0.1', 9999))
    #  receive data :
    print(s.recv(1024).decode('utf-8'))
s.close()
 Copy code 

Receiving data from the server still calls recv() Method .

Still start the server and client tests with two command lines , give the result as follows :

┌────────────────────────────────────────────────────────┐
│Command Prompt                                    - □ x │
├────────────────────────────────────────────────────────┤
│$ python udp_server.py                                  │
│Bind UDP on 9999...                                     │
│Received from 127.0.0.1:63823...                        │
│Received from 127.0.0.1:63823...                        │
│Received from 127.0.0.1:63823...                        │
│       ┌────────────────────────────────────────────────┴───────┐
│       │Command Prompt                                    - □ x │
│       ├────────────────────────────────────────────────────────┤
│       │$ python udp_client.py                                  │
│       │Welcome!                                                │
│       │Hello, Michael!                                         │
└───────┤Hello, Tracy!                                           │
        │Hello, Sarah!                                           │
        │$                                                       │
        │                                                        │
        │                                                        │
        └────────────────────────────────────────────────────────┘
 Copy code 

UDP Use and TCP similar , But there's no need to establish a connection . Besides , Server binding UDP Port and TCP Ports don't conflict with each other , in other words ,UDP Of 9999 Port and port TCP Of 9999 Ports can be individually bound .

Use MySQL

install MySQL

Can be directly from MySQL Download the latest Community Server 5.6.x edition .MySQL It's cross platform , Select the corresponding platform to download the installation file , Can be installed .

When installing ,MySQL Will prompt for root User's password , Please remember . If I'm afraid I can't remember , Just set the password to password.

stay Windows On , Please select... During installation UTF-8 code , In order to deal with Chinese correctly .

stay Mac or Linux On , You need to edit MySQL Configuration file for , Change the default encoding of the database to UTF-8.MySQL By default, the configuration file of is stored in /etc/my.cnf perhaps /etc/mysql/my.cnf

[client]
default-character-set = utf8

[mysqld]
default-storage-engine = INNODB
character-set-server = utf8
collation-server = utf8_general_ci
 Copy code 

restart MySQL after , Can pass MySQL Client command line check encoding for :

$ mysql -u root -p
Enter password: 
Welcome to the MySQL monitor...
...

mysql> show variables like '%char%';
+--------------------------+--------------------------------------------------------+
| Variable_name            | Value                                                  |
+--------------------------+--------------------------------------------------------+
| character_set_client     | utf8                                                   |
| character_set_connection | utf8                                                   |
| character_set_database   | utf8                                                   |
| character_set_filesystem | binary                                                 |
| character_set_results    | utf8                                                   |
| character_set_server     | utf8                                                   |
| character_set_system     | utf8                                                   |
| character_sets_dir       | /usr/local/mysql-5.1.65-osx10.6-x86_64/share/charsets/ |
+--------------------------+--------------------------------------------------------+
8 rows in set (0.00 sec)
 Copy code 

notice utf8 The word indicates that the coding setting is correct .

install MySQL drive

because MySQL The server runs as a separate process , And through the network of external services , therefore , Need to support Python Of MySQL Drive to connect to MySQL The server .MySQL The official provided mysql-connector-python drive , But when you install it, you need to give pip Command with parameters --allow-external

 pip install mysql-connector-python --allow-external mysql-connector-python
 Copy code 

If the above command fails to install , Try another driver :

$ pip install mysql-connector
 Copy code 

We demonstrate how to connect to MySQL Server's test database :

#  Import MySQL drive :
>>> import mysql.connector
#  Pay attention to the password Set it as your root password :
>>> conn = mysql.connector.connect(user='root', password='password', database='test')
>>> cursor = conn.cursor()
#  establish user surface :
>>> cursor.execute('create table user (id varchar(20) primary key, name varchar(20))')
#  Insert a line of records , Be careful MySQL The place holder for is %s:
>>> cursor.execute('insert into user (id, name) values (%s, %s)', ['1', 'Michael'])
>>> cursor.rowcount
1
#  Commit transaction :
>>> conn.commit()
>>> cursor.close()
#  run a query :
>>> cursor = conn.cursor()
>>> cursor.execute('select * from user where id = %s', ('1',))
>>> values = cursor.fetchall()
>>> values
[('1', 'Michael')]
#  close Cursor and Connection:
>>> cursor.close()
True
>>> conn.close()
 Copy code 

web Development

WSGI Interface

I understand HTTP The protocol and HTML file , We actually understand one Web The essence of application is :

  1. Browser sends a HTTP request ;
  2. The server receives the request , Generate a HTML file ;
  3. Server handle HTML Document as HTTP Responsive Body Send it to the browser ;
  4. Browser received HTTP Respond to , from HTTP Body Take out HTML Document and display .

therefore , The simplest Web Application is to put HTML Keep it in a file , Use a ready-made HTTP Server software , Receive user requests , Read from file HTML, return .Apache、Nginx、Lighttpd These common static servers do this .

If you want to dynamically generate HTML, We need to realize the above steps by ourselves . however , Accept HTTP request 、 analysis HTTP request 、 send out HTTP Response is coolie , If we write the underlying code ourselves , I haven't started to write dynamic yet HTML Well , It's going to take a month to read HTTP standard .

The right way is that the underlying code is implemented by specialized server software , We use it Python Focus on generating HTML file . Because we don't want to touch TCP Connect 、HTTP Original request and response format , therefore , Need a unified interface , Let's concentrate on Python To write Web Business .

This interface is WSGI:Web Server Gateway Interface.

WSGI Interface definition is very simple , It only requires Web The developer implements a function , I can respond HTTP request . Let's take a look at the simplest Web Version of “Hello, web!”:

def application(environ, start_response):
    start_response('200 OK', [('Content-Type', 'text/html')])
    return [b'<h1>Hello, web!</h1>']
 Copy code 

above application() The function is the one that fits WSGI One of the standards HTTP Processing function , It takes two parameters :

  • environ: One contains all HTTP Asking for information dict object ;
  • start_response: A send HTTP Function of response .

stay application() Function , call :

start_response('200 OK', [('Content-Type', 'text/html')])
 Copy code 

Just send HTTP Responsive Header, Be careful Header It can only be sent once , That is, it can only be called once start_response() function .start_response() Function takes two parameters , One is HTTP Response code , One is a group list It means HTTP Header, Every Header Use one to contain two str Of tuple Express .

Usually , We should Content-Type Hair to browser . Many other commonly used HTTP Header It should also send .

then , The return value of the function b'<h1>Hello, web!</h1>' Will serve as a HTTP Responsive Body Send it to the browser .

With WSGI, All we care about is how to get from environ This dict The object gets HTTP Request information , And then construct HTML, adopt start_response() send out Header, Finally back to Body.

Whole application() The function itself doesn't involve any parsing HTTP Part of , in other words , The underlying code doesn't need to be written by us , We are only responsible for thinking about how to respond to requests at a higher level .

application() How to call a function ? If we call ourselves , Two parameters environ and start_response We can't provide , Back to bytes I can't send it to the browser .

therefore application() Function must be created by WSGI Server to call . There's a lot of conformity WSGI Canonical server , We can choose one to use . But now , We just want to test our application() Function can really put HTML Output to browser , therefore , Find the simplest one quickly WSGI The server , Take us Web The app runs .

The good news is Python There's a built-in WSGI The server , This module is called wsgiref, It's pure Python Compiling WSGI A reference implementation of the server . So-called “ Reference implementation ” It means that the implementation completely conforms to WSGI standard , But it doesn't take into account any operational efficiency , For development and testing only .

function WSGI service

Let's write hello.py, Realization Web Application's WSGI Processing function :

# hello.py

def application(environ, start_response):
    start_response('200 OK', [('Content-Type', 'text/html')])
    return [b'<h1>Hello, web!</h1>']
 Copy code 

then , Write another server.py, Responsible for starting WSGI The server , load application() function :

# server.py
#  from wsgiref Module import :
from wsgiref.simple_server import make_server
#  Import what we wrote ourselves application function :
from hello import application

#  Create a server ,IP Address is empty , The port is 8000, The processing function is application:
httpd = make_server('', 8000, application)
print('Serving HTTP on port 8000...')
#  Start listening HTTP request :
httpd.serve_forever()
 Copy code 

Make sure the above two files are in the same directory , Then type on the command line python server.py To start up WSGI The server :

image.png After successful startup , Open the browser , Input http://localhost:8000/, And you can see the result :

image.png Press Ctrl+C Terminate the server .

If you think this Web The application is too simple , You can change it a little bit , from environ Read in PATH_INFO, This shows more dynamic content :

# hello.py

def application(environ, start_response):
    start_response('200 OK', [('Content-Type', 'text/html')])
    body = '<h1>Hello, %s!</h1>' % (environ['PATH_INFO'][1:] or 'web')
    return [body.encode('utf-8')]
 Copy code 

You can enter the user name as URL Part of , Will return Hello, xxx!

Use Web frame

every last URL Can correspond to GET and POST request , Of course, PUT、DELETE Equal request , But we usually only consider the most common GET and POST request .

One of the simplest ideas is to start with environ Take... From the variable HTTP Requested information , Then judge one by one :

def application(environ, start_response):
    method = environ['REQUEST_METHOD']
    path = environ['PATH_INFO']
    if method=='GET' and path=='/':
        return handle_home(environ, start_response)
    if method=='POST' and path='/signin':
        return handle_signin(environ, start_response)
    ...
 Copy code 

The reason why the code can't be maintained is because WSGI Although the interface provided is better than HTTP The interface is advanced a lot , But and Web App The processing logic is better than , It's still relatively low , We need to be in WSGI The interface can be further abstracted , Let's focus on dealing with a function URL, as for URL Mapping to functions , Hand over to Web Frame to do .

Because of the use of Python To develop a Web The framework is very easy , therefore Python There are hundreds of open source Web frame . Let's not discuss all kinds of Web Advantages and disadvantages of framework , Directly choose a more popular Web frame ——Flask To use .

use Flask To write Web App Than WSGI The interface is simple ( Isn't that bullshit , If than WSGI It's complicated , What are you doing with a frame ?), We use first pip install Flask:

$ pip install flask
 Copy code 

And then write a app.py, Handle 3 individual URL, Namely :

  • GET /: home page , return Home;
  • GET /signin: The login page , Show login form ;
  • POST /signin: Processing login forms , Show login results .

Pay attention , The same URL/signin There were GET and POST Two requests , Map to two handler functions .

Flask adopt Python Of Decorator Automatically put... Internally URL It's related to functions , therefore , The code we write is like this :

from flask import Flask
from flask import request

app = Flask(__name__)

@app.route('/', methods=['GET', 'POST'])
def home():
    return '<h1>Home</h1>'

@app.route('/signin', methods=['GET'])
def signin_form():
    return '''<form action="/signin" method="post">
              <p><input name="username"></p>
              <p><input name="password" type="password"></p>
              <p><button type="submit">Sign In</button></p>
              </form>'''

@app.route('/signin', methods=['POST'])
def signin():
    #  Need from request Object to read form content :
    if request.form['username']=='admin' and request.form['password']=='password':
        return '<h3>Hello, admin!</h3>'
    return '<h3>Bad username or password.</h3>'

if __name__ == '__main__':
    app.run()
 Copy code 

function python app.py,Flask Self contained Server In the port 5000 On the monitor :

$ python app.py 
 * Running on http://127.0.0.1:5000/
 Copy code 

Open the browser , Enter the home page address http://localhost:5000/

image.png The home page is displayed correctly !

Then enter... In the browser address bar http://localhost:5000/signin, The login form will be displayed :

image.png Enter the preset user name admin And password password, Login successful

Enter other wrong user names and passwords , Login failed

Actually Web App You should get the user name and password , Go to the database to query and compare , To determine whether the user can log in successfully .

except Flask, common Python Web And the framework :

  • Django: Omnipotent Web frame ;
  • web.py: A compact Web frame ;
  • Bottle: and Flask Allied Web frame ;
  • Tornado:Facebook Open source asynchronous Web frame .

Use templates

from flask import Flask, request, render_template

app = Flask(__name__)

@app.route('/', methods=['GET', 'POST'])
def home():
    return render_template('home.html')

@app.route('/signin', methods=['GET'])
def signin_form():
    return render_template('form.html')

@app.route('/signin', methods=['POST'])
def signin():
    username = request.form['username']
    password = request.form['password']
    if username=='admin' and password=='password':
        return render_template('signin-ok.html', username=username)
    return render_template('form.html', message='Bad username or password', username=username)

if __name__ == '__main__':
    app.run()
 Copy code 

Flask adopt render_template() Function to realize the rendering of the template . and Web The frame is similar to ,Python There are many kinds of templates .Flask The default supported templates are jinja2, So let's directly install jinja2:

$ pip install jinja2
 Copy code 

then , Start writing jinja2 Templates :

home.html

The template used to display the home page :

<html>
<head>
  <title>Home</title>
</head>
<body>
  <h1 style="font-style:italic">Home</h1>
</body>
</html>
 Copy code 

form.html

The template used to display the login form :

<html>
<head>
  <title>Please Sign In</title>
</head>
<body>
  {% if message %}
  <p style="color:red">{{ message }}</p>
  {% endif %}
  <form action="/signin" method="post">
    <legend>Please sign in:</legend>
    <p><input name="username" placeholder="Username" value="{{ username }}"></p>
    <p><input name="password" placeholder="Password" type="password"></p>
    <p><button type="submit">Sign In</button></p>
  </form>
</body>
</html>
 Copy code 

signin-ok.html

Login successful template :

<html>
<head>
  <title>Welcome, {{ username }}</title>
</head>
<body>
  <p>Welcome, {{ username }}!</p>
</body>
</html>
 Copy code 

asynchronous IO

stay IO In the programming section , We already know ,CPU It's much faster than disk 、 Network, etc IO. In a thread ,CPU Code execution is extremely fast , However , Once encountered IO operation , Such as reading and writing files 、 When sending network data , We need to wait IO Operation is completed , In order to continue with the next step of operation . This is called synchronization IO.

Another solution IO The solution to the problem is asynchronous IO. When the code needs to execute a time-consuming IO In operation , It just sends out IO Instructions , Don't wait IO result , Then you execute other code . After a while , When IO When returning results , Notice again CPU To deal with .

asynchronous IO The model needs a message loop , In the message loop , The main thread keeps repeating “ Read message - Process the message ” This process :

loop = get_event_loop()
while True:
    event = loop.get_event()
    process_event(event)
 Copy code 

asyncio

asyncio yes Python 3.4 Version of the standard library introduced , Direct built-in for asynchronous IO Support for . use asyncio Realization Hello world The code is as follows :

import asyncio

@asyncio.coroutine
def hello():
    print("Hello world!")
    #  Asynchronous call asyncio.sleep(1):
    r = yield from asyncio.sleep(1)
    print("Hello again!")

#  obtain EventLoop:
loop = asyncio.get_event_loop()
#  perform coroutine
loop.run_until_complete(hello())
loop.close()
 Copy code 

@asyncio.coroutine Put one generator Marked as coroutine type , then , So let's just take this coroutine Throw it EventLoop In the implementation of .

hello() Will first print out Hello world!, then ,yield from Syntax makes it easy to call another generator. because asyncio.sleep() Also a coroutine, So threads don't wait asyncio.sleep(), Instead, interrupt directly and execute the next message loop . When asyncio.sleep() return , The thread can start from yield from Get the return value ( Here is None), Then execute the next line .

hold asyncio.sleep(1) Think of it as a time-consuming 1 Of a second IO operation , in the meantime , The main thread is not waiting , But to execute EventLoop Other things that can be done in coroutine 了 , So concurrent execution can be implemented .

We use it Task Encapsulate two coroutine try :

import threading
import asyncio

@asyncio.coroutine
def hello():
    print('Hello world! (%s)' % threading.currentThread())
    yield from asyncio.sleep(1)
    print('Hello again! (%s)' % threading.currentThread())

loop = asyncio.get_event_loop()
tasks = [hello(), hello()]
loop.run_until_complete(asyncio.wait(tasks))
loop.close()
 Copy code 

Observe the execution process :

Hello world! (<_MainThread(MainThread, started 140735195337472)>)
Hello world! (<_MainThread(MainThread, started 140735195337472)>)
( Suspension of appointment 1 second )
Hello again! (<_MainThread(MainThread, started 140735195337472)>)
Hello again! (<_MainThread(MainThread, started 140735195337472)>)
 Copy code 

As can be seen from the current thread name printed , Two coroutine Is executed concurrently by the same thread .

If you put asyncio.sleep() Replace it with the real one IO operation , Multiple coroutine It can be executed concurrently by one thread .

We use it asyncio Asynchronous network connection to get sina、sohu and 163 Home page of the website :

import asyncio

@asyncio.coroutine
def wget(host):
    print('wget %s...' % host)
    connect = asyncio.open_connection(host, 80)
    reader, writer = yield from connect
    header = 'GET / HTTP/1.0\r\nHost: %s\r\n\r\n' % host
    writer.write(header.encode('utf-8'))
    yield from writer.drain()
    while True:
        line = yield from reader.readline()
        if line == b'\r\n':
            break
        print('%s header > %s' % (host, line.decode('utf-8').rstrip()))
    # Ignore the body, close the socket
    writer.close()

loop = asyncio.get_event_loop()
tasks = [wget(host) for host in ['www.sina.com.cn', 'www.sohu.com', 'www.163.com']]
loop.run_until_complete(asyncio.wait(tasks))
loop.close()
 Copy code 

The results are as follows :

wget www.sohu.com...
wget www.sina.com.cn...
wget www.163.com...
( Wait for a while )
( Print out sohu Of header)
www.sohu.com header > HTTP/1.1 200 OK
www.sohu.com header > Content-Type: text/html
...
( Print out sina Of header)
www.sina.com.cn header > HTTP/1.1 200 OK
www.sina.com.cn header > Date: Wed, 20 May 2015 04:56:33 GMT
...
( Print out 163 Of header)
www.163.com header > HTTP/1.0 302 Moved Temporarily
www.163.com header > Server: Cdn Cache Server V2.0
...
 Copy code 

so 3 A connection is made by a thread through coroutine Concurrent completion .

async/await

use asyncio Provided @asyncio.coroutine You can put a generator Marked as coroutine type , And then in coroutine For internal use yield from Call another coroutine Implement asynchronous operations .

To simplify and better identify asynchrony IO, from Python 3.5 New syntax has been introduced async and await, It can make coroutine The code is simpler and easier to read .

Please note that ,async and await Is aimed at coroutine New syntax , Use the new grammar , It's just a two-step simple replacement :

  1. hold @asyncio.coroutine Replace with async;
  2. hold yield from Replace with await.

Let's compare the code in the previous section :

@asyncio.coroutine
def hello():
    print("Hello world!")
    r = yield from asyncio.sleep(1)
    print("Hello again!")
 Copy code 

Rewrite the following with the new syntax :

async def hello():
    print("Hello world!")
    r = await asyncio.sleep(1)
    print("Hello again!")
 Copy code 

The rest of the code remains unchanged .

Last chapter You used to python Basics

I'll do some later py Small demo

Reference resources

Reference from   Mr. Liao Xuefeng's python course

Reference resources  python Official website

copyright notice
author[i aog !],Please bring the original link to reprint, thank you.
https://en.pythonmana.com/2022/01/202201292245588811.html

Random recommended