current position:Home>The most useful decorator in the python standard library

The most useful decorator in the python standard library

2022-01-30 17:36:10 somenzz

as everyone knows ,Python Flexible language 、 concise , Programmer friendly , But the performance is a little unsatisfactory , This can be proved by a recursive Fibonacci function :

def fib(n):
    if n <= 1:
        return n
    return fib(n - 1) + fib(n - 2)
 Copy code 

In my MBP Count up fib(40) It cost 33 second :

import time

def main():
    start = time.time()
    result = fib(40)
    end = time.time()
    cost = end - start
    print(f"{result = } {cost = :.4f}")

if __name__ == '__main__':
    main()
 Copy code 

however , If you use this decorator in the standard library , The result is completely different

from functools import lru_cache

@lru_cache
def fib(n):
    if n <= 1:
        return n
    return fib(n - 1) + fib(n - 2)

 Copy code 

The result of this time is 0 second , You read that right , I kept 4 Decimal place , The latter ignores .

How many times ? I can't figure it out .

Why? lru_cache The decorator is so awesome , What did it do ? Today, let's talk about the most useful decorator .

If you've seen the computer operating system , You are right about LRU It must not be strange , This is the famous cache elimination algorithm that has not been used for the longest time .

and lru_cache Is the concrete implementation of this algorithm .( This algorithm is often tested in interviews , Some require on-site handwritten code )

Now? , Let's take a look at one lru_cache Source code , The English notes , I have translated it into Chinese for you :

def lru_cache(maxsize=128, typed=False):
    """LRU  Cache decorator   If  *maxsize*  yes  None,  The cache will not be obsolete , There is no limit on the cache size   If  *typed*  yes  True,  Different types of parameters will be cached independently , such as  f(3.0) and f(3)  It will be considered as different function calls and cached on two cache nodes .  The arguments to the function must be  hash  The cache information is viewed using named tuples  (hits, misses, maxsize, currsize)  View cache information :user_func.cache_info().  Clean up cache information :user_func.cache_clear(). LRU  Algorithm : http://en.wikipedia.org/wiki/Cache_replacement_policies#Least_recently_used_(LRU) """

    # lru_cache  The internal implementation of is thread safe 

    if isinstance(maxsize, int):
        #  Negative numbers are converted to  0 
        if maxsize < 0:
            maxsize = 0
    elif callable(maxsize) and isinstance(typed, bool):
        # If the decorated function (user_function) Directly through  maxsize  Parameters of the incoming  
        user_function, maxsize = maxsize, 128
        wrapper = _lru_cache_wrapper(user_function, maxsize, typed, _CacheInfo)
        return update_wrapper(wrapper, user_function)
    elif maxsize is not None:
        raise TypeError(
            'Expected first argument to be an integer, a callable, or None')

    def decorating_function(user_function):
        wrapper = _lru_cache_wrapper(user_function, maxsize, typed, _CacheInfo)
        return update_wrapper(wrapper, user_function)

    return decorating_function

 Copy code 

There are two parameters in it , One is maxsize, Indicates the size of the cache , When a negative number is passed in , Automatically set to 0, If it doesn't come in maxsize, Or set to None, Indicates that the cache has no size limit , There is no cache at this time . The other one is type, When type Pass in True when , Different parameter types are treated as different parameters key Put it in the cache .

Next ,lru_cache The core of is on this function _lru_cache_wrapper, Suggest emotional reading 、 Recite and dictate . Let's take a look at its source code


def _lru_cache_wrapper(user_function, maxsize, typed, _CacheInfo):
    #  all  lru cache  Instance shared constants :
    sentinel = object()          #  The only object used to represent a cache miss 
    make_key = _make_key         # build a key from the function arguments
    PREV, NEXT, KEY, RESULT = 0, 1, 2, 3   # names for the link fields

    cache = {}
    hits = misses = 0
    full = False
    cache_get = cache.get    #  Binding function to get the data in the cache  key  Value 
    cache_len = cache.__len__  #  The binding function gets the cache size 
    lock = RLock()           #  Because the update on the linked list is thread unsafe 
    root = []                #  The root node of the circular bidirectional linked list 
    root[:] = [root, root, None, None]     #  Before and after initializing the root node, the pointer points to itself 

    if maxsize == 0:

        def wrapper(*args, **kwds):
            #  There is no cache , Update statistics only 
            nonlocal misses
            misses += 1
            result = user_function(*args, **kwds)
            return result

    elif maxsize is None:

        def wrapper(*args, **kwds):
            #  Sort only , Regardless of sorting and cache size restrictions 
            nonlocal hits, misses
            key = make_key(args, kwds, typed)
            result = cache_get(key, sentinel)
            if result is not sentinel:
                hits += 1
                return result
            misses += 1
            result = user_function(*args, **kwds)
            cache[key] = result
            return result

    else:

        def wrapper(*args, **kwds):
            #  There's a limit to the size , And track the most recently used cache 
            nonlocal root, hits, misses, full
            key = make_key(args, kwds, typed)
            with lock:
                link = cache_get(key)
                if link is not None:
                    #  A cache hit , Move the hit cache to the head of the circular bidirectional linked list 
                    link_prev, link_next, _key, result = link
                    link_prev[NEXT] = link_next
                    link_next[PREV] = link_prev
                    last = root[PREV]
                    last[NEXT] = root[PREV] = link
                    link[PREV] = last
                    link[NEXT] = root
                    hits += 1
                    return result
                misses += 1
            result = user_function(*args, **kwds)
            with lock:
                if key in cache:
                    #  So let's go over here  key  It's already in the cache , And the lock has been released , The linked list has been updated , There's nothing to do here , Finally, you only need to return the calculation result .
                    pass
                elif full:
                    #  If the cache is full ,  Use the oldest root node to store the new node , The linked list does not need to be deleted ( Isn't it smart )
                    oldroot = root
                    oldroot[KEY] = key
                    oldroot[RESULT] = result
                    root = oldroot[NEXT]
                    oldkey = root[KEY]
                    oldresult = root[RESULT]
                    root[KEY] = root[RESULT] = None
                    
                    #  Last , We need to clear this from the cache  key, Because it's no longer valid .
                    del cache[oldkey]
                    #  Put the new value into the cache 
                    cache[key] = oldroot
                else:
                    #  If it's not full , Put the new result into the head of the circular bidirectional linked list 
                    last = root[PREV]
                    link = [last, root, key, result]
                    last[NEXT] = root[PREV] = cache[key] = link
                    #  Use  cache_len  Bind methods instead of  len()  function , The latter may be packaged in  lru_cache  In itself 
                    full = (cache_len() >= maxsize)
            return result

    def cache_info():
        """ Report cache statistics """
        with lock:
            return _CacheInfo(hits, misses, maxsize, cache_len())

    def cache_clear():
        """ Clean up cache information """
        nonlocal hits, misses, full
        with lock:
            cache.clear()
            root[:] = [root, root, None, None]
            hits = misses = 0
            full = False

    wrapper.cache_info = cache_info
    wrapper.cache_clear = cache_clear
    return wrapper

 Copy code 

If you read all the notes I wrote , Then don't look at my nonsense , If you still don't understand , Let me say a few words , Maybe you understand .

  1. Cache , Still using memory , For fast access , It's just a hash surface , That is to say Python Dictionary , All operations in memory .
    cache = {}
 Copy code 
  1. If maxsize == 0, It's equivalent to not using cache , Every call , The number of misses is + 1, The code logic is like this :
def wrapper(*args, **kwds):
    nonlocal misses
    misses += 1 #  Misses 
    result = user_function(*args, **kwds)
    return result
 Copy code 
  1. If maxsize == None, Equivalent to unlimited cache , There is no need to consider elimination , This implementation is very simple , We can use a dictionary directly in the function , for instance :

cache = {}
def fib(n):

    if n in cache:
        return cache[n]

    if n <= 1:
        return n
    result = fib(n - 1) + fib(n - 2)
    cache[n] = result
    return result

 Copy code 

The elapsed time :

I understand that , In the decorator , This logic is not difficult to understand :

def wrapper(*args, **kwds):
    nonlocal hits, misses
    key = make_key(args, kwds, typed)
    result = cache_get(key, sentinel)
    if result is not sentinel:
        hits += 1
        return result
    misses += 1
    result = user_function(*args, **kwds)
    cache[key] = result
    return result

 Copy code 
  1. The real cache elimination algorithm .

In order to implement caching ( Key value pair ) The elimination of , We need to sort the cache by time , This requires a linked list , The head of the linked list is the latest inserted , The tail is the oldest inserted , When the number of caches has reached the maximum , We delete the longest unused tail node .

When cache hits , We need to move this node to the head of the linked list , Ensure that the head of the linked list is frequently used recently , For ease of movement , We need a two-way linked list .

Two way linked list in Python To realize , You can simply write :

PREV, NEXT, KEY, RESULT = 0, 1, 2, 3   # names for the link fields
root = []                # root of the circular doubly linked list
root[:] = [root, root, None, None]     # initialize by pointing to self
 Copy code 

Maybe some friends don't understand the last line of code :root[:] = [root, root, None, None], Draw a picture and you will understand :

These arrows point to the memory address of the node , As the number of nodes increases , This is what it looks like :

After knowing this, combine the source code , It's easy for you to understand . Especially this piece of code logic , It is the key point of interview , If you can write such a thread safe LRU Cache elimination algorithm , That is undoubtedly very excellent .

other LRU Implementation of algorithm

Other things about LRU Implementation of algorithm , I wrote two myself , You can see here :

LRU Cache elimination algorithm - Double linked list +hash surface

LRU Cache elimination algorithm -Python Orderly dictionary

Last words

Decorator lru_cache The function of is to save the computer results of the function , The next time you use it, you can directly from hash Take out from the table , Avoid double counting to improve efficiency , Simpler , Just use a dictionary in the function , More complicated , Please have a look at lru_cache Code implementation of . On the other hand , One of the main reasons why recursive functions are slow is that they are computationally repeated .

Python Source code of standard library , It is the most nutritious raw material for learning programming , When you are curious , Might as well peep into the source code , I believe you will have new harvest . Today's sharing is here , If there is a harvest , Please thumb up 、 Looking at 、 forward 、 Focus on , Thank you for your support .

copyright notice
author[somenzz],Please bring the original link to reprint, thank you.
https://en.pythonmana.com/2022/01/202201301736074527.html

Random recommended