current position:Home>[Python] the characteristics of dictionaries and collections and the hash table behind them

[Python] the characteristics of dictionaries and collections and the hash table behind them

2022-02-01 07:31:30 Hazelnut

「 This is my participation 11 The fourth of the yuegengwen challenge 3 God , Check out the activity details :2021 One last more challenge

The characteristics of a dictionary

  • Key queries are fast : Dictionary types provide fast access regardless of the amount of data —— As long as the dictionary can be stored in memory .
  • When to dict When a hash conflict occurs when a new key is added to the , New keys may be scheduled to be stored in another location . But whether the dictionary is equivalent or not has nothing to do with the order in which the keys are added .
  • Adding new keys to a dictionary may change the order of existing keys : Whenever you add a new key to the dictionary ,Python The interpreter may decide to expand the dictionary . The result of the expansion is to create a larger hash table , And add the existing elements in the dictionary to the new table . New hash conflicts may occur in the process , Causes the order of the keys in the new hash table to change .
    • So don't iterate and modify the dictionary at the same time . If you want to scan and modify a dictionary , It's best to do it in two steps :
      1. First iterate over the dictionary , To get what needs to be added , Put these in a new dictionary ;
      2. After the iteration, update the original dictionary .
  • Because the dictionary uses hash tables , And the hash table must be sparse , This leads to its inefficiency in space . If you need to store a large number of records , It's a better choice to put it in a list of tuples or named tuples .
    • In a user-defined type ,__slots__ Property can change how instance properties are stored , from dict become tuple .

Set characteristics

  • The elements in the collection must be hashable .
  • Collections consume a lot of memory .
  • It is very efficient to determine whether an element exists in a collection .
  • The order of elements depends on the order in which they are added to the collection .
  • Add elements to the collection , It may change the order of the existing elements in the collection .

dict and set behind : Hash table

  • Use... In a dictionary with 10 million floating-point numbers in Finding a number takes about a third of a microsecond (0.000000337s). Using the list requires 0.01s.
  • Python Try to make sure that about one third of the table elements are empty , So when this threshold is almost reached , The original hash table will be copied into a larger space .
  • Built in hash() Method can be used for all built-in type objects . If it is a custom object call hash() Words , It's actually running custom __hash__ .
  • If two objects are equal when compared , Then their hash values must be equal . for example , If 1 == 1.0 It's true , that hash(1) == hash(1.0) And it has to be true .
  • from Python 3.3 Start ,strbytes and datetime There is more random in the calculation of hash value of object “ Add salt ” This step . The added salt value is Python A constant in the process , But every time it starts Python The interpreter will generate a different salt value . Random salt values are added to prevent DOS Attack A safety measure .
  • In order to obtain my_dict[search_key] The value behind it ,Python The first call hash(search_key) To calculate search_key Hash value , Take the lowest digits of this value as the offset , Look up table elements in the hash table ( How many people do you want , It depends on the size of the current hash table ). If the table element found is empty , Throw out KeyError different often . If it's not empty , Then there will be a pair of found_key:found_value . Now Python Will test search_key == found_key Is it true , If they are equal , It will return found_value . In case of hash conflict , Then another part will be taken in the hash value , Get another line in the hash table .
  • A hashable object must satisfy the following requirements :
    1. Support hash() function , And through __hash__() The hash value obtained by method is invariant .
    2. Supported by __eq__() Method to detect equality .
    3. if a == b It's true , be hash(a) == hash(b) It's true . All user-defined objects are hashable by default , Because their hash values are determined by id() To get , And they're all unequal .
  • If you implement a class __eq__ Method , And hope it's hashable , Then it must have a proper __hash__ Method , Guaranteed at a == b If it's true hash(a) == hash(b) It must be true . On the other hand , If one contains custom __eq__ Dependent classes are in a mutable state , Then don't implement it in this class __hash__ Method , Because its instances are not hashable .

copyright notice
author[Hazelnut],Please bring the original link to reprint, thank you.

Random recommended