current position:Home>Why can't list be used as dictionary key value in Python

Why can't list be used as dictionary key value in Python

2022-02-01 05:17:04 yizdu

background

To put it simply , When we're in Python Try using a list as a key value , There will be a prompt “ Do not hash (unhashable)” error .

In [1]: la=[1,2]

In [2]: d={}

In [3]: d[la]=3
---------------------------------------------------------------------------
TypeError                                 Traceback (most recent call last)
<ipython-input-3-79b9de59ffa1> in <module>
----> 1 d[la]=3

TypeError: unhashable type: 'list'
 Copy code 

More broadly , about Python All built-in variable types in ( list 、 Dictionaries 、 aggregate ), We can't use it as a dictionary key . To answer why , Let's start with the general way lists and dictionaries work , Then discuss why it can't be like this .

Variable type and immutable type

We first need to understand the concepts of mutable and immutable types .Python Variable and immutable types in , In fact, it corresponds to... In other languages “ Value type ”、“ Reference type ”.

Python Data types in , It can be divided into variable and immutable . The immutable type is number (intfloat)、 character string (str)、 Boolean (bool)、 Tuples (tuple), Variable types have lists (list)、 Dictionaries (dict)、 aggregate (set), Of course , There are also user-defined classes .

Python Immutable types in , Semantics and implementation are immutable . For example, when we give an integer object a When assigning or modifying values , We will find that ,a Of id Changed , Because numbers are immutable types , When we modify variables a The value of , We don't store values in the original 88888 Modify in memory space . Instead, the original memory 88888 The destruction , Then open up a new memory space to write 9999 And give the object a.

In [40]: a=88888

In [41]: id(a)
Out[41]: 140691892091152

In [42]: a=9999

In [43]: id(a)
Out[43]: 140691892092048
 Copy code 

For variable types , If you modify it , It will be modified in the original memory space . Because the list la An object is a mutable object , That is, the reference object in other languages . Store the object la In memory space of , Not directly la The content of , Instead, it contains a pointer to the real storage la The address of the content .

In [44]: la=[1,2]

In [45]: id(la)
Out[45]: 140691892029824

In [46]: la[1]=3

In [47]: id(la)
Out[47]: 140691892029824

In [48]: la.append(4)

In [49]: id(la)
Out[49]: 140691892029824
 Copy code 

Variable types have an obvious feature , If you pass it directly to other objects , Whether it's a new variable 、 The function is also a member of other containers or classes . The object being passed can have its value ( That's the content ) Make changes , And affect the original variable object . let me put it another way , After passing a mutable object directly to another object , These objects will share the same value , Changes to them will affect each other . The following example shows this .

In [72]: la=[1,2]

In [73]: lb=la

In [74]: lc=[4,5,6,la]

In [75]: def change_la(la):
    ...:     la.append(999)
    ...: 

In [76]: change_la(la)

In [77]: la
Out[77]: [1, 2, 999]

In [78]: lb
Out[78]: [1, 2, 999]

In [79]: lc
Out[79]: [4, 5, 6, [1, 2, 999]]
 Copy code 

How dictionaries work

We'll briefly talk about how hash functions and dictionaries work .

The meaning of hash function

The mathematical definition of hash function , Is to input data of any length , It is transformed into fixed length output by hash algorithm , This output is the hash value . If the same hash function outputs two different hash values , Then their input will be different . But two different inputs , It is possible to produce the same output , This is called hash collision .

Specific to software engineering , Hash function is often used as a method to judge whether two objects are the same , Put some important attributes in an object ( value , Memory address, etc ) Hash and get a fixed length value , This value can be treated as a similar expression of the object “ The fingerprint ”、“ Abstract ” Identity information . Because the hash value data structure and length are fixed , It's easy to compare . If there are two objects with the same hash value , We can think of them as the same object for the time being , But because of hash conflicts , We need to further judge whether the important attributes of the two objects are the same .

The function and Realization of Dictionary

Dictionaries (dict) Hash table 、 Hash table . Is the use of hash functions (hash function) A data structure for features , The fastest time complexity that can be achieved is O(1) Query operation of .

A common implementation is , The hash table will first open up a bucket array , For the inserted key value pair , Hash the key , Then through further operation , For example, the hash value and the length of the bucket array are taken out , Finally decide which bucket to put the key value pair into , Buckets can be containers such as linked lists or arrays . Because of the existence of hash collision , The bucket can usually hold more than one key value pair . When performing a query operation , We perform the same calculation on the key when inserting , The subscript of the bucket storing the target data can be found in multiple buckets at once ( This is also the main reason why hash tables are efficient ), Then look for the value corresponding to the key in the bucket .

Let's refine the query process , Reference resources Python official wiki And translation , The dictionary search process can use the following lookup Function display . Of course , In fact, the design of the dictionary is much more complicated than this , Let's mainly understand the concept here .

def lookup(d, key):
    ''' The search of the dictionary is completed in three steps  1.  Use the hash function to calculate the hash value of the key . 2.  Locate the dictionary bucket array through the hash value (d.data) Somewhere in the world , We should be able to   Get an array called a bucket or conflict list . It contains key value pairs  ((key,value)). 3.  Traverse this array called bucket or conflict list , If a key and an incoming parameter key are found in the bucket  (key) Equal key value pairs , The value will be (value) Return as return value . '''
    h = hash(key)                  #  step  1
    cl = d.data[h]                 #  step  2
    for pair in cl:                #  step  3
        if key == pair[0]:
            return pair[1]
    else:
        raise KeyError, " Key not found  %s ." % key
 Copy code 

You can notice , Query a key , The key point is to get its hash value , And address in the bucket array through the hash value , Then compare whether there is the same key value pair as the incoming key value in the bucket , Finally, return the value of the key value pair .

Let's go back to Python In itself , Reference resources Python Official statement :

To be used as a dictionary key, an object must support the hash function (e.g. through hash), equality comparison (e.g. through eq or cmp), and must satisfy the correctness condition above.

A data type to be used as a dictionary key , Object must also support hashing ( For example, through __hash__ Method ) And judge whether they are equal ( For example, through __eq__ or __cmp__ Method ) These two operations .

In [80]: hash(la)
---------------------------------------------------------------------------
TypeError                                 Traceback (most recent call last)
<ipython-input-80-e4e0d0f14f47> in <module>
----> 1 hash(la)

TypeError: unhashable type: 'list'
 Copy code 

As for the title of the article , In fact, we can already answer here . That is, the list type does not implement __hash__ Method , Therefore, the hash value cannot be taken , Then it cannot be used as a dictionary key . But a little deeper , We want to discuss why we don't realize __hash__ Method .

Problems caused by allowing the list to get hash values

For dictionaries , If we abstract its function again , Can be described as . Insert a key value pair into the dictionary , We hope to pass “ alike ” The key can take out the value . And how to define two keys ( Or object ) identical , It's about hash functions .

That's the problem , Because of the properties of lists and hash functions , It's hard for us to make a list of “ identical ” Make an intuitive 、 A non confusing definition .

You may want to say , The contents of the two lists , That is, the length and element order are the same , It means that the two lists are the same , Isn't this definition easy to give ? actually , Lists do support equality operations ( Realized __eq__ Method ), But for a hash table key , There are still many problems .

Let's be direct , Consider designing such a list , It supports the same operation of judging equality as the native list , And its hash method , It is also used to calculate the content . Two lists with the same content will have the same hash , As we imagine .

class MyList(object):
    def __init__(self, value):
        self.value = value

    def __repr__(self):
        return str(self.value)

    def __str__(self):
        return str(self.value)

    def __eq__(self, other):
        return self.value == other.value

    def __hash__(self):
		#  Convert list to string , Then convert each character of the string into a number 
		#  for example :[1,2] -> "[1, 2]" -> 914944325093
        hashValue = ""
        for c in str(self.value):
            hashValue += str(ord(c))
        return int(hashValue)
 Copy code 

The usage is as follows

In [47]: ml1=mylist.MyList([1,2])

In [48]: ml1
Out[48]: [1, 2]

In [49]: d1={}

In [50]: d1[ml1]=999

In [51]: print(d1)
{[1, 2]: 999}

In [52]: ml2=mylist.MyList([1,2])

In [53]: d1[ml2]
Out[53]: 999
 Copy code 

It seems that this is very much in line with our imagination , But if we modify it ml1 Well ?

In [54]: ml1.value.append(3)

In [55]: ml1
Out[55]: [1, 2, 3]

In [56]: ml2
Out[56]: [1, 2]

In [57]: print(d1)
{[1, 2, 3]: 999}

In [58]: d1[ml1]
---------------------------------------------------------------------------
KeyError                                  Traceback (most recent call last)
<ipython-input-58-1e9c8efcdfed> in <module>
----> 1 d1[ml1]

KeyError: [1, 2, 3]

In [59]: d1[ml2]
---------------------------------------------------------------------------
KeyError                                  Traceback (most recent call last)
<ipython-input-59-aff57dc87ac2> in <module>
----> 1 d1[ml2]

KeyError: [1, 2]
 Copy code 

We can find out , If we are right ml1 It's been modified , Dictionaries d1 The key values in the also changed , This is because lists are mutable objects , Multiple references share one content , One change affects all references . If this time , We want to pass ml1 Extract value , It will give you an error , The reason is the modified ml1 The content has changed , The hash value has also changed , The dictionary cannot locate the original location of the key value pair through the new hash value . Besides , If we pass the content and hash value and change before ml1 It's all the same ml2 Try to remove d1 Value , It can also be wrong . Because the dictionary query needs to calculate the hash to find the subscript of the bucket , Also, to avoid hash collision, check again whether the query key is really equal to the key in the bucket , At this time, the key in the dictionary is because ml1 Changes with the modification of , therefore ml2 The value of is not equal to the key value in the bucket , Can't get the value .

We can also think about other hashing methods , For example, through the... Of the object id That is, the memory address is used to calculate the hash , But there are more obvious problems . The key to query the dictionary , Can only be a reference based on the initial Insert Object , Objects and literals with the same value will never find the desired result , Because their memory addresses are different . Although in fact , Hashing the memory address of an object of a user-defined class is the default operation , But that's easier to use in object-oriented thinking , For the basic type of built-in list , Using this scheme is counter intuitive .

In [98]: ml1=mylist.MyList([1,2])

In [99]: ml2=mylist.MyList([1,2])

In [100]: ml3=ml1

In [101]: print(id(ml1),id(ml2),id(ml3))
140172863009408 140172873557376 140172863009408

In [102]: ml1==ml2==ml3
Out[102]: True

In [103]: d1={}

In [104]: d1[ml1]=999

In [105]: print(d1)
{[1, 2]: 999}

In [106]: d1[ml1]
Out[106]: 999

In [107]: d1[ml3]
Out[107]: 999

In [108]: d1[ml2]
---------------------------------------------------------------------------
KeyError                                  Traceback (most recent call last)
<ipython-input-108-aff57dc87ac2> in <module>
----> 1 d1[ml2]

KeyError: [1, 2]

In [109]: d1[mylist.MyList([1,2])]
---------------------------------------------------------------------------
KeyError                                  Traceback (most recent call last)
<ipython-input-109-bcd49cc44bab> in <module>
----> 1 d1[mylist.MyList([1,2])]

KeyError: [1, 2]

 Copy code 

The reason is , The hash value used for indexing is generated based on an attribute of the object at a certain time , Is static . And after finding the corresponding bucket , Find the key value pair matching the incoming key from the bucket , Is based on the object itself . For immutable types , Their hash value and key value pairs in the bucket are unchanged . For variable types , Their changes will lead to the dynamic changes of key value pairs in the bucket , But the hash index is still static , This creates a contradiction .

Solution

Use tuples instead of lists , Tuples are immutable types , Its hash function is generated based on the content itself , The way you use it is intuitive .

Reference resources

[1] Why Lists Can't Be Dictionary Keys

[2] Why can't I use a list as a dict key in python?

copyright notice
author[yizdu],Please bring the original link to reprint, thank you.
https://en.pythonmana.com/2022/02/202202010516596627.html

Random recommended