current position:Home>What are the steps of simhash algorithm in Python data mining?

What are the steps of simhash algorithm in Python data mining?

2022-07-24 19:42:41Alibaba cloud Q & A

Python Data mining ,SimHash What are the steps of the algorithm ?




Take the answer 1:

simhash The algorithm is divided into 5 A step :

1.  participle : Get an effective eigenvector , Each eigenvector is set 1-5 etc. 5 A level of weight .

2. Hash: Calculate the hash value ,hash The value is a binary number 01 Composed of n-bit Signature . 

3.  weighting : Weight all the eigenvectors , namely W= Hash * weight . 

4.  Merge : Add up the weighted results of the above eigenvectors , Become a sequence string .

5.  Dimension reduction : The above cumulative results , If it is greater than 0 Then put 1, Otherwise, set 0, So we can get the definition of the statement simhash value , In the end, we can use different sentences simhash Hamming distance to judge their similarity .


copyright notice
author[Alibaba cloud Q & A],Please bring the original link to reprint, thank you.
https://en.pythonmana.com/2022/205/202207241836464878.html

Random recommended