Hash + Hamming distance for similarity measurement

Hash + Hamming distance for similarity measurement

Hash algorithm

aHash

  1. Definition

    aHash based on low frequency mean hash: too strict to search thumbnails

  2. Algorithm steps

    • Reduce picture size
    • Turn it into a grayscale image
    • Calculate gray mean
    • According to the comparison with the gray mean, the binary image is obtained
    • Combining binary graphs into hash values (fingerprints) in order
  3. Code implementation (python)

    def get_aHash(image_dir):
     from functools import reduce
     image = Image.open(image_dir).resize((8, 8), Image.ANTIALIAS).convert('L')
     avg = reduce((lambda x,y:x+y),image.getdata())/64
     #aHash_value_binary = reduce(lambda x,y: str(x)+str(y),map(lambda i: 0 if i < avg else 1, image.getdata())) # Binary representation
     aHash_value_decimal = reduce(lambda x, y_z: x | (y_z[1] << y_z[0]),enumerate(map(lambda i: 0 if i < avg else 1, image.getdata())),0)
     return aHash_value_decimal
    

pHash

  1. Definition

    Low frequency perceptual hash algorithm of pHash enhanced version: to avoid the influence of gamma correction or color histogram adjustment

  2. Algorithm steps

    • Reduce picture size
    • Turn it into a grayscale image
    • DCT transformation of computing image
    • Reduce DCT
    • Calculate the mean value of DCT
    • Binary graph obtained by comparison and judgment
    • Combining binary graphs into hash values (fingerprints) in order
  3. Code implementation (python)

    def get_pHash(image_dir):
     import cv2
     from functools import reduce
     image = cv2.imread(image_dir,cv2.IMREAD_GRAYSCALE)
     image = cv2.resize(image,(32,32),interpolation=cv2.INTER_AREA)
     image = image.astype('float')
     image = cv2.dct(image) 
     image = cv2.resize(image,(8,8))
     avg_dct = sum(sum(image))/64
     image = sum(image.tolist(),[])
     #pHash_value_binary = reduce(lambda x,y:str(x)+str(y),map(lambda x: 0 if x<avg_dct else 1 , image[::-1]))  #It doesn't matter the order of the combination, as long as the pictures compared are the same
     pHash_value_decimal = reduce(lambda x, y_z: x | (y_z[1] << y_z[0]),enumerate(map(lambda i: 0 if i < avg_dct else 1, image)),0)
     return pHash_value_decimal
    

dHash

  1. Definition

    dHash differential hash algorithm: faster than pHash and better than aHash

  2. Algorithm steps

    • Reduce image size (9 * 8)
    • Turn it into a grayscale image
    • Calculate the difference value (the comparison size of two adjacent values in each row) to get the binary graph
    • Combining binary graphs into hash values (fingerprints) in order
  3. Code implementation (python)

     @param:Picture path
     @return:Decimal representation of image difference hash
    
     def get_dHash(image_dir):
         import cv2
         image = cv2.imread(image_dir,cv2.IMREAD_GRAYSCALE)
         image = cv2.resize(image,(9,8),interpolation=cv2.INTER_AREA)
         image_list = sum(image.tolist(),[])
         hashString = ''
         for i in range(1,len(image_list)):
             if i%(image.shape[1])!=0:
                 if image_list[i-1]>image_list[i]:
                     hashString += '1'
                 else:
                     hashString += '0'
         dHash_value_decimal = int(hashString,2)
         return dHash_value_decimal 
    

Hamming distance

  1. Definition

    In information theory, the Hamming distance between two equal length strings is the number of different characters corresponding to the positions of two strings. In other words, it is the number of characters that need to be replaced to transform one string into another.

  2. Code implementation (python)

     @param: Two hash values to compare
     @return: Distance (number of mismatches)
     def hamming(h1,h2):
         d = h1^h2
         res = bin(d).count('1')
         return res   
    

supplement

  1. Methods in imagehash library can be called directly in python
Installation: pip install imagehash
//Method:
    phash(image, hash_size=8, highfreq_factor=4)
    average_hash(image, hash_size=8)
    dhash(image, hash_size=8)
    whash(image, hash_size = 8, image_scale = None, mode = 'haar', remove_max_haar_ll = True)
  1. In large-scale image search, the hash feature of image can be stored in es, and then matching search can be carried out
Published 20 original articles, won praise and 10000 visitors+
Private letter follow

Tags: Lambda Python pip

Posted on Thu, 16 Jan 2020 04:40:03 -0500 by bernouli