# Hash + Hamming distance for similarity measurement

## Hash algorithm

### aHash

1. Definition

aHash based on low frequency mean hash: too strict to search thumbnails

2. Algorithm steps

• Reduce picture size
• Turn it into a grayscale image
• Calculate gray mean
• According to the comparison with the gray mean, the binary image is obtained
• Combining binary graphs into hash values (fingerprints) in order
3. Code implementation (python)

```def get_aHash(image_dir):
from functools import reduce
image = Image.open(image_dir).resize((8, 8), Image.ANTIALIAS).convert('L')
avg = reduce((lambda x,y:x+y),image.getdata())/64
#aHash_value_binary = reduce(lambda x,y: str(x)+str(y),map(lambda i: 0 if i < avg else 1, image.getdata())) # Binary representation
aHash_value_decimal = reduce(lambda x, y_z: x | (y_z << y_z),enumerate(map(lambda i: 0 if i < avg else 1, image.getdata())),0)
return aHash_value_decimal
```

### pHash

1. Definition

Low frequency perceptual hash algorithm of pHash enhanced version: to avoid the influence of gamma correction or color histogram adjustment

2. Algorithm steps

• Reduce picture size
• Turn it into a grayscale image
• DCT transformation of computing image
• Reduce DCT
• Calculate the mean value of DCT
• Binary graph obtained by comparison and judgment
• Combining binary graphs into hash values (fingerprints) in order
3. Code implementation (python)

```def get_pHash(image_dir):
import cv2
from functools import reduce
image = cv2.resize(image,(32,32),interpolation=cv2.INTER_AREA)
image = image.astype('float')
image = cv2.dct(image)
image = cv2.resize(image,(8,8))
avg_dct = sum(sum(image))/64
image = sum(image.tolist(),[])
#pHash_value_binary = reduce(lambda x,y:str(x)+str(y),map(lambda x: 0 if x<avg_dct else 1 , image[::-1]))  #It doesn't matter the order of the combination, as long as the pictures compared are the same
pHash_value_decimal = reduce(lambda x, y_z: x | (y_z << y_z),enumerate(map(lambda i: 0 if i < avg_dct else 1, image)),0)
return pHash_value_decimal
```

### dHash

1. Definition

dHash differential hash algorithm: faster than pHash and better than aHash

2. Algorithm steps

• Reduce image size (9 * 8)
• Turn it into a grayscale image
• Calculate the difference value (the comparison size of two adjacent values in each row) to get the binary graph
• Combining binary graphs into hash values (fingerprints) in order
3. Code implementation (python)

``` @param:Picture path
@return:Decimal representation of image difference hash

def get_dHash(image_dir):
import cv2
image = cv2.resize(image,(9,8),interpolation=cv2.INTER_AREA)
image_list = sum(image.tolist(),[])
hashString = ''
for i in range(1,len(image_list)):
if i%(image.shape)!=0:
if image_list[i-1]>image_list[i]:
hashString += '1'
else:
hashString += '0'
dHash_value_decimal = int(hashString,2)
return dHash_value_decimal
```

## Hamming distance

1. Definition

In information theory, the Hamming distance between two equal length strings is the number of different characters corresponding to the positions of two strings. In other words, it is the number of characters that need to be replaced to transform one string into another.

2. Code implementation (python)

``` @param: Two hash values to compare
@return: Distance (number of mismatches)
def hamming(h1,h2):
d = h1^h2
res = bin(d).count('1')
return res
```

## supplement

1. Methods in imagehash library can be called directly in python
```Installation: pip install imagehash
//Method:
phash(image, hash_size=8, highfreq_factor=4)
average_hash(image, hash_size=8)
dhash(image, hash_size=8)
whash(image, hash_size = 8, image_scale = None, mode = 'haar', remove_max_haar_ll = True)
```
1. In large-scale image search, the hash feature of image can be stored in es, and then matching search can be carried out  Published 20 original articles, won praise and 10000 visitors+

Tags: Lambda Python pip

Posted on Thu, 16 Jan 2020 04:40:03 -0500 by bernouli