In the direction of image recognition, features can be extracted by sift, surf, orb and other algorithms, then fed to a medium granularity vector2 algorithm, and finally classified.

# 1. sift

## 1.1 introduction to SIFT features

SIFT (scale invariant feature transform) feature, namely scale invariant feature transform, is a feature extraction algorithm of computer vision, which is used to detect and describe local features in images.

In essence, it is to find key points (feature points) in different scale spaces and calculate the direction of key points. The key points found by SIFT are some very prominent points that will not change due to illumination, affine transformation and noise, such as corner points, edge points, bright spots in dark areas and dark spots in bright areas.

## 1.2 sift feature extraction steps

1. Extreme value detection in scale space: Scale space refers to a change scale( σ) Two dimensional Gaussian function G(x,y, σ) In the space formed after convolution (i.e. Gaussian blur) with the original image I(x,y), the scale invariant feature should be both the local extremum in the spatial domain and the scale domain. The general principle of extremum detection is to find the local extremum according to the difference of Gaussian (dog) in different scales. The points corresponding to these extremum are called key points or feature points.

2. Key point positioning: Too many key points may be found in different size spaces, and some key points may be relatively difficult to identify or easily disturbed by noise. In this step, each key point is located by the information of pixels near the key point, the size of the key point and the principal curvature of the key point, so as to eliminate the key points on the edge or easily disturbed by noise.

3. Orientation: In order to make the descriptor rotation invariant, we need to use the local features of the image to assign a reference direction to each key point. By calculating the direction histogram of the local neighborhood of the key point, we can find the direction of the maximum value in the histogram as the main direction of the key point.

4. Key point descriptor: After finding the position and size of the key point and giving the direction to the key point, the invariance of its movement, scaling and rotation can be ensured. In addition, a descriptor vector needs to be established for the key point to maintain its invariance under different light and viewing angles. SIFT descriptor is a representation of the statistical results of Gaussian image gradient in the neighborhood of the key point, as shown in the figure below. Through The image area is divided into blocks, the gradient histogram in the block is calculated, and a unique vector is generated. This vector is an abstraction of the image information in the region and has uniqueness. Lowe suggested in the original paper that the descriptor use the gradient information in 8 directions calculated in the window in the key point scale space, with a total of 44 * 8 = 128 dimensional vector representation. (the vector implemented in opencv is also 128 dimensional)

For details, please refer to this blog: SIFT feature extraction algorithm summary - Liu Xiaoshen - blog Park

# 2. surf

## 2.1 introduction to surf features

SURF (accelerated up robust features) is a robust image recognition and description algorithm. It is an efficient variant of sift and also extracts scale invariant features. The algorithm steps are roughly the same as SIFT algorithm, but the method is different, which is more efficient than SIFT algorithm (just as its name). SURF uses Hessian The determinant value of the matrix is used for feature point detection and the integral graph is used to accelerate the operation; the descriptor of SURF is based on the response of 2D discrete wavelet transform and makes effective use of the integral graph.

## 2.2 surf feature extraction steps

1. Feature point detection: SURF uses Hessian matrix to detect feature points, which is the second derivative matrix in X and Y directions. It can measure the local curvature of a function. Its determinant value represents the variation around the pixel point, and the feature points need to take the extreme value of the determinant value. Replace the Gaussian filter in SIFT with square filter, and use the integral graph (calculate the values at the four corners of the filter Square) Greatly improve the operation speed.

2. Feature point positioning: Similar to SIFT, feature points are located by feature point adjacent information interpolation.

3. Orientation: By calculating the Haar wavelet transform in the x and Y directions of the pixels around the feature points, and adding the transform values in the x and Y directions in an angle interval of the xy plane to form a vector, the longest of all vectors (i.e. the largest x and y components) is the direction of the feature points.

4. Feature descriptor: After the direction of the feature point is selected, the phase pixel points around it need to establish a descriptor based on this direction. At this time, 55 pixels are taken as a sub region, and the range of 2020 pixels around the feature point is taken as 16 sub regions, and the sum of Haar wavelet transform in the X and y directions (x in the horizontal feature point direction and y in the vertical feature point direction) in the sub region is calculated Σ dx, Σ dy Σ dx, Σ Sum of Dy and its vector length Σ| dx|, Σ| dy| Σ| dx|, Σ| Dy| there are four quantities, which can produce a 64 dimensional descriptor.

For details, please refer to this blog:

Section 13 SURF feature extraction algorithm - big Altman fighting small Monsters - blog Park

# 3. orb

## 3.1 introduction to orb features

ORB(Oriented FAST and Rotated BRIEF) The feature detection algorithm is proposed on the basis of the famous fast feature detection and brief feature descriptor. Its running time is much better than SIFT and SURF, and can be applied to real-time feature detection. Orb feature detection has scale and rotation invariance, as well as invariance to noise and its perspective change. Its good performance is the application scenario of using orb in feature description It is very extensive. Orb feature detection is mainly divided into the following two steps: (1) direction fast feature point detection (2)BRIEF feature description.

## 3.2 orb feature extraction algorithm

- FAST feature point detection: FAST feature point detection - ☆ Ronny, blog Park
- BRIEF feature descriptor: BRIEF feature descriptor - ☆ Ronny, blog Park

# 4. Code implementation

The library used in the next code is shown in the figure below

The two libraries with red boxes are very important! Please use the version of 3.4.2.16 instead of the latest, otherwise an error will be reported during feature extraction.

Error message: sift = cv2.xfeatures2d.SIFT_create() cv2.error: OpenCV(3.4.3) C:\projects\opencv-python\opencv_contrib\modules\xfeatures2d\src\sift.cpp:1207: error: (-213:The function/feature is not implemented) This algorithm is patented and is excluded in this configuration; Set OPENCV_ENABLE_NONFREE CMake option and rebuild the library in function 'cv::xfeatures2d::SIFT::create'

If you make the above error when using the cv2.xfeatures2d.SIFT_create() function, it is because your library version is too new. Just return the version.

## 4.1 feature extraction

def sift(filename): img = cv2.imread(filename) # read file img = cv2.cvtColor(img, cv2.COLOR_BGR2GRAY) # Convert to grayscale sift = cv2.xfeatures2d_SIFT.create() keyPoint, descriptor = sift.detectAndCompute(img, None) # The key points and corresponding descriptors (feature vectors) are obtained by feature extraction return img,keyPoint, descriptor

def surf(filename): img = cv2.imread(filename) # read file img = cv2.cvtColor(img, cv2.COLOR_BGR2GRAY) # Convert to grayscale sift = cv2.xfeatures2d_SURF.create() keyPoint, descriptor = sift.detectAndCompute(img, None) # The key points and corresponding descriptors (feature vectors) are obtained by feature extraction return img, keyPoint, descriptor

def orb(filename): img = cv2.imread(filename) # read file img = cv2.cvtColor(img, cv2.COLOR_BGR2GRAY) # Convert to grayscale sift = cv2.ORB_create() keyPoint, descriptor = sift.detectAndCompute(img, None) # The key points and corresponding descriptors (feature vectors) are obtained by feature extraction return img, keyPoint, descriptor

Here's an explanation of why it is necessary to convert to grayscale images?

- The key factor in object recognition is gradient (SIFT/HOG). Gradient means edge, which is the most essential part. When calculating gradient, gray image is naturally used. Gray can be understood as the intensity of image.
- Color is easy to be affected by illumination, and it is difficult to provide key information. Therefore, graying the image can also speed up the speed of feature extraction.

Compare the extracted results

def compare(filename): imgs = [] keyPoint = [] descriptor = [] img, keyPoint_temp, descriptor_temp = sift(filename) keyPoint.append(keyPoint_temp) descriptor.append(descriptor_temp) imgs.append(img) img, keyPoint_temp, descriptor_temp = surf(filename) keyPoint.append(keyPoint_temp) descriptor.append(descriptor_temp) imgs.append(img) img, keyPoint_temp, descriptor_temp = orb(filename) keyPoint.append(keyPoint_temp) descriptor.append(descriptor_temp) imgs.append(img) return imgs, keyPoint, descriptor def main(): method = ['sift','surf','orb'] imgs, kp, des = compare('./pic/doraemon1.jpg') for i in range(3): img = cv2.drawKeypoints(imgs[i], kp[i], None) cv2.imshow(method[i], img) cv2.waitKey() cv2.destroyAllWindows() print("sift len of des: %d, size of des: %d" % (len(des[0]), len(des[0][0]))) print("surf len of des: %d, size of des: %d" % (len(des[1]), len(des[1][0]))) print("orb len of des: %d, size of des: %d" % (len(des[2]), len(des[2][0])))

The following figure shows the extracted results. From left to right are the original, sift, surf and orb respectively

sift len of des: 458, size of des: 128 surf len of des: 1785, size of des: 64 orb len of des: 500, size of des: 32

It can be seen that:

- sift extracts the least feature points, but the effect is the best.
- The dimension of feature points extracted by sift is 128 dimensions, surf is 64 dimensions and orb is 32 dimensions.

## 4.2 feature matching

BruteForce matching and FLANN matching are two common methods of opencv 2D feature point matching, corresponding to BFMatcher (BruteForce matcher) and FlannBasedMatcher respectively.

The difference between the two is that BFMatcher always tries all possible matches, so that it can always find the best match, which is also the original meaning of Brute Force. In FlannBasedMatcher, FLANN means fast library for approximate nearest neighbors. From the literal meaning, it is an approximation method. The algorithm is faster, but the nearest neighbor approximate matching is found. Therefore, FlannBasedMatcher is often used when we need to find a relatively good matching but do not need the best matching. Of course, you can also improve the matching accuracy or algorithm speed by adjusting the parameters of FlannBasedMatcher, but the algorithm speed or algorithm accuracy will be affected accordingly.

In this paper, the optimal feature point matching is carried out, so BruteForce Matcher is selected.

def match(filename1, filename2, method): if(method == 'sift'): img1, kp1, des1 = sift(filename1) img2, kp2, des2 = sift(filename2) bf = cv2.BFMatcher(cv2.NORM_L2, crossCheck=True) # Norm should be used for norm type of sift_ L2 or NORM_L1 matches = bf.match(des1, des2) matches = sorted(matches, key=lambda x: x.distance) knnMatches = bf.knnMatch(des1, des2, k=1) # drawMatchesKnn if (method == 'surf'): img1, kp1, des1 = surf(filename1) img2, kp2, des2 = surf(filename2) bf = cv2.BFMatcher(cv2.NORM_L2, crossCheck=True) # The normType of surf should use NORM_L2 or NORM_L1 matches = bf.match(des1, des2) matches = sorted(matches, key=lambda x: x.distance) knnMatches = bf.knnMatch(des1, des2, k=1) # drawMatchesKnn if(method == 'orb'): img1, kp1, des1 = orb(filename1) img2, kp2, des2 = orb(filename2) bf = cv2.BFMatcher(cv2.NORM_HAMMING, crossCheck = True) # normType of orb should use NORM_HAMMING matches = bf.match(des1, des2) matches = sorted(matches, key=lambda x: x.distance) knnMatches = bf.knnMatch(des1, des2, k = 1) # drawMatchesKnn # filter for m in matches: for n in matches: if(m != n and m.distance >= n.distance*0.75): matches.remove(m) break img = cv2.drawMatches(img1, kp1, img2, kp2, matches[:50], img2, flags=2) cv2.imshow("matches", img) cv2.waitKey() cv2.destroyAllWindows() def main(): method = ['sift','surf','orb'] for i in range(3): match('./pic/wechat1.jpg', './pic/wechat2.png', method[i]) if __name__ == '__main__': main()

Introduce several key functions.

The first is the cv2.BFMatcher(normType, corssCheck) function. It has two parameters.

- The first parameter is used to specify the type of distance test to use. The default value is cv2.Norm_L2. This is very suitable for SIFT, SURF, etc. (cv2.NORM_L1 is also OK). Cv2.norm should be used for ORB,BRIEF,BRISK algorithms using binary descriptors_ Hamming, which returns the Hamming distance between the two test objects.
- The second parameter is the boolean variable crossCheck, which defaults to False. If it is set to True, the matching conditions will be more stringent. The best match (i,j) will be returned only when the i-th feature point in A is closest to the j-th feature point in B, and the j-th feature point in B is also closest to the i-th feature point in A (no other point in A is closer to j). That is, the two feature points should match each other.

Then bf.match(). It also has two parameters. The first one is the vector for query, and the last one is the vector for matching.

The sorted() function is used to sort the matching results by distance.

knnMatch() Is another method of the BFMatcher object, and the BFMatcher.match() method returns the best match. This method returns K best matches for each key point (the first k are selected after descending arrangement), where k is set by the user.

(Note: knnMatch() and match() do not return the same result)

The matching results are filtered to exclude some bad matching results.

drawMatch() Function can draw the matching points of two graphs. The parameters are as follows:

- img1 – source image 1
- keypoints1 – feature point of source image 1
- img2 – source image 2
- keypoints2 – feature points of source image 2
- Matchs1to2 – the feature points of the source image 1 match the feature points of the source image 2
- outImg - the output image is determined by flags
- matchColor – the matching color (feature points and lines). If matchColor==Scalar::all(-1), the color is random
- singlePointColor – the color of a single point, that is, unpaired feature points. If matchColor==Scalar::all(-1), the color is random
- Matchsmask – Mask determines which points will be drawn. If it is empty, all matching points will be drawn
- flags – Fdefined by DrawMatchesFlags.

Next, let's take a look at the results of the above code. From top to bottom, it is the original image, sift, surf and orb

sift size of kp: 59, after filtering: 20 surf size of kp: 197, after filtering: 35 orb size of kp: 390, after filtering: 47

From the output results, orb has the best effect. If you are interested, you can use other pictures to see the effect. pic folder also provides other two groups of pictures for comparison.

# 5. Summary

Feature based matching is divided into two steps: feature point extraction and matching. This paper mainly compares three methods of feature point extraction, namely SIFT, SURF and ORB, which have been implemented in OpenCV. SURF is basically a comprehensive upgraded version of SIFT. SIFT is basically not considered with SURF, and ORB's strength lies in the calculation time. The following specific comparison:

calculation speed: ORB>>SURF>>SIFT(One order of magnitude each) Rotation robustness: SURF>ORB~SIFT((almost) Fuzzy Robustness: SURF>ORB~SIFT Scale transformation robustness: SURF>SIFT>ORB(ORB (without scale transformation)

Therefore, the conclusion is that if the requirements for real-time calculation are very high, ORB algorithm can be selected, but it is basically necessary to ensure direct shooting; If the implementation requirements are slightly higher, SURF can be selected; SIFT is basically not used.

reference resources: Comparison of three feature detection algorithms for SURF SIFT ORB_ zilanpotou182 blog - CSDN blog

However, the above blog comments put forward different views, and the correctness needs to be verified.

# 6. Appendix

GitHub: Computer-Vision/feature-extraction at master · Multhree/Computer-Vision · GitHub