[novice] learn from opencv, week 1: from reading pictures to template matching

No other use, mainly to record their own learning process.

1, Data reading

  • Picture reading
cv2.imread():Function for reading

@param_1  : filename Name of file to be loaded. # File address
@param_2  :  flags Flag that can take values of cv::ImreadModes # Method of reading file

# There are three options
	cv2.IMREAD_COLOR : Load color pictures. This parameter is default. You can write 1 directly
	cv2.IMREAD_GRAYSCALE : Loading pictures in grayscale mode, you can write 0 directly 
	cv2.IMREAD_UNCHANGED : include alpha,Can write directly -1
img = cv2.imread('path')  # path: to read the address of the picture. The format read in is BGR
cv2.imshow('image',img)   # param_1: display picture window name, param_2: pictures to be displayed 

# You need two sentences at the end, otherwise it will be displayed all the time
cv2.waitKey(0)            # Here 0 is any key, you can also set it to the key you want
cv2.destroyAllWindows()   # Close all windows

# Read a picture and display it in the form of gray image
img_gr = cv2.imread('jinnie.jpeg', cv2.IMREAD_GRAYSCALE)
  • Video reading
vc = cv2.VideoCapture('file_path') # Enter file address
# Check whether it is opened correctly
open = vc.isOpened() # Returns a Boolean value
while open:
    # Read the video by frame, open returns bool, frame is the image of each frame, three-dimensional matrix, and the format is BGR
    ret, frame = vc.read()
    # If an empty frame is read, it ends
    if frame is None:
    # Open correctly
    if ret ==True:
        # Convert each frame image into a grayscale image: cv2.cvtColor(p1,p2) is the color space conversion function, p1 is the image to be converted, and p2 is the format to be converted
        gray = cv2.cvtColor(frame, cv2.COLOR_BGR2GRAY)
        cv2.imshow('result', gray)
        #  The value in cv2.waitKey(10) here needs to be handled by yourself. Less is double speed, and more is slow
        if cv2.waitKey(10) & 0xFF == 27:
  • pictures saving
cv.imwrite(path,img) # Write a picture
  • Image fusion
IMG = cv2.addWeighted(img1, a1, img2, a2, b)
a1 : img1 Weight required
a2 : img2 Weight required
b  : Offset 
Like new_ph = a1*img1 + a2*img2 +b 
  • Picture display, window adjustment
Method 1
# Window size cannot be changed:

# Window size adaptive scale:

# Keep window size proportional:

# The display color becomes dark (it seems useless and can't be understood):

# Relationship with cv2.imshow
cv2.imshow('Window title',image),If there is no front cv2.namedWindow,Automatically execute one first cv2.namedWindow()
cv2.nameWindow('filename', cv2.WINDOW_AUTOSIZE)

Method 2:
src: input image 
dsize: The size of the output image. If the parameter is 0, it means that the size after scaling needs to be calculated by formula, dsize = Size(round(fx*src.cols),round(fy*src.rows)). among fx And fy It's an image Width Direction and Height The scale of the direction.
fx: Width Scale of the direction, if 0, according to dsize * width/src.cols calculation
fy: Height Scale of the direction, if 0, according to dsize * height/src.rows calculation
interpolation: The interpolation algorithm type, or interpolation method, is bilinear interpolation by default(5 Two insertion methods)
	INTER_NEAREST: Nearest neighbor interpolation
	INTER_LINEAR: Linear interpolation (default)
	INTER_AREA: Regional interpolation
	INTER_CUBIC: Cubic spline interpolation
	INTER_LANCZOS4: Lanczos interpolation
cv2.resize(img, None, fx=a, fy=b)
# Threshold (source image, threshold, fill color, threshold type)
# The first value returned by the function is the input thresh value, and the second is the processed image
ret, dst = cv2.threshold(src, thresh, maxval, type)
param_1 : src : Output image, only single channel image can be input, generally gray image
param_2 : dst : Output diagram
param_3 : thresh  : threshold
param_4 : maxval  : Indicates the threshold type
	cv2.THRESH_BINARY Pixels that exceed the threshold are set to maxVal,No more than is set to 0
	cv2.THRESH_BINARY_INV 1 The inversion of (pixels that do not exceed the threshold are set to maxVal,Exceeded (set to 0)
	cv2.THRESH_TRUNC Exceed threshold set as threshold
	cv2.THRESH_TOZERO Below threshold set to 0
	cv2.THRESH_TOZERO_INV (Greater than threshold (set to 0)
# threshold
  • Image processing
    -Mean filtering (simple average convolution operation):
    Each pixel of the output image is the average value of the pixels corresponding to the input image in the core window (all pixel weighting coefficients are equal). In fact, it is normalized box filtering
    The mean filter itself has inherent defects, that is, it can not protect the image details well. While denoising the image, it also destroys the details of the image, which makes the image blurred and can not remove the noise points well. Especially salt and pepper noise

    cv2.blur(Photo name, convolution kernel size)
    cv2.blur(img, (3,3))

    -Block filtering:
    The image is processed by convolution kernel. If regularization is used, the result is the same as that of mean filtering

    box = cv2.boxFilter(img2, -1, (3,3), normalize =False)

    -Gaussian filtering:
    The value in the convolution kernel of Gaussian blur satisfies Gaussian distribution, which is equivalent to paying more attention to the middle

    img_gaussian = cv2.GaussianBlur(img, (Convolution kernel), 1)

    -Median filtering:
    Median filter is a nonlinear filter, which is often used to eliminate salt and pepper noise in images. Different from low-pass filtering, median filtering helps to preserve the sharpness of the edge, but it will wash away the texture in the uniform medium area.

    cv2.medianBlur(img, ksize)

    -Morphology_ Corrode_ Expansion:
    Expansion and corrosion can achieve a variety of functions, mainly as follows:
    1. Eliminate noise
    2. Separate image elements are segmented, and adjacent elements are joined in the image.
    3. Find the obvious maximum or minimum region in the image
    4. Find the gradient of the image

    Swell( See here for details):

    kernel = np.ones((3, 3), np.uint8)
    erosion = cv2.erode(img, kernel, iterations = 1)
    # iterations: Times


    dilate = cv2.dilate(erosion_xihuan_1, kernel, iterations =1)
  • Open and close operations:
    (Explain in detail)

     # On: first corrosion, then expansion
     kernel = np.ones((3, 3), np.uint8)
     opening = cv2.morphologyEx(img, cv2.MORPH_OPEN, kernel)
     # The second parameter can vary a lot
     # The explanation was very detailed 
    # Closed: expand first and then corrode
    kernel = np.ones((3, 3), np.uint8)
    closing = cv2.morphologyEx(img, cv2.MORPH_CLOSE, kernel)
  • Gradient operation:

    kernel = np.ones((3, 3), np.uint8)
    erosion_xihuan_1 = cv2.dilate(img, kernel, iterations = 5)
    erosion_xihuan_2 = cv2.erode(img, kernel, iterations = 5)
    error = erosion_xihuan_1-erosion_xihuan_2
    res = np.hstack((erosion_xihuan_1,erosion_xihuan_2,error))
  • Top hat and black hat:

    • Top hat = Original - operation calculation ¶
    • Black hat = closed operation - Original
        #formal hat
        kernel = np.ones((10,10), np.uint8) # Self defined core
        tophat = cv2.morphologyEx(img, cv2.MORPH_TOPHAT, kernel)
        # Black hat
        blackhat = cv2.morphologyEx(img, cv2.MORPH_BLACKHAT, kernel)
  • Gradient operation - SOBEL operator - edge detection:

Sobel The operator algorithm has the advantages of simple calculation and fast speed, but because only two direction templates are used,
Only horizontal and vertical edges can be detected, so this algorithm is suitable for images with complex texture,
The algorithm considers that all pixels with new gray value greater than or equal to the threshold are edge points.
This judgment is unreasonable, which will lead to misjudgment of edge points, because the gray value of many noise points is also very large.
sobelx = cv2.Sobel(img2, cv2.CV_64F, 1,0, ksize = 3)
sobely = cv2.Sobel(img2, cv2.CV_64F, 0,1, ksize = 3)
# Cv_64fc1 64F represents that each pixel element accounts for 64 bit floating-point numbers, and the number of channels is 1
# Cv_64fc3 64F represents that each pixel element accounts for 64 points × 3 floating-point numbers with 3 channels
CV_ - this is just a prefix
64 -Represents double precision
32 -Represents single precision
F  - floating-point
Cx - Number of channels,for example RGB It's three channels
  • Other operators:
  • charr operator - enhanced edge detection:
    (Comparison of differences between the two)
    The only difference between the two operators is that their convolution kernels are different, and they are the same in both computing time and complexity.
  • laplacian operator:
lap = cv2.Laplacian(img, cv2.CV_64F)
# Subtracting Laplacian from the original image results in enhanced contrast
v1 = cv2.Canny(img, 80, 150,L2gradient=True)
# There are two thresholds
80,150 Is two thresholds
 Pixels below threshold 1 will be considered as not edges;
Pixels higher than threshold 2 will be considered as edges;
Pixels between threshold 1 and threshold 2,If it is adjacent to the edge pixel obtained in step 2, it is considered as an edge, otherwise it is not considered as an edge.
L2gradient: Whether to use a more accurate gradient calculation method. The default is False
  • Image pyramid:

  • (Explain in detail)
    There are two common types of image pyramids

    • Gaussian pyramid: used for down / down sampling, the main image pyramid
      # Up sampling
      up = cv2.pyrUp(img)
      # Down sampling
      down = cv2.pyrDown(img)
      But it should be noted that, PryUp and PryDown Not reciprocal, i.e PryUp Is not the inverse of downsampling.
      In this case, the image is first expanded twice in each dimension, and the new rows (even rows) are filled with 0.
      The specified filter is then convoluted (actually a filter that is expanded twice in each dimension) to estimate the approximate value of "lost" pixels.
      PryDown( )Is a function that will lose information. In order to restore the original higher resolution image, we need to obtain the information lost by the downsampling operation, which is related to the Laplace pyramid.
    • Laplacian pyramid: it is used to reconstruct the upper unsampled image from the lower image of the pyramid. In digital image processing, that is, the prediction residual, it can restore the image to the greatest extent, and it can be used together with Gaussian pyramid
    • Code implementation( Explain in detail):
      import cv2 as cv
      #Gauss pyramid
      def pyramid_demo(image):
      	level = 3      #Set the number of layers of the pyramid to 3
      	temp = image.copy()  #Copy image
          pyramid_images = []  #Create an empty list
          for i in range(level):
              dst = cv.pyrDown(temp)   #Gaussian smoothing is performed on the image first, and then downsampling is performed (reducing the image size by half in row and column directions)
              pyramid_images.append(dst)  #Add a new object at the end of the list
              cv.imshow("pyramid"+str(i+1), dst)
              temp = dst.copy()
          return pyramid_images
      #laplacian pyramid 
      def lapalian_demo(image):
          pyramid_images = pyramid_demo(image)    #The result of Gauss pyramid must be used to make Laplace pyramid
          level = len(pyramid_images)
          for i in range(level-1, -1, -1):
              if (i-1) < 0:
                  expand = cv.pyrUp(pyramid_images[i], dstsize = image.shape[:2])
                  lpls = cv.subtract(image, expand)
                  cv.imshow("lapalian_down_"+str(i+1), lpls)
                  expand = cv.pyrUp(pyramid_images[i], dstsize = pyramid_images[i-1].shape[:2])
                  lpls = cv.subtract(pyramid_images[i-1], expand)
                  cv.imshow("lapalian_down_"+str(i+1), lpls)
      src = cv.imread('F:/test.jpg')
      cv.namedWindow('input_image') #Set to WINDOW_NORMAL, you can zoom at will
      cv.imshow('input_image', src)
  • Image outline:

cv2.findContours()The function returns two values, one is the contour itself, and the other is the attribute corresponding to each contour.
ret,thresh = cv2.threshold(img_gray, 127, 255, cv2.THRESH_BINARY)
countours, hierarchy = cv2.findContours(thresh,cv2.RETR_TREE,cv2.CHAIN_APPROX_NONE)
draw_img = img.copy()
res = cv2.drawContours(draw_img, countours,-1, (142, 95, 43),2)
# Take out the contour
cnt = countours[100]
# Calculated area
# Calculated perimeter
# True means closed
cv2.arcLength(cnt, True)
  • Contour approximation:
drawContours,The second parameter is the contour itself Python Is a list;So use[cnt]
# Convert picture to grayscale
gray = cv2.cvtColor(img, cv2.COLOR_BGR2GRAY)
# Obtain image threshold and output image (binary form)
ret, thresh = cv2.threshold(gray, 127, 255, cv2.THRESH_BINARY)
# Get the attributes of the contour itself and each contour
contours, hierarchy = cv2.findContours(thresh,cv2.RETR_TREE, cv2.CHAIN_APPROX_NONE)
cnt = contours[0]
draw_img = img.copy()
#  Draw the outline, 2 is the thickness of the line
res = cv2.drawContours(draw_img, [cnt], -1, (0,0,255),2)
  • Draw a bounding rectangle:
	# The first step is to convert the image to gray and find out the contour
	gray = cv2.cvtColor(img, cv2.COLOR_BGR2GRAY)
	ret, thresh = cv2.threshold(gray, 127, 255, cv2.THRESH_BINARY)
	contours, hierarchy = cv2.findContours(thresh, cv2.RETR_TREE, cv2.CHAIN_APPROX_NONE)
	cnt = contours[4]
	# Step 2: draw a rectangle
	x, y, w, h = cv2.boundingRect(cnt)
	img = cv2.rectangle(img,(x,y), (x+w,y+h), (0,255,0),2)
	area = cv2.contourArea(cnt)
	rect_area = w*h
	extent  = float(area) / rect_area
	# Ratio of contour area to boundary rectangle
  • Template matching:

    # import picture
    img = cv2.imread('./notebook/2.7/lena.jpg',0)
    # Import template
    template = cv2.imread('./notebook/2.7/face.jpg',0)
    h,w = template.shape[:2]

# cv2.matchTemplate() template matching function. The parameters are: original graph, template graph and loss function
res = cv2.matchTemplate(img, template, cv2.TM_CCOEFF_NORMED)
# Get the maximum value, minimum value, and their index, which is convenient for drawing the rectangle later
min_val, max_val, min_loc, max_loc = cv2.minMaxLoc(res)
methods = ['cv2.TM_CCOEFF','cv2.TM_CCOEFF_NORMED',
for meth in methods:
    img2 = img.copy()
    # True value of matching method
    # The eval() function executes a string expression and returns the value of the expression.
    method = eval(meth)
    res = cv2.matchTemplate(img2, template, method)
    min_val, max_val, min_loc, max_loc = cv2.minMaxLoc(res)
    # If it is square difference matching TM_SQDIFF or normalized square difference matching TM_SQDIFF_NORMAL, take the minimum value
    if method in [cv2.TM_SQDIFF,cv2.TM_SQDIFF_NORMED]:
        top_left = min_loc
        top_left = max_loc
    bottom_right = (top_left[0] +w, top_left[1] +h)
    # Draw rectangle
    cv2.rectangle(img2, top_left, bottom_right, 255,2)
    plt.imshow(res, cmap='gray')
    plt.xticks([]),plt.yticks([])# Hide axes
    plt.imshow(img2, cmap='gray')
    plt.xticks([]),plt.yticks([])# Hide axes

  • Match multiple objects:
img_rgb = cv2.imread('notebook/8/mario.jpg')
img_gray = cv2.cvtColor(img_rgb, cv2.COLOR_BGR2GRAY)
template = cv2.imread('notebook/8/mario_coin.jpg', 0)
h,w = template.shape[:2]

res = cv2.matchTemplate(img_gray, template, cv2.TM_CCOEFF_NORMED)
threshold = 0.8
# Take coordinates with matching degree greater than 80%
loc = np.where(res >= threshold)
for pt in zip(*loc[::-1]):
    bottom_right = (pt[0]+w, pt[1]+h)
    cv2.rectangle(img_rgb, pt, bottom_right, (0,0,255),2)
cv_show(img_rgb, 'img')

Tags: Python OpenCV

Posted on Thu, 07 Oct 2021 20:13:28 -0400 by Burns