preface
Use Python to identify the graphic verification code to realize automatic login. No more nonsense.
development tool
Python version: 3.6.4
Related modules:
re;
numpy module;
pytesseract module;
selenium module;
And some Python built-in modules.
Environment construction
Install Python and add it to the environment variable. pip can install the relevant modules required.
- Grayscale processing turns a color verification code image into a gray image
import cv2 image = cv2.imread('1.jpeg', 0) cv2.imwrite('1.jpg', image)
- Binarization processes the picture into a black-and-white picture. It is found that there is no interference line here, which means that we only need to deal with the interference points.
import cv2 image = cv2.imread('1.jpeg', 0) ret, image = cv2.threshold(image, 100, 255, 1) height, width = image.shape new_image = image[0:height, 0:150] cv2.imwrite('1.jpg', new_image)
- The noise reduction process removes small black spots, that is, isolated black pixels.
The principle of point noise reduction is to detect 8 points adjacent to black points and judge the color of 8 points. If it is all white spots, it is considered that the spot is white, and the black spots turn white. For example, at point ⑤, there are 8 adjacent areas in terms of Tian zigzag.
① ② ③ point coordinates are shown in the figure below. Similarly, we can see the coordinates of ④ ⑤⑥⑦⑧⑨ points
Noise reduction code
import cv2 import numpy as np from PIL import Image def inverse_color(image, col_range): # Read the picture, 0 means the picture becomes a grayscale image image = cv2.imread(image, 0) # Image binarization: 100 is the set threshold, 255 is the maximum threshold, and 1 is the threshold type. If the current point value is greater than the threshold, set it to 0, otherwise set it to 255. Return is the abbreviation of ret urn value, which represents the current threshold value ret, image = cv2.threshold(image, 110, 255, 1) # The height and width of the picture height, width = image.shape # Image anti color processing, reason: the above processing can only generate pictures with white characters and black background, and what we need is pictures with black characters and white background img2 = image.copy() for i in range(height): for j in range(width): img2[i, j] = (255 - image[i, j]) img = np.array(img2) # Intercept the processed image height, width = img.shape new_image = img[0:height, col_range[0]:col_range[1]] cv2.imwrite('handle_one.png', new_image) image = Image.open('handle_one.png') return image def clear_noise(img): # Image noise reduction x, y = img.width, img.height for i in range(x): for j in range(y): if sum_9_region(img, i, j) < 2: # Change pixel color, white img.putpixel((i, j), 255) img = np.array(img) cv2.imwrite('handle_two.png', img) img = Image.open('handle_two.png') return img def sum_9_region(img, x, y): """ Tian Zige """ # Gets the color value of the current pixel cur_pixel = img.getpixel((x, y)) width = img.width height = img.height if cur_pixel == 255: # If the current point is a white area, the neighborhood value is not counted return 10 if y == 0: # first line if x == 0: # Top left vertex, 4 neighborhood # 3 points next to the center point sum_1 = cur_pixel + img.getpixel((x, y + 1)) + img.getpixel((x + 1, y)) + img.getpixel((x + 1, y + 1)) return 4 - sum_1 / 255 elif x == width - 1: # Top right vertex sum_2 = cur_pixel + img.getpixel((x, y + 1)) + img.getpixel((x - 1, y)) + img.getpixel((x - 1, y + 1)) return 4 - sum_2 / 255 else: # Topmost non vertex, 6 neighborhood sum_3 = img.getpixel((x - 1, y)) + img.getpixel((x - 1, y + 1)) + cur_pixel + img.getpixel((x, y + 1)) + img.getpixel((x + 1, y)) + img.getpixel((x + 1, y + 1)) return 6 - sum_3 / 255 elif y == height - 1: # Bottom line if x == 0: # Lower left vertex # 3 points next to the center point sum_4 = cur_pixel + img.getpixel((x + 1, y)) + img.getpixel((x + 1, y - 1)) + img.getpixel((x, y - 1)) return 4 - sum_4 / 255 elif x == width - 1: # Lower right vertex sum_5 = cur_pixel + img.getpixel((x, y - 1)) + img.getpixel((x - 1, y)) + img.getpixel((x - 1, y - 1)) return 4 - sum_5 / 255 else: # Lowest non vertex, 6 neighborhood sum_6 = cur_pixel + img.getpixel((x - 1, y)) + img.getpixel((x + 1, y)) + img.getpixel((x, y - 1)) + img.getpixel((x - 1, y - 1)) + img.getpixel((x + 1, y - 1)) return 6 - sum_6 / 255 else: # y is not at the boundary if x == 0: # Left non vertex sum_7 = img.getpixel((x, y - 1)) + cur_pixel + img.getpixel((x, y + 1)) + img.getpixel((x + 1, y - 1)) + img.getpixel((x + 1, y)) + img.getpixel((x + 1, y + 1)) return 6 - sum_7 / 255 elif x == width - 1: # Right non vertex sum_8 = img.getpixel((x, y - 1)) + cur_pixel + img.getpixel((x, y + 1)) + img.getpixel((x - 1, y - 1)) + img.getpixel((x - 1, y)) + img.getpixel((x - 1, y + 1)) return 6 - sum_8 / 255 else: # Qualified in 9 fields sum_9 = img.getpixel((x - 1, y - 1)) + img.getpixel((x - 1, y)) + img.getpixel((x - 1, y + 1)) + img.getpixel((x, y - 1)) + cur_pixel + img.getpixel((x, y + 1)) + img.getpixel((x + 1, y - 1)) + img.getpixel((x + 1, y)) + img.getpixel((x + 1, y + 1)) return 9 - sum_9 / 255 def main(): img = '1.jpeg' img = inverse_color(img, (0, 160)) clear_noise(img) if __name__ == '__main__': main()
After solving the biggest problem, the next step is to realize automatic login. First, use selenium to automatically click the login button.
The screenshot is processed, and finally the verification code is successfully obtained.
Why is the screenshot here? The reason is that the verification code image is changing all the time. For example, I now copy the picture link of the 8863 verification code, open it in the new tab, and I will find that the verification code has changed, not 8863, but another verification code picture. Then we can get the verification code picture by getting the verification code link of the current page. This method is certainly not feasible.
By referring to relevant materials, we know that we can successfully solve this problem by accessing the verification code link page with cookies. However, because the related libraries were not successfully imported, they gave up. We'll solve it later when we do the machine learning of the verification code.