Python disk Commemorative Coin Series Part 2: YoloV3 identification verification code

Recently, Wuyishan currency has to start making an appointment. There are many friends in the official account background who are consulting me and exchanging scripts. In line with the idea of technical exchange, the previous code was re combed, and it was found that the success rate of identification and verification code was not as high as expected. In the spirit of simplifying the process and "killing chicken with ox knife", this article will solve this problem through YoloV3.

This issue of the article is longer, more familiar students can have a rough look.

Previous practice

The previous method is to cut out each character, and then use a convolution neural network model of 3-layer convolution plus 2-layer full connection to classify the characters. If it can be precisely cut, the effect is not obvious, but in practice, it is difficult to find a perfect cutting method for all the combined verification codes (in fact, it is mainly stupid, I did not expect a perfect and robust cutting method).

Current cutting methods often cut characters as follows:

This is obviously wrong, and also leads to the error of the final verification code. Although the new verification code will continue to be recognized if the verification code is filled in incorrectly on the page, considering that this is a similar use environment to rush to buy, the verification code identification method of "one click is in progress" is very necessary.

Why YoloV3

Recently, Yolo series of target detection algorithms are quite popular. After Yolo V3 was put forward, Yolo series algorithms haven't made great progress for a long time, but recently Yolo V4 and Yolo V5 have sprung up like mushrooms. Not to mention that the official hasn't approved YoloV5 named after Yolo for the time being, but in terms of algorithm performance alone, YoloV5 is, to a certain extent, a SoTA with speed and precision at the present stage.

New algorithms emerge in an endless stream. It's really hard for Algorithm Engineers. Many students smile bitterly: they can't learn any more!

But grumbling is grumbling. When new algorithms come out, you have to play with them. From the world's largest male dating website download Yolo V5 code, of course, also do not forget to give a big Star. I usually use Keras more, but this code is based on python. Considering that I am more familiar with Keras and YoloV3, I will use YoloV3 for the time being download YoloV3 code of Keras version. You can send Yolo to the official account in order to download the address.

Wuyishan coin is about to make an appointment, which is another good opportunity to debug code exchange technology. Let's see how YoloV3 solves the problem of verification code identification.

Dataset construction

Preparing a dataset

In the previous article, we have got a large number of verification code pictures. No students can get them at the following address.

(students familiar with YoloV3 dataset format can skip to the next section)

Prepare the marking software

YoloV3 is a target detection model. Training it requires not only the image itself, but also the category and location of the target to be detected, which is often called "label". There are many tools for marking datasets, such as LabelImg and Labelme. The first tool is used here. LabelImg is an open source software which is specially used to generate labels for the target detection data set. Users can easily label the target "picture frame". The packed exe executable file is provided below. Students using Windows 10 system can download it directly to the following link (other windows systems have no test, you can download it to test by yourself):

After downloading, first predefine the data folder_ classes.txt The contents of the document are changed to all categories of the actual verification code. Note that each line can only have one class alias, and the order of class aliases is not mandatory. For example:


This will facilitate tagging and later preparation to start training.

Start marking

open labelImg.exe The following interface appears:

Click Open Dir in the left menu bar, and select the folder where the verification code picture is located in the pop-up folder selection box. Then the software will display the first picture found. Then create a labels folder in the same level directory of the verification code picture folder, click Change Save Dir on the left menu bar to select the labels folder, so that the software will store the generated label files in the labels folder by default.

Now you can officially mark the picture. Click Create\nRectBox in the menu bar on the left, and the mouse will turn into a cross. Use the mouse to draw a minimum rectangular box around the target on the picture. When you release the mouse, a dialog box will appear to select the target category in this box. The category has been configured before. Select the corresponding category in the box below and confirm.

A character is marked on it. Each captcha picture has four characters. All captcha pictures should have four boxes.

View tag file

After each file is marked, a corresponding. xml format tag file will be generated. The contents of the file are as follows (some contents are omitted)


The generated file contains the path of the marked file, the location and category of the target in the corresponding image and other relevant information, which will be used below.

Generate Yolo specific format

Compared with other target detection models, Yolo series model needs unique training data format. Refer to the download YoloV3 source documentation to find the supported data set formats as follows:

One row for one image;

Row format: image_file_path box1 box2 ... boxN;

Box format: x_min,y_min,x_max,y_max,class_id (no space).

This explanation is simple and clear, that is to say:

  1. One picture in a row
  2. The format of each line is: picture path target box 1 target box 2... Target box N
  3. The format of the target box is: X axis min, Y axis min, X axis max, Y axis max, category ID

Finally, an example is given

path/to/img1.jpg 50,100,150,200,0 30,50,200,120,3

path/to/img2.jpg 120,300,250,600,2

Since people have said so clearly, what we need to do is to follow the example. The code is as follows:

import os
import xml.etree.ElementTree as ET # Package used to read XML files
import cv2

# First, according to the predefined_classes.txt Document content defines the order of categories
labels = ['A','B','C','D','E','F','G','H','J','K','L','M','N','P','Q',
dirpath = r'./capchars/labels'  # Directory where xml files are stored

for fp in os.listdir(dirpath):
    root = ET.parse(os.path.join(dirpath, fp)).getroot()
    path = root.find('path').text
    img = cv2.imread(path, cv2.IMREAD_GRAYSCALE)
    height, width = img.shape
    boxes = [path, ]

    for child in root.findall('object'): # Find all boxes in the picture
        label = child.find('name')
        label_index = labels.index(label.text) # Get the ID of the category name
        sub = child.find('bndbox') # Find the dimension value of the box and read it
        xmin = sub[0].text
        ymin = sub[1].text
        xmax = sub[2].text
        ymax = sub[3].text
        boxes.append(','.join([xmin, ymin, xmax, ymax, str(label_index)]))
    # Write data to data.txt file
    with open('./capchars/data.txt', 'a+') as f:
        f.write(' '.join(boxes) + '\n')

The data is written as follows:

F:\...\capchars\images\1.png 1,9,9,24,24 19,14,29,28,29 38,11,50,26,26 58,7,73,22,17
F:\...\images\10.png 1,8,15,23,7 18,11,29,26,26 39,15,49,29,30 58,14,73,27,17
F:\...\images\100.png 1,6,10,19,26 18,11,29,26,25 38,10,59,25,20 60,9,73,24,6
F:\...\images\101.png 1,8,14,24,21 18,8,33,23,1 38,13,55,28,9 57,12,69,26,8
F:\...\images\102.png 1,12,18,28,11 18,9,34,24,19 38,5,49,20,29 59,10,74,29,14
F:\...\images\103.png 1,10,18,25,20 18,5,29,19,30 38,7,54,22,12 59,7,74,25,14
F:\...\images\104.png 1,10,12,24,18 17,8,32,22,5 38,10,54,25,18 58,14,74,28,9
F:\...\images\105.png 1,13,9,27,8 18,9,34,24,22 37,12,50,27,8 58,11,74,29,14
F:\...\images\106.png 1,14,13,29,12 18,8,34,22,18 39,6,55,24,14 59,13,74,27,6
F:\...\images\107.png 1,8,13,22,2 17,9,32,24,4 38,8,53,23,4 58,11,73,26,11

So far, the data set for YoloV3 training is even built.

model training

Modify model parameter configuration

Open the , need to put the main function_ The first few lines of main are changed to our corresponding files. It should be noted here that the problem to be solved is the verification code detection, which is not very complex, so we choose to use the tiny version of Yolo, and the input size of the model is (416416). Here is the changed code:

annotation_path = 'data/capchars/data.txt'
log_dir = 'logs/'
classes_path = 'model_data/cap_classes.txt' # With the previous predefined_classes.txt The content of the document is the same
anchors_path = 'model_data/tiny_yolo_anchors.txt'

YoloV3's default number of input image channels is 3, considering the difficulty of the problem to be solved, but the channels are enough. In order to simplify the model, the Create of_ Model method and create_ tiny_ The second line of code in the model method is modified as follows:

image_input = Input(shape=(None, None, 1))

Modify the reading method of training data

Check the data read part of the source code and find yolo3/ Get in_ random_ The data method does some data enhancement operations while reading data. But these operations are not needed in this problem, so you choose to delete the code.

This is from the WeChat official account Titus's Cosmos, ID is TitusCosmos, please reprint it. ]

[in order to prevent all kinds of crawlers from crawling all over the Internet and intentionally delete the original author's information, we add the author's information in the middle of the article, and hope you can understand it]

In addition, the size of the input image mentioned earlier should be (416416), while the size of the verification code image is obviously different, so we need to do some operations to change the size. Get after change_ random_ The data method is as follows:

from PIL import Image
import numpy as np

def get_random_data(annotation_line, input_shape, max_boxes=4):
    line = annotation_line.split()
    image =[0])
    iw, ih = image.size
    h, w = input_shape # (416,416)
    boxes = np.array([np.array(list(map(int,box.split(',')))) for box in line[1:]])
    image_resize = image.resize((w, h), Image.BICUBIC)
    box_data = np.zeros((max_boxes,5))
    x_scale, y_scale = float(w / iw), float(h / ih) # Calculate the times of scaling the captcha image in the horizontal and vertical directions
    for index, box in enumerate(boxes):
        box[0] = int(box[0] * x_scale)
        box[1] = int(box[1] * y_scale)
        box[2] = int(box[2] * x_scale)
        box[3] = int(box[3] * y_scale)
        box_data[index, :] = box
    image_data = np.expand_dims(image_resize, axis=-1)
    image_data = np.array(image_data)/255.
    return image_data, box_data

model training

Everything is ready to run After training, the console will output the following information:


Epoch 2/100

 1/16 [>.............................] - ETA: 6s - loss: 511.1613
 2/16 [==>...........................] - ETA: 6s - loss: 489.3632
 3/16 [====>.........................] - ETA: 5s - loss: 474.8986
 4/16 [======>.......................] - ETA: 5s - loss: 458.0243
 5/16 [========>.....................] - ETA: 4s - loss: 443.5792
 6/16 [==========>...................] - ETA: 4s - loss: 430.4511
 7/16 [============>.................] - ETA: 4s - loss: 416.0158
 8/16 [==============>...............] - ETA: 3s - loss: 402.7111
 9/16 [===============>..............] - ETA: 3s - loss: 390.4001
10/16 [=================>............] - ETA: 2s - loss: 378.4502
11/16 [===================>..........] - ETA: 2s - loss: 368.6907
12/16 [=====================>........] - ETA: 1s - loss: 359.1886
13/16 [=======================>......] - ETA: 1s - loss: 350.0055
14/16 [=========================>....] - ETA: 0s - loss: 342.0475
15/16 [===========================>..] - ETA: 0s - loss: 333.6687
16/16 [==============================] - 8s 482ms/step - loss: 325.5808 - val_loss: 211.4188


In the source code, the default is to use the pre training model to train 50 rounds, at this time, only the last two layers are unfrozen; then unfreeze all layers and train 50 rounds. We observe loss and val_ When the loss no longer drops, the model is almost trained. Of course, it's OK to wait for the program to finish running directly.

After the program is run, a traced will be automatically generated in the logs folder_ weights_ Final.h5 file, which is the trained model file we need.

Model test

After the model training, the next step is the exciting test.

However, we can't eat hot tofu in a hurry. We need to modify some of the test code in the source code before we can start testing our model.

Open the During model test, this script will be called to generate an object of Yolo class, and then this object will be used for prediction, so it needs to be configured before running the script. In fact, it is similar to the training script configured earlier. The modified code is as follows:

_defaults = {
    "model_path": 'logs/trained_weights_final.h5',
    "anchors_path": 'model_data/tiny_yolo_anchors.txt',
    "classes_path": 'model_data/cap_classes.txt',

At this point, you can start to run the test model script To start the prediction, the program needs to input the image path to be predicted. We input an untrained captcha image:

Input image filename:data/capchars/images/416.png

Soon the result came:

Obviously, the result is LFG8, which is consistent with the actual situation.

Test several more:


This method does not need to cut the verification code image unsteadily, and avoids the errors in the cutting process. Therefore, the success rate of this method is higher than that of the previous method. Of course, this method can be optimized. For example, before training, remove the interference line from the captcha image according to the contents of the previous articles in the series, so the accuracy will be higher.

So far, YoloV3 identification verification code is finished.

All the source code of this series will be placed in the github warehouse below. You can refer to it if you need. If you have any questions, please correct them. Welcome to communicate. Thank you!

[Python commemorative coins series] previous recommendation:

The first issue: one of Python disk commemorative coins series: Introduction

Issue 2: Python disk Commemorative Coin Series II: identification code 01

Issue 3: Python disk Commemorative Coin Series II: identification code 02

Issue 4: Python disk Commemorative Coin Series II: identification code 03

Issue 5: Python disk Commemorative Coin Series II: identification code 04

Issue 6: Python disk commemorative coins series 3: automatic reservation script writing 01

Issue 7: Python disk commemorative coins Series III: automatic reservation script writing 02

Issue 8: Python commemorative coins Series III: automatic reservation script writing 03 & series summary

Below is my official account. I would like to scan it.

Tags: Python github xml Windows

Posted on Sun, 21 Jun 2020 01:27:49 -0400 by BhA