How does YOLOv4 train its own dataset

Step 1: install darknet

reference resources: Windows builds Darknet framework environment - yolov4 GPU_ Optimistic lishan blog - CSDN blog

The source code description of darknet has also briefly introduced how to use data sets to train networks

Step 2: create VOC format data set

Collect the data set you need online, or take relevant videos yourself, and then extract frames of pictures

Most of the network public data sets have been attached with marked xml files

The links of the recommended public data sets in the field of transportation (including driverless, traffic signs and vehicle detection) are as follows: [intelligent transportation data set] a collection of data sets in the field of intelligent transportation (I) - Propeller AI Studio - artificial intelligence learning and training community

There are also some datasets that do not have annotation files in xml format. At this time, other methods need to be used to convert them into xml files

for example BITVehicle The dataset provides a mark file in mat format. You need to write a program to read the data and generate the xml file of the corresponding picture

For datasets without labels or those taken by yourself, you need to use labelimg tool to calibrate, generate xml files, and use pip installation command to install them

pip install labelimg

Open labelimg command


Create a new VOCdevkit folder and create other subfolders by referring to the directory distribution of VOC dataset (there are no files in the folder at this time)


  1. Put all the pictures of the dataset into the JPEGImages folder

  2. Put the xml annotation files corresponding to all pictures into Annotations

  3. Generate txt related files and put them in the Main folder (using file)

    File from blog: yolov4 trains its own dataset_ sinat_ Blog of 28371057 - CSDN blog_ yolov4 trains its own dataset

    #!/usr/bin/env python
    # -*- coding: utf-8 -*-
    # file:
    # Generate txt files required for training
    import os
    import random
    root_path = './VOCdevkit/VOC2007'
    xmlfilepath = root_path + '/Annotations'
    txtsavepath = root_path + '/ImageSets/Main'
    if not os.path.exists(root_path):
        print("cannot find such directory: " + root_path)
    if not os.path.exists(txtsavepath):
    trainval_percent = 0.9   # Proportion of training verification set
    train_percent = 0.8      # Proportion of training set
    total_xml = os.listdir(xmlfilepath)
    num = len(total_xml)
    tv = int(num * trainval_percent)
    tr = int(tv * train_percent)
    trainval = random.sample(range(num), tv)
    train = random.sample(trainval, tr)
    print("train and val size:", tv)
    print("train size:", tr)
    ftrainval = open(txtsavepath + '/trainval.txt', 'w')
    ftest = open(txtsavepath + '/test.txt', 'w')
    ftrain = open(txtsavepath + '/train.txt', 'w')
    fval = open(txtsavepath + '/val.txt', 'w')
    for i in range(num):
        name = total_xml[i][:-4] + '\n'
        if i in trainval:
            if i in train:

  4. Generate the final txt file and label folder (using file)

    The file comes from the scripts folder under darknet

    #!/usr/bin/env python
    # -*- coding: utf-8 -*-
    # file:
    # Generate the final txt file and label folder
    import xml.etree.ElementTree as ET
    import pickle
    import os
    from os import listdir, getcwd
    from os.path import join
    import platform
    sets = [('2007', 'train'), ('2007', 'val'), ('2007', 'test')]
    classes = ["Bus", "Microbus", "Minivan", "Sedan", "SUV", "Truck"]
    def convert(size, box):
        dw = 1. / (size[0])
        dh = 1. / (size[1])
        x = (box[0] + box[1]) / 2.0 - 1
        y = (box[2] + box[3]) / 2.0 - 1
        w = box[1] - box[0]
        h = box[3] - box[2]
        x = x * dw
        w = w * dw
        y = y * dh
        h = h * dh
        return x, y, w, h
    def convert_annotation(year, image_id):
        in_file = open('VOCdevkit/VOC%s/Annotations/%s.xml' % (year, image_id))
        out_file = open('VOCdevkit/VOC%s/labels/%s.txt' % (year, image_id), 'w')
        tree = ET.parse(in_file)
        root = tree.getroot()
        size = root.find('size')
        w = int(size.find('width').text)
        h = int(size.find('height').text)
        for obj in root.iter('object'):
            # difficult = obj.find('difficult').text
            cls = obj.find('name').text
            # if cls not in classes or int(difficult) == 1:
            #     continue
            cls_id = classes.index(cls)
            xmlbox = obj.find('bndbox')
            b = (float(xmlbox.find('xmin').text), float(xmlbox.find('xmax').text), float(xmlbox.find('ymin').text),
            bb = convert((w, h), b)
            out_file.write(str(cls_id) + " " + " ".join([str(a) for a in bb]) + '\n')
    wd = getcwd()
    for year, image_set in sets:
        if not os.path.exists('VOCdevkit/VOC%s/labels/' % year):
            os.makedirs('VOCdevkit/VOC%s/labels/' % year)
        image_ids = open('VOCdevkit/VOC%s/ImageSets/Main/%s.txt' % (year, image_set)).read().strip().split()
        list_file = open('%s_%s.txt' % (year, image_set), 'w')
        for image_id in image_ids:
            list_file.write('%s/VOCdevkit/VOC%s/JPEGImages/%s.jpg\n' % (wd, year, image_id))
            print("Processing image:  %s" % image_id)
            convert_annotation(year, image_id)
    # if platform.system().lower() == 'windows':
    #     os.system("type 2007_train.txt 2007_val.txt > train.txt")
    #     os.system("type 2007_train.txt 2007_val.txt 2007_test.txt > train.all.txt")
    # elif platform.system().lower() == 'linux':
    #     os.system("cat 2007_train.txt 2007_val.txt > train.txt")
    #     os.system("cat 2007_train.txt 2007_val.txt 2007_test.txt > train.all.txt")
  5. Replication 2007_test.txt,2007_train.txt,2007_val.txt to the data folder, search and copy and voc.names under darknet to the data folder

  6. When training yolov4-tiny, search and copy yolov4-tiny.conv.29 and yolov4-tiny-custom.cfg under darknet to the data folder

    Note: the file placement path is not absolute. You can decide by yourself, as long as you know the meaning of the corresponding file, and then specify the correct path in the subsequent configuration process

Step 3: configure network structure and training parameters

  1. xxx..names file

    Training category name assignment: voc.names

    Replace with the category name of your own dataset, one row at a time, and there should be no empty rows

  2. file

    Data path file specification:

    Classes: Specifies the number of classes

    train: Specifies the training dataset image path to read txt

    valid: Specifies the image path of the validation dataset to read txt

    Test: Specifies the test dataset image path to read txt

    names: Specifies the category name to read the file

    backup: Specifies the path to save the training weight file

    For example:

    classes= 6
    train  = D:\DataSet\data/2007_train.txt
    valid  = D:\DataSet\data/2007_val.txt
    test = D:\DataSet\data/2007_test.txt
    names = D:\DataSet\data/voc.names
    backup = D:\DataSet\data/backup
  3. yoloxxx.cfg

    Training network parameter configuration: yolov4-tiny-custom.cfg

    Note the main modifications:


    Batch: the batch size. The stronger the graphics card, this value can be set higher. Otherwise, the default value is used or reduced appropriately

    subdivisions: how many equal parts are divided into each batch


    Classes: number of classes

    anchors: preselector size

    Note: yolov4 has three yolo layers and needs to be changed in three places. Yolov4 tiny has two yolo layers and needs to be changed in two places


    filters: above each yolo layer, there is a corresponding revolutionary. The value is: (classes+5) X 3. If the number of categories is 6, the value is 33

    Note: yolov4 has three yolo layers, corresponding to the revolution, which needs to be changed in three places; yolov4 tiny has two yolo layers, corresponding to the revolution, which needs to be changed in two places

  4. yoloxxx.conv.xx

    Pre training weight file of yolov4: yolov4.conv.137

    Pre training weight file of yolov4 tiny: yolov4 tiny.conv.29

  5. Create backup folder

    Specify where to save network training weights

Step 4: Training

1. Modify a priori box

The K-means algorithm is used to calculate the size of the highest priority check box

darknet.exe detector calc_anchors data/ -num_of_clusters 9 -width 416 -height 416

In the command, data/ is the specified file path, and the number 9 represents the number of categories, for example:

D:\darknet\build\darknet\x64\darknet.exe detector calc_anchors data/ -num_of_clusters 6 -width 416 -height 416

After running, the anchors.txt file will be generated, the contents of the file will be copied, and the anchors under [yolo] in yoloxxx.cfg file will be modified

2. Start training command

darknet.exe detector train data/ yolo-obj.cfg yolov4.conv.137

In the command, data/ is the specified file path, Yolo obj.cfg is the specified network structure file parameter, and yolov4.conv.137 is the pre training weight file, for example:

D:\darknet\build\darknet\x64\darknet.exe detector train data/ data/yolov4-tiny-custom.cfg data/yolov4-tiny.conv.29 -map

Tags: Algorithm Dynamic Programming linear algebra

Posted on Fri, 19 Nov 2021 10:51:42 -0500 by XeroXer