2021SC@SDUSC Application and practice of software engineering in school of software, Shandong University -- yoov5 code analysis general.py-4



This is the fourth article of yolov5 code analysis and the last article of general.py.

non_max_suppression function

def non_max_suppression(prediction, conf_thres=0.25, iou_thres=0.45, classes=None, agnostic=False, multi_label=False,
                        labels=(), max_det=300):
    """Runs Non-Maximum Suppression (NMS) on inference results

         list of detections, on (n,6) tensor per image [xyxy, conf, cls]

    nc = prediction.shape[2] - 5  # number of classes
    xc = prediction[..., 4] > conf_thres  # candidates

    # Checks
    assert 0 <= conf_thres <= 1, f'Invalid Confidence threshold {conf_thres}, valid values are between 0.0 and 1.0'
    assert 0 <= iou_thres <= 1, f'Invalid IoU {iou_thres}, valid values are between 0.0 and 1.0'

    # Settings
    min_wh, max_wh = 2, 4096  # (pixels) minimum and maximum box width and height
    max_nms = 30000  # maximum number of boxes into torchvision.ops.nms()
    time_limit = 10.0  # seconds to quit after
    redundant = True  # require redundant detections
    multi_label &= nc > 1  # multiple labels per box (adds 0.5ms/img)
    merge = False  # use merge-NMS

    t = time.time()
    output = [torch.zeros((0, 6), device=prediction.device)] * prediction.shape[0]
    for xi, x in enumerate(prediction):  # image index, image inference
        # Apply constraints
        # x[((x[..., 2:4] < min_wh) | (x[..., 2:4] > max_wh)).any(1), 4] = 0  # width-height
        x = x[xc[xi]]  # confidence

        # Cat apriori labels if autolabelling
        if labels and len(labels[xi]):
            l = labels[xi]
            v = torch.zeros((len(l), nc + 5), device=x.device)
            v[:, :4] = l[:, 1:5]  # box
            v[:, 4] = 1.0  # conf
            v[range(len(l)), l[:, 0].long() + 5] = 1.0  # cls
            x = torch.cat((x, v), 0)

        # If none remain process next image
        if not x.shape[0]:

        # Compute conf
        x[:, 5:] *= x[:, 4:5]  # conf = obj_conf * cls_conf

        # Box (center x, center y, width, height) to (x1, y1, x2, y2)
        box = xywh2xyxy(x[:, :4])

        # Detections matrix nx6 (xyxy, conf, cls)
        if multi_label:
            i, j = (x[:, 5:] > conf_thres).nonzero(as_tuple=False).T
            x = torch.cat((box[i], x[i, j + 5, None], j[:, None].float()), 1)
        else:  # best class only
            conf, j = x[:, 5:].max(1, keepdim=True)
            x = torch.cat((box, conf, j.float()), 1)[conf.view(-1) > conf_thres]

        # Filter by class
        if classes is not None:
            x = x[(x[:, 5:6] == torch.tensor(classes, device=x.device)).any(1)]

        # Apply finite constraint
        # if not torch.isfinite(x).all():
        #     x = x[torch.isfinite(x).all(1)]

        # Check shape
        n = x.shape[0]  # number of boxes
        if not n:  # no boxes
        elif n > max_nms:  # excess boxes
            x = x[x[:, 4].argsort(descending=True)[:max_nms]]  # sort by confidence

        # Batched NMS
        c = x[:, 5:6] * (0 if agnostic else max_wh)  # classes
        boxes, scores = x[:, :4] + c, x[:, 4]  # boxes (offset by class), scores
        i = torchvision.ops.nms(boxes, scores, iou_thres)  # NMS
        if i.shape[0] > max_det:  # limit detections
            i = i[:max_det]
        if merge and (1 < n < 3E3):  # Merge NMS (boxes merged using weighted mean)
            # update boxes as boxes(i,4) = weights(i,n) * boxes(n,4)
            iou = box_iou(boxes[i], boxes) > iou_thres  # iou matrix
            weights = iou * scores[None]  # box weights
            x[i, :4] = torch.mm(weights, x[:, :4]).float() / weights.sum(1, keepdim=True)  # merged boxes
            if redundant:
                i = i[iou.sum(1) > 1]  # require redundancy

        output[xi] = x[i]
        if (time.time() - t) > time_limit:
            print(f'WARNING: NMS time limit {time_limit}s exceeded')
            break  # time limit exceeded

    return output

prediction: output of forward propagation

conf_thres: confidence threshold

iou_thres: IOU threshold

classes: whether to keep only specific categories. If the default value is None, all categories will be kept

agnostic: does nms also remove boxes between different categories

multi_label: whether to use multiple labels

Labels: add real labels for nms

max_det: maximum number of frames retained after nms

This function realizes nms calculation,

nc is the number of categories, xc is the index with confidence greater than the threshold in prediction, and those with confidence less than the threshold are directly discarded

Next are some checks and initialization. Check the threshold, set the minimum wh and maximum wh, and set nms calculation for only the maximum 30000 frames. The time is limited to 10 seconds. Exit after that time. redundant is the detection result of whether redundancy is required. It only works when merge is true. If multi_label is true, and multiple categories are reserved for each box

Next, start the calculation and initialize the output result. Prediction is the data of a batch, and the for loop processes each data. The shape of prediction is [N,M,5+nc],N is the number of pictures, M is the predicted total number of each picture, xi is the index of the picture, and x is the picture

Select the prediction of the current picture, splice the real labels, multiply the confidence by the prediction score of the category to obtain the confidence of each category, and convert the form of box from xywh to xyxy.

Next, the prediction is integrated, and the prediction cong(n,nc+5) is integrated into the form of (n,6). If multi_label is true, and multiple predicted categories are reserved, which is greater than conf_thres were retained; If multi_ If the label is false, only the category prediction with the largest retention reliability is guaranteed. Find the maximum confidence and its index, and then splice it into [box prediction box, conf confidence, cls prediction category].

If classes is not empty, that is, only the prediction of classes is retained. N is the number of boxes. If n is greater than max_nms, sort them according to conf, and take the top max_nms.

c is the offset of each box set according to the category. Add the offset to boxes, calculate nms, limit the detection results, and select max_det prediction results.

If merge is true, the fusion prediction box is set to box_iou is the calculated ios, the returned shape is [N,M],N is the number of boxes[i], M is the number of boxes, and the index greater than the iou threshold is returned. The weight is set to iou multiplied by score for fusion, that is, the average value is taken according to the weight, which is the final result. Finally, output is returned

strip_optimizer function

def strip_optimizer(f='best.pt', s=''):  # from utils.general import *; strip_optimizer()
    # Strip optimizer from 'f' to finalize training, optionally save as 's'
    x = torch.load(f, map_location=torch.device('cpu'))
    if x.get('ema'):
        x['model'] = x['ema']  # replace model with ema
    for k in 'optimizer', 'training_results', 'wandb_id', 'ema', 'updates':  # keys
        x[k] = None
    x['epoch'] = -1
    x['model'].half()  # to FP16
    for p in x['model'].parameters():
        p.requires_grad = False
    torch.save(x, s or f)
    mb = os.path.getsize(s or f) / 1E6  # filesize
    print(f"Optimizer stripped from {f},{(' saved as %s,' % s) if s else ''} {mb:.1f}MB")

f: pt files saved during training, including network parameters, optimizer, training results, etc

s: Keep only the file names of network parameters

  First, read f and assign it to x. if the key value is' ema ', replace it with' model ', optimize and train_ results,wandb_ ID, ema and updates are cleared, only the model is reserved, and then the model is converted to F16. The back propagation is closed. If s is not empty, it is saved to s, otherwise it is saved to f, and the saved file size is output

print_mutation function

def print_mutation(results, hyp, save_dir, bucket):
    evolve_csv, results_csv, evolve_yaml = save_dir / 'evolve.csv', save_dir / 'results.csv', save_dir / 'hyp_evolve.yaml'
    keys = ('metrics/precision', 'metrics/recall', 'metrics/mAP_0.5', 'metrics/mAP_0.5:0.95',
            'val/box_loss', 'val/obj_loss', 'val/cls_loss') + tuple(hyp.keys())  # [results + hyps]
    keys = tuple(x.strip() for x in keys)
    vals = results + tuple(hyp.values())
    n = len(keys)

    # Download (optional)
    if bucket:
        url = f'gs://{bucket}/evolve.csv'
        if gsutil_getsize(url) > (os.path.getsize(evolve_csv) if os.path.exists(evolve_csv) else 0):
            os.system(f'gsutil cp {url} {save_dir}')  # download evolve.csv if larger than local

    # Log to evolve.csv
    s = '' if evolve_csv.exists() else (('%20s,' * n % keys).rstrip(',') + '\n')  # add header
    with open(evolve_csv, 'a') as f:
        f.write(s + ('%20.5g,' * n % vals).rstrip(',') + '\n')

    # Print to screen
    print(colorstr('evolve: ') + ', '.join(f'{x.strip():>20s}' for x in keys))
    print(colorstr('evolve: ') + ', '.join(f'{x:20.5g}' for x in vals), end='\n\n\n')

    # Save yaml
    with open(evolve_yaml, 'w') as f:
        data = pd.read_csv(evolve_csv)
        data = data.rename(columns=lambda x: x.strip())  # strip keys
        i = np.argmax(fitness(data.values[:, :7]))  #
        f.write(f'# YOLOv5 Hyperparameter Evolution Results\n' +
                f'# Best generation: {i}\n' +
                f'# Last generation: {len(data)}\n' +
                f'# ' + ', '.join(f'{x.strip():>20s}' for x in keys[:7]) + '\n' +
                f'# ' + ', '.join(f'{x:>20.5g}' for x in data.values[i, :7]) + '\n\n')
        yaml.safe_dump(hyp, f, sort_keys=False)

    if bucket:
        os.system(f'gsutil cp {evolve_csv} {evolve_yaml} gs://{bucket}')  # upload

results: save the file of model loss and indicators

hyp: a super parameter file. The stored content is a dictionary

save_dir: save path

bucket: if it is not empty, check whether the remote file is larger than the local file. If yes, download it

  First, create the key values of the dictionary, including some model evaluation indicators, loss values, etc. next, if you want to add content to the file, save the yaml configuration file, and upload the file if the bucket is not empty

apply_classifier function

def apply_classifier(x, model, img, im0):
    # Apply a second stage classifier to yolo outputs
    im0 = [im0] if isinstance(im0, np.ndarray) else im0
    for i, d in enumerate(x):  # per image
        if d is not None and len(d):
            d = d.clone()

            # Reshape and pad cutouts
            b = xyxy2xywh(d[:, :4])  # boxes
            b[:, 2:] = b[:, 2:].max(1)[0].unsqueeze(1)  # rectangle to square
            b[:, 2:] = b[:, 2:] * 1.3 + 30  # pad
            d[:, :4] = xywh2xyxy(b).long()

            # Rescale boxes from img_size to im0 size
            scale_coords(img.shape[2:], d[:, :4], im0[i].shape)

            # Classes
            pred_cls1 = d[:, 5].long()
            ims = []
            for j, a in enumerate(d):  # per item
                cutout = im0[i][int(a[1]):int(a[3]), int(a[0]):int(a[2])]
                im = cv2.resize(cutout, (224, 224))  # BGR
                # cv2.imwrite('example%i.jpg' % j, cutout)

                im = im[:, :, ::-1].transpose(2, 0, 1)  # BGR to RGB, to 3x416x416
                im = np.ascontiguousarray(im, dtype=np.float32)  # uint8 to float32
                im /= 255.0  # 0 - 255 to 0.0 - 1.0

            pred_cls2 = model(torch.Tensor(ims).to(d.device)).argmax(1)  # classifier prediction
            x[i] = x[i][pred_cls1 == pred_cls2]  # retain matching class detections

    return x

x: yolov5 prediction

Model: classification model

IMG: network input img

img0: Original

Convert img0 into a list. For each picture, convert the coordinates of the box into the form of xywh, take the wide, high and medium sides of the box as the sides, convert them into squares, and then carry out pad, convert them back to xyxy format, and convert the coordinates of img into those based on img0.

Take out the predicted classification results, cut and scale the original image back to 224, convert it into rgb and normalize it to 0-1. Input the data into the model, keep the results consistent with the classification of the classifier and the detector, and then return.  

save_one_box function

def save_one_box(xyxy, im, file='image.jpg', gain=1.02, pad=10, square=False, BGR=False, save=True):
    # Save image crop as {file} with crop size multiple {gain} and {pad} pixels. Save and/or return crop
    xyxy = torch.tensor(xyxy).view(-1, 4)
    b = xyxy2xywh(xyxy)  # boxes
    if square:
        b[:, 2:] = b[:, 2:].max(1)[0].unsqueeze(1)  # attempt rectangle to square
    b[:, 2:] = b[:, 2:] * gain + pad  # box wh * gain + pad
    xyxy = xywh2xyxy(b).long()
    clip_coords(xyxy, im.shape)
    crop = im[int(xyxy[0, 1]):int(xyxy[0, 3]), int(xyxy[0, 0]):int(xyxy[0, 2]), ::(1 if BGR else -1)]
    if save:
        cv2.imwrite(str(increment_path(file, mkdir=True).with_suffix('.jpg')), crop)
    return crop

xyxy: coordinates of the upper left and lower right corners of the picture

im: Original

gain: resize the prediction box

Pad: pad the prediction box

Square: save as square

BGR: whether the current picture is a BGR channel

Save: save

This function saves the screenshot of the prediction box of the picture

First convert xyxy to xywh format. If square is converted to square, continue to resize and pad, and then convert back to xywh format, and limit the length and width of the original image. Cut and save the image or return the processed image.

increment_path function

def increment_path(path, exist_ok=False, sep='', mkdir=False):
    # Increment file or directory path, i.e. runs/exp --> runs/exp{sep}2, runs/exp{sep}3, ... etc.
    path = Path(path)  # os-agnostic
    if path.exists() and not exist_ok:
        suffix = path.suffix
        path = path.with_suffix('')
        dirs = glob.glob(f"{path}{sep}*")  # similar paths
        matches = [re.search(rf"%s{sep}(\d+)" % path.stem, d) for d in dirs]
        i = [int(m.groups()[0]) for m in matches if m]  # indices
        n = max(i) + 1 if i else 2  # increment number
        path = Path(f"{path}{sep}{n}{suffix}")  # update path
    dir = path if path.suffix == '' else path.parent  # directory
    if not dir.exists() and mkdir:
        dir.mkdir(parents=True, exist_ok=True)  # make directory
    return path

path: root directory

exist_ok: true when creating a file. No new file will be generated

sep: file name prefix

mkdir: create directory

This function can automatically obtain a new path or file name according to the existing files in the folder. If there is a file of version 1, a new file of version 2 will be created.

If the path already exists and exists_ If ok is false, get the suffix of the current file, increase its version number and set a new file. If mkdir is true, create the file and return the file name.  

At this point, all the code in general.py has been analyzed. This part of the code mainly serves the code of other files. Deal with the miscellaneous things, and then start to analyze the loss function and evaluation of the model.

Tags: AI Computer Vision Object Detection yolov5

Posted on Mon, 25 Oct 2021 06:13:00 -0400 by Clandestinex337