2021SC@SDUSC
preface
This is the fourth article of yolov5 code analysis and the last article of general.py.
non_max_suppression function
def non_max_suppression(prediction, conf_thres=0.25, iou_thres=0.45, classes=None, agnostic=False, multi_label=False, labels=(), max_det=300): """Runs Non-Maximum Suppression (NMS) on inference results Returns: list of detections, on (n,6) tensor per image [xyxy, conf, cls] """ nc = prediction.shape[2] - 5 # number of classes xc = prediction[..., 4] > conf_thres # candidates # Checks assert 0 <= conf_thres <= 1, f'Invalid Confidence threshold {conf_thres}, valid values are between 0.0 and 1.0' assert 0 <= iou_thres <= 1, f'Invalid IoU {iou_thres}, valid values are between 0.0 and 1.0' # Settings min_wh, max_wh = 2, 4096 # (pixels) minimum and maximum box width and height max_nms = 30000 # maximum number of boxes into torchvision.ops.nms() time_limit = 10.0 # seconds to quit after redundant = True # require redundant detections multi_label &= nc > 1 # multiple labels per box (adds 0.5ms/img) merge = False # use merge-NMS t = time.time() output = [torch.zeros((0, 6), device=prediction.device)] * prediction.shape[0] for xi, x in enumerate(prediction): # image index, image inference # Apply constraints # x[((x[..., 2:4] < min_wh) | (x[..., 2:4] > max_wh)).any(1), 4] = 0 # width-height x = x[xc[xi]] # confidence # Cat apriori labels if autolabelling if labels and len(labels[xi]): l = labels[xi] v = torch.zeros((len(l), nc + 5), device=x.device) v[:, :4] = l[:, 1:5] # box v[:, 4] = 1.0 # conf v[range(len(l)), l[:, 0].long() + 5] = 1.0 # cls x = torch.cat((x, v), 0) # If none remain process next image if not x.shape[0]: continue # Compute conf x[:, 5:] *= x[:, 4:5] # conf = obj_conf * cls_conf # Box (center x, center y, width, height) to (x1, y1, x2, y2) box = xywh2xyxy(x[:, :4]) # Detections matrix nx6 (xyxy, conf, cls) if multi_label: i, j = (x[:, 5:] > conf_thres).nonzero(as_tuple=False).T x = torch.cat((box[i], x[i, j + 5, None], j[:, None].float()), 1) else: # best class only conf, j = x[:, 5:].max(1, keepdim=True) x = torch.cat((box, conf, j.float()), 1)[conf.view(-1) > conf_thres] # Filter by class if classes is not None: x = x[(x[:, 5:6] == torch.tensor(classes, device=x.device)).any(1)] # Apply finite constraint # if not torch.isfinite(x).all(): # x = x[torch.isfinite(x).all(1)] # Check shape n = x.shape[0] # number of boxes if not n: # no boxes continue elif n > max_nms: # excess boxes x = x[x[:, 4].argsort(descending=True)[:max_nms]] # sort by confidence # Batched NMS c = x[:, 5:6] * (0 if agnostic else max_wh) # classes boxes, scores = x[:, :4] + c, x[:, 4] # boxes (offset by class), scores i = torchvision.ops.nms(boxes, scores, iou_thres) # NMS if i.shape[0] > max_det: # limit detections i = i[:max_det] if merge and (1 < n < 3E3): # Merge NMS (boxes merged using weighted mean) # update boxes as boxes(i,4) = weights(i,n) * boxes(n,4) iou = box_iou(boxes[i], boxes) > iou_thres # iou matrix weights = iou * scores[None] # box weights x[i, :4] = torch.mm(weights, x[:, :4]).float() / weights.sum(1, keepdim=True) # merged boxes if redundant: i = i[iou.sum(1) > 1] # require redundancy output[xi] = x[i] if (time.time() - t) > time_limit: print(f'WARNING: NMS time limit {time_limit}s exceeded') break # time limit exceeded return output
prediction: output of forward propagation
conf_thres: confidence threshold
iou_thres: IOU threshold
classes: whether to keep only specific categories. If the default value is None, all categories will be kept
agnostic: does nms also remove boxes between different categories
multi_label: whether to use multiple labels
Labels: add real labels for nms
max_det: maximum number of frames retained after nms
This function realizes nms calculation,
nc is the number of categories, xc is the index with confidence greater than the threshold in prediction, and those with confidence less than the threshold are directly discarded
Next are some checks and initialization. Check the threshold, set the minimum wh and maximum wh, and set nms calculation for only the maximum 30000 frames. The time is limited to 10 seconds. Exit after that time. redundant is the detection result of whether redundancy is required. It only works when merge is true. If multi_label is true, and multiple categories are reserved for each box
Next, start the calculation and initialize the output result. Prediction is the data of a batch, and the for loop processes each data. The shape of prediction is [N,M,5+nc],N is the number of pictures, M is the predicted total number of each picture, xi is the index of the picture, and x is the picture
Select the prediction of the current picture, splice the real labels, multiply the confidence by the prediction score of the category to obtain the confidence of each category, and convert the form of box from xywh to xyxy.
Next, the prediction is integrated, and the prediction cong(n,nc+5) is integrated into the form of (n,6). If multi_label is true, and multiple predicted categories are reserved, which is greater than conf_thres were retained; If multi_ If the label is false, only the category prediction with the largest retention reliability is guaranteed. Find the maximum confidence and its index, and then splice it into [box prediction box, conf confidence, cls prediction category].
If classes is not empty, that is, only the prediction of classes is retained. N is the number of boxes. If n is greater than max_nms, sort them according to conf, and take the top max_nms.
c is the offset of each box set according to the category. Add the offset to boxes, calculate nms, limit the detection results, and select max_det prediction results.
If merge is true, the fusion prediction box is set to box_iou is the calculated ios, the returned shape is [N,M],N is the number of boxes[i], M is the number of boxes, and the index greater than the iou threshold is returned. The weight is set to iou multiplied by score for fusion, that is, the average value is taken according to the weight, which is the final result. Finally, output is returned
strip_optimizer function
def strip_optimizer(f='best.pt', s=''): # from utils.general import *; strip_optimizer() # Strip optimizer from 'f' to finalize training, optionally save as 's' x = torch.load(f, map_location=torch.device('cpu')) if x.get('ema'): x['model'] = x['ema'] # replace model with ema for k in 'optimizer', 'training_results', 'wandb_id', 'ema', 'updates': # keys x[k] = None x['epoch'] = -1 x['model'].half() # to FP16 for p in x['model'].parameters(): p.requires_grad = False torch.save(x, s or f) mb = os.path.getsize(s or f) / 1E6 # filesize print(f"Optimizer stripped from {f},{(' saved as %s,' % s) if s else ''} {mb:.1f}MB")
f: pt files saved during training, including network parameters, optimizer, training results, etc
s: Keep only the file names of network parameters
First, read f and assign it to x. if the key value is' ema ', replace it with' model ', optimize and train_ results,wandb_ ID, ema and updates are cleared, only the model is reserved, and then the model is converted to F16. The back propagation is closed. If s is not empty, it is saved to s, otherwise it is saved to f, and the saved file size is output
print_mutation function
def print_mutation(results, hyp, save_dir, bucket): evolve_csv, results_csv, evolve_yaml = save_dir / 'evolve.csv', save_dir / 'results.csv', save_dir / 'hyp_evolve.yaml' keys = ('metrics/precision', 'metrics/recall', 'metrics/mAP_0.5', 'metrics/mAP_0.5:0.95', 'val/box_loss', 'val/obj_loss', 'val/cls_loss') + tuple(hyp.keys()) # [results + hyps] keys = tuple(x.strip() for x in keys) vals = results + tuple(hyp.values()) n = len(keys) # Download (optional) if bucket: url = f'gs://{bucket}/evolve.csv' if gsutil_getsize(url) > (os.path.getsize(evolve_csv) if os.path.exists(evolve_csv) else 0): os.system(f'gsutil cp {url} {save_dir}') # download evolve.csv if larger than local # Log to evolve.csv s = '' if evolve_csv.exists() else (('%20s,' * n % keys).rstrip(',') + '\n') # add header with open(evolve_csv, 'a') as f: f.write(s + ('%20.5g,' * n % vals).rstrip(',') + '\n') # Print to screen print(colorstr('evolve: ') + ', '.join(f'{x.strip():>20s}' for x in keys)) print(colorstr('evolve: ') + ', '.join(f'{x:20.5g}' for x in vals), end='\n\n\n') # Save yaml with open(evolve_yaml, 'w') as f: data = pd.read_csv(evolve_csv) data = data.rename(columns=lambda x: x.strip()) # strip keys i = np.argmax(fitness(data.values[:, :7])) # f.write(f'# YOLOv5 Hyperparameter Evolution Results\n' + f'# Best generation: {i}\n' + f'# Last generation: {len(data)}\n' + f'# ' + ', '.join(f'{x.strip():>20s}' for x in keys[:7]) + '\n' + f'# ' + ', '.join(f'{x:>20.5g}' for x in data.values[i, :7]) + '\n\n') yaml.safe_dump(hyp, f, sort_keys=False) if bucket: os.system(f'gsutil cp {evolve_csv} {evolve_yaml} gs://{bucket}') # upload
results: save the file of model loss and indicators
hyp: a super parameter file. The stored content is a dictionary
save_dir: save path
bucket: if it is not empty, check whether the remote file is larger than the local file. If yes, download it
First, create the key values of the dictionary, including some model evaluation indicators, loss values, etc. next, if you want to add content to the file, save the yaml configuration file, and upload the file if the bucket is not empty
apply_classifier function
def apply_classifier(x, model, img, im0): # Apply a second stage classifier to yolo outputs im0 = [im0] if isinstance(im0, np.ndarray) else im0 for i, d in enumerate(x): # per image if d is not None and len(d): d = d.clone() # Reshape and pad cutouts b = xyxy2xywh(d[:, :4]) # boxes b[:, 2:] = b[:, 2:].max(1)[0].unsqueeze(1) # rectangle to square b[:, 2:] = b[:, 2:] * 1.3 + 30 # pad d[:, :4] = xywh2xyxy(b).long() # Rescale boxes from img_size to im0 size scale_coords(img.shape[2:], d[:, :4], im0[i].shape) # Classes pred_cls1 = d[:, 5].long() ims = [] for j, a in enumerate(d): # per item cutout = im0[i][int(a[1]):int(a[3]), int(a[0]):int(a[2])] im = cv2.resize(cutout, (224, 224)) # BGR # cv2.imwrite('example%i.jpg' % j, cutout) im = im[:, :, ::-1].transpose(2, 0, 1) # BGR to RGB, to 3x416x416 im = np.ascontiguousarray(im, dtype=np.float32) # uint8 to float32 im /= 255.0 # 0 - 255 to 0.0 - 1.0 ims.append(im) pred_cls2 = model(torch.Tensor(ims).to(d.device)).argmax(1) # classifier prediction x[i] = x[i][pred_cls1 == pred_cls2] # retain matching class detections return x
x: yolov5 prediction
Model: classification model
IMG: network input img
img0: Original
Convert img0 into a list. For each picture, convert the coordinates of the box into the form of xywh, take the wide, high and medium sides of the box as the sides, convert them into squares, and then carry out pad, convert them back to xyxy format, and convert the coordinates of img into those based on img0.
Take out the predicted classification results, cut and scale the original image back to 224, convert it into rgb and normalize it to 0-1. Input the data into the model, keep the results consistent with the classification of the classifier and the detector, and then return.
save_one_box function
def save_one_box(xyxy, im, file='image.jpg', gain=1.02, pad=10, square=False, BGR=False, save=True): # Save image crop as {file} with crop size multiple {gain} and {pad} pixels. Save and/or return crop xyxy = torch.tensor(xyxy).view(-1, 4) b = xyxy2xywh(xyxy) # boxes if square: b[:, 2:] = b[:, 2:].max(1)[0].unsqueeze(1) # attempt rectangle to square b[:, 2:] = b[:, 2:] * gain + pad # box wh * gain + pad xyxy = xywh2xyxy(b).long() clip_coords(xyxy, im.shape) crop = im[int(xyxy[0, 1]):int(xyxy[0, 3]), int(xyxy[0, 0]):int(xyxy[0, 2]), ::(1 if BGR else -1)] if save: cv2.imwrite(str(increment_path(file, mkdir=True).with_suffix('.jpg')), crop) return crop
xyxy: coordinates of the upper left and lower right corners of the picture
im: Original
gain: resize the prediction box
Pad: pad the prediction box
Square: save as square
BGR: whether the current picture is a BGR channel
Save: save
This function saves the screenshot of the prediction box of the picture
First convert xyxy to xywh format. If square is converted to square, continue to resize and pad, and then convert back to xywh format, and limit the length and width of the original image. Cut and save the image or return the processed image.
increment_path function
def increment_path(path, exist_ok=False, sep='', mkdir=False): # Increment file or directory path, i.e. runs/exp --> runs/exp{sep}2, runs/exp{sep}3, ... etc. path = Path(path) # os-agnostic if path.exists() and not exist_ok: suffix = path.suffix path = path.with_suffix('') dirs = glob.glob(f"{path}{sep}*") # similar paths matches = [re.search(rf"%s{sep}(\d+)" % path.stem, d) for d in dirs] i = [int(m.groups()[0]) for m in matches if m] # indices n = max(i) + 1 if i else 2 # increment number path = Path(f"{path}{sep}{n}{suffix}") # update path dir = path if path.suffix == '' else path.parent # directory if not dir.exists() and mkdir: dir.mkdir(parents=True, exist_ok=True) # make directory return path
path: root directory
exist_ok: true when creating a file. No new file will be generated
sep: file name prefix
mkdir: create directory
This function can automatically obtain a new path or file name according to the existing files in the folder. If there is a file of version 1, a new file of version 2 will be created.
If the path already exists and exists_ If ok is false, get the suffix of the current file, increase its version number and set a new file. If mkdir is true, create the file and return the file name.
At this point, all the code in general.py has been analyzed. This part of the code mainly serves the code of other files. Deal with the miscellaneous things, and then start to analyze the loss function and evaluation of the model.