Learn AI from Li Mu - anchor box code analysis - 3
Non maximum suppression prediction bounding box
- When there are many anchor boxes, many similar prediction bounding boxes with obvious re bands may be output. Around the same target, in order to simplify the output, use non maximum suppression (NMS) to merge similar prediction bounding boxes corresponding to the same target
- Its working principle is as follows:
- Basic concept: for a prediction boundary box B, the target detection model calculates the prediction probability and maximum prediction probability of each class
p
p
The category corresponding to p is the border
B
B
Category of B, here
p
p
p is
B
B
B. for the same image, all non background prediction frames are sorted in descending order to generate a list
L
L
L
- Operation process:
- from
L
L
Select the prediction bounding box with the highest confidence in L
B
1
B_1
B1 , as a benchmark, and then compare all with
B
1
B_1
B1 +
I
o
U
IoU
IoU exceeds a predetermined threshold
ϵ
\epsilon
ϵ Non benchmark prediction edge junction box from
L
L
Remove from L. At this time
L
L
In L, for
B
1
B_1
B1 , only one available bounding box is left, and other similar anchor boxes are deleted based on the above criteria
- Note that the IoU value here is
B
1
B_1
Based on B1 , other prediction anchor frames and
B
1
B_1
B1) calculate IoU instead of real Border
- from
L
L
Select the prediction frame of the second draft of confidence in L
B
2
B_2
B2 , as a benchmark, then compare all with
B
2
B_2
IoU of B2 ^ is greater than
ϵ
\epsilon
ϵ Non benchmark prediction bounding box from
L
L
Removed from L;
- Repeat the above process and go through it
L
L
All anchor frames in L until
L
L
All prediction bounding boxes in L have been used as benchmarks; here
L
L
The IoU of any pair of prediction bounding boxes in L is less than the threshold
ϵ
\epsilon
ϵ, No pair of anchor frames are similar
- The code is as follows:
-
def nms(boxes, scores, iou_threshold):
"""Sort the confidence of the prediction bounding box
args:
boxes: Prediction frame
[anchors_num, 4]
scores: Confidence
[anchors_num]
iou_threshold: iou threshold
"""
B = torch.argsort(scores, dim=-1, descending=True)
'''return scores Sorted subscript
B --> tensor([0, 3, 1, 2])
'''
keep = [] # Keep the indicators of the forecast bounding box
'''B.numel() Returned tensor Number of elements in'''
while B.numel() > 0:
i = B[0]
keep.append(i)
if B.numel() == 1: break
iou = box_iou(boxes[i, :].reshape(-1, 4),
boxes[B[1:], :].reshape(-1, 4)).reshape(-1)
'''iou Calculated as B1 And B2, B3,...of iou One dimensional matrix
iou --> tensor([0.00, 0.74, 0.55])'''
inds = torch.nonzero(iou <= iou_threshold).reshape(-1)
''' inds The returned is all iou Subscript less than threshold '''
B = B[inds + 1]
'''because iou Matrix length is anchors_num-1,
The maximum value is eliminated, so you need to add 1 here'''
return torch.tensor(keep, device=boxes.device)
Application of non maximum suppression method:
- This part is implemented by a function, and the main steps are briefly described as follows:
- a. According to the confidence matrix of anchor box and class, the maximum confidence of each anchor box and the class corresponding to its maximum confidence are obtained
- b. Using the transformation function, the anchor frame with offset is transformed into a prediction anchor frame, and the non maximum suppression method is used to filter based on the prediction anchor frame, and keep and non_ The subscript of keep is sorted (torch.cat is used for direct splicing), where keep corresponds to species, non_keep corresponds to the background, and the subscripts after merging and sorting are used to rearrange the maximum confidence of the anchor frame and the order of the prediction frame
- c. Process the anchor box whose confidence is less than the confidence threshold, set it as the background anchor box, and the value stored in the prediction probability of the anchor box class is 1-
p
p
p
- d. Finally, the above results are combined, and the six elements in the innermost dimension provide the output information of the same prediction bounding box. The first element is the predicted class index, starting from 0 (0 for dog and 1 for cat), and a value of - 1 indicates the background or has been removed in non maximum suppression. The second element is the confidence of the predicted bounding box. The remaining four elements are the (x,y) axis coordinates of the upper left and lower right corners of the prediction bounding box (range between 0 and 1).
- code:
def multibox_detection(cls_probs, offset_preds, anchors, nms_threshold=0.5,
pos_threshold=0.009999999):
"""Use non maximum suppression to predict the bounding box
args:
cls_probs: Probability of anchor box for different categories
[batch_size, 1+class_num, anchors_num]
offset_preds: Offset of different anchor frames
[anchors_num * 4]
anchors: Anchor frame matrix
[anchors_num]
"""
device, batch_size = cls_probs.device, cls_probs.shape[0]
anchors = anchors.squeeze(0)
num_classes, num_anchors = cls_probs.shape[1], cls_probs.shape[2]
out = []
for i in range(batch_size):
cls_prob, offset_pred = cls_probs[i], offset_preds[i].reshape(-1, 4)
'''Get the maximum confidence and the corresponding category
cls_prob Each column represents the confidence of different classes corresponding to a single anchor box
conf: Maximum confidence of each anchor box for different classes --> [anchors_num]
class_id: The category corresponding to the maximum confidence of each anchor frame --> [anchors_num]'''
conf, class_id = torch.max(cls_prob[1:], 0)
'''Convert a border with an offset to a prediction border'''
predicted_bb = offset_inverse(anchors, offset_pred)
keep = nms(predicted_bb, conf, nms_threshold)
# Find all non_keep index and set the class as the background, that is, set it to - 1
all_idx = torch.arange(num_anchors, dtype=torch.long, device=device)
'''Find the number without non maximum border and sort it, keep before, non_keep After'''
combined = torch.cat((keep, all_idx))
uniques, counts = combined.unique(return_counts=True)
non_keep = uniques[counts == 1]
''' all_id_sorted As the index of confidence and prediction frame '''
all_id_sorted = torch.cat((keep, non_keep))
'''The anchor box without reservation is considered as the background anchor box based on non_keep take class_id become -1
And press all_id_sorted yes class_id Rearrange'''
class_id[non_keep] = -1
class_id = class_id[all_id_sorted]
''' According to the maximum confidence of each anchor box and each prediction box NMS Sort by value '''
conf, predicted_bb = conf[all_id_sorted], predicted_bb[all_id_sorted]
# `pos_threshold ` is a threshold for non background prediction
'''The prediction box whose confidence is less than the threshold id Set to -1, abandon'''
below_min_idx = (conf < pos_threshold)
class_id[below_min_idx] = -1
'''Calculate and process the probability value of the prediction anchor frame less than the threshold'''
conf[below_min_idx] = 1 - conf[below_min_idx]
''' Merge category information, confidence, and forecast borders by column
pred_info --> [anchors_num, 6] '''
pred_info = torch.cat((class_id.unsqueeze(1),
conf.unsqueeze(1),
predicted_bb), dim=1)
out.append(pred_info)
return torch.stack(out)
- The code of the anchor box conversion function with offset is as follows:
def offset_inverse(anchors, offset_preds):
"""The prediction bounding box is calculated from the anchor box with the prediction offset."""
anc = d2l.box_corner_to_center(anchors)
pred_bbox_xy = (offset_preds[:, :2] * anc[:, 2:] / 10) + anc[:, :2]
pred_bbox_wh = torch.exp(offset_preds[:, 2:] / 5) * anc[:, 2:]
pred_bbox = torch.cat((pred_bbox_xy, pred_bbox_wh), axis=1)
predicted_bbox = d2l.box_center_to_corner(pred_bbox)
return predicted_bbox