Target tracking in video sequences is one of the research hotspots in the field of computer vision. This technology has important application value in security, transportation, military and other fields.
This project is based on PaddlePaddle Computer vision development kit, combined with deep learning and traditional visual tracking algorithm to achieve multi-target tracking task.
As the basic link from target detection to (multi) target tracking learning, this project designs the tracking method under the idea of tracking by detection, and is familiar with relevant steps and difficulties.
The common target tracking methods are generally divided into two categories: generation model and discrimination model. The discrimination model method is adopted in this project:
- Model generation method: search the image area and minimize the reconstruction error through the learned target model, such as Kalman Filter;
- Discriminant model method: the tracking problem is regarded as a binary classification problem, which is distinguished by judging the difference between target and background, such as DSST.
This item is marked manually HeLa cell dataset ( http://celltrackingchallenge.net/2d-datasets )For example, use PaddleX realization PP-YOLO Tiny Target detector training is then used DLib Built in DSST The single target tracking algorithm realizes multi-target tracking through the intersection ratio cascade matching of detection frame and observation frame.
- Note: the output of the project has been saved in the version (work /), so the annotated code does not need to be run. The training part of the detector can also be skipped, but the data set needs to be decompressed.
Under the idea of tracking by detection algorithm, the determination of tracking target is based on the detection model. So how to generate tracking targets? When the target has not been tracked in the image (frame 1), use the detector to obtain the prediction frame, take them as the first batch of targets to be tracked, and observe each prediction frame separately by setting multiple trackers; When the next image frame arrives, the single target tracker searches for the best matching region in the frame according to its own region matching algorithm (such as filter of DSST algorithm), and then it automatically takes the new region as the observation target. Through the above ideas, we solve the problem of how to generate the tracking target, that is, using the prediction frame of the detector.
3.1 dependency preparation
- Install DLIB persistently in the premium environment.
# !mkdir /home/aistudio/external-libraries # !pip install dlib -t /home/aistudio/external-libraries
The original plan was to export the detector as a high-performance deployment interface, but there are still some problems in the PIP source release version of padlex.deploy.predictor (the development version has been repaired, but there are still problems in the source code installation on AI Studio), so we will directly use the trained model for reasoning or use it Paddle Inference Reasoning.
# !git clone https://gitee.com/PaddlePaddle/PaddleX.git # %cd PaddleX # !git checkout develop # !python setup.py install # %cd ../
!pip install paddlex
import sys sys.path.append('/home/aistudio/external-libraries') import paddle import paddlex as pdx from paddlex import transforms as T import shutil import glob import os import dlib import numpy as np import pandas as pd import cv2 import imghdr from PIL import Image import matplotlib matplotlib.use('Agg') import matplotlib.pyplot as plt
3.2 data division
- Unzip the file to the same level directory.
!unzip -oq data/data107056/DIC-C2DH-HeLa.zip -d data/data107056
- 20% validation (16), 80% training (68)
!paddlex --split_dataset --format VOC\ --dataset_dir data/data107056/DIC-C2DH-HeLa\ --val_value 0.2\ --test_value 0
3.3 data enhancement and reader
Define the data preprocessing method of training set and verification set;
Both need to Resize to 320x320 size, and then use the normalization coefficient of IMAGENET (use the pre training weight of IMAGENET later).
train_transforms = T.Compose([ T.MixupImage(alpha=1.5, beta=1.5, mixup_epoch=int(550 * 25. / 27)), T.RandomDistort( brightness_range=0.5, brightness_prob=0.5, contrast_range=0.5, contrast_prob=0.5, saturation_range=0.5, saturation_prob=0.5, hue_range=18.0, hue_prob=0.5), T.RandomExpand(prob=0.5, im_padding_value=[float(int(x * 255)) for x in [0.485, 0.456, 0.406]]), T.RandomCrop(), T.Resize(target_size=320, interp='RANDOM'), T.RandomHorizontalFlip(prob=0.5), T.Normalize(mean=[0.485, 0.456, 0.406], std=[0.229, 0.224, 0.225]) ]) eval_transforms = T.Compose([ T.Resize(target_size=320, interp='AREA'), T.Normalize(mean=[0.485, 0.456, 0.406], std=[0.229, 0.224, 0.225]), ])
Configure *. txt generated in step [3.2] on the data reader.
train_dataset = pdx.datasets.VOCDetection( data_dir='data/data107056/DIC-C2DH-HeLa', file_list='data/data107056/DIC-C2DH-HeLa/train_list.txt', label_list='data/data107056/DIC-C2DH-HeLa/labels.txt', transforms=train_transforms, shuffle=True) eval_dataset = pdx.datasets.VOCDetection( data_dir='data/data107056/DIC-C2DH-HeLa', file_list='data/data107056/DIC-C2DH-HeLa/val_list.txt', label_list='data/data107056/DIC-C2DH-HeLa/labels.txt', transforms=eval_transforms)
3.4 model configuration and training
- Cluster anchors on the training set.
anchors = pdx.tools.YOLOAnchorCluster( num_anchors=9, dataset=train_dataset, image_size=320)()
- Load adaptive clustering data and construct PP-TOLO Tiny.
model = pdx.det.PPYOLOTiny( num_classes=len(train_dataset.labels), backbone='MobileNetV3', anchors=anchors)
- Custom optimizer (Momentum,L2(5e-4)), learning rate attenuation strategy based on empirical value (warmup + piecewise decay).
learning_rate = 0.001 warmup_steps = 66 warmup_start_lr = 0.0 train_batch_size = 8 step_each_epoch = train_dataset.num_samples // train_batch_size lr_decay_epochs = [130, 540] boundaries = [b * step_each_epoch for b in lr_decay_epochs] values = [learning_rate * (0.1**i) for i in range(len(lr_decay_epochs) + 1)] lr = paddle.optimizer.lr.PiecewiseDecay( boundaries=boundaries, values=values) lr = paddle.optimizer.lr.LinearWarmup( learning_rate=lr, warmup_steps=warmup_steps, start_lr=warmup_start_lr, end_lr=learning_rate) optimizer = paddle.optimizer.Momentum( learning_rate=lr, momentum=0.9, weight_decay=paddle.regularizer.L2Decay(0.0005), parameters=model.net.parameters())
- Start training, use IMAGENET pre training weight and custom optimizer to save the model every 30 EPOCH evaluations.
model.train( train_dataset=train_dataset, eval_dataset=eval_dataset, num_epochs=550, train_batch_size=train_batch_size, optimizer=optimizer, save_interval_epochs=30, log_interval_steps=step_each_epoch * 5, save_dir=r'output/PPYOLOTiny', pretrain_weights=r'IMAGENET', use_vdl=True)
3.5 model evaluation
- Draw PR curve.
pdx.det.draw_pr_curve( eval_details_file='output/PPYOLOTiny/best_model/eval_details.json', save_dir='work/visual')
- Draw error analysis chart.
_, evaluate_details = model.evaluate(eval_dataset, return_details=True) gt, bbox = evaluate_details['gt'], evaluate_details['bbox'] pdx.det.coco_error_analysis( gt=gt, pred_bbox=bbox, save_dir='work/visual')
-
It is observed that the current optimal model is EPOCH-550 (the last saved model), mAP = 79.77.
-
Save the model to work/best_model, so far, the steps of target detector are completed.
!cp -r output/PPYOLOTiny/best_model work/4 target tracking
4.1 picture synthesis video
This step is to define an images2mp4 function first: in our original dataset, it is a video sequence image with FPS of 10. It is necessary to merge the *. jpg file under the target folder into an mp4 video file with FPS of 10 to simulate the prediction of deployed video stream.
def images2mp4(images_dir, output_path): """ All files in the destination folder.jpg Picture synthesis.mp4 Format video file. :param images_dir: Path to destination folder :param output_path: Where to save the composite video :return: """ # Create an mp4 video stream file with 512x512 resolution and FPS of 10 video = cv2.VideoWriter( filename=output_path, fourcc=cv2.VideoWriter_fourcc(*'mp4v'), fps=10, frameSize=(512, 512)) # Read each target picture file and write it to the video stream for img_path in sorted(glob.glob(os.path.join(images_dir, '*.jpg'))): video.write(cv2.imread(img_path)) video.release()
!mkdir work/viedio # DIC-C2DH-HeLa/Test/*.jpg => work/viedio/test.mp4 images2mp4( images_dir='data/data107056/DIC-C2DH-HeLa/Test', output_path='work/viedio/test.mp4')
# Convert. mp4 to. gif file using ffmpeg !ffmpeg -i work/viedio/test.mp4 -s 320*320 work/viedio/test.gif -y
The following is the video sequence after test image synthesis (FPS=10):
After defining the tracker, you can test the file as input.
4.2 single target tracker
Target tracking uses a single tracker to track fixed targets. We design a scheme to uniformly manage these single target trackers to achieve multi-target tracking. Therefore, first build a single target tracker and encapsulate the necessary steps. Then, in the multi-target tracker, you only need to design algorithms and call their interfaces.
The following is the designed DSST single target tracker SingleTracker, which we can use to track a single target.
DSST(Accurate Scale Estimation for Robust Visual Tracking )The tracking is divided into two parts, and two correlation filters are defined. One filter is used to estimate the position and the other filter is used to estimate the scale. stay DLib This method is already built in.
The following is a brief process of single target tracking, which we use uniformly (x_min, y_min, x_max, y_max):
(1) Create a tracker and pass in the first frame to determine the target location: create a SingleTracker class and pass the observation area into the begin() method to start automatic tracking;
(2) Read subsequent frames: pass image into update_bbox() returns the score. If the score is less than a certain threshold, we think the target is lost in the image, and we can delete the tracker.
In short, after the tracking target box is determined (manually), it will automatically update the observation position every time the image is input. After updating, you can call SingleTracker.bbox to draw the frame.
class SingleTracker(dlib.correlation_tracker): def __init__(self, tracker_id, category): """ Initialize the single target tracker. :param tracker_id: Tracker assigned ID :param category: Category of tracking target """ super().__init__() self.id = int(tracker_id) self.category = str(category) self.bbox = None self.bbox_color = ( 100 + np.random.randint(0, 155), 100 + np.random.randint(0, 155), 100 + np.random.randint(0, 155)) def begin(self, image, bbox: list or tuple): """ Input image image Middle position bbox Make observations. :param image: input image :param bbox: x_min, y_min, x_max, y_max :return: None """ self.bbox = bbox self.start_track(image, dlib.rectangle(*bbox)) def update_bbox(self, image): """ Update the observation area of the current tracker according to the input image. :param image: input image :return: The tracking quality score of the tracker for the current image """ score = self.update(image) curr_pos = self.get_position() self.bbox = (int(curr_pos.left()), int(curr_pos.top()), int(curr_pos.right()), int(curr_pos.bottom())) return score
4.3 multi target tracker
How to apply single target tracker to multi-target tracking task? A simple idea is to use multiple trackers to track different targets, but it needs to deal with three main problems: the generation of tracking targets, the matching of old and new targets and the disappearance of existing targets.
- In step (3) of the detector, it is mentioned that the target is generated through the prediction frame of PP-YOLO Tiny;
- The target tracking step (4.2) describes that the disappearance of the target can be judged by the tracking quality score returned by the tracker being lower than a certain threshold;
- Finally, for the matching of old and new targets, we use the intersection union ratio cascade matching method (although the effect will be poor when the objects overlap, but the cell tracking may be less): it is assumed that there are existing tracking targets N ( 0 < = N ) N(0<=N) N (0 < = n), target obtained by detector M ( 0 < = M ) M(0<=M) M (0 < = m) and concatenate them to obtain an intersection union ratio matrix c o s t _ m a t r i x ( N × M ) cost\_matrix(N×M) cost_matrix(N × M) , using the allocation algorithm N N N and M M M pairs in pairs, and finally the of the prediction frame that fails to pair is left as the new target tracking area.
For the design of the MultiTracker class below, the code logic is relatively clear, and the entry is in the update at the bottom_ trackers(image):
# To import the linear allocation function, we need to use it in cascade matching: linear_sum_assignment(cost_matrix, maximize=False) from scipy.optimize import linear_sum_assignment
class MultiTracker: def __init__(self, model_path, det_threshold=0.35, stride=2): """ :param model_path: Path to the model :param det_threshold: Predictive filtering threshold :param stride: Interval for generating new targets """ self.det_threshold = det_threshold self.stride = stride try: from paddlex import load_model self.model = load_model(model_path) except Exception as e: raise e self.frame_num = 0 # Frame count self.tracker_num = 0 # Tracker ID statistics self.trackers = [] # Tracker instance list self.tracking_threshold = 6.5 # Tracking score threshold for tracker instances def _update_existed_trackers(self, image, is_update_frame=False): """ For every existing DSST Algorithm trackers transfer their tracking area to the current image image In your position, If the tracking score is lower than self.tracking_threshold,The target is lost by default. Delete the target. """ del_idx = [] for i in range(len(self.trackers)): if self.trackers[i].update_bbox(image) < self.tracking_threshold: del_idx.append(i) if is_update_frame: self.trackers = [self.trackers[i] for i in range(len(self.trackers)) if i not in del_idx] def det_image(self, img): """ Model prediction function, return img Predicted results on ([xmin, ymin, xmax, ymax], category) """ result = self.model.predict(img.astype('float32')) selected_result = [] for item in result: if item['score'] < self.det_threshold: continue x_min, y_min, w, h = np.int64(item['bbox']) selected_result.append(( [x_min, y_min, x_min + w, y_min + h], item['category'])) return selected_result @staticmethod def get_IoU(_bbox1, _bbox2): """ Enter the diagonal endpoint of the border(x_min, y_min, x_max, y_max),Calculate the intersection and union ratio of two rectangles IoU. """ x1min, y1min, x1max, y1max = _bbox1 x2min, y2min, x2max, y2max = _bbox2 s1 = (y1max - y1min + 1.) * (x1max - x1min + 1.) s2 = (y2max - y2min + 1.) * (x2max - x2min + 1.) x_min, y_min = max(x1min, x2min), max(y1min, y2min) x_max, y_max = min(x1max, x2max), min(y1max, y2max) inter_w, inter_h = max(y_max - y_min + 1., 0.), max(x_max - x_min + 1., 0.) intersection = inter_h * inter_w union = s1 + s2 - intersection return intersection / union def _add_new_tracker(self, image, bbox: list or list, category: str): """ Generate a single target tracker to observe the image image Upper bbox Border area. """ tracker = SingleTracker(tracker_id=self.tracker_num, category=category) tracker.begin(image=image, bbox=bbox) self.trackers.append(tracker) self.tracker_num += 1 def _matching_and_add_trackers(self, image, is_update_frame): """ The prediction frame and tracking frame are cascaded matched according to the intersection union ratio distance. The unmatched ones are regarded as new targets, and a tracker is created for them. """ if not is_update_frame: return # Obtain the prediction results of the model and generate a list of prediction boxes and observation boxes predict_result = self.det_image(image) predict_bboxes = [bbox for bbox, _ in predict_result] tracker_bboxes = [tracker.bbox for tracker in self.trackers] # Generating intersection union ratio distance matrix cost_matrix = np.zeros(shape=(len(tracker_bboxes), len(predict_bboxes)), dtype='float32') for i in range(len(tracker_bboxes)): for j in range(len(predict_bboxes)): cost_matrix[i, j] = 1. - self.get_IoU(tracker_bboxes[i], predict_bboxes[j]) # Gets the subscript pair (row_i, col_i) that minimizes the distance and after concatenation row, col = linear_sum_assignment(cost_matrix) # Take the prediction frame that has not been matched as a new target, and generate a tracker to observe the frame unused_idx = [i for i in range(len(predict_result)) if i not in col] for idx in unused_idx: bbox, category = predict_result[idx] self._add_new_tracker(image, bbox, category) def _plot_trackers(self, image): """ take regions_info The frame and other information in the image image Draw on and return. """ thickness = round(0.002 * (image.shape[0] + image.shape[1]) / 2) + 1 # Line thickness for tracker in self.trackers: # Gets the two diagonal vertices of the border pt1, pt2 = (tracker.bbox[0], tracker.bbox[1]), (tracker.bbox[2], tracker.bbox[3]) # Draw target border cv2.rectangle(image, pt1=pt1, pt2=pt2, color=tracker.bbox_color, thickness=thickness, lineType=cv2.LINE_AA) # Gets the two diagonal vertices of the text border w, h = cv2.getTextSize(text=tracker.category, fontFace=0, fontScale=thickness / 3, thickness=max(thickness - 1, 1))[0] font_pt1, font_pt2 = pt1, (pt1[0] + w, pt1[1] + h) # The background color of the filled text box area cv2.rectangle(image, pt1=font_pt1, pt2=font_pt2, color=tracker.bbox_color, thickness=-1, lineType=cv2.LINE_AA) # Outputs characters in the area of the text box cv2.putText(image, '{}({})'.format(tracker.category, tracker.id), org=(font_pt1[0], font_pt2[1]), fontFace=0, fontScale=thickness / 3, color=(225, 255, 255), thickness=max(thickness - 1, 1), lineType=cv2.LINE_AA) return image def update_trackers(self, image): self.frame_num = (self.frame_num + 1) % 864000 # Prevent overflow is_update_frame = self.frame_num % self.stride == 1 # Identification of interval increase / decrease tracker self._update_existed_trackers(image, is_update_frame) # Update the tracking area of the tracker and delete some targets self._matching_and_add_trackers(image, is_update_frame) # Cascade matching of detector prediction frame and tracker observation frame to add new targets plotted_image = self._plot_trackers(image) # Obtain the information of the existing tracker and draw it on the original map. return plotted_image
4.4 prediction experience
A function predict is defined here_ Stream() is used to read the video stream and model position, and save the prediction result (. jpg) of the corresponding frame to save_ In dir, we use the image synthesis function images2mp4() in step (4.1) to restore it to video.
def predict_stream(stream_path, model_path, save_dir): """ Predict each frame of picture in a video stream, and save the result of drawing the border of each frame of picture in the folder. :param stream_path: Video stream file path :param model_path: Model path :param save_dir: Folder address where pictures are saved :return: """ if not os.path.exists(save_dir): os.mkdir(save_dir) # Open target video stream video = cv2.VideoCapture(stream_path) # Define multi-target tracker multi_tracker = MultiTracker( model_path=model_path, det_threshold=0.35, stride=2) while True: # Each frame image of the video stream is read for prediction _, frame = video.read() if frame is None: video.release() break # Update the tracking area of each tracker in the multi-target tracker and return the drawn image plotted_frame = multi_tracker.update_trackers(frame) # Save the drawn border image in the target folder and name it with the frame number save_path = os.path.join(save_dir, '%03d.jpg' % (multi_tracker.frame_num - 1)) cv2.imwrite(save_path, plotted_frame)
# Video reasoning to generate pictures with tracking labels predict_stream( stream_path='work/viedio/test.mp4', model_path='work/best_model', save_dir='work/tracking_result') # Synthesize the reasoning result picture into video images2mp4( images_dir='work/tracking_result', output_path='work/viedio/test_track_result.mp4')
# Add the just synthesized ` work / viewio / test_ track_ Result. MP4 ` convert to gif hon # Video reasoning to generate pictures with tracking labels predict_stream( stream_path='work/viedio/test.mp4', model_path='work/best_model', save_dir='work/tracking_result') # Synthesize the reasoning result picture into video images2mp4( images_dir='work/tracking_result', output_path='work/viedio/test_track_result.mp4')
# Add the just synthesized ` work / viewio / test_ track_ Result. MP4 ` convert to gif !ffmpeg -i work/viedio/test_track_result.mp4 -s 320*320 work/viedio/test_track_result.gif -y
-
It can be seen that some tracking effects are poor because cells have division and proliferation behavior, and their shapes will also change compared with pedestrian and other targets, which increases the difficulty of prediction.
Under the guidance of the idea of tracking by detection, this project uses the computer vision development kit PaddleX to train PP-YOLO Tiny target detection model to detect cells, and then uses DLib's built-in DSST single target tracking algorithm to construct a cascade matching method based on the intersection and merge ratio distance between prediction frame and observation frame to realize a multi-target tracking class.
The project can optimize the following directions:
(1) The detection performance of the detector (the detection speed of the detector will affect the system speed, and the accuracy will affect the tracking quality of the tracker to the initial target);
(2) The tracking performance of single target tracker (DSST correlation filter can be replaced by Kalman Filter, so the project becomes a classic SORT Algorithm, which points out that "this project is to pave the way for later learning") described at the beginning of the project;
(3) Matching algorithm between prediction frame and tracking frame (this paper uses intersection union ratio distance as the cost, DEEPSORT Use cosine distance and Mahalanobis distance);
(4) Generation of prediction frame, adjustment of matching threshold between prediction frame and tracking frame and other parameters (specific parameters can rise in specific scenarios, but it is of little significance if deployed).
- If there are relevant problems or violations, you can contact the author for handling.
- My AI Studio home page