FPGA implementation of correlation filter tracking algorithm 1

1 Overview

At present, many embedded devices need to use target tracking applications, such as missiles (infrared guidance), UAVs (follow shooting), etc., but in view of the small size of embedded devices, low energy consumption, and small computing power, it is necessary to select an appropriate algorithm to implement on the embedded system. Here, we only talk about the FPGA based implementation, as long as the algorithm does not involve uncertain times Generally speaking, the computation speed of FPGA is able to complete ARM, and the data receiving and transmitting ability of FPGA is also very strong.

The calculation process of correlation filtering algorithm is very suitable for FPGA: there is analytic solution, without iterative optimization; considering that the implementation in FPGA needs to use the form of pipeline, before the implementation, of course, we need to do some optimization of the calculation process, and it is impossible to realize without any change. The algorithm can refer to the paper "stage: comprehensive learners for real time" Tracking, which adopts the calculation framework of correlation filtering and combines the HOG features and color histogram, has strong robustness and scale estimation. After that, it seems that the paper just adds neural network to extract stronger features, and the general framework has no change, so I have paid attention to it.

2 matlab implementation process

It mainly includes response calculation and template training. If there is no subfunction code involved, you can leave a message in the background. The names of formulas, matlab variables and Verilog variables in this paper are the same. If there is any problem, it can be discussed in detail. Link to the original matlab code: https://github.com/bertinetto/stage.

Response calculation process:

%% TESTING step% extract patch of size bg_area and resize to norm_bg_areaim_patch_cf = getSubwindow(im, pos, p.norm_bg_area, bg_area);pwp_search_area = round(p.norm_pwp_search_area / area_resize_factor);% extract patch of size pwp_search_area and resize to norm_pwp_search_areaim_patch_pwp = getSubwindow(im, pos, p.norm_pwp_search_area, pwp_search_area);% compute feature mapxt = getFeatureMap(im_patch_cf, p.feature_type, p.cf_response_size, p.hog_cell_size);% apply Hann windowxt_windowed = bsxfun(@times, hann_window, xt);% compute FFTxtf = fft2(xt_windowed);% Correlation between filter and test patch gives the response% Solve diagonal system per pixel.if p.den_per_channel  hf = hf_num ./ (hf_den + p.lambda);else  hf = bsxfun(@rdivide, hf_num, sum(hf_den, 3)+p.lambda);  %hf = bsxfun(@rdivide, hf_num, sum_hf_den+p.lambda);endconj_hf_xtf = conj(hf) .* xtf;iconj_hf_xtf = ifft2(sum(conj_hf_xtf, 3));response_cf = ensure_real(iconj_hf_xtf);% Crop square search region (in feature pixels).response_cf = cropFilterResponse(response_cf, ...floor_odd(p.norm_delta_area / p.hog_cell_size));if p.hog_cell_size > 1  % Scale up to match center likelihood resolution.  response_cf = mexResize(response_cf, p.norm_delta_area,'auto');end[likelihood_map] = getColourMap(im_patch_pwp, bg_hist, fg_hist, p.n_bins, p.grayscale_sequence);% (TODO) in theory it should be at 0.5 (unseen colors shoud have max entropy)likelihood_map(isnan(likelihood_map)) = 0;
% each pixel of response_pwp loosely represents the likelihood that% the target (of size norm_target_sz) is centred on itresponse_pwp = getCenterLikelihood(likelihood_map, p.norm_target_sz);
%% ESTIMATIONresponse = mergeResponses(response_cf, response_pwp, p.merge_factor, p.merge_method)

Template training process:

%% TRAINING% extract patch of size bg_area and resize to norm_bg_areaim_patch_bg = getSubwindow(im, pos, p.norm_bg_area, bg_area);pos_r = pos;% compute feature map, of cf_response_sizext = getFeatureMap(im_patch_bg, p.feature_type, p.cf_response_size, p.hog_cell_size);% apply Hann windowxt = bsxfun(@times, hann_window, xt);% compute FFTxtf = fft2(xt);%% FILTER UPDATE% Compute expectations over circular shifts,% therefore divide by number of pixels.new_hf_num1 = bsxfun(@times, conj(yf), xtf);new_hf_den1 = (conj(xtf) .* xtf);new_hf_num = new_hf_num1 / prod(p.cf_response_size);new_hf_den = new_hf_den1 / prod(p.cf_response_size);

3 FPGA implementation process

3.1 overall structure

In the process of FPGA algorithm implementation, it follows the principle of pipeline processing, so it is necessary to approximate some of the calculation process of the algorithm to achieve the effect of high frame rate processing, and has a small impact on the algorithm effect.

In the STAPLE algorithm, we need to estimate the target position first, then update the filter template according to the estimated target position information, and extract the scale space and calculate the target scale according to the calculated target position. However, in FPGA, the calculation process of tracking algorithm and image transmission process are carried out at the same time, so it is necessary to synchronize the target position estimation, filter template update and scale estimation. When the filter template update and scale estimation are processed approximately, the target information of the previous frame is used to update the target and generate the scale space. The schematic diagram of calculation structure optimization is shown in the figure below.

   

3.2 module division

The STAPLE algorithm is mainly divided into two parts: target location estimation and target scale estimation. The two parts are independent of each other and processed in parallel. When calculating the target position, the 1-D gray scale feature and 32-d HOG feature of the target area are extracted, and then the correlation filtering is carried out. Combined with the information of color histogram, the final target position response graph is obtained, and the maximum response position is calculated, that is, the estimated target position. In scale estimation, seven scale spaces are calculated and the response of each scale is calculated respectively. The largest scale of response is the estimated target scale. The block diagram of algorithm module structure is shown in the figure below. In this experiment, the algorithm module works in 150M clock domain.

ISE engineering structure is as follows:

   

3.3 result introduction

PC configuration: Intel i5-6500 CPU @3.2GHz, 8GBRAM; FPGA development tool is ISE 14.7, simulation tool is Modelsim SE 10.1c, MATLAB version 2017b.

Simulation waveform and calculation time statistics:

Serial number

Name

Number of clock cycles occupied

FPGA computing time (ms)

PC calculation time (ms)

Speedup ratio

(PC / FPGA)

A

Original image frame input process

The effective clock period is at least 312010

At least 2.08

6.7

-

B

Image block extraction process

Valid clock cycles up to 166400

Up to 0.111

0.372

-

C

Interpolation, HOG feature extraction process

About 82790

About 0.083

1.2

14.5

D

Location estimation process

About 386900

About 0.387

3.3

8.5

E

Color histogram extraction and matching process

About 35400

About 0.236

1.4

5.9

F

Scale estimation process

About 2716900

About 2.717

4.2

1.5

Total

At least 2.9

17.172

-

The maximum target size supported by this method is 256 * 256 pixels, and the input image size is 640 * 480 color image. When the algorithm module works in 150M clock, the feature time of extracting a HOG is 83us, the time of calculating the tracking position is 0.387ms, and the time of calculating the scale is 2.717ms. The calculating position and scale are independent of each other. In theory, the highest processing frame frequency can reach 286 Above FPS, LUT takes up 48% and storage resources 42%. It has better robustness to scale and deformation of the target, and less hardware resources. In the subsequent update, we will introduce the implementation idea and process of each module in turn, and attach the Verilog code of some modules. If there is something wrong, please leave a message for correction.

Welcome to the official account and exchange learning.

Published 6 original articles, won praise 19, visited 20000+

Tags: MATLAB Lambda Verilog network

Posted on Sun, 15 Mar 2020 01:29:09 -0400 by able