Actual combat | achieve binocular stereo matching to obtain depth map in real scene

At present, most stereo matching algorithms use standard image pairs provided by the standard test platform, such as the following two famous ones:


However, for children's shoes who want to try to take binocular pictures for stereo matching, obtain depth map and carry out three-dimensional reconstruction, more work needs to be done than using calibrated standard test image pairs. Therefore, bloggers feel it necessary to start with taking images with binocular cameras and go through the whole process.

It is mainly divided into four parts:

  • Camera calibration (including internal and external parameters)

  • Binocular image correction (including distortion correction and stereo correction)

  • Stereo matching algorithm obtains parallax map and depth map

  • Synthesis of virtual viewpoints using parallax map or depth map

Note: if there is no binocular camera, you can use a single camera to move in parallel, and the external parameters can be calculated through camera self calibration. I shoot with my mobile phone and try to move in parallel.

1, Camera calibration

1. Internal parameter calibration

Camera internal parameters reflect the projection relationship between camera coordinate system and image coordinate system. The calibration of camera internal parameters uses Zhang Zhengyou calibration method, which is simple and easy to operate. Please refer to Zhang Zhengyou's masterpiece a flexible new technology for camera calibration for specific principles. Of course, there will be a lot of information available on the Internet. MATLAB has a special camera calibration toolkit, OpenCV encapsulated camera calibration API, etc.

The internal parameters of the camera include fx, fy, cx, cy, and distortion coefficients [k1,k2,p1,p2,k3], which will not be described in detail. I use my mobile phone to shoot checkerboard images from various angles, as shown in the figure:

Use OpenCV3.4+VS2015 to calibrate the internal parameters of the mobile phone. The calibration results are as follows. The mobile phone lens is not a fish eye lens, so it can be calibrated by using an ordinary camera model:

Image resolution: 3968 x 2976. The order of the above calibration results is fx, fy, cx, cy,   k1, k2, p1, p2, k3, save to file for subsequent use.

2. External parameter calibration

The camera external parameters reflect the Rotation R and Translation t relationship between the camera coordinate system and the world coordinate system. If the internal parameters of the two cameras are known and R1, T1, R2 and T2 between each camera and the world coordinate system are known, the Rotation and Translation between the two cameras can be calculated, and the position conversion relationship from one camera coordinate system to another camera coordinate system can be found. The calibration board can also be used for camera external parameter calibration, but the left and right cameras can shoot the image of the same calibration board at the same time. Once the external parameters are calibrated, the structures of the two cameras must remain fixed, otherwise the external parameters will change and need to be calibrated again.

Then, it is difficult for the mobile phone to ensure that the same calibration board image can be taken and the relative position can be kept unchanged, because the position of the mobile phone will certainly change when it is used to take the actual test image later. Therefore, I use the external parameter self calibration method to perform the external parameter self calibration of the camera when shooting two images of the actual scene, so as to obtain the Rotation and Translation between the two camera positions at that time.

For example, I took such two images, which will be used for experiments of stereo matching and virtual viewpoint synthesis in the future.

① The camera internal parameters are used for distortion correction. The distortion degree of the mobile phone is very small. The two corrected pictures are as follows:

② Take the above two distortion corrected images as input, extract the matching feature point pairs, pts1 and pts2, using the optical flow method in OpenCV, and draw the following in the image:


③ Using the characteristic point pairs pts1 and pts2 and the internal parameter matrix camK, the essential matrix E is solved:

 cv::Mat E = cv::findEssentialMat(tmpPts1, tmpPts2, camK, CV_RANSAC);


④   The Rotation and Translation between two cameras, that is, the external parameters between two cameras, are calculated by using the essential matrix E solution. The following API functions are implemented in OpenCV. Please refer to the API documentation for details:

   cv::Mat R1, R2;    cv::decomposeEssentialMat(E, R1, R2, t);
    R = R1.clone();    t = -t.clone();


2, Binocular image correction

1. Distortion correction

Distortion correction has been introduced earlier. Distortion correction can be carried out by using distortion coefficient. Let's talk about stereo correction.

2. Stereo correction

① After obtaining the Rotation and Translation between the two cameras, the following API is used to correct the stereo epipolar line of the two images. Therefore, it is necessary to calculate the R and t required for the epipolar line correction of the two cameras, represented by R1, T1, R2 and T2, as well as the Perspective projection matrix P1 and P2:

cv::stereoRectify(camK, D, camK, D, imgL.size(), R, -R*t,  R1, R2, P1, P2, Q);

  ② After obtaining the above parameters, you can use the following API to correct the epipolar line and save the correction results locally:

cv::initUndistortRectifyMap(P1(cv::Rect(0, 0, 3, 3)), D, R1, P1(cv::Rect(0, 0, 3, 3)), imgL.size(), CV_32FC1, mapx, mapy);    cv::remap(imgL, recImgL, mapx, mapy, CV_INTER_LINEAR);    cv::imwrite("data/recConyL.png", recImgL);
    cv::initUndistortRectifyMap(P2(cv::Rect(0, 0, 3, 3)), D, R2, P2(cv::Rect(0, 0, 3, 3)), imgL.size(), CV_32FC1, mapx, mapy);    cv::remap(imgR, recImgR, mapx, mapy, CV_INTER_LINEAR);    cv::imwrite("data/recConyR.png", recImgR);

  The epipolar correction results are as follows. Check whether the epipolar correction results are accurate, which can be roughly estimated by observing whether several corresponding points are on the same line:


3, Stereo matching

1. SGBM algorithm obtains parallax map

After the left and right images after stereo correction are obtained, the matching points are on the same line. The BM algorithm or SGBM algorithm in OpenCV can be used to calculate the parallax map. Because the performance of SGBM algorithm is much better than BM algorithm, SGBM algorithm is used to obtain parallax map. The parameter settings in SGBM are as follows:

int numberOfDisparities = ((imgSize.width / 8) + 15) & -16;    cv::Ptr<cv::StereoSGBM> sgbm = cv::StereoSGBM::create(0, 16, 3);    sgbm->setPreFilterCap(32);    int SADWindowSize = 9;    int sgbmWinSize = SADWindowSize > 0 ? SADWindowSize : 3;    sgbm->setBlockSize(sgbmWinSize);    int cn = imgL.channels();    sgbm->setP1(8 * cn*sgbmWinSize*sgbmWinSize);    sgbm->setP2(32 * cn*sgbmWinSize*sgbmWinSize);    sgbm->setMinDisparity(0);    sgbm->setNumDisparities(numberOfDisparities);    sgbm->setUniquenessRatio(10);    sgbm->setSpeckleWindowSize(100);    sgbm->setSpeckleRange(32);    sgbm->setDisp12MaxDiff(1);    int alg = STEREO_SGBM;    if (alg == STEREO_HH)        sgbm->setMode(cv::StereoSGBM::MODE_HH);    else if (alg == STEREO_SGBM)        sgbm->setMode(cv::StereoSGBM::MODE_SGBM);    else if (alg == STEREO_3WAY)        sgbm->setMode(cv::StereoSGBM::MODE_SGBM_3WAY);    sgbm->compute(imgL, imgR, disp);

The left parallax map is calculated by default. If the right parallax map needs to be calculated, replace the three bold statements above with the first three statements below. Since the calculated parallax value is negative and the disp type is 16SC1, you need to take the absolute value and save it:

  sgbm->setMinDisparity(-numberOfDisparities);    sgbm->setNumDisparities(numberOfDisparities);    sgbm->compute(imgR, imgL, disp);
    disp = abs(disp);

The left and right parallax maps obtained by SGBM algorithm are as follows. The data type of the left parallax map is CV_16UC1, the data type of the right parallax map is CV_16SC1 (the unreliable parallax value in the parallax map in the SGBM is set to the minimum parallax (mindisp-1) * 16. Therefore, in this example, the unreliable parallax value in the left parallax map is set to - 16 and the truncation value is 0; the unreliable parallax value in the right parallax map is set to (- numberofdisparties-1) * 16, and the absolute value is (numberofdisparties + 1) * 16, so there will be great differences between the two maps):

Left parallax map (unreliable parallax value is 0)

Right parallax map (unreliable parallax value is (numberofdisparties + 1) * 16)

If the unreliable parallax value of the right parallax map is also set to 0, it is as follows

So far, the left parallax map and the right parallax map echo each other.

2. Parallax hole filling

Most of the unreliable parallax values in the parallax map are caused by occlusion or uneven illumination. Since the bull force, such as SGBM, also feels unreliable, it's better to fill it with nearby reliable parallax values instead of leaving it as a hole.

There are many ways to fill the hole. Here I detect the hole area and fill it with the mean of the nearby reliable parallax value. The filled parallax diagram is as follows:

Filled left parallax map                                                                            

Filled right parallax map


3. Convert parallax map to depth map

The unit of parallax is pixel, and the unit of depth is often millimeter (mm). According to the geometric relationship of parallel binocular vision (no drawing and derivation here, very simple), the following conversion formula between parallax and depth can be obtained:

 depth = ( f * baseline) / disp

In the above formula, depth represents depth map; f represents the normalized focal length, that is, fx in the internal parameter; Baseline is the distance between the optical centers of two cameras, called the baseline distance; disp is the parallax value. The following equations are known, and the depth value can be calculated.

Above, we use SGBM algorithm to obtain disparity map, and then convert it to depth map. The function code is as follows:

/*Function function: parallax map to depth map input: dispMap - parallax map, 8-bit single channel, CV_8UC1 K -- internal parameter matrix, float type output: depthMap -- depth map, 16 bit unsigned single channel, CV_16UC1*/void disp2Depth(cv::Mat dispMap, cv::Mat &depthMap, cv::Mat K){    int type = dispMap.type();
    float fx =<float>(0, 0);    float fy =<float>(1, 1);    float cx =<float>(0, 2);    float cy =<float>(1, 2);    float baseline = 65; //Baseline distance 65mm
    if (type == CV_8U)    {        const float PI = 3.14159265358;        int height = dispMap.rows;        int width = dispMap.cols;
        uchar* dispData = (uchar*);        ushort* depthData = (ushort*);        for (int i = 0; i < height; i++)        {            for (int j = 0; j < width; j++)            {                int id = i*width + j;                if (!dispData[id])  continue;  //Prevent 0 from dividing depthdata [ID] = USHORT ((float) FX * baseline / ((float) dispdata [ID]);}}}} else    {        cout << "please confirm dispImg's type!" << endl;        cv::waitKey(0);    }}

Note: png image format can save 16 bit unsigned precision, that is, the storage range is 0-65535. If it is in mm, it can represent the depth of about 65m at most, which is enough.

In the above code, I set the accuracy of the depth map to CV_16UC1, that is, ushort type. Set baseline to 65mm and save it in png format after conversion. If it is saved in image format such as jpg or bmp, the data will be truncated to 0-255. Therefore, png format is an ideal choice for saving depth map. (if you don't want to obtain an accurate depth map, you can set baseline to 1, so that you can obtain a relative depth map and the depth value is also a relative depth value.)

The converted depth map is as follows:

Left depth map                                                                                        

Right depth map  


The depth map after cavity filling is as follows:

Left depth map (after cavity filling)                                                       


Right depth map (after cavity filling)


Parallax map to depth map is completed.

Note: there are incorrect points in the parallax map and depth map. This paper is intended to introduce the whole process without paying special attention to the optimization of the algorithm. If you have great hope, please give me your advice.

Attachment: hole filling of parallax map and depth map

The steps are as follows:

① Take the parallax map dispImg as an example. Calculate the integral of the image and save the number n of all accumulated pixels at each integral value in the corresponding integral diagram (the pixels at the hole are not included in n because the pixel value at the hole is 0, which has no effect on the integral value, but will smooth the image).

② Multi-level mean filtering is adopted. First, use a large initial window to do mean filtering (there will be no more introduction to the realization of mean filtering by integral graph. Please refer to my previous blog) and assign values to holes in large areas. Then, in the next filtering, reduce the window size to half of the original, filter again using the original integral graph, and assign a value to the smaller hole (covering the original value); And so on, until the window size becomes 3x3, stop filtering and get the final result.

③ The multi-level filtering considers that for the initial large hole area, more neighborhood values need to be referred. If a small filtering window is used, it can not be completely filled. If all large windows are used, the image will be seriously smoothed. Therefore, the filter window is continuously adjusted according to the size of the hole. First assign a value to all holes with a large window, and then use the filter that gradually becomes a small window to cover the original value, which can not only ensure that the holes can be filled, but also ensure that the image will not be excessively smooth.

The function code of cavity filling is as follows for reference only:

void insertDepth32f(cv::Mat& depth){    const int width = depth.cols;    const int height = depth.rows;    float* data = (float*);    cv::Mat integralMap = cv::Mat::zeros(height, width, CV_64F);    cv::Mat ptsMap = cv::Mat::zeros(height, width, CV_32S);    double* integral = (double*);    int* ptsIntegral = (int*);    memset(integral, 0, sizeof(double) * width * height);    memset(ptsIntegral, 0, sizeof(int) * width * height);    for (int i = 0; i < height; ++i)    {        int id1 = i * width;        for (int j = 0; j < width; ++j)        {            int id2 = id1 + j;            if (data[id2] > 1e-3)            {                integral[id2] = data[id2];                ptsIntegral[id2] = 1;            }        }    }    // Integral interval for (int i = 0; I < height; + + I) {int Id1 = I * width; for (int j = 1; J < width; + + J) {int Id2 = Id1 + J; integral [Id2] + = integral [Id2 - 1]; ptsintegral [Id2] + = ptsintegral [Id2 - 1];}} for (int i = 1; I < height; + + I) {        int id1 = i * width;        for (int j = 0; j < width; ++j)        {            int id2 = id1 + j;            integral[id2] += integral[id2 - width];            ptsIntegral[id2] += ptsIntegral[id2 - width];        }    }    int wnd;    double dWnd = 2;    while (dWnd > 1)    {        wnd = int(dWnd);        dWnd /= 2;        for  (int i = 0; i < height; ++i)        {            int id1 = i * width;            for (int j = 0; j < width; ++j)            {                int id2 = id1 + j;                int left = j - wnd - 1;                int right = j + wnd;                int top = i - wnd - 1;                int bot = i + wnd;                left = max(0, left) ;                right = min(right, width - 1);                top = max(0, top);                bot = min(bot, height - 1);                int dx = right - left;                int dy = (bot - top)  * width;                int idLeftTop = top * width + left;                int idRightTop = idLeftTop + dx;                int idLeftBot = idLeftTop + dy;                int idRightBot = idLeftBot + dx;                int ptsCnt = ptsIntegral[idRightBot] + ptsIntegral[idLeftTop] - (ptsIntegral[idLeftBot] + ptsIntegral[idRightTop]) ;                double sumGray = integral[idRightBot] + integral[idLeftTop] - (integral[idLeftBot] + integral[idRightTop]);                if (ptsCnt <= 0)                {                    continue;                }                data[id2] = float(sumGray / ptsCnt);            }        }        int s = wnd / 2 * 2 + 1;        if (s > 201)         {            s = 201;        }        cv::GaussianBlur(depth, depth, cv::Size(s, s), s, s);    }}



Posted on Thu, 23 Sep 2021 01:12:05 -0400 by KC8Alen