Convert RGB-D to point cloud



1, Image data acquisition

2, Pose estimation of camera

1. Method of pose estimation

2.RGB-D point cloud splicing



       In these two days, a small experiment of RGB-D to point cloud and stitching will be done, which includes several small steps: image data acquisition, pose estimation, RGB-D to point cloud and point cloud stitching. It was also completed with the help of senior brother, so thank you again.

1, Image data acquisition

       For image data acquisition, I use Percipio depth camera, which outputs undistorted color map, undistorted color map and color to depth conversion map.

       When I first collected image data, I used the depth map automatically output in the routine, but the point cloud spliced later will be very different from the actual environment and can not be used. So with the help of my senior brother, I interpreted the code, found the correct depth map and output it. As can be seen from the following, the effect is much better than before, but there are still defects.

       Because the light in the laboratory also has a certain impact on data acquisition, I collect data after turning off the light, and the effect of depth map is better than that of turning on the light.

2, Pose estimation of camera

1. Method of pose estimation

       There are many methods for camera pose estimation, including PnP, EPnP, etc. I will not explain too much here. I use a pose estimation method of 2dto2d. The following is the relevant code, which is extracted from Professor Gao Bo's "Slam 14 lectures".

       Then let's first introduce 2D-2D: the principle of epipolar geometry

       From these two images, we can get several pairs of matched feature points. Now we use a pair of matched feature points and the corresponding relationship between the points on the two-dimensional image to solve the camera motion, that is And t.

       Let the centers of the two cameras be, . present There are characteristic points in the Corresponding to Medium . among The point is a point in 3D space,, The three points define a plane called the polar plane. , They are called poles. Called baseline. , It is called a polar line. Space point Pixels may appear in the ray Anywhere on the. IfThe location of the point is unknown, then Point in The possible position will be Come on. and The pixel position can be predicted. We can match the feature points in Same found in in Matching pixels .

       Suppose that in the coordinate system of the first frame image The spatial position coordinates of the point are

       Then the positions of the two pixels are

       Sometimes, we use homogeneous coordinates to represent the position of pixels. When using homogeneous coordinates, a vector will be equal to itself multiplied by any non-zero constant. This is usually used to express a projection relationship. For example,andThey are equal in the sense of homogeneous coordinates. We call this equality relationship as equality in the sense of scale, which is recorded as:

       Therefore, the above two projection relations can be written as


       Substitute into the above formula

       Both sides multiply left at the same timeAvailable

       Both sides multiply left at the same timeAvailable

       take, substitute into the above formula

       This formula is called epipolar constraint, and its geometric meaning is  ,,Three points are coplanar. Since we know the position coordinates of the two pixels and the internal parameter matrix of the camera, we can obtain the motion of the camera. The above is the general principle of the methodI will explain the solution in the next article.



#include <iostream>
#include <Eigen/Core>
#include <Eigen/Geometry>
#include <opencv2/core/core.hpp>
#include <opencv2/features2d/features2d.hpp>
#include <opencv2/highgui/highgui.hpp>
#include <opencv2/calib3d/calib3d.hpp>

// #include "extra.h" / / if OpenCV2 is used, use this line of code 
using namespace std;
using namespace cv;

void find_feature_matches (
    const Mat& img_1, const Mat& img_2,
    std::vector<KeyPoint>& keypoints_1,
    std::vector<KeyPoint>& keypoints_2,
    std::vector< DMatch >& matches );
//Function declaration, which is used to find feature points and match them
//Input img_ 1 img_ two

void pose_estimation_2d2d (
    std::vector<KeyPoint> keypoints_1,
    std::vector<KeyPoint> keypoints_2,
    std::vector< DMatch > matches,
    Mat& R, Mat& t );
//Function declaration, the function is used to estimate the pose change of the camera
//Enter keypoints_1 keypoints_2 matches
//Output the rotation matrix R of the camera and the translation matrix t of the camera

// Pixel coordinates to camera normalized coordinates
Point2d pixel2cam ( const Point2d& p, const Mat& K );

int main ( int argc, char** argv )
    if ( argc != 3 )
        cout<<"usage: pose_estimation_2d2d img1 img2"<<endl;
        return 1;
    //--Read image
    Mat img_1 = imread ( "/home/joey/joinpcl/color/1.png", CV_LOAD_IMAGE_COLOR );//The path when reading the image must be set correctly, otherwise the core dumped error may occur. Here I use the absolute path. You can enter pwd in the terminal to view the absolute path
    Mat img_2 = imread ( "/home/joey/joinpcl/color/6.png", CV_LOAD_IMAGE_COLOR );
    ofstream ofs;//The path when reading the image must be set correctly, otherwise the core dumped error may occur. Here I use the absolute path. You can enter pwd in the terminal to view the absolute path

    vector<KeyPoint> keypoints_1, keypoints_2; //Variables that define two KeyPoint data types
    vector<DMatch> matches; //Variables defining DMatch data types
    find_feature_matches ( img_1, img_2, keypoints_1, keypoints_2, matches ); //Call the OpenCV function, and the input is img_1,img_2. The output is keypoints_1,keypoints_2,matches
    cout<<"Totally"<<matches.size() <<"Group matching point"<<endl;

    //--Estimating motion between two images
    Mat R,t; //The matrix type in OpenCV is not compatible with the matrix type in Eigen
    Eigen::Matrix3d R_E; //Rotation matrix R of Eigen type_ E 3 * 3 double type
    Eigen::Vector3d t_E; //Translation matrix t of type Eigen_ E 3 * 1 double type
    pose_estimation_2d2d ( keypoints_1, keypoints_2, matches, R, t ); //Enter keypoints_1 keypoints_2 matches output R t
    cv2eigen(R,R_E); //Make a matrix type conversion from OpenCV to Eigen
    Eigen::Quaterniond q = Eigen::Quaterniond(R_E); //Convert the rotation matrix to quaternions because quaternions are required in pose.txt
    cout<<"quaterniond = "<<q.coeffs()<<endl; //The result output of quaternion is in the order of qx qy qz qw"/home/joey/joinpcl/pose1.txt",ios::out); //Open the pose1.txt file by writing
    ofs<<t<<" "<<q.coeffs()<<endl; //Store the translation matrix and quaternion in pose1.txt

    //--Verify that E=t^R*scale
    Mat t_x = ( Mat_<double> ( 3,3 ) <<
                0,            <double> ( 2,0 ),<double> ( 1,0 ),
      <double> ( 2,0 ),      0,            <double> ( 0,0 ),
      <double> ( 1,0 ),<double> ( 0,0 ),      0 );


    //--Verify polar constraints
    //Because I didn't need to verify the polar constraint, I commented it out
    /*Mat K = ( Mat_<double> ( 3,3 ) << 520.9, 0, 325.1, 0, 521.0, 249.7, 0, 0, 1 );
    for ( DMatch m: matches )
        Point2d pt1 = pixel2cam ( keypoints_1[ m.queryIdx ].pt, K );
        Mat y1 = ( Mat_<double> ( 3,1 ) << pt1.x, pt1.y, 1 );
        Point2d pt2 = pixel2cam ( keypoints_2[ m.trainIdx ].pt, K );
        Mat y2 = ( Mat_<double> ( 3,1 ) << pt2.x, pt2.y, 1 );
        Mat d = y2.t() * t_x * R * y1;
        cout << "epipolar constraint = " << d << endl;
    return 0;

void find_feature_matches ( const Mat& img_1, const Mat& img_2,
                            std::vector<KeyPoint>& keypoints_1,
                            std::vector<KeyPoint>& keypoints_2,
                            std::vector< DMatch >& matches )
    Mat descriptors_1, descriptors_2; //Initializes two descriptor submatrix variables
    // I use opencv3, so I can use these two lines of code
    Ptr<FeatureDetector> detector = ORB::create();
    Ptr<DescriptorExtractor> descriptor = ORB::create();
    // If you are installing opencv2, use the following two lines of code 
    // Ptr<FeatureDetector> detector = FeatureDetector::create ( "ORB" );
    // Ptr<DescriptorExtractor> descriptor = DescriptorExtractor::create ( "ORB" );
    Ptr<DescriptorMatcher> matcher  = DescriptorMatcher::create ( "BruteForce-Hamming" ); //Create a match and judge the match result by hamming distance
    //--Step 1: detect the corner position of Oriented FAST
    detector->detect ( img_1,keypoints_1 );
    detector->detect ( img_2,keypoints_2 );

    //--Step 2: calculate the BRIEF descriptor according to the corner position
    descriptor->compute ( img_1, keypoints_1, descriptors_1 );
    descriptor->compute ( img_2, keypoints_2, descriptors_2 );

    //--Step 3: match the BRIEF descriptors in the two images, using Hamming distance
    vector<DMatch> match;
    //BFMatcher matcher ( NORM_HAMMING );
    matcher->match ( descriptors_1, descriptors_2, match );

    //--Step 4: match point pair filtering
    double min_dist=10000, max_dist=0;

    //Find out the minimum distance and maximum distance between all matches, that is, the distance between the most similar and the least similar two groups of points
    for ( int i = 0; i < descriptors_1.rows; i++ )
        double dist = match[i].distance;
        if ( dist < min_dist ) min_dist = dist;
        if ( dist > max_dist ) max_dist = dist;

    printf ( "-- Max dist : %f \n", max_dist );
    printf ( "-- Min dist : %f \n", min_dist );

    //When the distance between descriptors is greater than twice the minimum distance, it is considered that there is an error in matching. However, sometimes the minimum distance will be very small. Set an empirical value of 30 as the lower limit
    for ( int i = 0; i < descriptors_1.rows; i++ )
        if ( match[i].distance <= max ( 2*min_dist, 30.0 ) )
            matches.push_back ( match[i] );

Point2d pixel2cam ( const Point2d& p, const Mat& K )
    return Point2d
               ( p.x -<double> ( 0,2 ) ) /<double> ( 0,0 ),
               ( p.y -<double> ( 1,2 ) ) /<double> ( 1,1 )

void pose_estimation_2d2d ( std::vector<KeyPoint> keypoints_1,
                            std::vector<KeyPoint> keypoints_2,
                            std::vector< DMatch > matches,
                            Mat& R, Mat& t )
    // Camera internal parameters, TUM Freiburg2
    //Mat K = ( Mat_<double> ( 3,3 ) << 520.9, 0, 325.1, 0, 521.0, 249.7, 0, 0, 1 );
    //Camera internal parameter. The camera I use is different from this one, so it is the internal parameter calibrated by myself
    Mat K = ( Mat_<double> ( 3,3 ) << 886.2391758072229, 0, 639.496654129007, 0, 884.5569261594919, 338.8017291870644, 0, 0, 1 );

    //--Convert matching points to vector < point2f >
    vector<Point2f> points1;
    vector<Point2f> points2;

    for ( int i = 0; i < ( int ) matches.size(); i++ )
        points1.push_back ( keypoints_1[matches[i].queryIdx].pt );
        points2.push_back ( keypoints_2[matches[i].trainIdx].pt );

    //--Calculation basis matrix
    Mat fundamental_matrix;
    fundamental_matrix = findFundamentalMat ( points1, points2, CV_FM_8POINT );
    cout<<"fundamental_matrix is "<<endl<< fundamental_matrix<<endl;

    //--Compute essential matrix
    Point2d principal_point ( 325.1, 249.7 );	//Camera optical center, TUM dataset, calibration value
    double focal_length = 521;			//Camera focal length, TUM dataset, calibration value
    Mat essential_matrix;
    essential_matrix = findEssentialMat ( points1, points2, focal_length, principal_point );
    cout<<"essential_matrix is "<<endl<< essential_matrix<<endl;

    //--Calculate homography matrix
    Mat homography_matrix;
    homography_matrix = findHomography ( points1, points2, RANSAC, 3 );
    cout<<"homography_matrix is "<<endl<<homography_matrix<<endl;

    //--Restore rotation and translation information from essential matrix
    recoverPose ( essential_matrix, points1, points2, R, t, focal_length, principal_point );
    cout<<"R is "<<endl<<R<<endl;
    cout<<"t is "<<endl<<t<<endl;


2.RGB-D point cloud splicing

The code is as follows (example):

If you can't understand the part of the depth map to world coordinates in this code, you can comment below, and I'll give you feedback in time.

#include <iostream>
#include <fstream>
using namespace std;
#include <opencv2/core/core.hpp>
#include <opencv2/highgui/highgui.hpp>
#include <Eigen/Geometry> 
#include <boost/format.hpp>
#include <pcl/point_types.h> 
#include <pcl/io/pcd_io.h> 
#include <pcl/visualization/pcl_visualizer.h>

int main( int argc, char** argv )
    vector<cv::Mat> colorImgs, depthImgs;    // Color map and depth map
    vector<Eigen::Isometry3d, Eigen::aligned_allocator<Eigen::Isometry3d>> poses;         // Camera pose
    ifstream fin("/home/joey/joinpcl/pose.txt");
    if (!fin)
        cerr<<"Please be there pose.txt Run this program in the directory of"<<endl;
        return 1;
    for ( int i=0; i<5; i++ )
        boost::format fmt( "/home/joey/joinpcl/%s/%d.%s" ); //Image file format
        colorImgs.push_back( cv::imread( (fmt%"color"%(i+1)%"png").str() ));
        depthImgs.push_back( cv::imread( (fmt%"depth"%(i+1)%"pgm").str(), -1 )); // -1 means to read the original image without modifying the image
        double data[7] = {0};
        for ( auto& d:data )
        Eigen::Quaterniond q( data[6], data[3], data[4], data[5] );
        Eigen::Isometry3d T(q);
        T.pretranslate( Eigen::Vector3d( data[0], data[1], data[2] ));
        poses.push_back( T );
    // Calculate point cloud and splice
    // Camera internal parameters 
    double cx = 639.5;
    double cy = 338.8;
    double fx = 886.2;
    double fy = 884.6;
    double depthScale = 1000.0;
    cout<<"Converting image to point cloud..."<<endl;
    // Define the format used by the point cloud: XYZRGB is used here
    typedef pcl::PointXYZRGB PointT; 
    typedef pcl::PointCloud<PointT> PointCloud;
    // Create a new point cloud
    PointCloud::Ptr pointCloud( new PointCloud ); 
    for ( int i=0; i<5; i++ )
        cout<<"Convert in image: "<<i+1<<endl; 
        cv::Mat color = colorImgs[i]; 
        cv::Mat depth = depthImgs[i];
        Eigen::Isometry3d T = poses[i];
        for ( int v=0; v<color.rows; v++ )
            for ( int u=0; u<color.cols; u++ )
                unsigned int d = depth.ptr<unsigned short> ( v )[u]; // Depth value
                if ( d==0 ) continue; // 0 means no measurement
                Eigen::Vector3d point; 
                point[2] = double(d)/depthScale; 
                point[0] = (u-cx)*point[2]/fx;
                point[1] = (v-cy)*point[2]/fy; 
                Eigen::Vector3d pointWorld = T*point;
                PointT p ;
                p.x = pointWorld[0];
                p.y = pointWorld[1];
                p.z = pointWorld[2];
                p.b =[ v*color.step+u*color.channels() ];
                p.g =[ v*color.step+u*color.channels()+1 ];
                p.r =[ v*color.step+u*color.channels()+2 ];
                pointCloud->points.push_back( p );
    pointCloud->is_dense = false;
    cout<<"Point cloud common"<<pointCloud->size()<<"Points."<<endl;
    pcl::io::savePCDFileBinary("map.pcd", *pointCloud );
    return 0;


cmake_minimum_required( VERSION 2.8 )
project( vo1 )

set( CMAKE_BUILD_TYPE "Release" )
set( CMAKE_CXX_FLAGS "-std=c++11 -O3" )

# Add cmake module to use g2o

find_package(OpenCV REQUIRED)

find_package(PCL REQUIRED COMPONENT common io)

find_package( OpenCV 3.1 REQUIRED )
# find_package( OpenCV REQUIRED ) # use this if in OpenCV2 
find_package( G2O REQUIRED )
find_package( CSparse REQUIRED )


add_executable( pose_estimation_2d2d pose_estimation_2d2d.cpp )
target_link_libraries( pose_estimation_2d2d ${OpenCV_LIBS} )

add_executable(joinMap joinMap.cpp)
target_link_libraries(joinMap ${OpenCV_LIBS} ${PCL_LIBRARIES})

This article refers to Dr. Gao Xiang's "Fourteen lectures on visual SLAM".


If there are any mistakes, I hope you will point out! If you have any questions, you can also leave a message, discuss with each other and make common progress!

Tags: Python AI

Posted on Mon, 06 Sep 2021 17:13:41 -0400 by Pnop