One hour makes you become a point cloud map player (fixed frame selection key frame method)

Creation time: November 1, 2021

abstract

  • Nowadays, with the rapid development of science and technology, many fields, whether driverless or virtual reality, need to build maps and develop them on this basis. Maps are mainly divided into sparse map, dense map and semantic map. They have different advantages, disadvantages and applications. Dense map occupies a very important position, which can be applied to navigation, obstacle avoidance, interaction and so on.

  • This paper focuses on the topic of dense point cloud mapping. RGBD dense mapping is carried out by using TUM dataset under the operating system of Ubuntu-18.04. This paper first briefly explains the principle of dense mapping, then briefly describes the RGBD camera, then summarizes the TUM dataset, explains some of its characteristics and how to download and use the dataset. Then, the article introduces the open source libraries needed for drawing construction, such as OpenCV, Eigen, etc., from the four aspects of introduction, features, installation and tutorial, and gives the websites to learn these libraries. After that, the article describes a simple drawing code based on fixed frame selection method based on C + + 1 And some codes are analyzed in detail, including explaining which open source libraries above need to be used, and showing the effect of drawing. Finally, the paper briefly summarizes the drawing code written by myself, and gives the direction of future exploration.

  • It is hoped that readers will have some harvest after reading this report and become a map builder. At the same time, I am very glad that readers can point out the shortcomings of the article and give valuable suggestions.

Drawing principle

  • Dense mapping is mainly divided into monocular dense mapping and RGBD dense mapping. RGBD camera is used for dense mapping 2 For shooting, it can directly measure the depth of pixels without calculating the depth through feature point matching, triangulation, epipolar search and block matching like monocular dense mapping.
  • Real time dense mapping is to start from the collected pictures one by one, use the RGB information and depth information of the pixel points, convert the pixel points from the pixel coordinates to the camera coordinates, and then convert the camera coordinates to the world coordinates according to the camera pose, so as to generate three-dimensional map points.
  • The quality of point cloud generated by mapping is affected by the internal parameters of camera, the quality of picture, the accuracy of camera pose and so on.
  • Mapping can not and does not need to make full use of the information of each picture, but often only processes the key frames.

RGBD camera

1. Introduction

  • RGB-D camera depth camera, also known as depth camera, is a kind of camera rising around 2010. It can measure the distance between the object and the camera through infrared structured light or ToF principle (the distance between the object and the camera is very important in drawing).

  • At present, commonly used RGB-D cameras include Kinect/Kinect V2, xton Pro live, RealSense, etc. people also use it to recognize faces on some mobile phones.

2. Features

  • Small amount of calculation: RGB-D camera measures the depth of pixels by physical means, saving a lot of computing resources.
  • Technology is not yet mature: at present, most RGB-D cameras still have many problems, such as narrow measurement range, large noise, small field of vision, easy to be disturbed by sunlight, unable to measure transmission materials, etc. in terms of drawing construction, RGB-D cameras are mainly used indoors, but it is difficult to be used outdoors.

TUM dataset

1. Introduction

  • The TUM dataset was published by the Computer Vision Lab of Munich University of technology.

  • The RGB map and depth map of the TUM dataset are generated by the RGBD camera 2 Shoot it.

  • The TUM dataset contains data of different structures and textures, moving objects and 3D object reconstruction.

  • The frame number and size of the TUM data set are different, the motion is fast or slow, and there is a handheld camera.

2. Features

  • The data set is very large and there are many scenes.
  • Camera pose accuracy: the TUM data set provides accurate camera pose through the motion capture system.
  • A set of python scripts are provided to process data sets: the scripts provided can align files in time, draw track diagrams, compare test results with real tracks, and so on.

3. Download

4. * tutorial

Open source library

1. OpenCV

1.1. Introduction

  • OpenCV provides a large number of open element image algorithms. It is a widely used image processing algorithm library in computer vision.
  • OpenCV consists of a series of C functions and a small number of C + + classes, and provides interfaces to Python, Ruby, MATLAB and other languages.
  • OpenCV is an open source computer vision and machine learning software library that can run on Linux, Windows, Android and Mac OS operating systems.

1.2. Features

  • Lightweight.
  • Fast execution and processing speed.

1.3. Installation

  • First install the OpenCV dependencies:
sudo apt-get install build-essential libgtk2.0-dev libvtk7-dev libjpeg-dev libtiff5-dev libjasper-dev libopenexr-dev libtbb-dev
  • If this error occurs:
errorE: unable to locate libjasper-dev
  • resolvent:
sudo apt-get update
sudo apt-get install software-properties-common
sudo apt install libjasper1 libjasper-dev
  • Enter the OpenCV official website and select the OpenCV for Linux version to download:

    OpenCV Download

  • To install OpenCV:
sudo make install
make -j4

1.4. * tutorial

2. Eigen

2.1. Introduction

  • Eigen is an open source library based on C + + templates. It provides fast linear algebraic operations related to matrices, and also supports numerical analysis and related algorithms.
  • Eigen supports multiple compilation environments.

2.2. Features

  • Wide scope of application: it supports all matrix operations, all standard numerical types, a variety of matrix decomposition and the solution of geometric features.
  • Fast: expression templates allow you to intelligently delete temporary variables and enable deferred evaluation when appropriate.
  • High reliability: the algorithm has been carefully selected and passed a large number of test suites.
  • Eigen is a library built purely with header files. When in use, you only need to import eigen header files.

2.3. Installation

  • Open the terminal and enter:
sudo apt-get install libeigen3-dev
sudo apt-get install libeigen3-dev

2.4. * tutorial

3. PCL

3.1. Introduction

  • PCL is a large cross platform open source C + + programming library. It implements a large number of general algorithms and efficient data structures related to point clouds, including point cloud acquisition, filtering, segmentation, registration, retrieval, feature extraction, recognition, tracking, surface reconstruction, visualization, etc.
  • PCL supports a variety of operating system platforms and can run on Windows, Linux, Android, Mac OS X and some embedded real-time systems.

3.2. Features

  • High real-time performance: PCL uses high-performance computing technology to improve the real-time performance of the program through parallelization.
  • Using shared pointers: all modules and algorithms in PCL transmit data through Boost shared pointers, avoiding the need to copy existing data in the system for many times.

3.3. Installation

  • Open the terminal and enter:
sudo apt-get install libpcl-dev pcl-tools

3.4. * tutorial

Mapping Practice

1. Description

2. Drawing framework

  • The drawing construction in this paper is divided into four steps, as shown in the figure below:

3. * framework analysis

3.1. Downloading data sets

  • Download the TUM dataset.
  • Open the terminal and use associate.py 7 Align RGB map, depth map and camera pose:
python associate.py rgb.txt depth.txt > associate.txt
python associate.py associate.txt groundtruth.txt > associate_with_groundtruth.txt

3.2. Data set preprocessing

  1. Read Associate_ with_ Information in the groundtruth.txt file( iostream):
ifstream fin("~/associate_with_groundtruth.txt");   // Read RGB map file name, depth map file name and camera pose alignment information
for (int i = 0; i < 12; i++) fin >> data[i];        // Enter alignment information
  1. Read aligned pictures and camera pose:
  • Data type of picture( OpenCV )Data type of camera pose( Eigen):
vector<cv::Mat> colorImgs, depthImgs;         // Color map and depth map
vector<Eigen::Isometry3d> poses;              // Camera pose
colorImgs.push_back(cv::imread("RGB Title"));	 // Read aligned RGB map
depthImgs.push_back(cv::imread("Depth map name"));    // Read aligned depth map
poses.push_back(T);							   // Read aligned camera pose

3.3. Calculate and splice point clouds

  1. Defines the format of points and point clouds( PCL):
typedef pcl::PointXYZRGB PointT;	          //Point type (world coordinates)
typedef pcl::PointCloud<PointT> PointCloud;	  //Point cloud type
  1. New point cloud ([ PCL]()):
PointCloud::Ptr pointCloud(new PointCloud);	  // Create a new point cloud
PointCloud::Ptr slide_win(new PointCloud);	  // Create a new point cloud compression window
  1. Select and process keys( Eigen):
  • Select pictures evenly as keyframes:
for (int i = 0; i < NImage; i+=IStep) {~}	  //Select keys evenly
  • Calculate the camera coordinates of pixels( Eigen):
Eigen::Vector3d point;	                     // Camera coordinates of pixels
point[2] = double(d) / depthScale;	         // Calculate camera coordinates
point[0] = (u - cx) * point[2] / fx;
point[1] = (v - cy) * point[2] / fy;
  • Calculate the world coordinates of pixels( Eigen):
Eigen::Vector3d pointWorld = T * point;		 // Convert camera coordinates to world coordinates
  • Inserts pixels into the point cloud of the current keyframe( PCL):
current->points.push_back(p);				 //Insert points into the current point cloud
  • Perform depth filtering on the point cloud of the current key frame( PCL):
pcl::StatisticalOutlierRemoval<PointT> statistical_filter;	// depth filter and statistical removal
statistical_filter.setInputCloud(current);	 // Depth filtering and statistical removal of current point cloud
statistical_filter.filter(*tmp);			 // Assign the result to temp point cloud
  • Insert the filtered point cloud into the compression window( PCL):
(*slide_win) += *tmp;						 // Insert the filtered point cloud into the compression window slide_win
  • Voxel filtering is performed on the compressed window that meets the conditions and stuffed into the point cloud( PCL):
pcl::VoxelGrid<PointT> voxel_filter;		 // voxel filter (voxel compression)
voxel_filter.setInputCloud(slide_win);		 // Voxel filtering the slide_win point cloud
voxel_filter.filter(*slide_win);			 // Assign the result to slide_win point cloud
(*pointCloud) += *slide_win;				 // Insert the filtered point cloud into the point cloud

3.4. Display video and point cloud

  1. Show video( OpenCV):
cv::imshow("img",colorImgs[i]);				  // Show video
  1. Show point cloud( PCL):
pcl::visualization::CloudViewer viewer ("Viewer");	// Window showing point cloud
viewer.showCloud(pointCloud);						// Show the generated point cloud

3.5. Final treatment

  1. Perform the final voxel filtering on the point cloud( PCL):
voxel_filter.setInputCloud(pointCloud);     		//Perform the final global compression on the point cloud
voxel_filter.filter(*pointCloud);					//Assign the result to the point cloud
  1. Save point cloud file( PCL):
pcl::io::savePCDFileBinary("map.pcd", *pointCloud); //Save point cloud file
  1. Wait until the display window is closed and end the program:
while (!viewer.wasStopped ()){}						//Make the viewer window not exit automatically
  1. If you want to see the cloud after the program, open the terminal:
pcl_viewer map.pcd

4. Total code of drawing construction based on fixed frame selection method

#include <iostream>
#include <fstream>

using namespace std;

#include <opencv2/core/core.hpp>
#include <opencv2/highgui/highgui.hpp>
#include <Eigen/Geometry>
#include <boost/format.hpp>
#include <pcl/point_types.h>
#include <pcl/io/pcd_io.h>
#include <pcl/filters/voxel_grid.h>
#include <pcl/visualization/pcl_visualizer.h>
#include <pcl/filters/statistical_outlier_removal.h>
#include <pcl/visualization/cloud_viewer.h>

int NImage = 2486;		//Number of pictures after two alignments
double NSlide = 1000000;	//Point cloud compression window
int IStep = 360;		//Select the step size of the key frame
double resolution = 0.005;		//Point cloud resolution

int main(int argc, char **argv) {
    vector<cv::Mat> colorImgs, depthImgs;    // Color map and depth map
    vector<Eigen::Isometry3d> poses;         // Camera pose

    // Use fin to read the file information of associate_with_ground truth.txt (RGB map file name, depth map file name, camera pose alignment information)
    ifstream fin("/home/lujunying/rgbd_dataset_freiburg3_long_office_household/associate_with_groundtruth.txt");
    if (!fin) {
        cerr << "cannot find pose file" << endl;
        return 1;
    }

    //Read the information of aligned pictures (RGB map, depth map, camera pose)
    for (int i = 0; i < NImage; i++) {	// Process each picture
        string data[12];// Enter the alignment information of a drawing
        for (int i = 0; i < 12; i++) fin >> data[i];
        colorImgs.push_back(cv::imread("/home/lujunying/rgbd_dataset_freiburg3_long_office_household/"+data[1])); // Read aligned RGB map (path to dataset)
        depthImgs.push_back(cv::imread("/home/lujunying/rgbd_dataset_freiburg3_long_office_household/"+data[3])); // Read aligned depth map (path to dataset)
        Eigen::Quaterniond q(atof(data[11].c_str()), atof(data[8].c_str()), atof(data[9].c_str()), atof(data[10].c_str()));	// Read aligned camera rotation matrix [quaternion (w, x, y, z)]
        Eigen::Isometry3d T(q);
        T.pretranslate(Eigen::Vector3d(atof(data[5].c_str()), atof(data[6].c_str()), atof(data[7].c_str())));	// Reads the translation vector of the aligned camera
        poses.push_back(T);	// The pose of the aligned camera
    }

    // Calculate point cloud and splice
    // Camera internal parameters (see camera internal parameters of TUM dataset)
    double cx = 320.1;
    double cy = 247.6;
    double fx = 535.4;
    double fy = 539.2;
    double depthScale = 5000.0;

    cout << "Converting image to point cloud..." << endl;

    // Define the format used by the point cloud: XYZRGB is used here
    typedef pcl::PointXYZRGB PointT;  //Point type (world coordinates)
    typedef pcl::PointCloud<PointT> PointCloud;	//Point cloud type

    // Create a new point cloud
    PointCloud::Ptr pointCloud(new PointCloud);
    // Create a new point cloud compression window
    PointCloud::Ptr slide_win(new PointCloud);
    
    // Select and process the key frame, generate and display the point cloud after twice filtering
    pcl::visualization::CloudViewer viewer ("Viewer");	// Open the window showing the point cloud
    for (int i = 0; i < NImage; i+=IStep) {	// Evenly select the picture as the key frame and process it
        PointCloud::Ptr current(new PointCloud);
        cout << "Convert in keyframes: " << i + 1 << endl;
        cv::Mat color = colorImgs[i];
        cv::Mat depth = depthImgs[i];
        Eigen::Isometry3d T = poses[i];
        for (int v = 0; v < color.rows; v++)	// Processing pixels on a picture
            for (int u = 0; u < color.cols; u++) {
                unsigned int d = depth.ptr<unsigned short>(v)[u];	//Get depth value
                if (d == 0) continue; // If the depth of the pixel is not measured, jump to the next pixel
                Eigen::Vector3d point;	// Camera coordinates of pixels
                point[2] = double(d) / depthScale;	// d divided by scale becomes the unit of meters
                point[0] = (u - cx) * point[2] / fx;
                point[1] = (v - cy) * point[2] / fy;
                Eigen::Vector3d pointWorld = T * point;	// Change camera coordinates to world coordinates
				
                PointT p; // Point (XYZRGB format)
                p.x = pointWorld[0];
                p.y = pointWorld[1];
                p.z = pointWorld[2];
                p.b = color.data[v * color.step + u * color.channels()];
                p.g = color.data[v * color.step + u * color.channels() + 1];
                p.r = color.data[v * color.step + u * color.channels() + 2];
                current->points.push_back(p);	//Insert points into the current point cloud
            }

        // depth filter and statistical removal
        cout << "current Point cloud common" << current->size() << "Points." << endl;
        PointCloud::Ptr tmp(new PointCloud);
        pcl::StatisticalOutlierRemoval<PointT> statistical_filter;
        statistical_filter.setMeanK(50);
        statistical_filter.setStddevMulThresh(1.0);
        statistical_filter.setInputCloud(current);	// Depth filtering and statistical removal of current point cloud
        statistical_filter.filter(*tmp);	// Assign the result to temp point cloud
        current.reset();	// Freeing current point cloud memory
        cout << "tmp Point cloud common" << tmp->size() << "Points." << endl;
        (*slide_win) += *tmp;	// Insert the filtered point cloud into the compression window slide_win
        tmp.reset();  // Free temp point cloud memory
        cout << "slide_win Point cloud common" << slide_win->size() << "Points." << endl;

        if(slide_win->size() > NSlide || slide_win->size() == NSlide || i == 0 || (i+IStep)>( NImage-1)){	//If slide_win satisfies the voxel compression condition
             // voxel filter (voxel compression)
             pcl::VoxelGrid<PointT> voxel_filter;
             voxel_filter.setLeafSize(resolution, resolution, resolution);
             voxel_filter.setInputCloud(slide_win);	// Voxel filtering the slide_win point cloud
             voxel_filter.filter(*slide_win);	// Assign the result to slide_win point cloud
             cout << "After compression slide_win Point cloud common" << slide_win->size() << "Points." << endl;
             (*pointCloud) += *slide_win;	// Insert the filtered slide_win point cloud into the point cloud
             slide_win.reset(new PointCloud);	// Free slide_win point cloud memory
        }

        cout << "pointCloud Point cloud common" << pointCloud->size() << "Points." << endl;
        cout << "-------------------------------------------------------------------------------" << endl;
		
        // Show video and generated point cloud
        cv::imshow("img",colorImgs[i]);	// Show video
        cv::waitKey(30);
        viewer.showCloud(pointCloud);	// Show point cloud
    }
    
    // Perform the final global compression on the point cloud
    cout << "After two filtering, the point clouds share" << pointCloud->size() << "Points." << endl;
    pointCloud->is_dense = false;	// This is useless
    pcl::VoxelGrid<PointT> voxel_filter;	// Final global compression
    voxel_filter.setLeafSize(resolution, resolution, resolution);
    voxel_filter.setInputCloud(pointCloud);
    voxel_filter.filter(*pointCloud);
    viewer.showCloud(pointCloud);
    
    //Save the generated point cloud file
    cout << "After the final two filtering, the point cloud has" << pointCloud->size() << "Points." << endl;
    pcl::io::savePCDFileBinary("map.pcd", *pointCloud);
    
    //End program
    while (!viewer.wasStopped ()){}	//Keep the viewer window and do not exit automatically
    return 0; 
}

5. Drawing effect

  • The final generated point cloud is shown below:

Present and future

  • In this paper, the method of window compression is used to build point cloud, which greatly reduces the unnecessary memory of point cloud and increases the scale of drawing.
  • The construction of this paper is based on TUM dataset For the real-time establishment, in the future, the knowledge of ROS will be used to build maps according to the videos taken by hand-held cameras.
  • In this paper, the fixed frame selection method is adopted for the selection of key frames, which has the advantages of easy to understand and simple programming, but has the disadvantages of high randomness and average quality. For example, when the motion is very slow, a large number of similar key frames will be selected, redundant, and many important key frames will be lost when the motion is fast. In the future, we will learn ORB-SLAM2[5] and GCNv2[6] The related algorithms of key frame extraction and dense reconstruction are improved.

reference

[1] Gao Xiang, Zhang Tao. Fourteen lectures on visual SLAM: from theory to practice. (2nd Edition). Beijing: Electronic Industry Press, 2019

[2] Stephen Prata. C++ Primer Plus. (6th Edition, Chinese version). Beijing: People's Posts and Telecommunications Press, July 2020

[3] T. Barfbot, State estimation fbr robotics: A matrix lie group approach, 2016.

[4] R. Mur-Artal, J. M. M. Montiel, and J. D. Tards, "ORB-SLAM: A versatile and accurate monocular slam system," IEEE Trans. Robot., vol. 31, no. 5, pp. 1147–1163, Oct. 2015.

[5] R. Mur-Artal and J. D. Tardós, "ORB-SLAM2: An open-source SLAM system for monocular, stereo and RGB-D cameras," IEEE Trans. Robot., vol. 33, no. 5, pp. 1255–1262, Oct. 2017.

[6] J. Tang, J. Folkesson, and P. Jensfelt, "GCNv2: Efficient Correspondence Prediction
for Real-Time SLAM," IEEE ROBOTICS AND AUTOMATION LETTERS, VOL. 4, NO. 4, OCTOBER 2019.

  1. See point 4 of drawing practice for the code. ↩︎

  2. Point I understand RGBD camera ↩︎ ↩︎

  3. Point I understand the TUM dataset ↩︎

  4. Point I know OpenCV ↩︎

  5. I know something about Eigen ↩︎

  6. Point I know PCL ↩︎

  7. Point I know associate.py ↩︎

Tags: C C++ slam

Posted on Sun, 21 Nov 2021 01:02:17 -0500 by phpPunk