Visual SLAM lesson 14 learning notes -- Lesson 8 back end optimization

        The trajectory and map estimated by the front end (visual odometer) are inaccurate for a long time due to the cumulative error. Therefore, we hope to build a larger-scale optimization problem for the overall situation to obtain the optimal trajectory and map. There are two main solutions:

(1) Kalman filter based on Markov hypothesis: Markov hypothesis can be simply understood as "the current time state is only related to the previous time". For the Kalman filter of SLAM problem (nonlinear), the maximum a posteriori estimation under single linear approximation is given, or the result of one iteration in the optimization process.

(2) Nonlinear optimization: nonlinear optimization considers the current and previous states at all times, and optimizes the camera pose at all times and the spatial position of each feature point. The optimal solution is obtained by beam adjustment (BA).

        In general, it is considered that nonlinear optimization algorithm has obvious advantages over filter, so we focus on BA solution and graph optimization.

1. Problem analysis

         Firstly, the global BA is still a problem of minimizing the re projection error in essence: combined with the external parameters (pose) and internal parameters of the camera at any time and the three-dimensional coordinates of the captured feature points, calculate the coordinates projected into the image pixel coordinate system, and minimize the error with the actual extraction results. The variables involved are estimated by the front end. In this process, the camera parameters and the coordinates of road signs (feature points) are adjusted at the same time.

          After determining the optimization problem, we can generally use common optimization algorithms to solve it directly, calculate Jacobian matrix and Hesse matrix to solve the incremental equation. However, considering all the past states and road signs, the dimension of the equation is very large, which is difficult to meet the real-time performance, and optimization is needed in the solution process. Considering the sparsity of H matrix, Schur elimination (also known as marginalization) is used to significantly reduce the amount of calculation and accelerate the operation. However, in practical application, both Ceres and g2o libraries can help us complete this process.

        Finally, in order to weaken the influence of abnormal data on the whole optimization process and improve robustness, several commonly used Luban kernel functions are proposed to ensure that the error of each edge will not be too large to cover up other edges, so as to make the optimization result more robust.

2. Sample code analysis

        The first is data reading. Here, we choose to create a BALProblem class to manage the read data.

///Read BAL dataset from file
class BALProblem {
    ///Load file data
    explicit BALProblem(const std::string &filename, bool use_quaternions = false);

    ~BALProblem() {
        delete[] point_index_;
        delete[] camera_index_;
        delete[] observations_;
        delete[] parameters_;

    ///Store data to. txt
    void WriteToFile(const std::string &filename) const;

    ///Storing point cloud files
    void WriteToPLYFile(const std::string &filename) const;
    void Normalize();
    ///Noise data
    void Perturb(const double rotation_sigma,
                 const double translation_sigma,
                 const double point_sigma);
    ///Number of camera parameters: use quaternion
    int camera_block_size() const { return use_quaternions_ ? 10 : 9; }
    ///Number of feature point parameters: 3
    int point_block_size() const { return 3; }
    ///Number of camera angles
    int num_cameras() const { return num_cameras_; }
    ///Key points
    int num_points() const { return num_points_; }
    ///Number of observation results: how many feature points are observed from multiple perspectives (the same feature point appears multiple times)
    int num_observations() const { return num_observations_; }
    ///Number of parameters to be estimated
    int num_parameters() const { return num_parameters_; }
    ///Observation result: first address of feature point number
    const int *point_index() const { return point_index_; }
    ///Observation result: camera angle number first address
    const int *camera_index() const { return camera_index_; }
    ///Observation result: first address of observation point coordinate
    const double *observations() const { return observations_; }
    ///First address of all parameters to be estimated (camera pose parameters + feature point coordinates)
    const double *parameters() const { return parameters_; }
    ///Camera pose parameter first address
    const double *cameras() const { return parameters_; }
    ///Feature point parameter first address
    const double *points() const { return parameters_ + camera_block_size() * num_cameras_; }
    ///Mutable mutable
    double *mutable_cameras() { return parameters_; }

    double *mutable_points() { return parameters_ + camera_block_size() * num_cameras_; }
    ///Find corresponding information
    double *mutable_camera_for_observation(int i) {
        return mutable_cameras() + camera_index_[i] * camera_block_size();

    double *mutable_point_for_observation(int i) {
        return mutable_points() + point_index_[i] * point_block_size();

    const double *camera_for_observation(int i) const {
        return cameras() + camera_index_[i] * camera_block_size();

    const double *point_for_observation(int i) const {
        return points() + point_index_[i] * point_block_size();

    void CameraToAngelAxisAndCenter(const double *camera,
                                    double *angle_axis,
                                    double *center) const;

    void AngleAxisAndCenterToCamera(const double *angle_axis,
                                    const double *center,
                                    double *camera) const;

    int num_cameras_;
    int num_points_;
    int num_observations_;
    int num_parameters_;
    bool use_quaternions_;

    int *point_index_;      // point index corresponding to each observation
    int *camera_index_;     // camera index corresponding to each observation
    double *observations_;
    double *parameters_;

        A fscanf function is used when reading robust data. Its feature is that the file pointer automatically moves down after reading, which can explain how the whole process reads all the data in the file.

void FscanfOrDie(FILE *fptr, const char *format, T *value) {
    //fscanf: the file pointer moves down automatically
    int num_scanned = fscanf(fptr, format, value);
    if (num_scanned != 1)
        std::cerr << "Invalid UW data file. ";

        The contents of the read problem-16-22106-pre.txt file include:

(1) First read the first three numbers: the number of camera viewing angles, the number of feature points and the number of observation results, that is, 16 images. A total of 22106 feature points were extracted, and these feature points appeared 83718 times (the same feature point repeated in multiple viewing angles).

(2) Observation results: four data in each line, including image number (camera angle number), feature point number and pixel coordinates. For example, "10 2"     5.826200e + 02 "3.637200e + 02" indicates that the imaging coordinate of No. 2 feature point in No. 10 image is (582.6363.7).

(3) Finally, there are camera parameters and road marking world coordinates: there are 9 camera parameters for each viewing angle, including 3-axis rotation angle, 3-axis translation vector, focal length and 2 distortion coefficients. If they are converted to quaternion to express rotation, they correspond to 10 parameters. The world coordinates of each road marking point include three parameters.

        The solution method mainly focuses on the application of g2o. In the previous examples, most applications are "one node + multiple unary edges", that is, only one set of parameters are optimized, while the global back-end optimization is "two types of multiple nodes + multiple binary edges". There are two types of nodes: camera parameter node and road marking coordinate node.

One node + multiple unary edges
Two types of multiple nodes + multiple binary edges

        First define the camera parameter structure:

///Structure of attitude and internal parameters
struct PoseAndIntrinsics {
    PoseAndIntrinsics() {}

    /// set from given data address
    explicit PoseAndIntrinsics(double *data_addr) {
        rotation = SO3d::exp(Vector3d(data_addr[0], data_addr[1], data_addr[2]));//Rotation vector
        translation = Vector3d(data_addr[3], data_addr[4], data_addr[5]);//Translation vector
        focal = data_addr[6];//focal length
        k1 = data_addr[7];//Distortion coefficient
        k2 = data_addr[8];

    ///Put estimates into memory
    void set_to(double *data_addr) {
        auto r = rotation.log();
        for (int i = 0; i < 3; ++i) data_addr[i] = r[i];
        for (int i = 0; i < 3; ++i) data_addr[i + 3] = translation[i];
        data_addr[6] = focal;
        data_addr[7] = k1;
        data_addr[8] = k2;

    SO3d rotation;
    Vector3d translation = Vector3d::Zero();
    double focal = 0;
    double k1 = 0, k2 = 0;

          Defining camera parameter nodes: including the re projection process

///The vertex of pose plus camera internal parameters, 9-dimensional, the first three-dimensional is so3, followed by t, f, k1, k2
class VertexPoseAndIntrinsics : public g2o::BaseVertex<9, PoseAndIntrinsics> {

    VertexPoseAndIntrinsics() {}

    virtual void setToOriginImpl() override {
        _estimate = PoseAndIntrinsics();
    //When updating variables, the rotation vector uses exponential mapping, and other quantities are directly +
    virtual void oplusImpl(const double *update) override {
        _estimate.rotation = SO3d::exp(Vector3d(update[0], update[1], update[2])) * _estimate.rotation;
        _estimate.translation += Vector3d(update[3], update[4], update[5]);
        _estimate.focal += update[6];
        _estimate.k1 += update[7];
        _estimate.k2 += update[8];

    ///Project a point based on the estimate
    Vector2d project(const Vector3d &point) {
        Vector3d pc = _estimate.rotation * point + _estimate.translation;
        pc = -pc / pc[2];
        double r2 = pc.squaredNorm();
        double distortion = 1.0 + r2 * (_estimate.k1 + _estimate.k2 * r2);
        return Vector2d(_estimate.focal * distortion * pc[0],
                        _estimate.focal * distortion * pc[1]);

    virtual bool read(istream &in) {}

    virtual bool write(ostream &out) const {}

        Define the coordinate node of road marking points:         

class VertexPoint : public g2o::BaseVertex<3, Vector3d> {

    VertexPoint() {}

    virtual void setToOriginImpl() override {
        _estimate = Vector3d(0, 0, 0);

    virtual void oplusImpl(const double *update) override {
        _estimate += Vector3d(update[0], update[1], update[2]);

    virtual bool read(istream &in) {}

    virtual bool write(ostream &out) const {}

          Defining binary edges: calculating re projection errors

class EdgeProjection :
    public g2o::BaseBinaryEdge<2, Vector2d, VertexPoseAndIntrinsics, VertexPoint> {

    virtual void computeError() override {
        auto v0 = (VertexPoseAndIntrinsics *) _vertices[0];
        auto v1 = (VertexPoint *) _vertices[1];
        auto proj = v0->project(v1->estimate());
        _error = proj - _measurement;

    // use numeric derivatives
    virtual bool read(istream &in) {}

    virtual bool write(ostream &out) const {}


        Finally, add nodes and edges, and pay attention to the connection relationship between nodes and edges. Only when there is a re projection error in the road marking taken by the camera, can an edge be connected between the camera parameter node and the road marking node.

Tags: Computer Vision slam

Posted on Thu, 25 Nov 2021 22:34:11 -0500 by phuggett