Multimodal visual position recognition based on dynamic invariant perception

Original link: Open source soon! Multimodal visual position recognition based on dynamic invariant perception
Title: multi modal visual place recognition in dynamics invariant perception space

**From: * * School of automation, Southeast University

**Author: * * Lin Wu, Teng Wang and Changyin Sun


Code address (to be open source):


Visual position recognition is one of the essential and challenging problems in the field of robot. In this newsletter, we first explore the use of semantic and visual multimodal fusion in dynamic invariant space to improve location recognition in dynamic environment. Firstly, we design a novel deep learning architecture to generate static semantic segmentation and recover the static image directly from the corresponding dynamic image. Then, we use spatial pyramid matching model (SPM) to encode static semantic segmentation into feature vectors, while for static images, we use the popular word bag model (BoW) to encode. Based on the above multimodal features, we measure the similarity between the query image and the target landmark through the joint similarity of semantic and visual coding. A large number of experiments show the effectiveness and robustness of the proposed method in dynamic environment.

Visual position recognition

Visual position recognition (VPR), as a key component of SLAM system, is a task that can help robot determine whether it is located in the place it has visited previously. The current work usually regards it as an image retrieval task to match the current observation with a set of reference landmarks, and designs various feature descriptors to measure the similarity of landmarks. These methods usually assume that the system runs in a static environment. However, the real world is complex and dynamic. The existence of dynamic objects makes the appearance of the scene inconsistent at different times, which increases the error of feature matching.

Dynamic invariant perception

Dynamic invariant perception refers to the elimination and transformation of dynamic content (such as pedestrians and vehicles) into corresponding static content in a dynamic scene. Typical work includes empty cities: a dynamic object invariant space for visual Slam (IEEE Transactions on Robotics,2020). On this basis, we have made some improvements and proposed a coarse to fine approach for dynamic to static image translation (Pattern Recognition, 2021). In the IEEE-SPL express, we design a novel deep neural network architecture to directly infer static semantics (i.e. static semantic segmentation graph) and static images from the input static scene images. In particular, we also use static semantics as a priori to improve the quality of static image generation. The static semantic segmentation results and static image conversion effects are shown in Figure 2 and figure 3 (the experimental data set is created by driverless simulator CARLA).

Visual position recognition experiment

In order to compare with the VPR recall rate of the current mainstream image conversion methods, we use Pix2Pix, MGAN, SRMGAN and SSGGNet to restore the static image, and then extract the BoW feature from it to measure the image similarity. The recall accuracy of different models is given in the table. In contrast, our method uses BoW and SPM coding at the same time, which performs best, and greatly improves the recall rate of the second SSGGNet BoW, which fully reflects the importance of semantic features based on SPM. In addition, SSGGNet BoW is better than Pix2Pix BoW, MGAN BoW and SRMGAN BoW, which further verifies the effectiveness of using static semantics to guide static image generation.

Related papers

T. Wang, L. Wu and C. Sun, "A coarse-to-fine approach for dynamic-to-static image translation," in Pattern Recognition, 2022, doi: 10.1016/j.patcog.2021.108373.

L. Wu, T. Wang and C. Sun, "Multi-Modal Visual Place Recognition in Dynamics-Invariant Perception Space," in IEEE Signal Processing Letters, 2021, doi: 10.1109/LSP.2021.3123907.

B. Bescos, C. Cadena and J. Neira, "Empty Cities: A Dynamic-Object-Invariant Space for Visual SLAM," in IEEE Transactions on Robotics, 2021, doi: 10.1109/TRO.2020.3031267.

P. Isola, J. Zhu, T. Zhou and A. Efros, "Image-to-Image Translation with Conditional Adversarial Networks", CVPR, 2017,


Exclusive heavyweight course!

1, VINS:Mono+Fusion [SLAM Interviewer: look at your resume VINS,Please push the pre integral on site!](

2,VIO Course:[VIO Best open source algorithm: ORB-SLAM3 Super full analysis course heavy upgrade!](

3,3D image reconstruction course (phase 2):[Visual geometry 3D reconstruction tutorial (phase 2): dense reconstruction, surface reconstruction, point cloud fusion, texture mapping](

4,[Heavy attack! be based on LiDAR Multisensor fusion SLAM Series of tutorials: LOAM,LeGO-LOAM,LIO-SAM](

5,Systematic and comprehensive camera calibration course:[Monocular/fisheye/binocular/Array camera calibration: principle and Practice](

6,vision SLAM Essential foundation (phase 2):[vision SLAM Required Foundation: ORB-SLAM2 Detailed source code](

7,In depth 3D reconstruction course:[3D reconstruction learning route based on deep learning](

8,Laser positioning+Mapping course:[laser SLAM frame Cartographer Course 90+All videos are online! Suitable for service robots!](

Link:[Open source soon! Multimodal visual position recognition based on dynamic invariant perception](

The best in the country SLAM,3D visual learning community↓

Link:[Open source soon! Multimodal visual position recognition based on dynamic invariant perception](


#### Technology exchange wechat group

Welcome to join official account readers and communicate with colleagues. SLAM,3D vision, sensors, automatic driving, computational photography, detection, segmentation, recognition, medical imaging GAN,Algorithm competition and other wechat groups, please add wechat signals chichui502 Or add a group at the bottom of the scan and note: "name"/nickname+school/company+Research direction ". Please note according to the format, otherwise it will not pass. After adding successfully, you will be invited to enter the relevant wechat group according to the research direction. Do not send advertisements in the group, otherwise you will be invited out of the group. Thank you for your understanding~

Contributions and cooperation are also welcome:

Link:[Open source soon! Multimodal visual position recognition based on dynamic invariant perception](


Scan the video number and watch the video show of the latest technology landing and open source solutions ↓

Video number link:[Open source soon! Multimodal visual position recognition based on dynamic invariant perception](


—   Copyright notice -

The original content of the official account is computer vision. life All non original words, pictures, audio and video materials collected, sorted out and reproduced with authorization from public channels belong to the original author. If they infringe, please contact us and will be deleted in time.

Tags: OpenCV Machine Learning Computer Vision Deep Learning slam

Posted on Wed, 01 Dec 2021 00:04:23 -0500 by allex01