Uni-SLAM:


Uncertainty-Aware Neural Implicit SLAM for Real-Time Dense Indoor Scene Reconstruction

WACV 2025

Can we really trust every pixel during reconstruction?

The quality of 3D reconstruction is inherently influenced by the input image qualities, specifically color and depth information. However, these two types of data contribute differently to the tracking and mapping processes. In this paper, we propose a predictive uncertainty framework to identify valuable pixels for these two processes. Additionally, we leverage this predictive uncertainty to guide a strategic bundle adjustment, thereby enhancing the overall accuracy and robustness of the reconstruction process.

Color

Depth Uncertainty


Abstract

Neural implicit fields have recently emerged as a powerful representation method for multi-view surface reconstruction due to their simplicity and state-of-the-art performance. However, reconstructing thin structures of indoor scenes while ensuring real-time performance remains a challenge for dense visual SLAM systems. Previous methods do not consider varying quality of input RGB-D data and employ fixed-frequency mapping process to reconstruct the scene, which could result in the loss of valuable information in some frames.

In this paper, we propose Uni-SLAM, a decoupled 3D spatial representation based on hash grids for indoor reconstruction. We introduce a novel defined predictive uncertainty to reweight the loss function, along with strategic local-to-global bundle adjustment. Experiments on synthetic and real-world datasets demonstrate that our system achieves state-of-the-art tracking and mapping accuracy while maintaining real-time performance. It significantly improves over current methods with a 25% reduction in depth L1 error and a 66.86% completion rate within 1 cm on the Replica dataset, reflecting a more accurate reconstruction of thin structures.


Method



Uni-SLAM consists of two threads, tracking and mapping. While tracking is performed every frame for RGB-D stream, besides constant mapping is performed every n frame constantly with global BA, activated additional mapping process is executed to capture local scene information based on uncertainty and co-visibility check with local BA and local loop closure optimization (LLCO). Our proposed pixel-level uncertainty method adaptively filters outlier pixels and reweights the loss function, enabling more precise localization during tracking and the reconstruction of color and geometric information in mapping.


Visualization


Qualitative comparison

Hover over image to move the zoomed in patch

Click on reference image to switch to a different image

Reference
Uni-SLAM (Ours)
Co-SLAM
BSLAM
PLGSLAM
Loopy-SLAM


Reference
Uni-SLAM (Ours)
Co-SLAM
BSLAM
ESLAM
Loopy-SLAM


Reference
Uni-SLAM (Ours)
Co-SLAM
BSLAM
Point-SLAM
Loopy-SLAM


Mesh viewer

BSLAM

Point-SLAM

Loopy-SLAM

Ours



Mesh viewer

BSLAM

Loopy-SLAM

Ours

GT

BibTeX

@misc{wang2024unislamuncertaintyawareneuralimplicit,
      title={Uni-SLAM: Uncertainty-Aware Neural Implicit SLAM for Real-Time Dense Indoor Scene Reconstruction}, 
      author={Shaoxiang Wang and Yaxu Xie and Chun-Peng Chang and Christen Millerdurai and Alain Pagani and Didier Stricker},
      year={2024},
      eprint={2412.00242},
      archivePrefix={arXiv},
      primaryClass={cs.CV},
      url={https://arxiv.org/abs/2412.00242}, 
}

Acknowledgements

This research has been partially funded by the EU projects CORTEX2 (GA Nr 101070192) and FLUENTLY (GA Nr 101058680). We sincerely thank Nice-SLAM, ESLAM, Co-SLAM, BSLAM, Point-SLAM, Loopy-SLAM for their excellent work, and Tianchen Deng for providing mesh on Replica of PLGSLAM!