Incremental Semantics-Aided Meshing from LiDAR-Inertial Odometry and RGB Direct Label Transfer

Generative AI & LLMs
Published: arXiv: 2604.09478v1
Authors

Muhammad Affan Ville Lehtola George Vosselman

Abstract

Geometric high-fidelity mesh reconstruction from LiDAR-inertial scans remains challenging in large, complex indoor environments -- such as cultural buildings -- where point cloud sparsity, geometric drift, and fixed fusion parameters produce holes, over-smoothing, and spurious surfaces at structural boundaries. We propose a modular, incremental RGB+LiDAR pipeline that generates incremental semantics-aided high-quality meshes from indoor scans through scan frame-based direct label transfer. A vision foundation model labels each incoming RGB frame; labels are incrementally projected and fused onto a LiDAR-inertial odometry map; and an incremental semantics-aware Truncated Signed Distance Function (TSDF) fusion step produces the final mesh via marching cubes. This frame-level fusion strategy preserves the geometric fidelity of LiDAR while leveraging rich visual semantics to resolve geometric ambiguities at reconstruction boundaries caused by LiDAR point-cloud sparsity and geometric drift. We demonstrate that semantic guidance improves geometric reconstruction quality; quantitative evaluation is therefore performed using geometric metrics on the Oxford Spires dataset, while results from the NTU VIRAL dataset are analyzed qualitatively. The proposed method outperforms state-of-the-art geometric baselines ImMesh and Voxblox, demonstrating the benefit of semantics-aided fusion for geometric mesh quality. The resulting semantically labelled meshes are of value when reconstructing Universal Scene Description (USD) assets, offering a path from indoor LiDAR scanning to XR and digital modeling.

Paper Summary

Problem
The main problem this paper addresses is the challenge of creating high-fidelity 3D mesh reconstructions from LiDAR-inertial scans in large, complex indoor environments. These environments are particularly difficult to reconstruct due to sparse point clouds, geometric drift, and fixed fusion parameters that lead to artifacts such as holes, over-smoothing, and spurious surfaces.
Key Innovation
The innovation of this work is a modular, incremental RGB+LiDAR pipeline that transfers semantic labels from RGB images to LiDAR-inertial odometry maps to improve geometric mesh reconstruction. This pipeline uses a vision foundation model to label each incoming RGB frame, which are then projected and fused onto the LiDAR-inertial odometry map. The final mesh is produced through a semantics-aware Truncated Signed Distance Function (TSDF) fusion step.
Practical Impact
This research has significant practical implications for various applications, including digital twins, architecture/engineering/construction workflows, immersive XR content for cultural heritage preservation, and robotics simulation. The resulting semantically labelled meshes can be exported as Universal Scene Description (USD) assets, offering a path from indoor LiDAR scanning to XR and digital modeling.
Analogy / Intuitive Explanation
Imagine trying to reconstruct a 3D model of a complex building using only a few scattered points. It's like trying to draw a picture from a handful of puzzle pieces. The innovation of this research is like adding a special tool that helps you identify the shapes and patterns of the building's features, such as walls, windows, and doors. This tool, called the vision foundation model, allows the system to transfer semantic labels from RGB images to LiDAR-inertial odometry maps, which improves the accuracy and completeness of the 3D mesh reconstruction.
Paper Information
Categories:
cs.CV cs.RO
Published Date:

arXiv ID:

2604.09478v1

Quick Actions