ICRA 2024 papers for SLAM / Spatial AI researchers and engineers

2D/3D Visual Perception

  • Vision-Language Frontier Maps for Zero-Shot Semantic Navigation
  • LPS-Net: Lightweight Parameter-Shared Network for Point Cloud-Based Place Recognition
  • ZS6D: Zero-Shot 6D Object Pose Estimation Using Vision Transformers

Deep Learning for Visual Perception

  • Energy-Based Detection of Adverse Weather Effects in LiDAR Data
  • Fast and Robust Point Cloud Registration with Tree-Based Transformer
  • VOLoc: Visual Place Recognition by Querying Compressed Lidar Map
  • Advancements in 3D Lane Detection Using LiDAR Point Clouds: From Data Collection to Model Development

Visual-Inertial SLAM

  • Field-VIO: Stereo Visual-Inertial Odometry Based on Quantitative Windows in Agricultural Open Fields
  • JacobiGPU: GPU-Accelerated Numerical Differentiation for Loop Closure in Visual SLAM


  • Salience-Guided Ground Factor for Robust Localization of Delivery Robots in Complex Urban Environments
  • Block-Map-Based Localization in Large-Scale Environment
  • Colmap-PCD: An Open-Source Tool for Fine Image-To-Point Cloud Registration
  • COIN-LIO: Complementary Intensity-Augmented LiDAR Inertial Odometry
  • MegaParticles: Range-Based 6-DoF Monte Carlo Localization with GPU-Accelerated Stein Particle Filter
  • Tightly Coupled Range Inertial Localization on a 3D Prior Map Based on Sliding Window Factor Graph Optimization
  • SPOT: Point Cloud Based Stereo Visual Place Recognition for Similar and Opposing Viewpoints
  • Adaptive Outlier Thresholding for Bundle Adjustment in Visual SLAM
  • WayIL: Image-Based Indoor Localization with Wayfinding Maps
  • LocNDF: Neural Distance Field Mapping for Robot Localization
  • Looking beneath More: A Sequence-Based Localizing Ground Penetrating Radar Framework
  • SAGE-ICP: Semantic Information-Assisted ICP
  • HR-APR: APR-Agnostic Framework with Uncertainty Estimation and Hierarchical Refinement for Camera Relocalisation
  • Visual Localization in Repetitive and Symmetric Indoor Parking Lots Using 3D Key Text Graph
  • AnyLoc: Towards Universal Visual Place Recognition
  • CLIP-Loc: Multi-Modal Landmark Association for Global Localization in Object-Based Maps
  • LeagTag: An Elongated High-Accuracy Fiducial Marker for Tight Spaces


  • ERASOR++: Height Coding Plus Egocentric Ratio Based Dynamic Object Removal for Static Point Cloud Mapping
  • Uncertainty-Aware 3D Object-Level Mapping with Deep Shape Priors
  • H2-Mapping: Real-Time Dense Mapping Using Hierarchical Hybrid Representation
  • Fast and Robust Normal Estimation for Sparse LiDAR Scans


  • KDD-LOAM: Jointly Learned Keypoint Detector and Descriptors Assisted LiDAR Odometry and Mapping
  • Campus Map: A Large-Scale Dataset to Support Multi-View VO, SLAM and BEV Estimation
  • CURL-MAP: Continuous Mapping and Positioning with CURL Representation
  • Design and Evaluation of a Generic Visual SLAM Framework for Multi Camera Systems
  • HERO-SLAM: Hybrid Enhanced Robust Optimization of Neural SLAM
  • Accurate Loop Closure with Panoptic Information and Scan Context++ for LiDAR-Based SLAM
  • S-Graphs+: Real-Time Localization and Mapping Leveraging Hierarchical Representations
  • Visual Place Recognition: A Tutorial
  • Semantically Guided Feature Matching for Visual SLAM
  • Effectively Detecting Loop Closures Using Point Cloud Density Maps
  • VOOM: Robust Visual Object Odometry and Mapping Using Hierarchical Landmarks
  • LIO-EKF: High Frequency LiDAR-Inertial Odometry Using Extended Kalman Filters
  • Multi-LIO: A Lightweight Multiple LiDAR-Inertial Odometry System
  • The Importance of Coordinate Frames in Dynamic SLAM
  • VoxelMap++: Mergeable Voxel Mapping Method for Online LiDAR(-Inertial) Odometry
  • Efficient and Consistent Bundle Adjustment on Lidar Point Clouds
  • ImMesh: An Immediate LiDAR Localization and Meshing Framework
  • DORF: A Dynamic Object Removal Framework for Robust Static LiDAR Mapping in Urban Environments
  • 2D-3D Object Shape Alignment for Camera-Object Pose Compensation in Object-Visual SLAM
  • HPF-SLAM: An Efficient Visual SLAM System Leveraging Hybrid Point Features
  • IBoW3D: Place Recognition Based on Incremental and General Bag of Words in 3D Scans
  • Language-EXtended Indoor SLAM (LEXIS): A Versatile System for Real-Time Visual Scene Understanding
  • Tightly-Coupled LiDAR-Visual-Inertial SLAM and Large-Scale Volumetric-Occupancy Mapping


  • Open X-Embodiment: Robotic Learning Datasets and RT-X Models
  • SARA-RT: Scaling up Robotics Transformers with Self-Adaptive Robust Attention
  • Robi Butler: Multimodal Remote Interaction with Household Robotic Assistants
  • OK-Robot: What Really Matters in Integrating Open-Knowledge Models for
  • Scaling Robot Policy Learning via Zero-Shot Labeling with Foundation Models
  • What Do We Learn from a Large-Scale Study of Pre-Trained Visual Representations in Sim and Real Environments?

Localization and Mapping

  • VBR: A Vision Benchmark in Rome
  • Spatial-Aware Dynamic Lightweight Self-Supervised Monocular Depth Estimation
  • NISB-Map: Scalable Mapping with Neural Implicit Spatial Block
  • On the Study of Data Augmentation for Visual Place Recognition
  • MBFusion: A New Multi-Modal BEV Feature Fusion Method for HD Map Construction
  • OCC-VO: Dense Mapping Via 3D Occupancy-Based Visual Odometry for Autonomous Driving
  • NF-Atlas: Multi-Volume Neural Feature Fields for Large Scale LiDAR Mapping
  • LESS-Map: Lightweight and Evolving Semantic Map in Parking Lots for Long-Term Self-Localization
  • RO-MAP: Real-Time Multi-Object Mapping with Neural Radiance Fields
  • OctoMap-RT: Fast Probabilistic Volumetric Mapping Using Ray-Tracing GPUs

Robot Vision

  • NGEL-SLAM: Neural Implicit Representation-Based Global Consistent Low-Latency SLAM System
  • Ultrafast Square-Root Filter-Based VINS
  • Lifting 2D Pretrained Knowledge to 3D for Object Grounding

Representation Learning

  • NeRF-Loc: Transformer-Based Object Localization within Neural Radiance Fields
  • NeRF-Loc: Transformer-Based Object Localization within Neural Radiance Fields


  • Zero-Training LiDAR-Camera Extrinsic Calibration Method Using Segment Anything Model
  • Online Camera-LiDAR Calibration Monitoring and Rotational Drift Tracking
  • A Novel, Efficient and Accurate Method for Lidar Camera Calibration
  • An Extrinsic Calibration Method between LiDAR and GNSS/INS for Autonomous Driving
  • SGCalib: A Two-Stage Camera-LiDAR Calibration Method Using Semantic Information and Geometric Features
  • LiDAR-Camera Extrinsic Calibration with Hierachical and Iterative Feature Matching
  • CalibFormer: A Transformer-Based Automatic LiDAR-Camera Calibration Network

Sensor Fusion

  • Influence of Camera-LiDAR Configuration on 3D Object Detection for Autonomous Driving
  • SuperFusion: Multilevel LiDAR-Camera Fusion for Long-Range HD Map Generation
  • GICI-LIB: A GNSS/INS/Camera Integrated Navigation Library
  • LiDAR-Camera Calibration Using Intensity Variance Cost

Visual Perception and Learning

  • Bag of Views: An Appearance-Based Approach to Next-Best-View Planning for 3D Reconstruction
  • CopperTag: A Real-Time Occlusion-Resilient Fiducial Marker
  • Marrying NeRF with Feature Matching for One-Step Pose Estimation

Semantic Scene Understanding

  • Mapping High-Level Semantic Regions in Indoor Environments without Object Recognition
  • LLM-Grounder: Open-Vocabulary 3D Visual Grounding with Large Language Model As an Agent
  • Few-Shot Panoptic Segmentation with Foundation Models
  • Open-Fusion: Real-Time Open-Vocabulary 3D Mapping and Queryable Scene Representation
  • Mask4Former: Mask Transformer for 4D Panoptic Segmentation
  • Mask4D: End-To-End Mask-Based 4D Panoptic Segmentation for LiDAR Sequences
  • SG-RoadSeg: End-To-End Collision-Free Space Detection Sharing Encoder Representations Jointly Learned Via Unsupervised Deep Stereo
  • Collaborative Dynamic 3D Scene Graphs for Automated Driving
  • BroadBEV: Collaborative LiDAR-Camera Fusion for Broad-Sighted Bird’s Eye View Map Construction

Object Detection

  • Road Obstacle Detection Based on Unknown Objectness Scores
  • Efficient Semantic Segmentation for Compressed Video
  • Cross-Cluster Shifting for Efficient and Effective 3D Object Detection in Autonomous Driving
  • Dynablox: Real-Time Detection of Diverse Dynamic Objects in Complex Environments
  • 3D Object Detection with VI-SLAM Point Clouds: The Impact of Object and Environment Characteristics on Model Performance
  • Uplifting Range-View-Based 3D Semantic Segmentation in Real-Time with Multi-Sensor Fusion

Robotics with Large Language Models

  • Grasp-Anything: Large-Scale Grasp Dataset from Foundation Models

Autonomous Vehicle Navigation

  • Driving with LLMs: Fusing Object-Level Vector Modality for Explainable Autonomous Driving
  • Monocular Localization with Semantics Map for Autonomous Vehicles
  • Talk2BEV: Language-Enhanced Bird’s-Eye View Maps for Autonomous Driving
  • FastOcc: Accelerating 3D Occupancy Prediction by Fusing the 2D Bird’s-Eye View and Perspective View

Intelligent Transportation Systems

  • PCB-RandNet: Rethinking Random Sampling for LiDAR Semantic Segmentation in Autonomous Driving Scene

Localization and Navigation

  • 3D-BBS: Global Localization for 3D Point Cloud Scan Matching Using Branch-And-Bound Algorithm
  • DynaInsRemover: A Real-Time Dynamic Instance-Aware Static 3D LiDAR Mapping Framework for Dynamic Environment
  • Efficient 3D Instance Mapping and Localization with Neural Fields
  • MonoOcc: Digging into Monocular Semantic Occupancy Prediction

Object Detection and Pose Estimation

  • IFFNeRF:

    Initialisation Free and Fast 6DoF Pose Estimation from a Single Image and a NeRF Model

  • Toward Accurate Camera-Based 3D Object Detection Via Cascade Depth Estimation and Calibration
  • RGB-Based Category-Level Object Pose Estimation Via Decoupled Metric Scale Recovery

Vision Systems

  • Nvblox: GPU-Accelerated Incremental Signed Distance Field Mapping

Multi-Robot SLAM

  • Swarm-SLAM: Sparse Decentralized Collaborative Simultaneous Localization and Mapping Framework for Multi-Robot Systems
  • Asynchronous Multiple LiDAR-Inertial Odometry Using Point-Wise Inter-LiDAR Uncertainty Propagation
  • AutoMerge: A Framework for Map Assembling and Smoothing in City-Scale Environments

RGB-D Sensing and Perception

  • ConceptGraphs: Open-Vocabulary 3D Scene Graphs for Perception and Planning

Visual Learning

  • AnyOKP: One-Shot and Instance-Aware Object Keypoint Extraction with Pretrained ViT

Learning in Field Robotic

  • TartanDrive 2.0: More Modalities and Better Infrastructure to Further Self-Supervised Learning Research in Off-Road Driving Tasks

Aerial Systems

  • High-Speed Stereo Visual SLAM for Low-Powered Computing Devices