ICRA 2024 papers for SLAM / Spatial AI researchers and engineers
2D/3D Visual Perception
- Vision-Language Frontier Maps for Zero-Shot Semantic Navigation
- LPS-Net: Lightweight Parameter-Shared Network for Point Cloud-Based Place Recognition
- ZS6D: Zero-Shot 6D Object Pose Estimation Using Vision Transformers
Deep Learning for Visual Perception
- Energy-Based Detection of Adverse Weather Effects in LiDAR Data
- Fast and Robust Point Cloud Registration with Tree-Based Transformer
- VOLoc: Visual Place Recognition by Querying Compressed Lidar Map
- Advancements in 3D Lane Detection Using LiDAR Point Clouds: From Data Collection to Model Development
Visual-Inertial SLAM
- Field-VIO: Stereo Visual-Inertial Odometry Based on Quantitative Windows in Agricultural Open Fields
- JacobiGPU: GPU-Accelerated Numerical Differentiation for Loop Closure in Visual SLAM
Localization
- Salience-Guided Ground Factor for Robust Localization of Delivery Robots in Complex Urban Environments
- Block-Map-Based Localization in Large-Scale Environment
- Colmap-PCD: An Open-Source Tool for Fine Image-To-Point Cloud Registration
- COIN-LIO: Complementary Intensity-Augmented LiDAR Inertial Odometry
- MegaParticles: Range-Based 6-DoF Monte Carlo Localization with GPU-Accelerated Stein Particle Filter
- Tightly Coupled Range Inertial Localization on a 3D Prior Map Based on Sliding Window Factor Graph Optimization
- SPOT: Point Cloud Based Stereo Visual Place Recognition for Similar and Opposing Viewpoints
- Adaptive Outlier Thresholding for Bundle Adjustment in Visual SLAM
- WayIL: Image-Based Indoor Localization with Wayfinding Maps
- LocNDF: Neural Distance Field Mapping for Robot Localization
- Looking beneath More: A Sequence-Based Localizing Ground Penetrating Radar Framework
- SAGE-ICP: Semantic Information-Assisted ICP
- HR-APR: APR-Agnostic Framework with Uncertainty Estimation and Hierarchical Refinement for Camera Relocalisation
- Visual Localization in Repetitive and Symmetric Indoor Parking Lots Using 3D Key Text Graph
- AnyLoc: Towards Universal Visual Place Recognition
- CLIP-Loc: Multi-Modal Landmark Association for Global Localization in Object-Based Maps
- LeagTag: An Elongated High-Accuracy Fiducial Marker for Tight Spaces
Mapping
- ERASOR++: Height Coding Plus Egocentric Ratio Based Dynamic Object Removal for Static Point Cloud Mapping
- Uncertainty-Aware 3D Object-Level Mapping with Deep Shape Priors
- H2-Mapping: Real-Time Dense Mapping Using Hierarchical Hybrid Representation
- Fast and Robust Normal Estimation for Sparse LiDAR Scans
SLAM
- KDD-LOAM: Jointly Learned Keypoint Detector and Descriptors Assisted LiDAR Odometry and Mapping
- Campus Map: A Large-Scale Dataset to Support Multi-View VO, SLAM and BEV Estimation
- CURL-MAP: Continuous Mapping and Positioning with CURL Representation
- Design and Evaluation of a Generic Visual SLAM Framework for Multi Camera Systems
- HERO-SLAM: Hybrid Enhanced Robust Optimization of Neural SLAM
- Accurate Loop Closure with Panoptic Information and Scan Context++ for LiDAR-Based SLAM
- S-Graphs+: Real-Time Localization and Mapping Leveraging Hierarchical Representations
- Visual Place Recognition: A Tutorial
- Semantically Guided Feature Matching for Visual SLAM
- Effectively Detecting Loop Closures Using Point Cloud Density Maps
- VOOM: Robust Visual Object Odometry and Mapping Using Hierarchical Landmarks
- LIO-EKF: High Frequency LiDAR-Inertial Odometry Using Extended Kalman Filters
- Multi-LIO: A Lightweight Multiple LiDAR-Inertial Odometry System
- The Importance of Coordinate Frames in Dynamic SLAM
- VoxelMap++: Mergeable Voxel Mapping Method for Online LiDAR(-Inertial) Odometry
- Efficient and Consistent Bundle Adjustment on Lidar Point Clouds
- ImMesh: An Immediate LiDAR Localization and Meshing Framework
- DORF: A Dynamic Object Removal Framework for Robust Static LiDAR Mapping in Urban Environments
- 2D-3D Object Shape Alignment for Camera-Object Pose Compensation in Object-Visual SLAM
- HPF-SLAM: An Efficient Visual SLAM System Leveraging Hybrid Point Features
- IBoW3D: Place Recognition Based on Incremental and General Bag of Words in 3D Scans
- Language-EXtended Indoor SLAM (LEXIS): A Versatile System for Real-Time Visual Scene Understanding
- Tightly-Coupled LiDAR-Visual-Inertial SLAM and Large-Scale Volumetric-Occupancy Mapping
Manipulation
- Open X-Embodiment: Robotic Learning Datasets and RT-X Models
- SARA-RT: Scaling up Robotics Transformers with Self-Adaptive Robust Attention
- Robi Butler: Multimodal Remote Interaction with Household Robotic Assistants
- OK-Robot: What Really Matters in Integrating Open-Knowledge Models for
Robotics - Scaling Robot Policy Learning via Zero-Shot Labeling with Foundation Models
- What Do We Learn from a Large-Scale Study of Pre-Trained Visual Representations in Sim and Real Environments?
Localization and Mapping
- VBR: A Vision Benchmark in Rome
- Spatial-Aware Dynamic Lightweight Self-Supervised Monocular Depth Estimation
- NISB-Map: Scalable Mapping with Neural Implicit Spatial Block
- On the Study of Data Augmentation for Visual Place Recognition
- MBFusion: A New Multi-Modal BEV Feature Fusion Method for HD Map Construction
- OCC-VO: Dense Mapping Via 3D Occupancy-Based Visual Odometry for Autonomous Driving
- NF-Atlas: Multi-Volume Neural Feature Fields for Large Scale LiDAR Mapping
- LESS-Map: Lightweight and Evolving Semantic Map in Parking Lots for Long-Term Self-Localization
- RO-MAP: Real-Time Multi-Object Mapping with Neural Radiance Fields
- OctoMap-RT: Fast Probabilistic Volumetric Mapping Using Ray-Tracing GPUs
Robot Vision
- NGEL-SLAM: Neural Implicit Representation-Based Global Consistent Low-Latency SLAM System
- Ultrafast Square-Root Filter-Based VINS
- Lifting 2D Pretrained Knowledge to 3D for Object Grounding
Representation Learning
- NeRF-Loc: Transformer-Based Object Localization within Neural Radiance Fields
- NeRF-Loc: Transformer-Based Object Localization within Neural Radiance Fields
Calibration
- Zero-Training LiDAR-Camera Extrinsic Calibration Method Using Segment Anything Model
- Online Camera-LiDAR Calibration Monitoring and Rotational Drift Tracking
- A Novel, Efficient and Accurate Method for Lidar Camera Calibration
- An Extrinsic Calibration Method between LiDAR and GNSS/INS for Autonomous Driving
- SGCalib: A Two-Stage Camera-LiDAR Calibration Method Using Semantic Information and Geometric Features
- LiDAR-Camera Extrinsic Calibration with Hierachical and Iterative Feature Matching
- CalibFormer: A Transformer-Based Automatic LiDAR-Camera Calibration Network
Sensor Fusion
- Influence of Camera-LiDAR Configuration on 3D Object Detection for Autonomous Driving
- SuperFusion: Multilevel LiDAR-Camera Fusion for Long-Range HD Map Generation
- GICI-LIB: A GNSS/INS/Camera Integrated Navigation Library
- LiDAR-Camera Calibration Using Intensity Variance Cost
Visual Perception and Learning
- Bag of Views: An Appearance-Based Approach to Next-Best-View Planning for 3D Reconstruction
- CopperTag: A Real-Time Occlusion-Resilient Fiducial Marker
- Marrying NeRF with Feature Matching for One-Step Pose Estimation
Semantic Scene Understanding
- Mapping High-Level Semantic Regions in Indoor Environments without Object Recognition
- LLM-Grounder: Open-Vocabulary 3D Visual Grounding with Large Language Model As an Agent
- Few-Shot Panoptic Segmentation with Foundation Models
- Open-Fusion: Real-Time Open-Vocabulary 3D Mapping and Queryable Scene Representation
- Mask4Former: Mask Transformer for 4D Panoptic Segmentation
- Mask4D: End-To-End Mask-Based 4D Panoptic Segmentation for LiDAR Sequences
- SG-RoadSeg: End-To-End Collision-Free Space Detection Sharing Encoder Representations Jointly Learned Via Unsupervised Deep Stereo
- Collaborative Dynamic 3D Scene Graphs for Automated Driving
- BroadBEV: Collaborative LiDAR-Camera Fusion for Broad-Sighted Bird’s Eye View Map Construction
Object Detection
- Road Obstacle Detection Based on Unknown Objectness Scores
- Efficient Semantic Segmentation for Compressed Video
- Cross-Cluster Shifting for Efficient and Effective 3D Object Detection in Autonomous Driving
- Dynablox: Real-Time Detection of Diverse Dynamic Objects in Complex Environments
- 3D Object Detection with VI-SLAM Point Clouds: The Impact of Object and Environment Characteristics on Model Performance
- Uplifting Range-View-Based 3D Semantic Segmentation in Real-Time with Multi-Sensor Fusion
Robotics with Large Language Models
- Grasp-Anything: Large-Scale Grasp Dataset from Foundation Models
Autonomous Vehicle Navigation
- Driving with LLMs: Fusing Object-Level Vector Modality for Explainable Autonomous Driving
- Monocular Localization with Semantics Map for Autonomous Vehicles
- Talk2BEV: Language-Enhanced Bird’s-Eye View Maps for Autonomous Driving
- FastOcc: Accelerating 3D Occupancy Prediction by Fusing the 2D Bird’s-Eye View and Perspective View
Intelligent Transportation Systems
- PCB-RandNet: Rethinking Random Sampling for LiDAR Semantic Segmentation in Autonomous Driving Scene
Localization and Navigation
- 3D-BBS: Global Localization for 3D Point Cloud Scan Matching Using Branch-And-Bound Algorithm
- DynaInsRemover: A Real-Time Dynamic Instance-Aware Static 3D LiDAR Mapping Framework for Dynamic Environment
- Efficient 3D Instance Mapping and Localization with Neural Fields
- MonoOcc: Digging into Monocular Semantic Occupancy Prediction
Object Detection and Pose Estimation
IFFNeRF:
Initialisation Free and Fast 6DoF Pose Estimation from a Single Image and a NeRF Model
- Toward Accurate Camera-Based 3D Object Detection Via Cascade Depth Estimation and Calibration
- RGB-Based Category-Level Object Pose Estimation Via Decoupled Metric Scale Recovery
Vision Systems
- Nvblox: GPU-Accelerated Incremental Signed Distance Field Mapping
Multi-Robot SLAM
- Swarm-SLAM: Sparse Decentralized Collaborative Simultaneous Localization and Mapping Framework for Multi-Robot Systems
- Asynchronous Multiple LiDAR-Inertial Odometry Using Point-Wise Inter-LiDAR Uncertainty Propagation
- AutoMerge: A Framework for Map Assembling and Smoothing in City-Scale Environments
RGB-D Sensing and Perception
- ConceptGraphs: Open-Vocabulary 3D Scene Graphs for Perception and Planning
Visual Learning
- AnyOKP: One-Shot and Instance-Aware Object Keypoint Extraction with Pretrained ViT
Learning in Field Robotic
- TartanDrive 2.0: More Modalities and Better Infrastructure to Further Self-Supervised Learning Research in Off-Road Driving Tasks
Aerial Systems
- High-Speed Stereo Visual SLAM for Low-Powered Computing Devices