FusionPortable: A Multi-Sensor Campus-Scene Dataset for Evaluation of Localization and Mapping Accuracy on Diverse Platforms

Jianhao Jiao, Hexiang Wei, Tianshuai Hu, Xiangcheng Hu, Yilong Zhu, Zhijian He, Jin Wu,
Jingwen Yu, Xupeng Xie, Huaiyang Huang, Ruoyu Geng, Lujia Wang, Ming Liu
Equal contribution.The Hong Kong University of Science and Technology (Guangzhou), Nansha, Guangzhou, 511400, Guangdong, China.The Hong Kong University of Science and Technology, Hong Kong, China. jjiao@connect.ust.hk, eelium@ust.hk.HKUST Shenzhen-Hong Kong Collaborative Innovation Research Institute, Futian, Shenzhen, China.Clear Water Bay Institute of Autonomous Driving, Hong Kong, China.Department of Electronic and Electrical Engineering, Southern University of Science and Technology, Shenzhen, China.This work was supported by Zhongshan Science and Technology Bureau Fund, under project 2020AG002, Foshan-HKUST Project no. FSUST20-SHCIRI06C, and the Project of Hetao Shenzhen-Hong Kong Science and Technology Innovation Cooperation Zone(HZQB-KCZYB-2020083), awarded to Prof. Ming Liu.
Abstract

Combining multiple sensors enables a robot to maximize its perceptual awareness of environments and enhance its robustness to external disturbance, crucial to robotic navigation. This paper proposes the FusionPortable benchmark, a complete multi-sensor dataset with a diverse set of sequences for mobile robots. This paper presents three contributions. We first advance a portable and versatile multi-sensor suite that offers rich sensory measurements: 10Hz LiDAR point clouds, 20Hz stereo frame images, high-rate and asynchronous events from stereo event cameras, 200Hz inertial readings from an IMU, and 10Hz GPS signal. Sensors are already temporally synchronized in hardware. This device is lightweight, self-contained, and has plug-and-play support for mobile robots. Second, we construct a dataset by collecting 17 sequences that cover a variety of environments on the campus by exploiting multiple robot platforms for data collection. Some sequences are challenging to existing SLAM algorithms. Third, we provide ground truth for the decouple localization and mapping performance evaluation. We additionally evaluate state-of-the-art SLAM approaches and identify their limitations. The dataset, consisting of raw sensor measurements, ground truth, calibration data, and evaluated algorithms, will be released.

I Introduction

I-a Motivation

Multi-sensor fusion for robust perception is fundamental to various robotic applications. Different sensors can complement each other, and thus the system’s perception capability is enhanced with sensor fusion. Over the past decades, research on multi-sensor SLAM has made substantial progress. High-quality open datasets, which are collections of multi-sensor data and provide a suite of benchmark tools, significantly contribute to this advancement. On one hand, these datasets can waive inhibitive requirements on budget and workforce, such as system integration calibration and field operations. On the other hand, they investigate the advantages and limitations of current SLAM solutions and elaborately design practical, but challenging sequences [pomerleau2012challenging, wang2020tartanair]. Several of them also introduce novel sensors and indicate future research opportunities [mueggler2017event]. Researchers can easily develop, validate, and rank their algorithms with others, thus accelerating the breakthroughs. However, existing datasets were mostly collected with a single data collection platform or simplified sensor configuration. Researchers may only utilize limited sensors to develop algorithms that has a risk of over-fitting to a benchmark. Hence, we consider that a desirable dataset should fulfill the following four requirements.

  1. Various sensors are required, making it possible to explore novel approaches to utilize them jointly.

  2. Algorithm evaluation should be fairely conducted on various mobile robots. These robots perform different motion patterns that may challenge several SLAM algorithms’ assumptions.

  3. Sequences have to cover from room-scale (meter-level) to large-scale (kilometer-level) environments to evaluate algorithms’ scalability.

  4. Ground-truth trajectories and 3D maps are required to evaluate algorithms’ localization and surface reconstruction accuracy, respectively.

Dataset Platform Environment Sensor GT Pose GT Map
IMU GPS LiDAR Frame Cam. Event Cam.
UZH-Event [mueggler2017event] Handheld In/Outdoors Mocap
ETH-EuRoc [burri2016euroc] MAV Indoors Mocap/LT Nova MS50
TUM VI [schubert2018tum] Handheld In/Outdoors Mocap
MIT DARPA [huang2010high] Car Urban GPS/INS
KITTI [geiger2013vision] Car Urban RTK-GPS/INS
Oxford RobotCar [maddern20171] Car Urban GPS/INS/SLAM
UrbanLoc [wen2020urbanloco] Car Urban GPS/INS
Newer College [ramezani2020newer] Handheld Outdoors 6DoF ICP BLK
NCLT [carlevaris2016university] UGV In/Outdoors RTK-GPS/SLAM
M2DGR [yin2021m2dgr] UGV In/Outdoors RTK-GPS/Mocap/LT
MVSEC [zhu2018multivehicle] Handheld/UAV/Motorcycle/Car In/Outdoors Mocap/SLAM
Ours (FusionPortable) Handheld/Quad. Robot/UGV In/Outdoors Mocap/RTK-GPS/6DoF NDT BLK
Mocap: Motion capture system. LT: Laser tracker.
TABLE I: Comparison with previous datasets on data-acquisition platform, environment, sensor type, and ground-truth method.

I-B Contributions

There appears to be an absence of compatible public datasets that satisfy these requirements, motivating us to propose a new SLAM benchmark.

This paper proposes the FusionPortable benchmark, a novel multi-sensor dataset with a set of sequences from diverse environments. Our contributions are presented three-fold. First, a portable and versatile multi-sensor device is elaborately manufactured. Two RGB frame cameras are mounted on the left and right side, one high-frequency and high-precision IMU is mounted internally, and one RTK-GPS is installed on the top position. Moreover, thanks to current progress in sensory technology, both novel event cameras and high-resolution 3D LiDAR are available. Thus, we also integrate them with our sensor rig and investigate their performance. All these sensors are mounted on the same rigid aluminum-alloy-based parts. Thus, their spatial relation has a tiny dynamic deviation. The complete device has its own clock synchronization unit, processor, and battery, thus self-contained. Since its size, weight, and extensibility (see Fig. 1) are satisfying, we advance that it would be a plug-and-play support to various mobile robots.

Second, we install the sensor rig on various platforms ranging from the handheld mode with a gimbal stabilizer, a quadruped robot, and an autonomous vehicle in performing distinguishable motion for the dataset construction. Various structured or semi-structured environments on The Hong Kong University of Science and Technology (HKUST) campus, including the lab, garden, canteen, corridor, escalator, and outdoor road, are examined in the dataset. Also, the collected sequences present several environmental changes caused by external light, moving objects, and scene texture. These issues are challenging to SLAM algorithms.

Third, besides ground-truth poses, we also provide ground-truth maps of most indoor sequences. We consider that measuring the mapping accuracy is crucial for evaluation. We also benchmark several state-of-the-art (SOTA) SLAM systems, including two vision-based methods and four LiDAR-based approaches. To benefit the community, the dataset will be publicly released: https://ram-lab.com/file/site/multi-sensor-dataset.

Ii Related Work

There are extensive datasets for robotic perception. Here, we introduce related works with a focus on SLAM.

Several datasets were specifically designed for one type of sensor. Mueggler et al. [mueggler2017event] proposd the event camera dataset for the purpose of overcoming illumination and motion blur issues caused by frame cameras. Pomerleau et al. [pomerleau2012challenging] proposed the point cloud dataset that covers a large spectrum of environmental structures to challenge registration algorithms. Handa et al. [handa2014benchmark] promoted the research on RGB-D cameras by publishing the ICL-NUIM dataset.

Complementing vision sensors with inertial measurements, visual-inertial odometry (VIO) approaches can tremendously improve camera tracking accuracy and robustness. Relevant datasets have been reported. Burri et al. [burri2016euroc] presented the EuRoc dataset collected by a micro aerial vehicle (MAV) in an industrial environment and a room. Schubert et al. [schubert2018tum] put forward the TUM VI benchmark by collecting handheld sequences with a careful photometric calibration forwards.

The DARPA challenge has driven the development of autonomous vehicles. Huang et al. [huang2010high] presented the MIT DARPA dataset with over sequence. Geiger et al. [geiger2013vision] presented the KITTI driving benchmark where diverse perception tasks are explored. There are other datasets targeting at long-term navigation [barnes2020oxford] and urban challenges [wen2020urbanloco].

The multi-sensor device and data collection platform:
(a) CAD model of the sensor rig, where axis directions are colored: red:
The multi-sensor device and data collection platform:
(a) CAD model of the sensor rig, where axis directions are colored: red:
The multi-sensor device and data collection platform:
(a) CAD model of the sensor rig, where axis directions are colored: red:
The multi-sensor device and data collection platform:
(a) CAD model of the sensor rig, where axis directions are colored: red:
Fig. 1: The multi-sensor device and data collection platform: (a) CAD model of the sensor rig, where axis directions are colored: red: , green: , blue: . The sensor rig is rigidly mounted on (b) a gimbal stabilizer, (c) a quadruped robot, and (d) an apollo autonomous vehicle.

Several datasets were collected by handheld devices and other types of ground robots. Ramezani et al. [ramezani2020newer] collected the Newer College Dataset with a handheld device. The NCLT dataset [carlevaris2016university] facilitated the long-term SLAM research by collecting sequences in a college campus, over traverse and months. The M2DGR dataset covers various challenging scenarios such as entering lifts and indoor-outdoor traverse [yin2021m2dgr] with a ground robot. Zhu et al. [zhu2018multivehicle] proposed a multi-vehicle dataset for event-based perception.

Table I compares existing datasets with our work. In summary, our dataset is more complete from three aspects: 1) raw and rich sensory measurements; 2) data collection on three different platforms including a legged robot; 3) ground-truth trajectories and 3D maps for algorithm evaluation.

Iii System Overview

This section introduces sensors used in our dataset and how we achieve the spatio-temporal calibration between each sensor. Fig. 1 shows the handheld device equipped with multiple sensors and how it is mounted on three data collection platforms.

Iii-a Sensor Configuration

Sensors’ characteristics can be found in Table II. We use the Intel NUC to run sensor drivers, attach timestamps of sensor messages, and record messages into ROS bags on the Ubuntu system. The PC uses an i processor, TB solid-state drive (SSD), and GB DDR4 memory. Below, we provide detailed description of these sensors.

Iii-A1 3D LiDARs

We configure the OS- LiDAR to provide accurate measurements of surrounding environments. This LiDAR has two attractive properties. First, an internal synchronized IMU outputs Hz linear accelerations and angular velocities. Second, it additionally outputs depth images, signal images, and ambient images of surroundings.

Iii-A2 Stereo Frame Cameras

Two FILR BFS-U3-31S4C global-shutter color cameras are mounted at two sides on the system, facing directly forward. They are synchronized by an external trigger and capture high-resolution images at fps. Their exposure time is set as fixed values to minimize the relative latency. Our experiments show that the average difference in timestamps of these images is below .

Iii-A3 Stereo Event Cameras

Two event cameras are also configured. They possess several desirable properties: high temporal resolution, high dynamic range, and low power consumption. The cameras have a resolution and an internal high-rate IMU output. Event cameras are synchronized using the trigger signal generated from the left camera (master) to deliver sync pulses to the right (slave) through an external wire. But there is no way to synchronize the image acquisition (around - offset). To suppress the LiDAR’s laser light, both cameras are equipped with additional infrared filters. For indoor sequences, we manually set and fix the APS exposures, which helps to minimize the latency between cameras. For outdoor sequences, we use auto-exposure to avoid over- or under-exposure.

Scene Images of places of several sequences.
(a) Garden
Scene Images of places of several sequences.
(b) Building
Scene Images of places of several sequences.
(c) Campus Road
Fig. 2: Scene Images of places of several sequences.

Iii-A4 Inertial Measurement Unit

A tactical-grade STIM IMU that is rigidly mounted below the LiDAR is employed as the main inertial sensor of the system. It features a high update rate (Hz) and low noisy and drifting measurements. Its bias Instability is around .

Iii-A5 Global Positioning Systsem

We additionally install a ZED-F9P RTK-GPS device on the top of the LiDAR. In outdoor scenes, the GPS is activated and provides accurate latitude, longitude, and altitude readings. But it may sometimes become unstable due to buildings’ occlusion.

Iii-B Sensor Calibration

We carefully calibrate intrinsics of individual sensors, extrinsics, and overall time latency between sensors in advance. We define the coordinate system of the STIM IMU as the body frame. We provide calibration data and reports in the dataset website.

Iii-B1 Clock Synchronization

We use an FPGA to generate an external signal trigger to synchronize clocks of all sensors. This can guarantee data collection across multiple sensors with minimum latency. The FPGA receives a pulse-per-second (PPS) signal from the GPS and outputs Hz signal to the IMU, cameras, and LiDAR, respectively. The FPGA switches to use its internal clock to enable the time synchronization in GPS-denied scenes.

Sensor Characteristics
3D LiDAR
OS-, range@Hz; FOV: vert., horiz.
Image: @Hz
IMU: ICM@Hz, -axis MEMS, intrinsic calibrated
Frame
Camera
Stereo color cameras: FILR BFS-U-SC
Resolution: , global shutter@Hz
FOV: vert., horiz.
Event
Camera
Stereo color event cameras: DAVIS346
Resolution: ;   FOV: vert., horiz.
IMU: MPU@Hz, -axis MEMS, intrinsic calibrated
IMU
STIM@Hz, Bias Instability , Allan Var. @
GPS
ZED-F9P RTK-GPS@Hz, concurrent GNSS, L1/L2/L5 RTK
TABLE II: Sensors and characteristics

Iii-B2 Stereo Camera Calibration

Intrinsics and extrinsics of our stereo frame and event cameras are estimated using the Matlab calibration toolbox, where the pinhole camera and radial-tangential distortion model are used. We move the sensor suite before a checkerboard to collect a sequence of images. We evenly sample images as the calibration data and manually remove outliers with high reprojection errors.

Iii-B3 Camera-IMU Extrinsic Calibration

The intrinsics of IMUs are calibrated using the Allen derivation toolbox111https://github.com/ori-drs/allan_variance_ros that estimates the noisy density and random walk for gyroscope and accelerometer measurements. After that, the spatial and temporal parameters of a camera w.r.t. an IMU are obtained by the Kalibr [furgale2013unified]. Our system consists of IMUs: STIM, ICM in the LiDAR, and two MPU in the DAVIS346 event cameras. Thus, we calibrate the intrinsics of these IMUs, and estimate extrinsics of these sensor pairs: STIM, frame cameras, STIM, event cameras, left MPU, left DAVIS346, and right MPU, right DAVIS346.

Iii-B4 Camera-LiDAR Extrinsic Calibration

Given initial extrinsics, we further refine the camera-LiDAR extrinsics. The checkerboard is the calibration target that provides distinctive corners and boundaries for data association. We extend the work proposed by Zhou et al. [zhou2018automatic] by improving feature extraction and matching step. We instead extract the outer corners of the board from point clouds and images. The extrinsics are optimized by minimizing the distance of all corresponding corners.

Iv Dataset Description

Sample sensor measurements.
(a)-(d): images captured by the frame camera.
(e)-(f): images augmented by positive events (red) and negative events (blue).
(h)-(i): 3D point clouds of the LiDAR.
The grid size is
(a) Canteen
Sample sensor measurements.
(a)-(d): images captured by the frame camera.
(e)-(f): images augmented by positive events (red) and negative events (blue).
(h)-(i): 3D point clouds of the LiDAR.
The grid size is
(b) Escalator
Sample sensor measurements.
(a)-(d): images captured by the frame camera.
(e)-(f): images augmented by positive events (red) and negative events (blue).
(h)-(i): 3D point clouds of the LiDAR.
The grid size is
(c) Corridor
Sample sensor measurements.
(a)-(d): images captured by the frame camera.
(e)-(f): images augmented by positive events (red) and negative events (blue).
(h)-(i): 3D point clouds of the LiDAR.
The grid size is
(d) Road
Sample sensor measurements.
(a)-(d): images captured by the frame camera.
(e)-(f): images augmented by positive events (red) and negative events (blue).
(h)-(i): 3D point clouds of the LiDAR.
The grid size is
(e) MCR
Sample sensor measurements.
(a)-(d): images captured by the frame camera.
(e)-(f): images augmented by positive events (red) and negative events (blue).
(h)-(i): 3D point clouds of the LiDAR.
The grid size is
(f) Building
Sample sensor measurements.
(a)-(d): images captured by the frame camera.
(e)-(f): images augmented by positive events (red) and negative events (blue).
(h)-(i): 3D point clouds of the LiDAR.
The grid size is
(g) Road
Sample sensor measurements.
(a)-(d): images captured by the frame camera.
(e)-(f): images augmented by positive events (red) and negative events (blue).
(h)-(i): 3D point clouds of the LiDAR.
The grid size is
(h) Canteen
Sample sensor measurements.
(a)-(d): images captured by the frame camera.
(e)-(f): images augmented by positive events (red) and negative events (blue).
(h)-(i): 3D point clouds of the LiDAR.
The grid size is
(i) Road
Fig. 3: Sample sensor measurements. (a)-(d): images captured by the frame camera. (e)-(f): images augmented by positive events (red) and negative events (blue). (h)-(i): 3D point clouds of the LiDAR. The grid size is .

This section first introduces the overall features of different sequences, which stand as our basic criteria for data collection. Details are then described, including the ground truth estimation method and dataset format.

Iv-a Sequences

The collected sequences should cover various environments, lighting conditions, motion patterns, dynamic objects, etc. We categorize major characteristics of our collected sequences as follows:

  1. Location: Environmental locations are divided into indoors and outdoors. GPS signal is available but sometimes unstable in outdoor environments.

  2. Structure: Structured environments can mainly be explained using geometric primitives (e.g., offices or buildings), while semi-structured environments have both geometric and complex elements like trees and sundries. Scenarios like narrow corridors are structured but may cause state estimators.

  3. Lighting Condition: Frame cameras are sensitive to external lighting conditions. Both weak and strong light may raise challenges to visual processing algorithms.

  4. Appearance: Texture-rich scenes facilitate visual algorithms to extract stable features (e.g., points and lines), while textureless may negatively affect the performance. Also, many events are triggered in texture-rich scenes.

  5. Motion Pattern: Slow, normal, and fast motion may be performed. Regarding mounted platforms, the handheld device performs arbitrary 6-DoF and jerky motions, the device installed on a gimbal stabilizer conducts 6-DoF but stable motions, the quadruped robot mostly performs planar but jerky motions. In contrast, the vehicle performs planar movements at a constant speed.

  6. Object Motion: In dynamic environments, several elements are moving while the data are captured. The more time of the data capture, the more deformed the elements will be (e.g., pedestrians or cars) [pomerleau2012challenging]. In contrast, moving objects are few in static environments.

Platform Sequence T D Location Structure Lighting Texture Motion Object GT Pose GT Map
Handheld canteen_night indoors structured weak rich 6-DoF static 6-DoF NDT Yes
canteen_day indoors structured normal rich 6-DoF static 6-DoF NDT Yes
garden_night indoors structured weak rich 6-DoF static 6-DoF NDT Yes
garden_day indoors structured normal rich 6-DoF static 6-DoF NDT Yes
corridor_day indoors structured weak less 6-DoF static 6-DoF NDT Yes
escalator_day indoors structured strong rich 6-DoF, height changes dynamic 6-DoF NDT Yes
building_day indoors structured normal rich 6-DoF dynamic 6-DoF NDT Yes
MCR_slow indoors semi-structured normal rich 6-DoF, jerky static OptiTrack Yes
MCR_normal indoors semi-structured normal rich 6-DoF, jerky static OptiTrack Yes
MCR_fast indoors semi-structured normal rich 6-DoF, jerky static OptiTrack Yes
Quadruped Robot MCR_slow_ indoors semi-structured normal rich planar, jerky static OptiTrack Yes
MCR_slow_ indoors semi-structured normal rich planar, jerky static OptiTrack Yes
MCR_normal_ indoors semi-structured normal rich planar, jerky static OptiTrack Yes
MCR_normal_ indoors semi-structured normal rich planar, jerky static OptiTrack Yes
MCR_fast_ indoors semi-structured normal rich planar, jerky static OptiTrack Yes
MCR_fast_ indoors semi-structured normal rich planar, jerky static OptiTrack Yes
Apollo campus_road outdoors semi-structured normal rich planar dynamic SLAM No
T: Total time. D: Total distance traveled. MCR: motion capture room. : Mean linear velocity.
TABLE III: Some statistics and features of each sequence
Ground-truth point cloud in color of the motion capture room, corridor, and building scenario.
Point cloud data was recorded by the Leica BLK
(a) Motion Capture Room
Ground-truth point cloud in color of the motion capture room, corridor, and building scenario.
Point cloud data was recorded by the Leica BLK
(b) Building
Fig. 4: Ground-truth point cloud in color of the motion capture room, corridor, and building scenario. Point cloud data was recorded by the Leica BLK laser scanner. They are used to generate trajectory groundtruth and evaluate algorithms’ reconstruction accuracy.

Table III summaries key features of each sequence, Fig. 2 shows several scene pictures, and Fig. 3 illustrates sample sensor data. The motion capture room is abbreviated as the MCR in the following sensors.

Iv-B Groundtruth Generation

Most sequences provide ground-truth poses for algorithm evaluation. In several indoor scenes, we also provide ground-truth maps of surrounding environments. The ground truth generation is detailed as follows:

  • Ground-truth maps: In small- or middle-scale environments, we use the Leica BLK360 laser scanner to record the structure’s high-resolution colorized 3D dense map with millimeter accuracy from multiple locations. Fig. 4 visualizes three examples.

  • Ground-truth poses: In the motion capture room, we use the OptiTrack system to measure the pose of the center of reflective balls at Hz with millimeter accuracy. The OptiTrack is directly connected with the same PC to record poses to minimize the time latency. The extrinsics from the balls’ center to the body frame of the sensor rig are solved by the hand-eye calibration approach. In middle-scale environments that are covered by the ground-truth maps, we employ the NDT-based 6-DoF localization [koide2019portable] to estimate LiDAR’s poses in a prior map as the ground-truth trajectory. In outdoor environments, we fuse the RTK GPS signal with LiDAR-inertial measurements to obtain accuracy trajectories based on the LIO-SAM [shan2020lio].

Iv-C Data Format and Post-Processing

Data were collected in the ROS environment. We provide both ROS bags and individual data files for better usage:

  1. env.bag is the raw rosbag obtained from the data collection process. It can be parsed using ROS tools.

  2. env_ref.bag is the refined rosbag where sensor data are post-processed with below steps.

  3. data/ stores individual sensor data from the env.bag. Each data has its timestamp that can be retrieved from the timestamps.txt.

  4. data_ref_kitti/ follows the KITTI format [geiger2013vision] to store sensor data from data/.

We have three steps to post-process the raw data to generate the env_ref.bag: 1) caused by unperfect IMUs (like the MPU), several missing measurements are linearly interpolated; 2) poses provided by the motion capture system are transformed into the body frame with the hand-eye calibration results; and 3) event packages are republished at around Hz for several event-based algorithms [zhou2021event].

Unrectified RGB images are stored. Events are stored with timestamps, pixel locations, and polarity. IMU measurements are also stored with timestamps, gyroscope measurements, accelerometer measurements, and covariances. Calibration parameters are stored in yaml files.

V Experiment

As one of the applications, we can use this dataset to benchmark SOTA SLAM systems. Here, we evaluate several open-source systems with different sensor combinations and methodologies: VINS-Fusion (IMU+stereo frame cameras) [qin2019general], ESVO (stereo event cameras) [zhou2021event], A-LOAM (LiDAR-only) [zhang2014loam], LIO-Mapping (IMU+LiDAR) [ye2019tightly], LIO-SAM (IMU+LiDAR) [shan2020lio], and FAST-LIO2 (IMU+LiDAR) [xu2022fast]. Their data loaders are modified to fit our dataset format and also released. We calculate the mean absolute trajectory error (ATE) of estimated trajectories w.r.t. the ground truth. For LiDAR-based systems, we also report the mapping accuracy on two sequences by calculating the mean point-to-point error of algorithms’ maps w.r.t. the ground-truth maps.

Trajectories of the algorithms on four sequences:
MCR_fast_00, campus_road_day, garden_day, and escalator_day w.r.t. the ground truth.
(a) MCR_fast_00
Trajectories of the algorithms on four sequences:
MCR_fast_00, campus_road_day, garden_day, and escalator_day w.r.t. the ground truth.
(b) Campus_road_day
Trajectories of the algorithms on four sequences:
MCR_fast_00, campus_road_day, garden_day, and escalator_day w.r.t. the ground truth.
(c) Garden_day
Trajectories of the algorithms on four sequences:
MCR_fast_00, campus_road_day, garden_day, and escalator_day w.r.t. the ground truth.
(d) Escalator_day
Fig. 5: Trajectories of the algorithms on four sequences: MCR_fast_00, campus_road_day, garden_day, and escalator_day w.r.t. the ground truth.
Platform Sequence
VINS-
Fusion (LC)
A-
LOAM
LIO-
Mapping
LIO-
SAM
FAST-
LIO2
Handheld canteen_night
canteen_day
garden_night
garden_day
corridor_day
escalator_day
building_day
MCR_slow
MCR_normal
MCR_fast
Quad. Robot MCR_slow_
MCR_slow_
MCR_normal_
MCR_normal_
MCR_fast_
MCR_fast_
Apollo campus_road_day
TABLE IV: Localization accuracy.
Evaluation of (a) A-LOAM’s and (b) LIO-SAM’s mapping accuracy.
(a) Corridor_day
Evaluation of (a) A-LOAM’s and (b) LIO-SAM’s mapping accuracy.
(b) Garden_day
Fig. 6: Evaluation of (a) A-LOAM’s and (b) LIO-SAM’s mapping accuracy.

The quantitative localization results are reported in Table IV. “LC” indicates that the loop closure module is used. “” means that algorithms fail to finish the sequence. ESVO’s results are not shown here since it cannot finish all sequences. It requires events to be continuously triggered to generate reliable time surface maps for camera tracking. But all these sequences contain textureless scenarios or static motion. Its immediate results on mapping and tracking are shown in the dataset website. VINS-Fusion and FAST-LIO2 fail in some cases since they cannot initialize well at the beginning of the sequence. Without the aid of the IMU, A-LOAM cannot handle jerky and rapid motion and thus performs poorly on two MCR sequences and all sequences on the quadruped robot. Although FAST-LIO2 has a superior real-time performance based on the filter-based state estimator and efficient tree structure, it sometimes has unreliable results on several sequences. Surprisingly, LIO-SAM performs well on all quadruped robot-based sequences, even at large rotated and fast motion. The corridor_day sequence is challenging to all methods, where the scene is textureless and structureless.

We also evaluate the mapping quality of A-LOAM and LIO-SAM on the corridor_day and garden_day sequences. The distance map is in Fig. 6. The mean distance is and respectively. Especially for the corridor mapping, A-LOAM’s map has a large drift on the -axis.

Vi Conclusion

This paper presented the FusionPortable benchmark, a multi-sensor dataset from diverse campus scenes on various platforms. We advanced the self-contained and plug-and-play multi-sensor rig that significantly enhances the preception capability of mobile robots. With the release of this dataset, we intended to challenge current SLAM approaches and encouraged future research. As the future work, we plan to extend this dataset beyond the campus-scale environments.

References