In this article, the Laserscanner Multi-Fisheye Camera Dataset (LaFiDa) for applying benchmarks is presented. A head-mounted multi-fisheye camera system combined with a mobile laserscanner was utilized to capture the benchmark datasets. Besides this, accurate six degrees of freedom (6 DoF) ground truth poses were obtained from a motion capture system with a sampling rate of 360 Hz. Multiple sequences were recorded in an indoor and outdoor environment, comprising different motion characteristics, lighting conditions, and scene dynamics. The provided sequences consist of images from three-by hardware trigger-fully synchronized fisheye cameras combined with a mobile laserscanner on the same platform. In total, six trajectories are provided. Each trajectory also comprises intrinsic and extrinsic calibration parameters and related measurements for all sensors. Furthermore, we generalize the most common toolbox for an extrinsic laserscanner to camera calibration to work with arbitrary central cameras, such as omnidirectional or fisheye projections. The benchmark dataset is available online released under the Creative Commons Attributions Licence (CC-BY 4.0), and it contains raw sensor data and specifications like timestamps, calibration, and evaluation scripts. The provided dataset can be used for multi-fisheye camera and/or laserscanner simultaneous localization and mapping (SLAM).
translated by 谷歌翻译
In this paper, we present a novel benchmark for the evaluation of RGB-D SLAM systems. We recorded a large set of image sequences from a Microsoft Kinect with highly accurate and time-synchronized ground truth camera poses from a motion capture system. The sequences contain both the color and depth images in full sensor resolution (640 × 480) at video frame rate (30 Hz). The ground-truth trajectory was obtained from a motion-capture system with eight high-speed tracking cameras (100 Hz). The dataset consists of 39 sequences that were recorded in an office environment and an industrial hall. The dataset covers a large variety of scenes and camera motions. We provide sequences for debugging with slow motions as well as longer trajectories with and without loop closures. Most sequences were recorded from a handheld Kinect with unconstrained 6-DOF motions but we also provide sequences from a Kinect mounted on a Pioneer 3 robot that was manually navigated through a cluttered indoor environment. To stimulate the comparison of different approaches, we provide automatic evaluation tools both for the evaluation of drift of visual odometry systems and the global pose error of SLAM systems. The benchmark website [1] contains all data, detailed descriptions of the scenes, specifications of the data formats, sample code, and evaluation tools.
translated by 谷歌翻译
行人惯性惯性测量法缺乏现实和开放的基准数据集,因此很难确定已发布方法的差异。现有数据集要么缺乏完整的六自由度事实,要么仅限于具有光学跟踪系统的小空间。 Wetake在纯惯性导航方面的优势,并开发了一套用于视觉惯性测距的多功能和具有挑战性的真实计算机视觉基准集。为此,我们建立了一个配备iPhone,Google Pixel Android手机和Google Tango设备的测试台。我们提供各种原始传感器数据,几乎可以在任何现代智能手机上使用,并提供高质量的地面实况轨道。我们还通过学术论坛上发布的两种最新方法,比较了Google Tango,ARCore和AppleARKit的视觉惯性轨迹。数据集涵盖了室内和室外的箱子,包括楼梯,自动扶梯,电梯,办公室环境,购物中心和地铁站。
translated by 谷歌翻译
New vision sensors, such as the Dynamic and Active-pixel Vision sensor (DAVIS), incorporate a conventional global-shutter camera and an event-based sensor in the same pixel array. These sensors have great potential for high-speed robotics and computer vision because they allow us to combine the benefits of conventional cameras with those of event-based sensors: low latency, high temporal resolution, and very high dynamic range. However, new algorithms are required to exploit the sensor characteristics and cope with its unconventional output, which consists of a stream of asynchronous brightness changes (called "events") and synchronous grayscale frames. For this purpose, we present and release a collection of datasets captured with a DAVIS in a variety of synthetic and real environments, which we hope will motivate research on new algorithms for high-speed and high-dynamic-range robotics and computer-vision applications. In addition to global-shutter intensity images and asynchronous events, we provide inertial measurements and ground-truth camera poses from a motion-capture system. The latter allows comparing the pose accuracy of ego-motion estimation algorithms quantitatively. All the data are released both as standard text files and binary files (i.e., rosbag). This paper provides an overview of the available data and describes a simulator that we release open-source to create synthetic event-camera data. Keywords Event-based cameras, visual odometry, SLAM, simulation Dataset Website All datasets and the simulator can be found on the web:
translated by 谷歌翻译
我们提出了一个新的数据集来评估单眼,立体和全光摄像头视觉测距算法。该数据集包括由基于微透镜阵列(MLA)的全光相机和立体相机系统记录的一组同步图像序列。为此,立体摄像机和全光相机组装在一个共同的手持平台上。所有序列都以非常大的循环记录,其中开始和结束显示相同的场景。因此,视觉测距算法的跟踪精度可以从序列的开始和结束之间的漂移来测量。对于全光相机和立体声系统,我们提供全内置相机模型以及渐晕数据。该数据集由11个序列组成,这些序列记录在具有挑战性的室内和室外场景中。作为示例,我们提出了通过最先进的算法实现的结果。
translated by 谷歌翻译
我们提出了一种新颖的实时直接单目视觉测距仪,用于单向摄像机。我们的方法通过使用统一的全方位模型作为投影函数来扩展直接稀疏测距(DSO),可以应用于具有远大于180度的视场(FoV)的鱼眼摄像机。这种公式允许使用整个区域。输入图像甚至具有强烈的失真,而大多数现有的视觉测距方法只能使用其成像和裁剪部分。有效关键帧窗口内的模型参数被联合优化,包括内在/外在相机参数,点的3D位置和仿射亮度参数。由于宽的FoV,帧之间的图像重叠变得更大并且点更具空间分布。我们的结果表明,我们的方法提供了比最先进的视觉odometrogngorithms更高的准确性和鲁棒性。
translated by 谷歌翻译
Direct methods for Visual Odometry (VO) have gained popularity due to their capability to exploit information from all intensity gradients in the image. However, low computational speed as well as missing guarantees for optimality and consistency are limiting factors of direct methods, where established feature-based methods instead succeed at. Based on these considerations, we propose a Semi-direct VO (SVO) that uses direct methods to track and triangulate pixels that are characterized by high image gradients but relies on proven feature-based methods for joint optimization of structure and motion. Together with a robust probabilistic depth estimation algorithm, this enables us to efficiently track pixels lying on weak corners and edges in environments with little or high-frequency texture. We further demonstrate that the algorithm can easily be extended to multiple cameras, to track edges, to include motion priors, and to enable the use of very large field of view cameras, such as fisheye and catadioptric ones. Experimental evaluation on benchmark datasets shows that the algorithm is significantly faster than the state of the art while achieving highly competitive accuracy.
translated by 谷歌翻译
本文介绍了一种从avisual-inertial-pressure采集系统中获取的新型水下数据集,旨在用于基准视觉测距,视觉SLAM和多传感器SLAM解决方案。该数据集可公开获得,并包含用于评估的地面实况轨迹。
translated by 谷歌翻译
如今,轨道车辆本地化基于基础设施侧的Balises(信标)以及车载里程,以确定铁路段是否被占用。这种粗略锁定导致铁路网络的次优使用。新的铁路标准提出使用以铁路车辆为中心的移动块来增加网络的容量。然而,这种方法需要对所有车辆进行准确而稳健的位置和速度估算。在这项工作中,我们研究了当前视觉和视觉惯性运动估计框架对铁路应用的适用性,挑战和局限性。在工业,郊区和环境中记录的多个数据集中,对RTK-GPS地面实况进行了评估。我们的研究结果表明,立体视觉惯性测量法具有很大的潜力,可以提供精确的运动估计,因为它具有补充传感器模态,并且在与其他框架相比具有挑战性的情况下表现出优越的性能。
translated by 谷歌翻译
忽视滚动快门相机对视觉里程(VO)的影响会严重降低准确性和稳健性。在本文中,我们提出了一种包含滚动快门模型的noveldirect单眼VO方法。 Ourapproach扩展了直接稀疏测距,可以对一组最近的关键帧姿势和稀疏的图像点集的深度进行直接束调整。我们估计每个关键帧的速度,并在优化之前施加aconstant-velocity。通过这种方式,我们获得了近实时,准确的直接VO方法。我们的方法通过最先进的全局快门VO实现了具有挑战性的滚动快门序列的改进结果。
translated by 谷歌翻译
A monocular visual-inertial system (VINS), consisting of a camera and a low-cost inertial measurement unit (IMU), forms the minimum sensor suite for metric six degrees-of-freedom (DOF) state estimation. However, the lack of direct distance measurement poses significant challenges in terms of IMU processing, estimator initialization, extrinsic calibration, and nonlinear optimization. In this work, we present VINS-Mono: a robust and versatile monocular visual-inertial state estimator. Our approach starts with a robust procedure for estimator initialization and failure recovery. A tightly-coupled, nonlinear optimization-based method is used to obtain high accuracy visual-inertial odometry by fusing pre-integrated IMU measurements and feature observations. A loop detection module, in combination with our tightly-coupled formulation, enables relocalization with minimum computation overhead. We additionally perform four degrees-of-freedom pose graph optimization to enforce global consistency. We validate the performance of our system on public datasets and real-world experiments and compare against other state-of-the-art algorithms. We also perform onboard closed-loop autonomous flight on the MAV platform and port the algorithm to an iOS-based demonstration. We highlight that the proposed work is a reliable, complete, and versatile system that is applicable for different applications that require high accuracy localization. We open source our implementations for both PCs 1 and iOS mobile devices 2 .
translated by 谷歌翻译
我们提出了一种新颖的直接稀疏视觉测距配方。它结合了直接的概率模型(最小化光度误差)与所有模型参数的一致,联合优化,包括几何 - 代表参考帧中的反深度 - 和相机运动。通过省略先前在其他直接方法中使用的平滑度并且在整个图像中均匀地采样像素,实时地实现了这一点。由于我们的方法不依赖于关键点检测器或描述符,因此它可以自然地对来自具有强度梯度的所有图像区域的像素进行采样,包括在大多数白色墙壁上的边缘或平滑强度变化。所提出的模型集成了完整的光度校准,计算前曝光时间,镜头渐晕和非线性响应功能。在三个不同的数据集上,包括几个小时的视频,对我们的方法进行了充分的评估。实验表明,在跟踪精度和鲁棒性方面,所提出的方法在各种实际环境中显着地优于现有技术的直接和间接方法。
translated by 谷歌翻译
We propose a novel Large-Scale Direct SLAM algorithm for stereo cameras (Stereo LSD-SLAM) that runs in real-time at high frame rate on standard CPUs. In contrast to sparse interest-point based methods, our approach aligns images directly based on the photoconsistency of all high-contrast pixels, including corners, edges and high texture areas. It concurrently estimates the depth at these pixels from two types of stereo cues: Static stereo through the fixed-baseline stereo camera setup as well as temporal multi-view stereo exploiting the camera motion. By incorporating both disparity sources, our algorithm can even estimate depth of pixels that are under-constrained when only using fixed-baseline stereo. Using a fixed baseline, on the other hand, avoids scale-drift that typically occurs in pure monocular SLAM. We furthermore propose a robust approach to enforce illumination invariance, capable of handling aggressive brightness changes between frames-greatly improving the performance in realistic settings. In experiments, we demonstrate state-of-the-art results on stereo SLAM benchmarks such as Kitti or challenging datasets from the EuRoC Challenge 3 for micro aerial vehicles.
translated by 谷歌翻译
数据集通过提出具有挑战性的新问题并提供标准化的算法比较方法来推进研究。存在针对机器人和计算机视觉中的许多重要问题的高质量数据集,包括运动估计和运动/场景分割,但不适用于估计场景中的运动的技术。这些多运动估计技术的度量评估需要由多个复杂运动组成的数据集,这些运动也可以为每个运动物体提供基础事实。 Oxford Multimotion数据集提供了许多不同复杂度的多运动估计问题。它既包括挑战现有算法的复杂问题,也包括许多支持开发的简单问题。这些包括来自静态和动态传感器的观察,不同数量的运动物体以及各种不同的3D运动。它还提供了许多实验,旨在隔离多运动问题的特定挑战,包括关于视神经和闭塞的旋转。总之,Oxford Multimotion数据集包含超过110分钟的多动态数据,包括立体声和RGB-D摄像机图像,IMU数据和Vicon地面实况轨迹。该数据集最终形成了一个代表许多具有挑战性的现实世界场景的复杂玩具车。本文描述了每个实验,重点是它与多运动估计问题的相关性。
translated by 谷歌翻译
We provide a large dataset containing RGB-D image sequences and the ground-truth camera trajectories with the goal to establish a benchmark for the evaluation of visual SLAM systems. Our dataset contains the color and depth images of a Microsoft Kinect sensor and the ground-truth trajectory of camera poses. The data was recorded at full frame rate (30 Hz) and sensor resolution (640x480). The ground-truth trajectory was obtained from a high-accuracy motion-capture system with eight high-speed tracking cameras (100 Hz). Further, we provide the accelerometer data from the Kinect. Finally, we propose an evaluation criterion for measuring the quality of the estimated camera trajectory of visual SLAM systems.
translated by 谷歌翻译
事件相机是生物启发的视觉传感器,输出像素级亮度变化而不是标准强度帧。与标准相机相比,它们具有显着的优势,即非常高的动态范围,无运动模糊,以及微秒级的延迟。然而,由于传感器输出的结构基本不同,新算法可以利用所需的高时间分辨率和异步性质。最近的工作表明,事件相机姿势的连续时间表示可以以原理方式处理该传感器的高时间分辨率和同步性质。在本文中,我们利用事件相机对这种连续时间表示进行视觉惯性测量。该表示允许以微秒精度直接集成异步事件并且以高频率直接集成惯性测量。事件摄像机轨迹通过使用三次样条的刚体运动空间中的平滑曲线近似。该配方显着减少了轨迹估计问题中的变量数量。我们从几个场景中评估我们的实际数据方法,并将结果与​​运动捕捉系统中的地面实况进行比较。 Weshow我们的方法提供了比事件相机的最先进的视觉测距方法的结果更高的准确性。我们还表明,通过融合事件和惯性数据,可以准确地恢复地图方向和比例。据我们所知,这是第一个使用连续时间框架与事件相机进行视觉惯性融合的工作。
translated by 谷歌翻译
诸如冲击和温度变化之类的外部影响会影响视觉惯性传感器系统的校准,因此它们无法完全依赖工厂校准。由用户收集的短数据集执行的重新校准可能会产生较差的性能,因为某些参数的可观察性高度依赖于运动。此外,在资源受限的系统(例如移动电话)上,完全批量处理过长的会话很快变得非常昂贵。在本文中,我们通过引入信息理论度量来评估轨迹分段的信息内容来处理自校准问题,从而允许从数据集中选择信息最丰富的部分用于校准目的。通过这种方法,我们能够构建紧凑的校准数据集:(a)通过选择具有有限激动的长会话的分段或(b)从多个短会话中,其中单个会话不一定足以激发所有模式。四个不同环境中的真实实验表明,所提出的方法与批量校准方法具有相当的性能,但是,在与这些持续时间无关的恒定计算复杂度上。
translated by 谷歌翻译
准确的状态估计是自主机器人的基本问题。为了实现局部精确和全局无漂移状态估计,具有互补特性的多传感器通常融合在一起。本地传感器(摄像机,IMU,激光雷达等)在一个小区域内提供精确的姿势,而全球传感器(GPS,磁力计,气压计等)在大规模环境中提供嘈杂但无全局无差别的定位。在本文中,我们提出了一种传感器融合框架,用于将局部状态与全局传感器融合,从而实现局部精确和全局无漂移姿态估计。由现有VO / VIO方法产生的Locales估计在姿势图优化中与全局传感器融合。在图优化中,localestimations被对齐到全局坐标。同时,积累的漂移被消除。我们在公共数据集和实际实验中评估我们系统的性能。将结果与其他最先进的算法进行比较。我们强调我们的系统是一个通用框架,可以在统一的姿势图优化中轻松融合各种全局传感器。我们的实现是opensource \ footnote {https://github.com/HKUST-Aerial-Robotics/VINS-Fusion}。
translated by 谷歌翻译
In this paper, we present a monocular visual-inertial odometry algorithm which, by directly using pixel intensity errors of image patches, achieves accurate tracking performance while exhibiting a very high level of robustness. After detection, the tracking of the multilevel patch features is closely coupled to the underlying extended Kalman filter (EKF) by directly using the intensity errors as innovation term during the update step. We follow a purely robocentric approach where the location of 3D landmarks are always estimated with respect to the current camera pose. Furthermore, we decompose landmark positions into a bearing vector and a distance parametrization whereby we employ a minimal representation of differences on a corresponding σ-Algebra in order to achieve better consistency and to improve the computational performance. Due to the robocentric, inverse-distance landmark parametrization, the framework does not require any initialization procedure, leading to a truly power-up-and-go state estimation system. The presented approach is successfully evaluated in a set of highly dynamic hand-held experiments as well as directly employed in the control loop of a multirotor unmanned aerial vehicle (UAV).
translated by 谷歌翻译
We propose Stereo Direct Sparse Odometry (Stereo DSO) as a novel method for highly accurate real-time visual odometry estimation of large-scale environments from stereo cameras. It jointly optimizes for all the model parameters within the active window, including the intrin-sic/extrinsic camera parameters of all keyframes and the depth values of all selected pixels. In particular, we propose a novel approach to integrate constraints from static stereo into the bundle adjustment pipeline of temporal multi-view stereo. Real-time optimization is realized by sampling pix-els uniformly from image regions with sufficient intensity gradient. Fixed-baseline stereo resolves scale drift. It also reduces the sensitivities to large optical flow and to rolling shutter effect which are known shortcomings of direct image alignment methods. Quantitative evaluation demonstrates that the proposed Stereo DSO outperforms existing state-of-the-art visual odometry methods both in terms of tracking accuracy and robustness. Moreover, our method delivers a more precise metric 3D reconstruction than previous dense/semi-dense direct approaches while providing a higher reconstruction density than feature-based methods.
translated by 谷歌翻译