视觉位置识别(VPR)不仅对于自动驾驶车辆的定位和映射至关重要,而且对于视力受损的人群的辅助导航至关重要。为了大规模启用长期VPR系统,需要解决一些挑战。首先,不同的应用程序可能需要不同的图像视图方向,例如自动驾驶汽车的前视图,而低视力人的侧视图。其次,由于行人和车辆身份信息的成像,大都市场景中的VPR通常会引起隐私问题,呼吁在VPR查询和数据库构建之前需要数据匿名化。这两个因素都可能导致VPR性能变化,而尚未得到很好的理解。 To study their influences, we present the NYU-VPR dataset that contains more than 200,000 images over a 2km by 2km area near the New York University campus, taken within the whole year of 2016. We present benchmark results on several popular VPR algorithms showing that对于当前的VPR方法,侧视观点明显更具挑战性,而数据匿名的影响几乎可以忽略不计,以及我们的假设解释和深入的分析。
translated by 谷歌翻译
地理定位的概念是指确定地球上的某些“实体”的位置的过程,通常使用全球定位系统(GPS)坐标。感兴趣的实体可以是图像,图像序列,视频,卫星图像,甚至图像中可见的物体。由于GPS标记媒体的大规模数据集由于智能手机和互联网而迅速变得可用,而深入学习已经上升以提高机器学习模型的性能能力,因此由于其显着影响而出现了视觉和对象地理定位的领域广泛的应用,如增强现实,机器人,自驾驶车辆,道路维护和3D重建。本文提供了对涉及图像的地理定位的全面调查,其涉及从捕获图像(图像地理定位)或图像内的地理定位对象(对象地理定位)的地理定位的综合调查。我们将提供深入的研究,包括流行算法的摘要,对所提出的数据集的描述以及性能结果的分析来说明每个字段的当前状态。
translated by 谷歌翻译
In this paper, we present a novel visual SLAM and long-term localization benchmark for autonomous driving in challenging conditions based on the large-scale 4Seasons dataset. The proposed benchmark provides drastic appearance variations caused by seasonal changes and diverse weather and illumination conditions. While significant progress has been made in advancing visual SLAM on small-scale datasets with similar conditions, there is still a lack of unified benchmarks representative of real-world scenarios for autonomous driving. We introduce a new unified benchmark for jointly evaluating visual odometry, global place recognition, and map-based visual localization performance which is crucial to successfully enable autonomous driving in any condition. The data has been collected for more than one year, resulting in more than 300 km of recordings in nine different environments ranging from a multi-level parking garage to urban (including tunnels) to countryside and highway. We provide globally consistent reference poses with up to centimeter-level accuracy obtained from the fusion of direct stereo-inertial odometry with RTK GNSS. We evaluate the performance of several state-of-the-art visual odometry and visual localization baseline approaches on the benchmark and analyze their properties. The experimental results provide new insights into current approaches and show promising potential for future research. Our benchmark and evaluation protocols will be available at https://www.4seasons-dataset.com/.
translated by 谷歌翻译
本文解决了基于跨视频的相机本地化(CVL)的问题。任务是通过利用其过去观察结果的信息来定位查询摄像机,即在以前的时间邮票处观察到的图像连续序列,并将它们与大型开销视图卫星图像匹配。该任务的关键挑战是为顺序地面视图图像学习强大的全局功能描述符,同时考虑其与参考卫星图像的域对齐。为此,我们介绍了CVLNET,该CVLNET首先通过探索地面和开头几何对应关系,然后利用预测图像之间的照片一致性来形成全局表示,首先将顺序地面视图图像投射到高架视图中。这样,跨视图域的差异就被桥接了。由于参考卫星图像通常会预先编写并定期采样,因此查询相机位置与其匹配的卫星图像中心之间始终存在未对准。在此激励的情况下,我们建议在相似性匹配之前估算查询摄像机的相对位移对卫星图像。在此位移估计过程中,我们还考虑了相机位置的不确定性。例如,相机不太可能在树上。为了评估所提出方法的性能,我们从Google Map中为Kitti数据集收集卫星图像,并构建一个新的基于跨视频的本地化本地化基准数据集Kitti-CVL。广泛的实验证明了基于视频的本地化对基于单个图像的本地化的有效性以及每个提出的模块比其他替代方案的优越性。
translated by 谷歌翻译
位置识别是可以协助同时定位和映射(SLAM)进行循环闭合检测和重新定位以进行长期导航的基本模块。在过去的20美元中,该地点认可社区取得了惊人的进步,这吸引了在计算机视觉和机器人技术等多个领域的广泛研究兴趣和应用。但是,在复杂的现实世界情景中,很少有方法显示出有希望的位置识别性能,在复杂的现实世界中,长期和大规模的外观变化通常会导致故障。此外,在最先进的方法之间缺乏集成框架,可以应对所有挑战,包括外观变化,观点差异,对未知区域的稳健性以及现实世界中的效率申请。在这项工作中,我们调查针对长期本地化并讨论未来方向和机会的最先进方法。首先,我们研究了长期自主权中的位置识别以及在现实环境中面临的主要挑战。然后,我们回顾了最新的作品,以应对各种位置识别挑战的不同传感器方式和当前的策略的认可。最后,我们回顾了现有的数据集以进行长期本地化,并为不同的方法介绍了我们的数据集和评估API。本文可以成为该地点识别界新手的研究人员以及关心长期机器人自主权的研究人员。我们还对机器人技术中的常见问题提供了意见:机器人是否需要准确的本地化来实现长期自治?这项工作以及我们的数据集和评估API的摘要可向机器人社区公开,网址为:https://github.com/metaslam/gprs。
translated by 谷歌翻译
We tackle the problem of large scale visual place recognition, where the task is to quickly and accurately recognize the location of a given query photograph. We present the following three principal contributions. First, we develop a convolutional neural network (CNN) architecture that is trainable in an end-to-end manner directly for the place recognition task. The main component of this architecture, NetVLAD, is a new generalized VLAD layer, inspired by the "Vector of Locally Aggregated Descriptors" image representation commonly used in image retrieval. The layer is readily pluggable into any CNN architecture and amenable to training via backpropagation. Second, we develop a training procedure, based on a new weakly supervised ranking loss, to learn parameters of the architecture in an end-to-end manner from images depicting the same places over time downloaded from Google Street View Time Machine. Finally, we show that the proposed architecture significantly outperforms non-learnt image representations and off-the-shelf CNN descriptors on two challenging place recognition benchmarks, and improves over current stateof-the-art compact image representations on standard image retrieval benchmarks.
translated by 谷歌翻译
视觉摄像头是超越视觉线(B-VLOS)无人机操作的吸引人的设备,因为它们的尺寸,重量,功率和成本较低,并且可以为GPS失败提供多余的方式。但是,最新的视觉定位算法无法匹配由于照明或观点而导致外观明显不同的视觉数据。本文介绍了Isimloc,这是一种条件/观点一致的层次结构全局重新定位方法。 Isimloc的位置功能可用于在不断变化的外观和观点下搜索目标图像。此外,我们的分层全局重新定位模块以粗到精细的方式完善,使Isimloc可以执行快速准确的估计。我们在一个数据集上评估了我们的方法,其中具有外观变化和一个数据集,该数据集的重点是在复杂的环境中长期飞行进行大规模匹配。在我们的两个数据集中,Isimloc在1.5s推导时间的成功检索率达到88.7 \%和83.8 \%,而使用下一个最佳方法,为45.8%和39.7%。这些结果证明了在各种环境中的强大定位。
translated by 谷歌翻译
尽管外观和观点的显着变化,视觉地点识别(VPR)通常是能够识别相同的地方。 VPR是空间人工智能的关键组成部分,使机器人平台和智能增强平台,例如增强现实设备,以察觉和理解物理世界。在本文中,我们观察到有三个“驱动程序”,它对空间智能代理有所要求,因此vpr系统:1)特定代理包括其传感器和计算资源,2)该代理的操作环境,以及3)人造工具执行的具体任务。在本文中,考虑到这些驱动因素,包括他们的位置代表和匹配选择,在VPR区域中表征和调查关键作品。我们还基于视觉重叠的VPR提供了一种新的VPR - 类似于大脑中的空间视图单元格 - 这使我们能够找到对机器人和计算机视觉领域的其他研究领域的相似之处和差异。我们确定了许多开放的挑战,并建议未来工作需要更深入的关注的领域。
translated by 谷歌翻译
Robust detection and tracking of objects is crucial for the deployment of autonomous vehicle technology. Image based benchmark datasets have driven development in computer vision tasks such as object detection, tracking and segmentation of agents in the environment. Most autonomous vehicles, however, carry a combination of cameras and range sensors such as lidar and radar. As machine learning based methods for detection and tracking become more prevalent, there is a need to train and evaluate such methods on datasets containing range sensor data along with images. In this work we present nuTonomy scenes (nuScenes), the first dataset to carry the full autonomous vehicle sensor suite: 6 cameras, 5 radars and 1 lidar, all with full 360 degree field of view. nuScenes comprises 1000 scenes, each 20s long and fully annotated with 3D bounding boxes for 23 classes and 8 attributes. It has 7x as many annotations and 100x as many images as the pioneering KITTI dataset. We define novel 3D detection and tracking metrics. We also provide careful dataset analysis as well as baselines for lidar and image based detection and tracking. Data, development kit and more information are available online 1 .
translated by 谷歌翻译
视觉地位识别(VPR)通常关注本地化室外图像。但是,本地化包含部分户外场景的室内场景对于各种应用来说可能具有很大的值。在本文中,我们介绍了内部视觉地点识别(IOVPR),一个任务,旨在通过Windows可见的户外场景本地化图像。对于此任务,我们介绍了新的大型数据集Amsterdam-XXXL,在阿姆斯特丹拍摄的图像,由640万全景街头视图图像和1000个用户生成的室内查询组成。此外,我们介绍了一个新的培训协议,内部数据增强,以适应视觉地点识别方法,以便展示内外视觉识别的潜力。我们经验展示了我们提出的数据增强方案的优势,较小的规模,同时展示了现有方法的大规模数据集的难度。通过这项新任务,我们旨在鼓励为IOVPR制定方法。数据集和代码可用于HTTPS://github.com/saibr/iovpr的研究目的
translated by 谷歌翻译
自动化驾驶系统(广告)开辟了汽车行业的新领域,为未来的运输提供了更高的效率和舒适体验的新可能性。然而,在恶劣天气条件下的自主驾驶已经存在,使自动车辆(AVS)长时间保持自主车辆(AVS)或更高的自主权。本文评估了天气在分析和统计方式中为广告传感器带来的影响和挑战,并对恶劣天气条件进行了解决方案。彻底报道了关于对每种天气的感知增强的最先进技术。外部辅助解决方案如V2X技术,当前可用的数据集,模拟器和天气腔室的实验设施中的天气条件覆盖范围明显。通过指出各种主要天气问题,自主驾驶场目前正在面临,近年来审查硬件和计算机科学解决方案,这项调查概述了在不利的天气驾驶条件方面的障碍和方向的障碍和方向。
translated by 谷歌翻译
The last decade witnessed increasingly rapid progress in self-driving vehicle technology, mainly backed up by advances in the area of deep learning and artificial intelligence. The objective of this paper is to survey the current state-of-the-art on deep learning technologies used in autonomous driving. We start by presenting AI-based self-driving architectures, convolutional and recurrent neural networks, as well as the deep reinforcement learning paradigm. These methodologies form a base for the surveyed driving scene perception, path planning, behavior arbitration and motion control algorithms. We investigate both the modular perception-planning-action pipeline, where each module is built using deep learning methods, as well as End2End systems, which directly map sensory information to steering commands. Additionally, we tackle current challenges encountered in designing AI architectures for autonomous driving, such as their safety, training data sources and computational hardware. The comparison presented in this survey helps to gain insight into the strengths and limitations of deep learning and AI approaches for autonomous driving and assist with design choices. 1
translated by 谷歌翻译
本文通过将地面图像与高架视图卫星地图匹配,解决了车辆安装的相机本地化问题。现有方法通常将此问题视为跨视图图像检索,并使用学习的深度特征将地面查询图像与卫星图的分区(例如,小补丁)匹配。通过这些方法,定位准确性受卫星图的分配密度(通常是按数米的顺序)限制。本文偏离了图像检索的传统智慧,提出了一种新的解决方案,可以实现高度准确的本地化。关键思想是将任务提出为构成估计,并通过基于神经网络的优化解决。具体而言,我们设计了一个两分支{CNN},分别从地面和卫星图像中提取可靠的特征。为了弥合巨大的跨视界域间隙,我们求助于基于相对摄像头姿势的几何投影模块,该模块从卫星地图到地面视图。为了最大程度地减少投影功能和观察到的功能之间的差异,我们采用了可区分的Levenberg-Marquardt({lm})模块来迭代地搜索最佳相机。整个管道都是可区分的,并且端到端运行。关于标准自动驾驶汽车定位数据集的广泛实验已经证实了该方法的优越性。值得注意的是,例如,从40m x 40m的宽区域内的相机位置的粗略估计开始,我们的方法迅速降低了新的Kitti Cross-view数据集中的横向位置误差在5m之内。
translated by 谷歌翻译
Visual Place Recognition is an essential component of systems for camera localization and loop closure detection, and it has attracted widespread interest in multiple domains such as computer vision, robotics and AR/VR. In this work, we propose a faster, lighter and stronger approach that can generate models with fewer parameters and can spend less time in the inference stage. We designed RepVGG-lite as the backbone network in our architecture, it is more discriminative than other general networks in the Place Recognition task. RepVGG-lite has more speed advantages while achieving higher performance. We extract only one scale patch-level descriptors from global descriptors in the feature extraction stage. Then we design a trainable feature matcher to exploit both spatial relationships of the features and their visual appearance, which is based on the attention mechanism. Comprehensive experiments on challenging benchmark datasets demonstrate the proposed method outperforming recent other state-of-the-art learned approaches, and achieving even higher inference speed. Our system has 14 times less params than Patch-NetVLAD, 6.8 times lower theoretical FLOPs, and run faster 21 and 33 times in feature extraction and feature matching. Moreover, the performance of our approach is 0.5\% better than Patch-NetVLAD in Recall@1. We used subsets of Mapillary Street Level Sequences dataset to conduct experiments for all other challenging conditions.
translated by 谷歌翻译
The PASCAL Visual Object Classes (VOC) challenge is a benchmark in visual object category recognition and detection, providing the vision and machine learning communities with a standard dataset of images and annotation, and standard evaluation procedures. Organised annually from 2005 to present, the challenge and its associated dataset has become accepted as the benchmark for object detection.This paper describes the dataset and evaluation procedure. We review the state-of-the-art in evaluated methods for both classification and detection, analyse whether the methods are statistically different, what they are learning from the images (e.g. the object or its context), and what the methods find easy or confuse. The paper concludes with lessons learnt in the three year history of the challenge, and proposes directions for future improvement and extension.
translated by 谷歌翻译
近年来,机器人社区已经广泛检查了关于同时定位和映射应用范围内的地点识别任务的方法。这篇文章提出了一种基于外观的循环闭合检测管道,命名为“fild ++”(快速和增量环闭合检测) .First,系统由连续图像馈送,并且通过通过单个卷积神经网络通过两次,通过单个卷积神经网络来提取全局和局部深度特征。灵活,分级导航的小世界图逐步构建表示机器人遍历路径的可视数据库基于计算的全局特征。最后,每个时间步骤抓取查询映像,被设置为在遍历的路线上检索类似的位置。遵循的图像到图像配对,它利用本地特征来评估空间信息。因此,在拟议的文章中,我们向全球和本地特征提取提出了一个网络与我们之前的一个网络工作(FILD),而在生成的深度本地特征上采用了彻底搜索验证过程,避免利用哈希代码。关于11个公共数据集的详尽实验表现出系统的高性能(实现其中八个的最高召回得分)和低执行时间(在新学院平均22.05毫秒,这是与其他国家相比包含52480图像的最大版本) - 最艺术方法。
translated by 谷歌翻译
在这项研究中,我们提出了一种新型的视觉定位方法,以根据RGB摄像机的可视数据准确估计机器人在3D激光镜头内的六个自由度(6-DOF)姿势。使用基于先进的激光雷达的同时定位和映射(SLAM)算法,可获得3D地图,能够收集精确的稀疏图。将从相机图像中提取的功能与3D地图的点进行了比较,然后解决了几何优化问题,以实现精确的视觉定位。我们的方法允许使用配备昂贵激光雷达的侦察兵机器人一次 - 用于映射环境,并且仅使用RGB摄像头的多个操作机器人 - 执行任务任务,其本地化精度高于常见的基于相机的解决方案。该方法在Skolkovo科学技术研究所(Skoltech)收集的自定义数据集上进行了测试。在评估本地化准确性的过程中,我们设法达到了厘米级的准确性;中间翻译误差高达1.3厘米。仅使用相机实现的确切定位使使用自动移动机器人可以解决需要高度本地化精度的最复杂的任务。
translated by 谷歌翻译
事件摄像机由于理想的特征,例如高动态范围,低延迟,几乎没有运动模糊和高能量效率而继续引起兴趣。事件摄像机研究的潜在应用之一是在机器人本地化的视觉场所识别中,必须将查询观测值与数据库中的相应参考位置匹配。在这封信中,我们探讨了一小部分像素(在数十个或数百个)中的事件流的独特性。我们证明,当使用在参考集中显示大变化的像素时,积累到事件框架的那些像素位置的事件数量的绝对差异就足以足以进行位置识别任务。使用如此稀疏(图像坐标),但是(对于每个像素位置的事件数量)有变化,可以使位置估计值的频繁和计算廉价更新。此外,当事件帧包含恒定事件的数量时,我们的方法充分利用了感官流的事件驱动性质,并显示出对速度变化的有希望的鲁棒性。我们在户外驾驶场景中评估了布里斯班 - 事件-VPR数据集的建议方法,以及新贡献的室内QCR-Event-VPR数据集,该数据集用安装在移动机器人平台上的Davis346相机捕获。我们的结果表明,与这些数据集上的几种基线方法相比,我们的方法可实现竞争性能,并且特别适合于计算和能源约束的平台,例如星际漫游者。
translated by 谷歌翻译
随着自动驾驶行业正在缓慢成熟,视觉地图本地化正在迅速成为尽可能准确定位汽车的标准方法。由于相机或激光镜等视觉传感器返回的丰富数据,研究人员能够构建具有各种细节的不同类型的地图,并使用它们来实现高水平的车辆定位准确性和在城市环境中的稳定性。与流行的SLAM方法相反,视觉地图本地化依赖于预先构建的地图,并且仅通过避免误差积累或漂移来提高定位准确性。我们将视觉地图定位定义为两个阶段的过程。在位置识别的阶段,通过将视觉传感器输出与一组地理标记的地图区域进行比较,可以确定车辆在地图中的初始位置。随后,在MAP指标定位的阶段,通过连续将视觉传感器的输出与正在遍历的MAP的当前区域进行对齐,对车辆在地图上移动时进行了跟踪。在本文中,我们调查,讨论和比较两个阶段的基于激光雷达,基于摄像头和跨模式的视觉图本地化的最新方法,以突出每种方法的优势。
translated by 谷歌翻译
Figure 1: PoseNet: Convolutional neural network monocular camera relocalization. Relocalization results for an input image (top), the predicted camera pose of a visual reconstruction (middle), shown again overlaid in red on the original image (bottom). Our system relocalizes to within approximately 2m and 6 • for large outdoor scenes spanning 50, 000m 2 . For an online demonstration, please see our project webpage: mi.eng.cam.ac.uk/projects/relocalisation/
translated by 谷歌翻译