为了关注自动驾驶工具的点对点导航的任务,我们提出了一种新颖的深度学习模型,该模型接受了端到端和多任务学习的方式,以同时执行感知和控制任务。该模型用于通过按照全球规划器定义的一系列路线来安全地驱动自我车辆。模型的感知部分用于编码RGBD摄像机提供的高维观察数据,同时执行语义分割,语义深度云(SDC)映射以及交通灯状态和停止符号预测。然后,控制零件将解码编码的功能以及GPS和速度计提供的其他信息,以预测带有潜在特征空间的路点。此外,还采用了两名代理来处理这些输出,并制定控制策略,以确定转向,油门和制动的水平为最终动作。在Carla模拟器上评估该模型,其各种情况由正常的对抗情况和不同的风雨制成,以模仿现实世界中的情况。此外,我们对一些最近的模型进行了比较研究,以证明驾驶多个方面的性能是合理的。此外,我们还对SDC映射和多代理进行了消融研究,以了解其角色和行为。结果,即使参数和计算负载较少,我们的模型也达到了最高的驾驶得分。为了支持未来的研究,我们可以在https://github.com/oskarnatan/end-to-end-drive上分享我们的代码。
translated by 谷歌翻译
我们提出了DeePIPC,这是一种端到端的多任务模型,可在自动驾驶移动机器人自动驾驶时处理感知和控制任务。该模型由两个主要部分组成:感知和控制器模块。感知模块拍摄RGB图像和深度图来执行语义分割和鸟类视图(BEV)语义映射,并提供其编码功能。同时,控制器模块通过测量GNSS位置和角速度处理这些功能,以估算带有潜在特征的航路点。然后,使用两种不同的代理将航路点和潜在特征转换为一组导航控件,以驱动机器人。通过预测驾驶记录并在实际环境中的各种条件下进行自动驾驶来评估该模型。基于实验结果,DEEPIPC与其他模型相比,即使参数较少,DEEPIPC也达到了最佳的驾驶性和多任务性能。
translated by 谷歌翻译
由于安全问题,自动驾驶汽车的大规模部署已不断延迟。一方面,全面的场景理解是必不可少的,缺乏这种理解会导致易受罕见但复杂的交通状况,例如突然出现未知物体。但是,从全球环境中的推理需要访问多种类型的传感器以及多模式传感器信号的足够融合,这很难实现。另一方面,学习模型中缺乏可解释性也会因无法验证的故障原因阻碍安全性。在本文中,我们提出了一个安全增强的自主驾驶框架,称为可解释的传感器融合变压器(Interfuser),以完全处理和融合来自多模式多视图传感器的信息,以实现全面的场景理解和对抗性事件检测。此外,我们的框架是从我们的框架中生成的中间解释功能,该功能提供了更多的语义,并被利用以更好地约束操作以在安全集内。我们在Carla基准测试中进行了广泛的实验,我们的模型优于先前的方法,在公共卡拉排行榜上排名第一。
translated by 谷歌翻译
受到人类使用多种感觉器官感知世界的事实的启发,具有不同方式的传感器在端到端驾驶中部署,以获得3D场景的全球环境。在以前的作品中,相机和激光镜的输入通过变压器融合,以更好地驾驶性能。通常将这些输入进一步解释为高级地图信息,以帮助导航任务。然而,从复杂地图输入中提取有用的信息很具有挑战性,因为冗余信息可能会误导代理商并对驾驶性能产生负面影响。我们提出了一种新颖的方法,可以从矢量化高清(HD)地图中有效提取特征,并将其利用在端到端驾驶任务中。此外,我们设计了一个新的专家,以通过考虑多道路规则来进一步增强模型性能。实验结果证明,两种提出的改进都可以使我们的代理人与其他方法相比获得卓越的性能。
translated by 谷歌翻译
在本文中,我们提出了一个系统,以培训不仅从自我车辆收集的经验,而且还观察到的所有车辆的经验。该系统使用其他代理的行为来创建更多样化的驾驶场景,而无需收集其他数据。从其他车辆学习的主要困难是没有传感器信息。我们使用一组监督任务来学习一个中间表示,这是对控制车辆的观点不变的。这不仅在训练时间提供了更丰富的信号,而且还可以在推断过程中进行更复杂的推理。了解所有车辆驾驶如何有助于预测测试时的行为,并避免碰撞。我们在闭环驾驶模拟中评估该系统。我们的系统的表现优于公共卡拉排行榜上的所有先前方法,较大的利润率将驾驶得分提高了25,路线完成率提高了24分。我们的方法赢得了2021年的卡拉自动驾驶挑战。代码和数据可在https://github.com/dotchen/lav上获得。
translated by 谷歌翻译
当前的端到端自动驾驶方法要么基于计划的轨迹运行控制器,要么直接执行控制预测,这已经跨越了两条单独研究的研究线。本文看到了它们彼此的潜在相互利益,主动探讨了这两个发展良好的世界的结合。具体而言,我们的集成方法分别有两个用于轨迹计划和直接控制的分支。轨迹分支可以预测未来的轨迹,而控制分支则涉及一种新颖的多步预测方案,以便可以将当前动作与未来状态之间的关系进行推理。连接了两个分支,因此控制分支在每个时间步骤中从轨迹分支接收相应的指导。然后将来自两个分支的输出融合以实现互补的优势。我们的结果在闭环城市驾驶环境中进行了评估,并使用CARLA模拟器具有挑战性的情况。即使有了单眼相机的输入,建议的方法在官方Carla排行榜上排名第一$,超过了其他具有多个传感器或融合机制的复杂候选人。源代码和数据将在https://github.com/openperceptionx/tcp上公开提供。
translated by 谷歌翻译
许多现有的自动驾驶范式涉及多个任务的多个阶段离散管道。为了更好地预测控制信号并增强用户安全性,希望从联合时空特征学习中受益的端到端方法是可取的。尽管基于激光雷达的输入或隐式设计有一些开创性的作品,但在本文中,我们在可解释的基于视觉的设置中提出了问题。特别是,我们提出了一种空间性特征学习方案,以同时同时进行感知,预测和计划任务的一组更具代表性的特征,称为ST-P3。具体而言,提出了一种以自我为中心的积累技术来保留3D空间中的几何信息,然后才能感知鸟类视图转化。设计了双重途径建模,以考虑将来的预测,以将过去的运动变化考虑到过去。引入了基于时间的精炼单元,以弥补识别基于视觉的计划的元素。据我们所知,我们是第一个系统地研究基于端视力的自主驾驶系统的每个部分。我们在开环Nuscenes数据集和闭环CARLA模拟上对以前的最先进的方法进行基准测试。结果显示了我们方法的有效性。源代码,模型和协议详细信息可在https://github.com/openperceptionx/st-p3上公开获得。
translated by 谷歌翻译
Accurate localization ability is fundamental in autonomous driving. Traditional visual localization frameworks approach the semantic map-matching problem with geometric models, which rely on complex parameter tuning and thus hinder large-scale deployment. In this paper, we propose BEV-Locator: an end-to-end visual semantic localization neural network using multi-view camera images. Specifically, a visual BEV (Birds-Eye-View) encoder extracts and flattens the multi-view images into BEV space. While the semantic map features are structurally embedded as map queries sequence. Then a cross-model transformer associates the BEV features and semantic map queries. The localization information of ego-car is recursively queried out by cross-attention modules. Finally, the ego pose can be inferred by decoding the transformer outputs. We evaluate the proposed method in large-scale nuScenes and Qcraft datasets. The experimental results show that the BEV-locator is capable to estimate the vehicle poses under versatile scenarios, which effectively associates the cross-model information from multi-view images and global semantic maps. The experiments report satisfactory accuracy with mean absolute errors of 0.052m, 0.135m and 0.251$^\circ$ in lateral, longitudinal translation and heading angle degree.
translated by 谷歌翻译
Modern autonomous driving system is characterized as modular tasks in sequential order, i.e., perception, prediction and planning. As sensors and hardware get improved, there is trending popularity to devise a system that can perform a wide diversity of tasks to fulfill higher-level intelligence. Contemporary approaches resort to either deploying standalone models for individual tasks, or designing a multi-task paradigm with separate heads. These might suffer from accumulative error or negative transfer effect. Instead, we argue that a favorable algorithm framework should be devised and optimized in pursuit of the ultimate goal, i.e. planning of the self-driving-car. Oriented at this goal, we revisit the key components within perception and prediction. We analyze each module and prioritize the tasks hierarchically, such that all these tasks contribute to planning (the goal). To this end, we introduce Unified Autonomous Driving (UniAD), the first comprehensive framework up-to-date that incorporates full-stack driving tasks in one network. It is exquisitely devised to leverage advantages of each module, and provide complementary feature abstractions for agent interaction from a global perspective. Tasks are communicated with unified query design to facilitate each other toward planning. We instantiate UniAD on the challenging nuScenes benchmark. With extensive ablations, the effectiveness of using such a philosophy is proven to surpass previous state-of-the-arts by a large margin in all aspects. The full suite of codebase and models would be available to facilitate future research in the community.
translated by 谷歌翻译
We introduce CARLA, an open-source simulator for autonomous driving research. CARLA has been developed from the ground up to support development, training, and validation of autonomous urban driving systems. In addition to open-source code and protocols, CARLA provides open digital assets (urban layouts, buildings, vehicles) that were created for this purpose and can be used freely. The simulation platform supports flexible specification of sensor suites and environmental conditions. We use CARLA to study the performance of three approaches to autonomous driving: a classic modular pipeline, an endto-end model trained via imitation learning, and an end-to-end model trained via reinforcement learning. The approaches are evaluated in controlled scenarios of increasing difficulty, and their performance is examined via metrics provided by CARLA, illustrating the platform's utility for autonomous driving research.
translated by 谷歌翻译
自动化驾驶系统(广告)开辟了汽车行业的新领域,为未来的运输提供了更高的效率和舒适体验的新可能性。然而,在恶劣天气条件下的自主驾驶已经存在,使自动车辆(AVS)长时间保持自主车辆(AVS)或更高的自主权。本文评估了天气在分析和统计方式中为广告传感器带来的影响和挑战,并对恶劣天气条件进行了解决方案。彻底报道了关于对每种天气的感知增强的最先进技术。外部辅助解决方案如V2X技术,当前可用的数据集,模拟器和天气腔室的实验设施中的天气条件覆盖范围明显。通过指出各种主要天气问题,自主驾驶场目前正在面临,近年来审查硬件和计算机科学解决方案,这项调查概述了在不利的天气驾驶条件方面的障碍和方向的障碍和方向。
translated by 谷歌翻译
在鸟眼中学习强大的表现(BEV),以进行感知任务,这是趋势和吸引行业和学术界的广泛关注。大多数自动驾驶算法的常规方法在正面或透视视图中执行检测,细分,跟踪等。随着传感器配置变得越来越复杂,从不同的传感器中集成了多源信息,并在统一视图中代表功能至关重要。 BEV感知继承了几个优势,因为代表BEV中的周围场景是直观和融合友好的。对于BEV中的代表对象,对于随后的模块,如计划和/或控制是最可取的。 BEV感知的核心问题在于(a)如何通过从透视视图到BEV来通过视图转换来重建丢失的3D信息; (b)如何在BEV网格中获取地面真理注释; (c)如何制定管道以合并来自不同来源和视图的特征; (d)如何适应和概括算法作为传感器配置在不同情况下各不相同。在这项调查中,我们回顾了有关BEV感知的最新工作,并对不同解决方案进行了深入的分析。此外,还描述了该行业的BEV方法的几种系统设计。此外,我们推出了一套完整的实用指南,以提高BEV感知任务的性能,包括相机,激光雷达和融合输入。最后,我们指出了该领域的未来研究指示。我们希望该报告能阐明社区,并鼓励对BEV感知的更多研究。我们保留一个活跃的存储库来收集最新的工作,并在https://github.com/openperceptionx/bevperception-survey-recipe上提供一包技巧的工具箱。
translated by 谷歌翻译
在动态,多助手和复杂的城市环境中驾驶是一个需要复杂的决策政策的艰巨任务。这种策略的学习需要可以编码整个环境的状态表示。作为图像编码车辆环境的中级表示已成为一种受欢迎的选择。仍然,它们是非常高的,限制了他们在诸如加固学习等数据饥饿的方法的使用。在本文中,我们建议通过利用相关语义因素的知识来学习环境的低维度和丰富的潜在表示。为此,我们训练编码器解码器深神经网络,以预测多种应用相关因素,例如其他代理和自助车的轨迹。此外,我们提出了一种基于其他车辆的未来轨迹的危险信号和计划的路由,这些路线与学习的潜在表示作为输入到下游策略的输入。我们演示了使用多头编码器解码器神经网络导致比标准单头模型更具信息的表示。特别是,所提出的代表学习和危险信号有助于加强学习以更快地学习,而性能提高,数据比基线方法更快。
translated by 谷歌翻译
车辆到所有(V2X)通信技术使车辆与附近环境中许多其他实体之间的协作可以从根本上改善自动驾驶的感知系统。但是,缺乏公共数据集极大地限制了协作感知的研究进度。为了填补这一空白,我们提出了V2X-SIM,这是一个针对V2X辅助自动驾驶的全面模拟多代理感知数据集。 V2X-SIM提供:(1)\ hl {Multi-Agent}传感器记录来自路边单元(RSU)和多种能够协作感知的车辆,(2)多模式传感器流,可促进多模式感知和多模式感知和(3)支持各种感知任务的各种基础真理。同时,我们在三个任务(包括检测,跟踪和细分)上为最先进的协作感知算法提供了一个开源测试台,并为最先进的协作感知算法提供了基准。 V2X-SIM试图在现实数据集广泛使用之前刺激自动驾驶的协作感知研究。我们的数据集和代码可在\ url {https://ai4ce.github.io/v2x-sim/}上获得。
translated by 谷歌翻译
Autonomous driving requires efficient reasoning about the location and appearance of the different agents in the scene, which aids in downstream tasks such as object detection, object tracking, and path planning. The past few years have witnessed a surge in approaches that combine the different taskbased modules of the classic self-driving stack into an End-toEnd(E2E) trainable learning system. These approaches replace perception, prediction, and sensor fusion modules with a single contiguous module with shared latent space embedding, from which one extracts a human-interpretable representation of the scene. One of the most popular representations is the Birds-eye View (BEV), which expresses the location of different traffic participants in the ego vehicle frame from a top-down view. However, a BEV does not capture the chromatic appearance information of the participants. To overcome this limitation, we propose a novel representation that captures various traffic participants appearance and occupancy information from an array of monocular cameras covering 360 deg field of view (FOV). We use a learned image embedding of all camera images to generate a BEV of the scene at any instant that captures both appearance and occupancy of the scene, which can aid in downstream tasks such as object tracking and executing language-based commands. We test the efficacy of our approach on synthetic dataset generated from CARLA. The code, data set, and results can be found at https://rebrand.ly/APP OCC-results.
translated by 谷歌翻译
Multi-modal fusion is a basic task of autonomous driving system perception, which has attracted many scholars' interest in recent years. The current multi-modal fusion methods mainly focus on camera data and LiDAR data, but pay little attention to the kinematic information provided by the bottom sensors of the vehicle, such as acceleration, vehicle speed, angle of rotation. These information are not affected by complex external scenes, so it is more robust and reliable. In this paper, we introduce the existing application fields of vehicle bottom information and the research progress of related methods, as well as the multi-modal fusion methods based on bottom information. We also introduced the relevant information of the vehicle bottom information data set in detail to facilitate the research as soon as possible. In addition, new future ideas of multi-modal fusion technology for autonomous driving tasks are proposed to promote the further utilization of vehicle bottom information.
translated by 谷歌翻译
With the development of deep representation learning, the domain of reinforcement learning (RL) has become a powerful learning framework now capable of learning complex policies in high dimensional environments. This review summarises deep reinforcement learning (DRL) algorithms and provides a taxonomy of automated driving tasks where (D)RL methods have been employed, while addressing key computational challenges in real world deployment of autonomous driving agents. It also delineates adjacent domains such as behavior cloning, imitation learning, inverse reinforcement learning that are related but are not classical RL algorithms. The role of simulators in training agents, methods to validate, test and robustify existing solutions in RL are discussed.
translated by 谷歌翻译
以视觉为中心的BEV感知由于其固有的优点,最近受到行业和学术界的关注,包括展示世界自然代表和融合友好。随着深度学习的快速发展,已经提出了许多方法来解决以视觉为中心的BEV感知。但是,最近没有针对这个小说和不断发展的研究领域的调查。为了刺激其未来的研究,本文对以视觉为中心的BEV感知及其扩展进行了全面调查。它收集并组织了最近的知识,并对常用算法进行了系统的综述和摘要。它还为几项BEV感知任务提供了深入的分析和比较结果,从而促进了未来作品的比较并激发了未来的研究方向。此外,还讨论了经验实现细节并证明有利于相关算法的开发。
translated by 谷歌翻译
由于资源限制,高效的计算系统长期以来一直是设计自动驾驶汽车的人的关键需求。此外,传感器成本和尺寸限制了自动驾驶汽车的开发。本文为基于视觉的自动车辆运行提供了有效的框架;前置摄像头和一些便宜的雷达是驱动环境感知的所需传感器。所提出的算法包括一个多任务UNET(MTUNET)网络,用于提取图像特征和约束的迭代迭代线性二次调节器(CILQR)模块,用于快速侧向运动和纵向运动计划。 MTUNET旨在同时求解车道线分割,自我车辆标题角度回归,道路类型分类和交通对象检测任务时,当尺寸228 x 228的RGB图像被送入其中时,其速度约为40 fps。然后,CILQR算法将处理的MTUNET输出和雷达数据作为输入,以产生驾驶命令,以进行侧向和纵向车辆自动化指导;两个最佳控制问题都可以在1 ms内解决。所提出的CILQR控制器比顺序二次编程(SQP)方法更有效,并且可以与MTUNET合作以在未看到的模拟环境中自动驾驶汽车,以实现泳道和汽车保护操作。我们的实验表明,提出的自主驾驶系统适用于现代汽车。
translated by 谷歌翻译
The last decade witnessed increasingly rapid progress in self-driving vehicle technology, mainly backed up by advances in the area of deep learning and artificial intelligence. The objective of this paper is to survey the current state-of-the-art on deep learning technologies used in autonomous driving. We start by presenting AI-based self-driving architectures, convolutional and recurrent neural networks, as well as the deep reinforcement learning paradigm. These methodologies form a base for the surveyed driving scene perception, path planning, behavior arbitration and motion control algorithms. We investigate both the modular perception-planning-action pipeline, where each module is built using deep learning methods, as well as End2End systems, which directly map sensory information to steering commands. Additionally, we tackle current challenges encountered in designing AI architectures for autonomous driving, such as their safety, training data sources and computational hardware. The comparison presented in this survey helps to gain insight into the strengths and limitations of deep learning and AI approaches for autonomous driving and assist with design choices. 1
translated by 谷歌翻译