变形金刚在NLP和计算机视觉上实现了突破,最近开始在自动驾驶汽车(AV)的轨迹预测中表现出有希望的表现。如何有效地对自我代理与其他道路和动态对象之间的交互关系建模仍然对标准注意模块仍然具有挑战性。在这项工作中,我们提出了一个类似变压器的架构模块MNM网络,该网络配备了新型掩盖的目标调节训练程序,用于AV轨迹预测。最终的模型名为高尔夫球手,取得了最先进的性能,在2022 Waymo Open DataSet Motion Predict挑战中赢得了第二名,并根据Minade排名第一。
translated by 谷歌翻译
预测道路用户的未来行为是自主驾驶中最具挑战性和最重要的问题之一。应用深度学习对此问题需要以丰富的感知信号和地图信息的形式融合异构世界状态,并在可能的期货上推断出高度多模态分布。在本文中,我们呈现MultiPath ++,这是一个未来的预测模型,实现了在流行的基准上实现最先进的性能。 MultiPath ++通过重新访问许多设计选择来改善多径架构。第一关键设计差异是偏离基于图像的基于输入世界状态的偏离,有利于异构场景元素的稀疏编码:多径++消耗紧凑且有效的折线,直接描述道路特征和原始代理状态信息(例如,位置,速度,加速)。我们提出了一种背景感知这些元素的融合,并开发可重用的多上下文选通融合组件。其次,我们重新考虑了预定义,静态锚点的选择,并开发了一种学习模型端到端的潜在锚嵌入的方法。最后,我们在其他ML域中探索合奏和输出聚合技术 - 常见的常见域 - 并为我们的概率多模式输出表示找到有效的变体。我们对这些设计选择进行了广泛的消融,并表明我们所提出的模型在协会运动预测竞争和Waymo开放数据集运动预测挑战上实现了最先进的性能。
translated by 谷歌翻译
自动驾驶的运动预测是一项艰巨的任务,因为复杂的驾驶场景导致静态和动态输入的异质组合。这是一个开放的问题,如何最好地表示和融合有关道路几何,车道连接,时变的交通信号状态以及动态代理的历史及其相互作用的历史。为了模拟这一不同的输入功能集,许多提出的方法旨在设计具有多种模态模块的同样复杂系统。这导致难以按严格的方式进行扩展,扩展或调整的系统以进行质量和效率。在本文中,我们介绍了Wayformer,这是一个基于注意力的运动架构,用于运动预测,简单而均匀。 Wayformer提供了一个紧凑的模型描述,该描述由基于注意力的场景编码器和解码器组成。在场景编码器中,我们研究了输入方式的早期,晚和等级融合的选择。对于每种融合类型,我们通过分解的注意力或潜在的查询关注来探索策略来折衷效率和质量。我们表明,尽管早期融合的结构简单,但不仅是情感不可知论,而且还取得了最先进的结果。
translated by 谷歌翻译
在这项工作中,我们提出了一种新的多模态多代理轨迹预测架构,专注于使用图形表示的地图和交互建模。出于地图建模的目的,我们将丰富的拓扑结构捕获到基于向量的星形图中,使代理能够直接参加用于代表地图的折线上的相关区域。我们表示此架构Starnet,并将其集成在单次代理预测设置中。作为主要结果,我们将此架构扩展到联合场景级预测,同时产生多个代理的预测。联合赛斯网的关键思想在自己的参考框中将一个代理的意识与其他代理人的观点察觉到。我们通过蒙面的自我关注实现这一目标。两个提出的架构都建立在我们以前的工作中介绍的动作空间预测框架之上,这确保了运动学上可行的轨迹预测。我们评估了富含互动的IND和交互数据集的方法,其中STARNET和联合星网实现了最先进的技术。
translated by 谷歌翻译
在本报告中,我们介绍了2022 Waymo Open DataSet挑战中运动预测轨迹的第一名解决方案。我们为多模式运动预测提出了一个新型的运动变压器框架,该框架引入了一组新型运动查询对,用于通过共同执行意图定位和迭代运动改进来产生更好的多模式未来轨迹。采用了一种简单的模型合奏策略,并采用了非最大抑制作用,以进一步提高最终性能。我们的方法在2022 Waymo打开数据集挑战的运动预测排行榜上取得了第一名,优于其他利润率的其他方法。代码将在https://github.com/sshaoshuai/mtr上找到。
translated by 谷歌翻译
Accurately predicting interactive road agents' future trajectories and planning a socially compliant and human-like trajectory accordingly are important for autonomous vehicles. In this paper, we propose a planning-centric prediction neural network, which takes surrounding agents' historical states and map context information as input, and outputs the joint multi-modal prediction trajectories for surrounding agents, as well as a sequence of control commands for the ego vehicle by imitation learning. An agent-agent interaction module along the time axis is proposed in our network architecture to better comprehend the relationship among all the other intelligent agents on the road. To incorporate the map's topological information, a Dynamic Graph Convolutional Neural Network (DGCNN) is employed to process the road network topology. Besides, the whole architecture can serve as a backbone for the Differentiable Integrated motion Prediction with Planning (DIPP) method by providing accurate prediction results and initial planning commands. Experiments are conducted on real-world datasets to demonstrate the improvements made by our proposed method in both planning and prediction accuracy compared to the previous state-of-the-art methods.
translated by 谷歌翻译
预测交通参与者的多模式未来行为对于机器人车辆做出安全决策至关重要。现有作品探索以直接根据潜在特征预测未来的轨迹,或利用密集的目标候选者来识别代理商的目的地,在这种情况下,由于所有运动模式均来自相同的功能,而后者的策略具有效率问题,因此前者策略的收敛缓慢,因为其性能高度依赖关于候选目标的密度。在本文中,我们提出了运动变压器(MTR)框架,该框架将运动预测模拟为全球意图定位和局部运动改进的联合优化。 MTR不使用目标候选者,而是通过采用一系列可学习的运动查询对来结合空间意图。每个运动查询对负责特定运动模式的轨迹预测和完善,这可以稳定训练过程并促进更好的多模式预测。实验表明,MTR在边际和联合运动预测挑战上都达到了最新的性能,在Waymo Open Motion DataSet排行榜上排名第一。代码将在https://github.com/sshaoshuai/mtr上找到。
translated by 谷歌翻译
行为预测在集成自主驾驶软件解决方案中起着重要作用。在行为预测研究中,与单一代理行为预测相比,交互行为预测是一个较小的领域。预测互动剂的运动需要启动新的机制来捕获交互式对的关节行为。在这项工作中,我们将端到端的关节预测问题作为边际学习和车辆行为联合学习的顺序学习过程。我们提出了ProspectNet,这是一个采用加权注意分数的联合学习块,以模拟交互式剂对之间的相互影响。联合学习块首先权衡多模式预测的候选轨迹,然后通过交叉注意更新自我代理的嵌入。此外,我们将每个交互式代理的个人未来预测播放到一个智慧评分模块中,以选择顶部的$ K $预测对。我们表明,ProspectNet优于两个边际预测的笛卡尔产品,并在Waymo交互式运动预测基准上实现了可比的性能。
translated by 谷歌翻译
Modern autonomous driving system is characterized as modular tasks in sequential order, i.e., perception, prediction and planning. As sensors and hardware get improved, there is trending popularity to devise a system that can perform a wide diversity of tasks to fulfill higher-level intelligence. Contemporary approaches resort to either deploying standalone models for individual tasks, or designing a multi-task paradigm with separate heads. These might suffer from accumulative error or negative transfer effect. Instead, we argue that a favorable algorithm framework should be devised and optimized in pursuit of the ultimate goal, i.e. planning of the self-driving-car. Oriented at this goal, we revisit the key components within perception and prediction. We analyze each module and prioritize the tasks hierarchically, such that all these tasks contribute to planning (the goal). To this end, we introduce Unified Autonomous Driving (UniAD), the first comprehensive framework up-to-date that incorporates full-stack driving tasks in one network. It is exquisitely devised to leverage advantages of each module, and provide complementary feature abstractions for agent interaction from a global perspective. Tasks are communicated with unified query design to facilitate each other toward planning. We instantiate UniAD on the challenging nuScenes benchmark. With extensive ablations, the effectiveness of using such a philosophy is proven to surpass previous state-of-the-arts by a large margin in all aspects. The full suite of codebase and models would be available to facilitate future research in the community.
translated by 谷歌翻译
预测周围动态剂的未来轨迹是自动驾驶中的必要要求。这些轨迹主要取决于周围的静态环境以及这些动态剂的过去运动。此外,代理意图的多模式性质使轨迹预测问题更具挑战性。所有现有模型都同样考虑目标剂以及周围的剂,而无需考虑物理特性的变化。在本文中,我们为自动驾驶中的多模式轨迹预测提供了一个新颖的基于深度学习的框架,该框架考虑了目标及周围车辆的物理特性,例如对象类及其物理尺寸通过加权注意模块,从而改善预测的准确性。我们的模型在Nuscenes轨迹预测基准测试中取得了最高的结果,这些模型是使用栅格图来输入环境信息的模型。此外,我们的模型能够实时运行,达到300 fps的高推理率。
translated by 谷歌翻译
We propose JFP, a Joint Future Prediction model that can learn to generate accurate and consistent multi-agent future trajectories. For this task, many different methods have been proposed to capture social interactions in the encoding part of the model, however, considerably less focus has been placed on representing interactions in the decoder and output stages. As a result, the predicted trajectories are not necessarily consistent with each other, and often result in unrealistic trajectory overlaps. In contrast, we propose an end-to-end trainable model that learns directly the interaction between pairs of agents in a structured, graphical model formulation in order to generate consistent future trajectories. It sets new state-of-the-art results on Waymo Open Motion Dataset (WOMD) for the interactive setting. We also investigate a more complex multi-agent setting for both WOMD and a larger internal dataset, where our approach improves significantly on the trajectory overlap metrics while obtaining on-par or better performance on single-agent trajectory metrics.
translated by 谷歌翻译
准确地预测占用和流量对于在复杂的交通情况下为自动驾驶汽车提供更好的安全性和互动至关重要。这项工作提出了Strajnet:一个多模式的SWIN变压框架,用于有效的场景占用和流动预测。我们采用Swin Transformer编码图像和相互作用感知运动表示形式,并提出一个交叉意识模块,以在不同的时间步长跨不同时间步骤将运动意识注入网格单元。然后通过颞膜金字塔解码器来解码流量和占用预测。所提出的方法在Waymo Open数据集基准中显示了竞争性预测准确性和其他评估指标。
translated by 谷歌翻译
Predicting the future motion of dynamic agents is of paramount importance to ensure safety or assess risks in motion planning for autonomous robots. In this paper, we propose a two-stage motion prediction method, referred to as R-Pred, that effectively utilizes both the scene and interaction context using a cascade of the initial trajectory proposal network and the trajectory refinement network. The initial trajectory proposal network produces M trajectory proposals corresponding to M modes of a future trajectory distribution. The trajectory refinement network enhances each of M proposals using 1) the tube-query scene attention (TQSA) and 2) the proposal-level interaction attention (PIA). TQSA uses tube-queries to aggregate the local scene context features pooled from proximity around the trajectory proposals of interest. PIA further enhances the trajectory proposals by modeling inter-agent interactions using a group of trajectory proposals selected based on their distances from neighboring agents. Our experiments conducted on the Argoverse and nuScenes datasets demonstrate that the proposed refinement network provides significant performance improvements compared to the single-stage baseline and that R-Pred achieves state-of-the-art performance in some categories of the benchmark.
translated by 谷歌翻译
Level 5 Autonomous Driving, a technology that a fully automated vehicle (AV) requires no human intervention, has raised serious concerns on safety and stability before widespread use. The capability of understanding and predicting future motion trajectory of road objects can help AV plan a path that is safe and easy to control. In this paper, we propose a network architecture that parallelizes multiple convolutional neural network backbones and fuses features to make multi-mode trajectory prediction. In the 2020 ICRA Nuscene Prediction challenge, our model ranks 15th on the leaderboard across all teams.
translated by 谷歌翻译
Behavior prediction in dynamic, multi-agent systems is an important problem in the context of self-driving cars, due to the complex representations and interactions of road components, including moving agents (e.g. pedestrians and vehicles) and road context information (e.g. lanes, traffic lights). This paper introduces VectorNet, a hierarchical graph neural network that first exploits the spatial locality of individual road components represented by vectors and then models the high-order interactions among all components. In contrast to most recent approaches, which render trajectories of moving agents and road context information as bird-eye images and encode them with convolutional neural networks (ConvNets), our approach operates on a vector representation. By operating on the vectorized high definition (HD) maps and agent trajectories, we avoid lossy rendering and computationally intensive ConvNet encoding steps. To further boost VectorNet's capability in learning context features, we propose a novel auxiliary task to recover the randomly masked out map entities and agent trajectories based on their context. We evaluate VectorNet on our in-house behavior prediction benchmark and the recently released Argoverse forecasting dataset. Our method achieves on par or better performance than the competitive rendering approach on both benchmarks while saving over 70% of the model parameters with an order of magnitude reduction in FLOPs. It also outperforms the state of the art on the Argoverse dataset.
translated by 谷歌翻译
预测公路参与者的未来运动对于自动驾驶至关重要,但由于令人震惊的运动不确定性,因此极具挑战性。最近,大多数运动预测方法求助于基于目标的策略,即预测运动轨迹的终点,作为回归整个轨迹的条件,以便可以减少解决方案的搜索空间。但是,准确的目标坐标很难预测和评估。此外,目的地的点表示限制了丰富的道路环境的利用,从而导致预测不准确。目标区域,即可能的目的地区域,而不是目标坐标,可以通过涉及更多的容忍度和指导来提供更软的限制,以搜索潜在的轨迹。考虑到这一点,我们提出了一个新的基于目标区域的框架,名为“目标区域网络”(GANET)进行运动预测,该框架对目标区域进行了建模,而不是确切的目标坐标作为轨迹预测的先决条件,更加可靠,更准确地执行。具体而言,我们建议一个goicrop(目标的目标区域)操作员有效地提取目标区域中的语义巷特征,并在目标区域和模型演员的未来互动中提取语义巷,这对未来的轨迹估计很大。 Ganet在所有公共文献(直到论文提交)中排名第一个,将其源代码排在第一位。
translated by 谷歌翻译
变压器一直是自然语言处理(NLP)和计算机视觉(CV)革命的核心。 NLP和CV的显着成功启发了探索变压器在点云处理中的使用。但是,变压器如何应对点云的不规则性和无序性质?变压器对于不同的3D表示(例如,基于点或体素)的合适性如何?各种3D处理任务的变压器有多大的能力?截至目前,仍然没有对这些问题的研究进行系统的调查。我们第一次为3D点云分析提供了越来越受欢迎的变压器的全面概述。我们首先介绍变压器体系结构的理论,并在2D/3D字段中审查其应用程序。然后,我们提出三种不同的分类法(即实现 - 数据表示和基于任务),它们可以从多个角度对当前的基于变压器的方法进行分类。此外,我们介绍了研究3D中自我注意机制的变异和改进的结果。为了证明变压器在点云分析中的优势,我们提供了基于各种变压器的分类,分割和对象检测方法的全面比较。最后,我们建议三个潜在的研究方向,为3D变压器的开发提供福利参考。
translated by 谷歌翻译
交通参与者的运动预测对于安全和强大的自动化驾驶系统至关重要,特别是在杂乱的城市环境中。然而,由于复杂的道路拓扑以及其他代理的不确定意图,这是强大的挑战。在本文中,我们介绍了一种基于图形的轨迹预测网络,其命名为双级预测器(DSP),其以分层方式编码静态和动态驾驶环境。与基于光栅状地图或稀疏车道图的方法不同,我们将驾驶环境视为具有两层的图形,专注于几何和拓扑功能。图形神经网络(GNNS)应用于提取具有不同粒度级别的特征,随后通过基于关注的层间网络聚合,实现更好的本地全局特征融合。在最近的目标驱动的轨迹预测管道之后,提取了目标代理的高可能性的目标候选者,并在这些目标上产生预测的轨迹。由于提出的双尺度上下文融合网络,我们的DSP能够产生准确和人类的多模态轨迹。我们评估了大规模协会运动预测基准测试的提出方法,实现了有希望的结果,优于最近的最先进的方法。
translated by 谷歌翻译
Motion prediction systems aim to capture the future behavior of traffic scenarios enabling autonomous vehicles to perform safe and efficient planning. The evolution of these scenarios is highly uncertain and depends on the interactions of agents with static and dynamic objects in the scene. GNN-based approaches have recently gained attention as they are well suited to naturally model these interactions. However, one of the main challenges that remains unexplored is how to address the complexity and opacity of these models in order to deal with the transparency requirements for autonomous driving systems, which includes aspects such as interpretability and explainability. In this work, we aim to improve the explainability of motion prediction systems by using different approaches. First, we propose a new Explainable Heterogeneous Graph-based Policy (XHGP) model based on an heterograph representation of the traffic scene and lane-graph traversals, which learns interaction behaviors using object-level and type-level attention. This learned attention provides information about the most important agents and interactions in the scene. Second, we explore this same idea with the explanations provided by GNNExplainer. Third, we apply counterfactual reasoning to provide explanations of selected individual scenarios by exploring the sensitivity of the trained model to changes made to the input data, i.e., masking some elements of the scene, modifying trajectories, and adding or removing dynamic agents. The explainability analysis provided in this paper is a first step towards more transparent and reliable motion prediction systems, important from the perspective of the user, developers and regulatory agencies. The code to reproduce this work is publicly available at https://github.com/sancarlim/Explainable-MP/tree/v1.1.
translated by 谷歌翻译
近年来,行为预测模型已经激增,尤其是在自动驾驶的流行现实机器人技术应用中,代表移动代理可能未来的分布对于安全舒适的运动计划至关重要。在这些模型中,选择代表输入和输出的坐标框架的选择具有至关重要的交易折扣,这些折扣通常属于两个类别之一。以代理为中心的模型转换输入并在以代理为中心的坐标中执行推断。这些模型在场景元素之间的翻译和旋转上本质上不变,在公共排行榜上表现最好,但与代理和场景元素的数量相互缩小。以场景为中心的模型使用固定的坐标系来处理所有代理。这为他们提供了在所有代理之间共享表示形式的优势,并提供有效的摊销推理计算,该计算与代理数量线性缩放。但是,这些模型必须学习场景元素之间的翻译和旋转的不变性,并且通常以表现为中心的模型。在这项工作中,我们在概率运动预测模型之间开发知识蒸馏技术,并应用这些技术来缩小以代理为中心和以场景为中心的模型之间的性能差距。这将以场景为中心的模型性能提高了13.2%,在公共Argoverse基准中,Waymo Open Datatet的7.8%,在大型内部数据集中最多可达9.4%。这些以场景为中心的改进的模型在公共排行榜中排名很高,在繁忙场景中以代理商为中心的教师的效率高15倍。
translated by 谷歌翻译