预测交通参与者的多模式未来行为对于机器人车辆做出安全决策至关重要。现有作品探索以直接根据潜在特征预测未来的轨迹,或利用密集的目标候选者来识别代理商的目的地,在这种情况下,由于所有运动模式均来自相同的功能,而后者的策略具有效率问题,因此前者策略的收敛缓慢,因为其性能高度依赖关于候选目标的密度。在本文中,我们提出了运动变压器(MTR)框架,该框架将运动预测模拟为全球意图定位和局部运动改进的联合优化。 MTR不使用目标候选者,而是通过采用一系列可学习的运动查询对来结合空间意图。每个运动查询对负责特定运动模式的轨迹预测和完善,这可以稳定训练过程并促进更好的多模式预测。实验表明,MTR在边际和联合运动预测挑战上都达到了最新的性能,在Waymo Open Motion DataSet排行榜上排名第一。代码将在https://github.com/sshaoshuai/mtr上找到。
translated by 谷歌翻译
在本报告中,我们介绍了2022 Waymo Open DataSet挑战中运动预测轨迹的第一名解决方案。我们为多模式运动预测提出了一个新型的运动变压器框架,该框架引入了一组新型运动查询对,用于通过共同执行意图定位和迭代运动改进来产生更好的多模式未来轨迹。采用了一种简单的模型合奏策略,并采用了非最大抑制作用,以进一步提高最终性能。我们的方法在2022 Waymo打开数据集挑战的运动预测排行榜上取得了第一名,优于其他利润率的其他方法。代码将在https://github.com/sshaoshuai/mtr上找到。
translated by 谷歌翻译
Predicting the future motion of dynamic agents is of paramount importance to ensure safety or assess risks in motion planning for autonomous robots. In this paper, we propose a two-stage motion prediction method, referred to as R-Pred, that effectively utilizes both the scene and interaction context using a cascade of the initial trajectory proposal network and the trajectory refinement network. The initial trajectory proposal network produces M trajectory proposals corresponding to M modes of a future trajectory distribution. The trajectory refinement network enhances each of M proposals using 1) the tube-query scene attention (TQSA) and 2) the proposal-level interaction attention (PIA). TQSA uses tube-queries to aggregate the local scene context features pooled from proximity around the trajectory proposals of interest. PIA further enhances the trajectory proposals by modeling inter-agent interactions using a group of trajectory proposals selected based on their distances from neighboring agents. Our experiments conducted on the Argoverse and nuScenes datasets demonstrate that the proposed refinement network provides significant performance improvements compared to the single-stage baseline and that R-Pred achieves state-of-the-art performance in some categories of the benchmark.
translated by 谷歌翻译
预测公路参与者的未来运动对于自动驾驶至关重要,但由于令人震惊的运动不确定性,因此极具挑战性。最近,大多数运动预测方法求助于基于目标的策略,即预测运动轨迹的终点,作为回归整个轨迹的条件,以便可以减少解决方案的搜索空间。但是,准确的目标坐标很难预测和评估。此外,目的地的点表示限制了丰富的道路环境的利用,从而导致预测不准确。目标区域,即可能的目的地区域,而不是目标坐标,可以通过涉及更多的容忍度和指导来提供更软的限制,以搜索潜在的轨迹。考虑到这一点,我们提出了一个新的基于目标区域的框架,名为“目标区域网络”(GANET)进行运动预测,该框架对目标区域进行了建模,而不是确切的目标坐标作为轨迹预测的先决条件,更加可靠,更准确地执行。具体而言,我们建议一个goicrop(目标的目标区域)操作员有效地提取目标区域中的语义巷特征,并在目标区域和模型演员的未来互动中提取语义巷,这对未来的轨迹估计很大。 Ganet在所有公共文献(直到论文提交)中排名第一个,将其源代码排在第一位。
translated by 谷歌翻译
Motion prediction is highly relevant to the perception of dynamic objects and static map elements in the scenarios of autonomous driving. In this work, we propose PIP, the first end-to-end Transformer-based framework which jointly and interactively performs online mapping, object detection and motion prediction. PIP leverages map queries, agent queries and mode queries to encode the instance-wise information of map elements, agents and motion intentions, respectively. Based on the unified query representation, a differentiable multi-task interaction scheme is proposed to exploit the correlation between perception and prediction. Even without human-annotated HD map or agent's historical tracking trajectory as guidance information, PIP realizes end-to-end multi-agent motion prediction and achieves better performance than tracking-based and HD-map-based methods. PIP provides comprehensive high-level information of the driving scene (vectorized static map and dynamic objects with motion information), and contributes to the downstream planning and control. Code and models will be released for facilitating further research.
translated by 谷歌翻译
Modern autonomous driving system is characterized as modular tasks in sequential order, i.e., perception, prediction and planning. As sensors and hardware get improved, there is trending popularity to devise a system that can perform a wide diversity of tasks to fulfill higher-level intelligence. Contemporary approaches resort to either deploying standalone models for individual tasks, or designing a multi-task paradigm with separate heads. These might suffer from accumulative error or negative transfer effect. Instead, we argue that a favorable algorithm framework should be devised and optimized in pursuit of the ultimate goal, i.e. planning of the self-driving-car. Oriented at this goal, we revisit the key components within perception and prediction. We analyze each module and prioritize the tasks hierarchically, such that all these tasks contribute to planning (the goal). To this end, we introduce Unified Autonomous Driving (UniAD), the first comprehensive framework up-to-date that incorporates full-stack driving tasks in one network. It is exquisitely devised to leverage advantages of each module, and provide complementary feature abstractions for agent interaction from a global perspective. Tasks are communicated with unified query design to facilitate each other toward planning. We instantiate UniAD on the challenging nuScenes benchmark. With extensive ablations, the effectiveness of using such a philosophy is proven to surpass previous state-of-the-arts by a large margin in all aspects. The full suite of codebase and models would be available to facilitate future research in the community.
translated by 谷歌翻译
自动驾驶的运动预测是一项艰巨的任务,因为复杂的驾驶场景导致静态和动态输入的异质组合。这是一个开放的问题,如何最好地表示和融合有关道路几何,车道连接,时变的交通信号状态以及动态代理的历史及其相互作用的历史。为了模拟这一不同的输入功能集,许多提出的方法旨在设计具有多种模态模块的同样复杂系统。这导致难以按严格的方式进行扩展,扩展或调整的系统以进行质量和效率。在本文中,我们介绍了Wayformer,这是一个基于注意力的运动架构,用于运动预测,简单而均匀。 Wayformer提供了一个紧凑的模型描述,该描述由基于注意力的场景编码器和解码器组成。在场景编码器中,我们研究了输入方式的早期,晚和等级融合的选择。对于每种融合类型,我们通过分解的注意力或潜在的查询关注来探索策略来折衷效率和质量。我们表明,尽管早期融合的结构简单,但不仅是情感不可知论,而且还取得了最先进的结果。
translated by 谷歌翻译
预测道路用户的未来行为是自主驾驶中最具挑战性和最重要的问题之一。应用深度学习对此问题需要以丰富的感知信号和地图信息的形式融合异构世界状态,并在可能的期货上推断出高度多模态分布。在本文中,我们呈现MultiPath ++,这是一个未来的预测模型,实现了在流行的基准上实现最先进的性能。 MultiPath ++通过重新访问许多设计选择来改善多径架构。第一关键设计差异是偏离基于图像的基于输入世界状态的偏离,有利于异构场景元素的稀疏编码:多径++消耗紧凑且有效的折线,直接描述道路特征和原始代理状态信息(例如,位置,速度,加速)。我们提出了一种背景感知这些元素的融合,并开发可重用的多上下文选通融合组件。其次,我们重新考虑了预定义,静态锚点的选择,并开发了一种学习模型端到端的潜在锚嵌入的方法。最后,我们在其他ML域中探索合奏和输出聚合技术 - 常见的常见域 - 并为我们的概率多模式输出表示找到有效的变体。我们对这些设计选择进行了广泛的消融,并表明我们所提出的模型在协会运动预测竞争和Waymo开放数据集运动预测挑战上实现了最先进的性能。
translated by 谷歌翻译
行为预测在集成自主驾驶软件解决方案中起着重要作用。在行为预测研究中,与单一代理行为预测相比,交互行为预测是一个较小的领域。预测互动剂的运动需要启动新的机制来捕获交互式对的关节行为。在这项工作中,我们将端到端的关节预测问题作为边际学习和车辆行为联合学习的顺序学习过程。我们提出了ProspectNet,这是一个采用加权注意分数的联合学习块,以模拟交互式剂对之间的相互影响。联合学习块首先权衡多模式预测的候选轨迹,然后通过交叉注意更新自我代理的嵌入。此外,我们将每个交互式代理的个人未来预测播放到一个智慧评分模块中,以选择顶部的$ K $预测对。我们表明,ProspectNet优于两个边际预测的笛卡尔产品,并在Waymo交互式运动预测基准上实现了可比的性能。
translated by 谷歌翻译
由于人类行为的瞬极性,预测道路代理的未来轨迹是对自动驾驶的挑战。最近,证明基于目标的多轨道预测方法是有效的,在那里他们首先将过度采样的目标候选者进行得分,然后从它们中选择最终集合。然而,这些方法通常涉及基于稀疏预定锚和启发式目标选择算法的目标预测。在这项工作中,我们提出了一种名为Densetnt的无锚和端到端轨迹预测模型,它直接从密集的目标候选者输出一组轨迹。此外,我们介绍了基于离线优化的技术,为我们的最终在线模型提供多重伪标签。实验表明,Densetnt实现了最先进的性能,在协会运动预测基准中排名第一,并成为2021 Waymo开放数据集运动预测挑战的第一名获胜者。
translated by 谷歌翻译
预测场景中代理的未来位置是自动驾驶中的一个重要问题。近年来,在代表现场及其代理商方面取得了重大进展。代理与场景和彼此之间的相互作用通常由图神经网络建模。但是,图形结构主要是静态的,无法表示高度动态场景中的时间变化。在这项工作中,我们提出了一个时间图表示,以更好地捕获流量场景中的动态。我们用两种类型的内存模块补充表示形式。一个专注于感兴趣的代理,另一个专注于整个场景。这使我们能够学习暂时意识的表示,即使对多个未来进行简单回归,也可以取得良好的结果。当与目标条件预测结合使用时,我们会显示出更好的结果,可以在Argoverse基准中达到最先进的性能。
translated by 谷歌翻译
交通参与者的运动预测对于安全和强大的自动化驾驶系统至关重要,特别是在杂乱的城市环境中。然而,由于复杂的道路拓扑以及其他代理的不确定意图,这是强大的挑战。在本文中,我们介绍了一种基于图形的轨迹预测网络,其命名为双级预测器(DSP),其以分层方式编码静态和动态驾驶环境。与基于光栅状地图或稀疏车道图的方法不同,我们将驾驶环境视为具有两层的图形,专注于几何和拓扑功能。图形神经网络(GNNS)应用于提取具有不同粒度级别的特征,随后通过基于关注的层间网络聚合,实现更好的本地全局特征融合。在最近的目标驱动的轨迹预测管道之后,提取了目标代理的高可能性的目标候选者,并在这些目标上产生预测的轨迹。由于提出的双尺度上下文融合网络,我们的DSP能够产生准确和人类的多模态轨迹。我们评估了大规模协会运动预测基准测试的提出方法,实现了有希望的结果,优于最近的最先进的方法。
translated by 谷歌翻译
Making safe and human-like decisions is an essential capability of autonomous driving systems and learning-based behavior planning is a promising pathway toward this objective. Distinguished from existing learning-based methods that directly output decisions, this work introduces a predictive behavior planning framework that learns to predict and evaluate from human driving data. Concretely, a behavior generation module first produces a diverse set of candidate behaviors in the form of trajectory proposals. Then the proposed conditional motion prediction network is employed to forecast other agents' future trajectories conditioned on each trajectory proposal. Given the candidate plans and associated prediction results, we learn a scoring module to evaluate the plans using maximum entropy inverse reinforcement learning (IRL). We conduct comprehensive experiments to validate the proposed framework on a large-scale real-world urban driving dataset. The results reveal that the conditional prediction model is able to forecast multiple possible future trajectories given a candidate behavior and the prediction results are reactive to different plans. Moreover, the IRL-based scoring module can properly evaluate the trajectory proposals and select close-to-human ones. The proposed framework outperforms other baseline methods in terms of similarity to human driving trajectories. Moreover, we find that the conditional prediction model can improve both prediction and planning performance compared to the non-conditional model, and learning the scoring module is critical to correctly evaluating the candidate plans to align with human drivers.
translated by 谷歌翻译
在智能系统(例如自动驾驶和机器人导航)中,轨迹预测一直是一个长期存在的问题。最近在大规模基准测试的最新模型一直在迅速推动性能的极限,主要集中于提高预测准确性。但是,这些模型对效率的强调较少,这对于实时应用至关重要。本文提出了一个名为Gatraj的基于注意力的图形模型,其预测速度要高得多。代理的时空动力学,例如行人或车辆,是通过注意机制建模的。代理之间的相互作用是通过图卷积网络建模的。我们还实施了拉普拉斯混合物解码器,以减轻模式崩溃,并为每个代理生成多种模式预测。我们的模型以在多个开放数据集上测试的更高预测速度与最先进的模型相同的性能。
translated by 谷歌翻译
自我监督学习(SSL)是一种新兴技术,已成功地用于培训卷积神经网络(CNNS)和图形神经网络(GNNS),以进行更可转移,可转换,可推广和稳健的代表性学习。然而,很少探索其对自动驾驶的运动预测。在这项研究中,我们报告了将自学纳入运动预测的首次系统探索和评估。我们首先建议研究四项新型的自我监督学习任务,以通过理论原理以及对挑战性的大规模argoverse数据集进行运动预测以及定量和定性比较。其次,我们指出,基于辅助SSL的学习设置不仅胜过预测方法,这些方法在性能准确性方面使用变压器,复杂的融合机制和复杂的在线密集目标候选优化算法,而且具有较低的推理时间和建筑复杂性。最后,我们进行了几项实验,以了解为什么SSL改善运动预测。代码在\ url {https://github.com/autovision-cloud/ssl-lanes}上开源。
translated by 谷歌翻译
本文提出了一个新型的深度学习框架,用于多模式运动预测。该框架由三个部分组成:经常性神经网络,以处理目标代理的运动过程,卷积神经网络处理栅格化环境表示以及一种基于距离的注意机制,以处理不同代理之间的相互作用。我们在大规模的真实驾驶数据集,Waymo Open Motion数据集上验证了所提出的框架,并将其性能与标准测试基准上的其他方法进行比较。定性结果表明,我们的模型给出的预测轨迹是准确,多样的,并且根据道路结构。标准基准测试的定量结果表明,我们的模型在预测准确性和其他评估指标方面优于其他基线方法。拟议的框架是2021 Waymo Open DataSet运动预测挑战的第二名。
translated by 谷歌翻译
Predicting the future motion of road agents is a critical task in an autonomous driving pipeline. In this work, we address the problem of generating a set of scene-level, or joint, future trajectory predictions in multi-agent driving scenarios. To this end, we propose FJMP, a Factorized Joint Motion Prediction framework for multi-agent interactive driving scenarios. FJMP models the future scene interaction dynamics as a sparse directed interaction graph, where edges denote explicit interactions between agents. We then prune the graph into a directed acyclic graph (DAG) and decompose the joint prediction task into a sequence of marginal and conditional predictions according to the partial ordering of the DAG, where joint future trajectories are decoded using a directed acyclic graph neural network (DAGNN). We conduct experiments on the INTERACTION and Argoverse 2 datasets and demonstrate that FJMP produces more accurate and scene-consistent joint trajectory predictions than non-factorized approaches, especially on the most interactive and kinematically interesting agents. FJMP ranks 1st on the multi-agent test leaderboard of the INTERACTION dataset.
translated by 谷歌翻译
We propose JFP, a Joint Future Prediction model that can learn to generate accurate and consistent multi-agent future trajectories. For this task, many different methods have been proposed to capture social interactions in the encoding part of the model, however, considerably less focus has been placed on representing interactions in the decoder and output stages. As a result, the predicted trajectories are not necessarily consistent with each other, and often result in unrealistic trajectory overlaps. In contrast, we propose an end-to-end trainable model that learns directly the interaction between pairs of agents in a structured, graphical model formulation in order to generate consistent future trajectories. It sets new state-of-the-art results on Waymo Open Motion Dataset (WOMD) for the interactive setting. We also investigate a more complex multi-agent setting for both WOMD and a larger internal dataset, where our approach improves significantly on the trajectory overlap metrics while obtaining on-par or better performance on single-agent trajectory metrics.
translated by 谷歌翻译
相应地预测周围交通参与者的未来状态,并计划安全,平稳且符合社会的轨迹对于自动驾驶汽车至关重要。当前的自主驾驶系统有两个主要问题:预测模块通常与计划模块解耦,并且计划的成本功能很难指定和调整。为了解决这些问题,我们提出了一个端到端的可区分框架,该框架集成了预测和计划模块,并能够从数据中学习成本函数。具体而言,我们采用可区分的非线性优化器作为运动计划者,该运动计划将神经网络给出的周围剂的预测轨迹作为输入,并优化了自动驾驶汽车的轨迹,从而使框架中的所有操作都可以在框架中具有可观的成本,包括成本功能权重。提出的框架经过大规模的现实驾驶数据集进行了训练,以模仿整个驾驶场景中的人类驾驶轨迹,并在开环和闭环界面中进行了验证。开环测试结果表明,所提出的方法的表现优于各种指标的基线方法,并提供以计划为中心的预测结果,从而使计划模块能够输出接近人类的轨迹。在闭环测试中,提出的方法表明能够处理复杂的城市驾驶场景和鲁棒性,以抵抗模仿学习方法所遭受的分配转移。重要的是,我们发现计划和预测模块的联合培训比在开环和闭环测试中使用单独的训练有素的预测模块进行计划要比计划更好。此外,消融研究表明,框架中的可学习组件对于确保计划稳定性和性能至关重要。
translated by 谷歌翻译
The task of motion forecasting is critical for self-driving vehicles (SDVs) to be able to plan a safe maneuver. Towards this goal, modern approaches reason about the map, the agents' past trajectories and their interactions in order to produce accurate forecasts. The predominant approach has been to encode the map and other agents in the reference frame of each target agent. However, this approach is computationally expensive for multi-agent prediction as inference needs to be run for each agent. To tackle the scaling challenge, the solution thus far has been to encode all agents and the map in a shared coordinate frame (e.g., the SDV frame). However, this is sample inefficient and vulnerable to domain shift (e.g., when the SDV visits uncommon states). In contrast, in this paper, we propose an efficient shared encoding for all agents and the map without sacrificing accuracy or generalization. Towards this goal, we leverage pair-wise relative positional encodings to represent geometric relationships between the agents and the map elements in a heterogeneous spatial graph. This parameterization allows us to be invariant to scene viewpoint, and save online computation by re-using map embeddings computed offline. Our decoder is also viewpoint agnostic, predicting agent goals on the lane graph to enable diverse and context-aware multimodal prediction. We demonstrate the effectiveness of our approach on the urban Argoverse 2 benchmark as well as a novel highway dataset.
translated by 谷歌翻译