智能论文笔记

DenseTNT: End-to-end Trajectory Prediction from Dense Goal Sets

Junru Gu , Chen Sun , Hang Zhao

分类：计算机视觉 | 机器人

2021-08-22

由于人类行为的瞬极性，预测道路代理的未来轨迹是对自动驾驶的挑战。最近，证明基于目标的多轨道预测方法是有效的，在那里他们首先将过度采样的目标候选者进行得分，然后从它们中选择最终集合。然而，这些方法通常涉及基于稀疏预定锚和启发式目标选择算法的目标预测。在这项工作中，我们提出了一种名为Densetnt的无锚和端到端轨迹预测模型，它直接从密集的目标候选者输出一组轨迹。此外，我们介绍了基于离线优化的技术，为我们的最终在线模型提供多重伪标签。实验表明，Densetnt实现了最先进的性能，在协会运动预测基准中排名第一，并成为2021 Waymo开放数据集运动预测挑战的第一名获胜者。

translated by 谷歌翻译

ViP3D: End-to-end Visual Trajectory Prediction via 3D Agent Queries

Junru Gu , Chenxu Hu , Tianyuan Zhang , Xuanyao Chen , Yilun Wang , Yue Wang , Hang Zhao

分类：计算机视觉 | 机器人

2022-08-02

现有的自动驾驶管道将感知模块与预测模块分开。这两个模块通过手工挑选的功能（例如代理框和轨迹）作为接口进行通信。由于这种分离，预测模块仅从感知模块接收部分信息。更糟糕的是，感知模块的错误会传播和积累，从而对预测结果产生不利影响。在这项工作中，我们提出了VIP3D，这是一种视觉轨迹预测管道，利用原始视频的丰富信息来预测场景中代理的未来轨迹。VIP3D在整个管道中采用稀疏的代理查询，使其完全可区分和可解释。此外，我们为这项新型的端到端视觉轨迹预测任务提出了评估度量。Nuscenes数据集的广泛实验结果表明，VIP3D在传统管道和以前的端到端模型上的强劲性能。

translated by 谷歌翻译

GANet: Goal Area Network for Motion Forecasting

Mingkun Wang , Xinge Zhu , Changqian Yu , Wei Li , Yuexin Ma , Ruochun Jin , Xiaoguang Ren , Dongchun Ren , Mingxu Wang , Wenjing Yang

分类：计算机视觉

2022-09-20

预测公路参与者的未来运动对于自动驾驶至关重要，但由于令人震惊的运动不确定性，因此极具挑战性。最近，大多数运动预测方法求助于基于目标的策略，即预测运动轨迹的终点，作为回归整个轨迹的条件，以便可以减少解决方案的搜索空间。但是，准确的目标坐标很难预测和评估。此外，目的地的点表示限制了丰富的道路环境的利用，从而导致预测不准确。目标区域，即可能的目的地区域，而不是目标坐标，可以通过涉及更多的容忍度和指导来提供更软的限制，以搜索潜在的轨迹。考虑到这一点，我们提出了一个新的基于目标区域的框架，名为“目标区域网络”（GANET）进行运动预测，该框架对目标区域进行了建模，而不是确切的目标坐标作为轨迹预测的先决条件，更加可靠，更准确地执行。具体而言，我们建议一个goicrop（目标的目标区域）操作员有效地提取目标区域中的语义巷特征，并在目标区域和模型演员的未来互动中提取语义巷，这对未来的轨迹估计很大。 Ganet在所有公共文献（直到论文提交）中排名第一个，将其源代码排在第一位。

translated by 谷歌翻译

Trajectory Prediction with Graph-based Dual-scale Context Fusion

Lu Zhang , Peiliang Li , Jing Chen , Shaojie Shen

分类：机器人 | 计算机视觉

2021-11-02

交通参与者的运动预测对于安全和强大的自动化驾驶系统至关重要，特别是在杂乱的城市环境中。然而，由于复杂的道路拓扑以及其他代理的不确定意图，这是强大的挑战。在本文中，我们介绍了一种基于图形的轨迹预测网络，其命名为双级预测器（DSP），其以分层方式编码静态和动态驾驶环境。与基于光栅状地图或稀疏车道图的方法不同，我们将驾驶环境视为具有两层的图形，专注于几何和拓扑功能。图形神经网络（GNNS）应用于提取具有不同粒度级别的特征，随后通过基于关注的层间网络聚合，实现更好的本地全局特征融合。在最近的目标驱动的轨迹预测管道之后，提取了目标代理的高可能性的目标候选者，并在这些目标上产生预测的轨迹。由于提出的双尺度上下文融合网络，我们的DSP能够产生准确和人类的多模态轨迹。我们评估了大规模协会运动预测基准测试的提出方法，实现了有希望的结果，优于最近的最先进的方法。

translated by 谷歌翻译

ProspectNet: Weighted Conditional Attention for Future Interaction Modeling in Behavior Prediction

Yutian Pang , Zehua Guo , Binnan Zhuang

分类：人工智能 | 机器人

2022-08-29

行为预测在集成自主驾驶软件解决方案中起着重要作用。在行为预测研究中，与单一代理行为预测相比，交互行为预测是一个较小的领域。预测互动剂的运动需要启动新的机制来捕获交互式对的关节行为。在这项工作中，我们将端到端的关节预测问题作为边际学习和车辆行为联合学习的顺序学习过程。我们提出了ProspectNet，这是一个采用加权注意分数的联合学习块，以模拟交互式剂对之间的相互影响。联合学习块首先权衡多模式预测的候选轨迹，然后通过交叉注意更新自我代理的嵌入。此外，我们将每个交互式代理的个人未来预测播放到一个智慧评分模块中，以选择顶部的$ K $预测对。我们表明，ProspectNet优于两个边际预测的笛卡尔产品，并在Waymo交互式运动预测基准上实现了可比的性能。

translated by 谷歌翻译

Trajectory Forecasting on Temporal Graphs

Görkay Aydemir , Adil Kaan Akan , Fatma Güney

分类：计算机视觉 | 机器人

2022-07-01

预测场景中代理的未来位置是自动驾驶中的一个重要问题。近年来，在代表现场及其代理商方面取得了重大进展。代理与场景和彼此之间的相互作用通常由图神经网络建模。但是，图形结构主要是静态的，无法表示高度动态场景中的时间变化。在这项工作中，我们提出了一个时间图表示，以更好地捕获流量场景中的动态。我们用两种类型的内存模块补充表示形式。一个专注于感兴趣的代理，另一个专注于整个场景。这使我们能够学习暂时意识的表示，即使对多个未来进行简单回归，也可以取得良好的结果。当与目标条件预测结合使用时，我们会显示出更好的结果，可以在Argoverse基准中达到最先进的性能。

translated by 谷歌翻译

Motion Transformer with Global Intention Localization and Local Movement Refinement

Shaoshuai Shi , Li Jiang , Dengxin Dai , Bernt Schiele

分类：计算机视觉

2022-09-27

预测交通参与者的多模式未来行为对于机器人车辆做出安全决策至关重要。现有作品探索以直接根据潜在特征预测未来的轨迹，或利用密集的目标候选者来识别代理商的目的地，在这种情况下，由于所有运动模式均来自相同的功能，而后者的策略具有效率问题，因此前者策略的收敛缓慢，因为其性能高度依赖关于候选目标的密度。在本文中，我们提出了运动变压器（MTR）框架，该框架将运动预测模拟为全球意图定位和局部运动改进的联合优化。 MTR不使用目标候选者，而是通过采用一系列可学习的运动查询对来结合空间意图。每个运动查询对负责特定运动模式的轨迹预测和完善，这可以稳定训练过程并促进更好的多模式预测。实验表明，MTR在边际和联合运动预测挑战上都达到了最新的性能，在Waymo Open Motion DataSet排行榜上排名第一。代码将在https://github.com/sshaoshuai/mtr上找到。

translated by 谷歌翻译

Jointly Learning Agent and Lane Information for Multimodal Trajectory Prediction

Jie Wang , Caili Guo , Minan Guo , Jiujiu Chen

分类：机器学习 | 计算机视觉 | 机器人

2021-11-26

预测附近代理商的合理的未来轨迹是自治车辆安全的核心挑战，主要取决于两个外部线索：动态邻居代理和静态场景上下文。最近的方法在分别表征两个线索方面取得了很大进展。然而，它们忽略了两个线索之间的相关性，并且大多数很难实现地图自适应预测。在本文中，我们使用Lane作为场景数据，并提出一个分阶段网络，即共同学习代理和车道信息，用于多模式轨迹预测（JAL-MTP）。 JAL-MTP使用社交到LANE（S2L）模块来共同代表静态道和相邻代理的动态运动作为实例级车道，一种用于利用实例级车道来预测的反复出的车道注意力（RLA）机制来预测Map-Adaptive Future Trajections和两个选择器，可识别典型和合理的轨迹。在公共协议数据集上进行的实验表明JAL-MTP在定量和定性中显着优于现有模型。

translated by 谷歌翻译

Domain Knowledge Driven Pseudo Labels for Interpretable Goal-Conditioned Interactive Trajectory Prediction

Lingfeng Sun , Chen Tang , Yaru Niu , Enna Sachdeva , Chiho Choi , Teruhisa Misu , Masayoshi Tomizuka , Wei Zhan

分类：机器人 | 人工智能

2022-03-28

在高度互动的场景中进行运动预测是自主驾驶中的一个挑战性问题。在这种情况下，我们需要准确预测相互作用的代理的共同行为，以确保自动驾驶汽车的安全有效导航。最近，由于其在性能方面的优势和捕获轨迹分布中多模态的能力，目标条件方法引起了人们的关注。在这项工作中，我们研究了目标条件框架的联合轨迹预测问题。特别是，我们引入了一个有条件的基于AutoEncoder（CVAE）模型，以将不同的相互作用模式明确地编码到潜在空间中。但是，我们发现香草模型遭受后塌陷，无法根据需要诱导信息的潜在空间。为了解决这些问题，我们提出了一种新颖的方法，以避免KL消失并诱导具有伪标签的可解释的互动潜在空间。提出的伪标签使我们能够以灵活的方式将域知识纳入有关相互作用的知识。我们使用说明性玩具示例激励提出的方法。此外，我们通过定量和定性评估验证Waymo Open Motion数据集上的框架。

translated by 谷歌翻译

Perceive, Interact, Predict: Learning Dynamic and Static Clues for End-to-End Motion Prediction

Bo Jiang , Shaoyu Chen , Xinggang Wang , Bencheng Liao , Tianheng Cheng , Jiajie Chen , Helong Zhou , Qian Zhang , Wenyu Liu , Chang Huang

分类：计算机视觉 | 机器人

2022-12-05

Motion prediction is highly relevant to the perception of dynamic objects and static map elements in the scenarios of autonomous driving. In this work, we propose PIP, the first end-to-end Transformer-based framework which jointly and interactively performs online mapping, object detection and motion prediction. PIP leverages map queries, agent queries and mode queries to encode the instance-wise information of map elements, agents and motion intentions, respectively. Based on the unified query representation, a differentiable multi-task interaction scheme is proposed to exploit the correlation between perception and prediction. Even without human-annotated HD map or agent's historical tracking trajectory as guidance information, PIP realizes end-to-end multi-agent motion prediction and achieves better performance than tracking-based and HD-map-based methods. PIP provides comprehensive high-level information of the driving scene (vectorized static map and dynamic objects with motion information), and contributes to the downstream planning and control. Code and models will be released for facilitating further research.

translated by 谷歌翻译

Improving Diversity of Multiple Trajectory Prediction based on Map-adaptive Lane Loss

Sanmin Kim , Hyeongseok Jeon , Junwon Choi , Dongsuk Kum

分类：计算机视觉

2022-06-17

自主驾驶的运动预测领域的先前艺术倾向于寻找接近地面真理轨迹的轨迹。但是，这种问题的表述和方法经常导致多样性和偏见轨迹预测的丧失。因此，它们不适合现实世界的自主驾驶，在这种驾驶中，多样化和依赖道路的多模式轨迹预测对安全至关重要。为此，本研究提出了一种新颖的损失函数\ textit {lane损失}，可确保地图自适应多样性并适应几何约束。对带有新型轨迹候选建议模块的两阶段轨迹预测架构，\ textit {轨迹预测注意（TPA）}经过训练，通过车道损失训练，鼓励多个轨迹分布多样，以涵盖可行的方式以图像意识的方式涵盖可行的操作。此外，考虑到现有的轨迹性能指标正在重点是基于地面真理未来轨迹评估准确性，因此还建议定量评估指标来评估预测的多个轨迹的多样性。在Argoverse数据集上进行的实验表明，所提出的方法显着提高了预测轨迹的多样性，而无需牺牲预测准确性。

translated by 谷歌翻译

SSL-Lanes: Self-Supervised Learning for Motion Forecasting in Autonomous Driving

Prarthana Bhattacharyya , Chengjie Huang , Krzysztof Czarnecki

分类：计算机视觉 | 人工智能 | 机器人

2022-06-28

自我监督学习（SSL）是一种新兴技术，已成功地用于培训卷积神经网络（CNNS）和图形神经网络（GNNS），以进行更可转移，可转换，可推广和稳健的代表性学习。然而，很少探索其对自动驾驶的运动预测。在这项研究中，我们报告了将自学纳入运动预测的首次系统探索和评估。我们首先建议研究四项新型的自我监督学习任务，以通过理论原理以及对挑战性的大规模argoverse数据集进行运动预测以及定量和定性比较。其次，我们指出，基于辅助SSL的学习设置不仅胜过预测方法，这些方法在性能准确性方面使用变压器，复杂的融合机制和复杂的在线密集目标候选优化算法，而且具有较低的推理时间和建筑复杂性。最后，我们进行了几项实验，以了解为什么SSL改善运动预测。代码在\ url {https://github.com/autovision-cloud/ssl-lanes}上开源。

translated by 谷歌翻译

Goal-oriented Autonomous Driving

Yihan Hu , Jiazhi Yang , Li Chen , Keyu Li , Chonghao Sima , Xizhou Zhu , Siqi Chai , Senyao Du , Tianwei Lin , Wenhai Wang

分类：计算机视觉 | 机器人

2022-12-20

Modern autonomous driving system is characterized as modular tasks in sequential order, i.e., perception, prediction and planning. As sensors and hardware get improved, there is trending popularity to devise a system that can perform a wide diversity of tasks to fulfill higher-level intelligence. Contemporary approaches resort to either deploying standalone models for individual tasks, or designing a multi-task paradigm with separate heads. These might suffer from accumulative error or negative transfer effect. Instead, we argue that a favorable algorithm framework should be devised and optimized in pursuit of the ultimate goal, i.e. planning of the self-driving-car. Oriented at this goal, we revisit the key components within perception and prediction. We analyze each module and prioritize the tasks hierarchically, such that all these tasks contribute to planning (the goal). To this end, we introduce Unified Autonomous Driving (UniAD), the first comprehensive framework up-to-date that incorporates full-stack driving tasks in one network. It is exquisitely devised to leverage advantages of each module, and provide complementary feature abstractions for agent interaction from a global perspective. Tasks are communicated with unified query design to facilitate each other toward planning. We instantiate UniAD on the challenging nuScenes benchmark. With extensive ablations, the effectiveness of using such a philosophy is proven to surpass previous state-of-the-arts by a large margin in all aspects. The full suite of codebase and models would be available to facilitate future research in the community.

translated by 谷歌翻译

MultiPath++: Efficient Information Fusion and Trajectory Aggregation for Behavior Prediction

Balakrishnan Varadarajan , Ahmed Hefny , Avikalp Srivastava , Khaled S. Refaat , Nigamaa Nayakanti , Andre Cornman , Kan Chen , Bertrand Douillard , Chi Pang Lam , Dragomir Anguelov

分类：计算机视觉 | 人工智能 | 机器学习 | 机器人

2021-11-29

预测道路用户的未来行为是自主驾驶中最具挑战性和最重要的问题之一。应用深度学习对此问题需要以丰富的感知信号和地图信息的形式融合异构世界状态，并在可能的期货上推断出高度多模态分布。在本文中，我们呈现MultiPath ++，这是一个未来的预测模型，实现了在流行的基准上实现最先进的性能。 MultiPath ++通过重新访问许多设计选择来改善多径架构。第一关键设计差异是偏离基于图像的基于输入世界状态的偏离，有利于异构场景元素的稀疏编码：多径++消耗紧凑且有效的折线，直接描述道路特征和原始代理状态信息（例如，位置，速度，加速）。我们提出了一种背景感知这些元素的融合，并开发可重用的多上下文选通融合组件。其次，我们重新考虑了预定义，静态锚点的选择，并开发了一种学习模型端到端的潜在锚嵌入的方法。最后，我们在其他ML域中探索合奏和输出聚合技术 - 常见的常见域 - 并为我们的概率多模式输出表示找到有效的变体。我们对这些设计选择进行了广泛的消融，并表明我们所提出的模型在协会运动预测竞争和Waymo开放数据集运动预测挑战上实现了最先进的性能。

translated by 谷歌翻译

VectorNet: Encoding HD Maps and Agent Dynamics from Vectorized Representation

Jiyang Gao , Chen Sun , Hang Zhao , Yi Shen , Dragomir Anguelov , Congcong Li , Cordelia Schmid

分类：

2020-05-08

Behavior prediction in dynamic, multi-agent systems is an important problem in the context of self-driving cars, due to the complex representations and interactions of road components, including moving agents (e.g. pedestrians and vehicles) and road context information (e.g. lanes, traffic lights). This paper introduces VectorNet, a hierarchical graph neural network that first exploits the spatial locality of individual road components represented by vectors and then models the high-order interactions among all components. In contrast to most recent approaches, which render trajectories of moving agents and road context information as bird-eye images and encode them with convolutional neural networks (ConvNets), our approach operates on a vector representation. By operating on the vectorized high definition (HD) maps and agent trajectories, we avoid lossy rendering and computationally intensive ConvNet encoding steps. To further boost VectorNet's capability in learning context features, we propose a novel auxiliary task to recover the randomly masked out map entities and agent trajectories based on their context. We evaluate VectorNet on our in-house behavior prediction benchmark and the recently released Argoverse forecasting dataset. Our method achieves on par or better performance than the competitive rendering approach on both benchmarks while saving over 70% of the model parameters with an order of magnitude reduction in FLOPs. It also outperforms the state of the art on the Argoverse dataset.

translated by 谷歌翻译

Learning Lane Graph Representations for Motion Forecasting

Ming Liang , Bin Yang , Rui Hu , Yun Chen , Renjie Liao , Song Feng , Raquel Urtasun

分类：

2020-07-27

We propose a motion forecasting model that exploits a novel structured map representation as well as actor-map interactions. Instead of encoding vectorized maps as raster images, we construct a lane graph from raw map data to explicitly preserve the map structure. To capture the complex topology and long range dependencies of the lane graph, we propose LaneGCN which extends graph convolutions with multiple adjacency matrices and along-lane dilation. To capture the complex interactions between actors and maps, we exploit a fusion network consisting of four types of interactions, actor-to-lane, lane-to-lane, laneto-actor and actor-to-actor. Powered by LaneGCN and actor-map interactions, our model is able to predict accurate and realistic multi-modal trajectories. Our approach significantly outperforms the state-of-the-art on the large scale Argoverse motion forecasting benchmark.

translated by 谷歌翻译

ReCoAt: A Deep Learning-based Framework for Multi-Modal Motion Prediction in Autonomous Driving Application

Zhiyu Huang , Xiaoyu Mo , Chen Lv

分类：机器人

2022-07-02

本文提出了一个新型的深度学习框架，用于多模式运动预测。该框架由三个部分组成：经常性神经网络，以处理目标代理的运动过程，卷积神经网络处理栅格化环境表示以及一种基于距离的注意机制，以处理不同代理之间的相互作用。我们在大规模的真实驾驶数据集，Waymo Open Motion数据集上验证了所提出的框架，并将其性能与标准测试基准上的其他方法进行比较。定性结果表明，我们的模型给出的预测轨迹是准确，多样的，并且根据道路结构。标准基准测试的定量结果表明，我们的模型在预测准确性和其他评估指标方面优于其他基线方法。拟议的框架是2021 Waymo Open DataSet运动预测挑战的第二名。

translated by 谷歌翻译

BITS: Bi-level Imitation for Traffic Simulation

Danfei Xu , Yuxiao Chen , Boris Ivanovic , Marco Pavone

分类：机器人 | 机器学习

2022-08-26

仿真是对机器人系统（例如自动驾驶汽车）进行扩展验证和验证的关键。尽管高保真物理和传感器模拟取得了进步，但在模拟道路使用者的现实行为方面仍然存在一个危险的差距。这是因为，与模拟物理和图形不同，设计人类行为的第一个原理模型通常是不可行的。在这项工作中，我们采用了一种数据驱动的方法，并提出了一种可以学会从现实世界驱动日志中产生流量行为的方法。该方法通过将交通仿真问题分解为高级意图推理和低级驾驶行为模仿，通过利用驾驶行为的双层层次结构来实现高样本效率和行为多样性。该方法还结合了一个计划模块，以获得稳定的长马行为。我们从经验上验证了我们的方法，即交通模拟（位）的双层模仿，并具有来自两个大规模驾驶数据集的场景，并表明位表明，在现实主义，多样性和长途稳定性方面可以达到平衡的交通模拟性能。我们还探索了评估行为现实主义的方法，并引入了一套评估指标以进行交通模拟。最后，作为我们的核心贡献的一部分，我们开发和开源一个软件工具，该工具将跨不同驱动数据集的数据格式统一，并将现有数据集将场景转换为交互式仿真环境。有关其他信息和视频，请参见https://sites.google.com/view/nvr-bits2022/home

translated by 谷歌翻译

HTML版本

Conditional Predictive Behavior Planning with Inverse Reinforcement Learning for Human-like Autonomous Driving

Zhiyu Huang , Haochen Liu , Jingda Wu , Chen Lv

分类：机器人

2022-12-17

Making safe and human-like decisions is an essential capability of autonomous driving systems and learning-based behavior planning is a promising pathway toward this objective. Distinguished from existing learning-based methods that directly output decisions, this work introduces a predictive behavior planning framework that learns to predict and evaluate from human driving data. Concretely, a behavior generation module first produces a diverse set of candidate behaviors in the form of trajectory proposals. Then the proposed conditional motion prediction network is employed to forecast other agents' future trajectories conditioned on each trajectory proposal. Given the candidate plans and associated prediction results, we learn a scoring module to evaluate the plans using maximum entropy inverse reinforcement learning (IRL). We conduct comprehensive experiments to validate the proposed framework on a large-scale real-world urban driving dataset. The results reveal that the conditional prediction model is able to forecast multiple possible future trajectories given a candidate behavior and the prediction results are reactive to different plans. Moreover, the IRL-based scoring module can properly evaluate the trajectory proposals and select close-to-human ones. The proposed framework outperforms other baseline methods in terms of similarity to human driving trajectories. Moreover, we find that the conditional prediction model can improve both prediction and planning performance compared to the non-conditional model, and learning the scoring module is critical to correctly evaluating the candidate plans to align with human drivers.

translated by 谷歌翻译

Exploring Map-based Features for Efficient Attention-based Vehicle Motion Prediction

Carlos Gómez-Huélamo , Marcos V. Conde , Miguel Ortiz

分类：机器人 | 计算机视觉

2022-05-25

从社交机器人到自动驾驶汽车，多种代理的运动预测（MP）是任意复杂环境中的至关重要任务。当前方法使用端到端网络解决了此问题，其中输入数据通常是场景的最高视图和所有代理的过去轨迹；利用此信息是获得最佳性能的必不可少的。从这个意义上讲，可靠的自动驾驶（AD）系统必须按时产生合理的预测，但是，尽管其中许多方法使用了简单的Convnets和LSTM，但在使用两个信息源时，模型对于实时应用程序可能不够有效（地图和轨迹历史）。此外，这些模型的性能在很大程度上取决于训练数据的数量，这可能很昂贵（尤其是带注释的HD地图）。在这项工作中，我们探讨了如何使用有效的基于注意力的模型在Argoverse 1.0基准上实现竞争性能，该模型将其作为最小地图信息的过去轨迹和基于地图的功能的输入，以确保有效且可靠的MP。这些功能代表可解释的信息作为可驱动区域和合理的目标点，与基于黑框CNN的地图处理方法相反。

translated by 谷歌翻译