Autonomous vehicles must often contend with conflicting planning requirements, e.g., safety and comfort could be at odds with each other if avoiding a collision calls for slamming the brakes. To resolve such conflicts, assigning importance ranking to rules (i.e., imposing a rule hierarchy) has been proposed, which, in turn, induces rankings on trajectories based on the importance of the rules they satisfy. On one hand, imposing rule hierarchies can enhance interpretability, but introduce combinatorial complexity to planning; while on the other hand, differentiable reward structures can be leveraged by modern gradient-based optimization tools, but are less interpretable and unintuitive to tune. In this paper, we present an approach to equivalently express rule hierarchies as differentiable reward structures amenable to modern gradient-based optimizers, thereby, achieving the best of both worlds. We achieve this by formulating rank-preserving reward functions that are monotonic in the rank of the trajectories induced by the rule hierarchy; i.e., higher ranked trajectories receive higher reward. Equipped with a rule hierarchy and its corresponding rank-preserving reward function, we develop a two-stage planner that can efficiently resolve conflicting planning requirements. We demonstrate that our approach can generate motion plans in ~7-10 Hz for various challenging road navigation and intersection negotiation scenarios.
translated by 谷歌翻译
在现代自治堆栈中,预测模块对于在其他移动代理的存在下计划动作至关重要。但是,预测模块的失败会误导下游规划师做出不安全的决定。确实,轨迹预测任务固有的高度不确定性可确保这种错误预测经常发生。由于需要提高自动驾驶汽车的安全而不受损害其性能的需求,我们开发了一个概率运行时监视器,该监视器检测到何时发生“有害”预测故障,即与任务相关的失败检测器。我们通过将轨迹预测错误传播到计划成本来推理其对AV的影响来实现这一目标。此外,我们的检测器还配备了假阳性和假阴性速率的性能度量,并允许进行无数据校准。在我们的实验中,我们将检测器与其他各种检测器进行了比较,发现我们的检测器在接收器操作员特征曲线下具有最高的面积。
translated by 谷歌翻译
相应地预测周围交通参与者的未来状态,并计划安全,平稳且符合社会的轨迹对于自动驾驶汽车至关重要。当前的自主驾驶系统有两个主要问题:预测模块通常与计划模块解耦,并且计划的成本功能很难指定和调整。为了解决这些问题,我们提出了一个端到端的可区分框架,该框架集成了预测和计划模块,并能够从数据中学习成本函数。具体而言,我们采用可区分的非线性优化器作为运动计划者,该运动计划将神经网络给出的周围剂的预测轨迹作为输入,并优化了自动驾驶汽车的轨迹,从而使框架中的所有操作都可以在框架中具有可观的成本,包括成本功能权重。提出的框架经过大规模的现实驾驶数据集进行了训练,以模仿整个驾驶场景中的人类驾驶轨迹,并在开环和闭环界面中进行了验证。开环测试结果表明,所提出的方法的表现优于各种指标的基线方法,并提供以计划为中心的预测结果,从而使计划模块能够输出接近人类的轨迹。在闭环测试中,提出的方法表明能够处理复杂的城市驾驶场景和鲁棒性,以抵抗模仿学习方法所遭受的分配转移。重要的是,我们发现计划和预测模块的联合培训比在开环和闭环测试中使用单独的训练有素的预测模块进行计划要比计划更好。此外,消融研究表明,框架中的可学习组件对于确保计划稳定性和性能至关重要。
translated by 谷歌翻译
Robots such as autonomous vehicles and assistive manipulators are increasingly operating in dynamic environments and close physical proximity to people. In such scenarios, the robot can leverage a human motion predictor to predict their future states and plan safe and efficient trajectories. However, no model is ever perfect -- when the observed human behavior deviates from the model predictions, the robot might plan unsafe maneuvers. Recent works have explored maintaining a confidence parameter in the human model to overcome this challenge, wherein the predicted human actions are tempered online based on the likelihood of the observed human action under the prediction model. This has opened up a new research challenge, i.e., \textit{how to compute the future human states online as the confidence parameter changes?} In this work, we propose a Hamilton-Jacobi (HJ) reachability-based approach to overcome this challenge. Treating the confidence parameter as a virtual state in the system, we compute a parameter-conditioned forward reachable tube (FRT) that provides the future human states as a function of the confidence parameter. Online, as the confidence parameter changes, we can simply query the corresponding FRT, and use it to update the robot plan. Computing parameter-conditioned FRT corresponds to an (offline) high-dimensional reachability problem, which we solve by leveraging recent advances in data-driven reachability analysis. Overall, our framework enables online maintenance and updates of safety assurances in human-robot interaction scenarios, even when the human prediction model is incorrect. We demonstrate our approach in several safety-critical autonomous driving scenarios, involving a state-of-the-art deep learning-based prediction model.
translated by 谷歌翻译
本文提出了一种新的规划和控制策略,用于赛车场景中的多辆车竞争。所提出的赛车策略在两种模式之间切换。当没有周围的车辆时,使用基于学习的模型预测控制(MPC)轨迹策划器用于保证自助车辆更好地实现了更好的搭接定时。当EGO车辆与其他围绕车辆竞争以超车时,基于优化的策划器通过并行计算产生多个动态可行的轨迹。每个轨迹在MPC配方下进行优化,其具有不同的同型贝塞尔曲线参考路径,横向于周围的车辆之间。选择这些不同的同型轨迹之间的时间最佳轨迹,并使用具有障碍物避免约束的低级MPC控制器来保证系统的安全性能。所提出的算法具有能够生成无碰撞轨迹并跟踪它们,同时提高杠杆定时性能,稳定的低计算复杂性,优于汽车赛车环境的时序和性能中的现有方法。为了展示我们的赛车策略的表现,我们在轨道上模拟了多个随机生成的移动车辆,并测试自我车辆的超越机动。
translated by 谷歌翻译
一般而言,融合是人类驱动因素和自治车辆的具有挑战性的任务,特别是在密集的交通中,因为合并的车辆通常需要与其他车辆互动以识别或创造间隙并安全合并。在本文中,我们考虑了强制合并方案的自主车辆控制问题。我们提出了一种新的游戏 - 理论控制器,称为领导者跟随者游戏控制器(LFGC),其中自主EGO车辆和其他具有先验不确定驾驶意图的车辆之间的相互作用被建模为部分可观察到的领导者 - 跟随游戏。 LFGC估计基于观察到的轨迹的其他车辆在线在线,然后预测其未来的轨迹,并计划使用模型预测控制(MPC)来同时实现概率保证安全性和合并目标的自我车辆自己的轨迹。为了验证LFGC的性能,我们在模拟和NGSIM数据中测试它,其中LFGC在合并中展示了97.5%的高成功率。
translated by 谷歌翻译
我们解决了由具有不同驱动程序行为的道路代理人填充的密集模拟交通环境中的自我车辆导航问题。由于其异构行为引起的代理人的不可预测性,这种环境中的导航是挑战。我们提出了一种新的仿真技术,包括丰富现有的交通模拟器,其具有与不同程度的侵略性程度相对应的行为丰富的轨迹。我们在驾驶员行为建模算法的帮助下生成这些轨迹。然后,我们使用丰富的模拟器培训深度加强学习(DRL)策略,包括一组高级车辆控制命令,并在测试时间使用此策略来执行密集流量的本地导航。我们的政策隐含地模拟了交通代理商之间的交互,并计算了自助式驾驶员机动,例如超速,超速,编织和突然道路变化的激进驾驶员演习的安全轨迹。我们增强的行为丰富的模拟器可用于生成由对应于不同驱动程序行为和流量密度的轨迹组成的数据集,我们的行为的导航方案可以与最先进的导航算法相结合。
translated by 谷歌翻译
这项工作研究了以下假设:与人类驾驶状态的部分可观察到的马尔可夫决策过程(POMDP)计划可以显着提高自动高速公路驾驶的安全性和效率。我们在模拟场景中评估了这一假设,即自动驾驶汽车必须在快速连续中安全执行三个车道变化。通过观测扩大(POMCPOW)算法,通过部分可观察到的蒙特卡洛计划获得了近似POMDP溶液。这种方法的表现优于过度自信和保守的MDP基准,匹配或匹配效果优于QMDP。相对于MDP基准,POMCPOW通常将不安全情况的速率降低了一半或将成功率提高50%。
translated by 谷歌翻译
基于神经网络的驾驶规划师在改善自动驾驶的任务绩效方面表现出了巨大的承诺。但是,确保具有基于神经网络的组件的系统的安全性,尤其是在密集且高度交互式的交通环境中,这是至关重要的,但又具有挑战性。在这项工作中,我们为基于神经网络的车道更改提出了一个安全驱动的互动计划框架。为了防止过度保守计划,我们确定周围车辆的驾驶行为并评估其侵略性,然后以互动方式相应地适应了计划的轨迹。如果在预测的最坏情况下,即使存在安全的逃避轨迹,则自我车辆可以继续改变车道;否则,它可以停留在当前的横向位置附近或返回原始车道。我们通过广泛而全面的实验环境以及在自动驾驶汽车公司收集的现实情况下进行了广泛的模拟,定量证明了计划者设计的有效性及其优于基线方法的优势。
translated by 谷歌翻译
Autonomous vehicle (AV) stacks are typically built in a modular fashion, with explicit components performing detection, tracking, prediction, planning, control, etc. While modularity improves reusability, interpretability, and generalizability, it also suffers from compounding errors, information bottlenecks, and integration challenges. To overcome these challenges, a prominent approach is to convert the AV stack into an end-to-end neural network and train it with data. While such approaches have achieved impressive results, they typically lack interpretability and reusability, and they eschew principled analytical components, such as planning and control, in favor of deep neural networks. To enable the joint optimization of AV stacks while retaining modularity, we present DiffStack, a differentiable and modular stack for prediction, planning, and control. Crucially, our model-based planning and control algorithms leverage recent advancements in differentiable optimization to produce gradients, enabling optimization of upstream components, such as prediction, via backpropagation through planning and control. Our results on the nuScenes dataset indicate that end-to-end training with DiffStack yields substantial improvements in open-loop and closed-loop planning metrics by, e.g., learning to make fewer prediction errors that would affect planning. Beyond these immediate benefits, DiffStack opens up new opportunities for fully data-driven yet modular and interpretable AV architectures. Project website: https://sites.google.com/view/diffstack
translated by 谷歌翻译
Traditional planning and control methods could fail to find a feasible trajectory for an autonomous vehicle to execute amongst dense traffic on roads. This is because the obstacle-free volume in spacetime is very small in these scenarios for the vehicle to drive through. However, that does not mean the task is infeasible since human drivers are known to be able to drive amongst dense traffic by leveraging the cooperativeness of other drivers to open a gap. The traditional methods fail to take into account the fact that the actions taken by an agent affect the behaviour of other vehicles on the road. In this work, we rely on the ability of deep reinforcement learning to implicitly model such interactions and learn a continuous control policy over the action space of an autonomous vehicle. The application we consider requires our agent to negotiate and open a gap in the road in order to successfully merge or change lanes. Our policy learns to repeatedly probe into the target road lane while trying to find a safe spot to move in to. We compare against two model-predictive control-based algorithms and show that our policy outperforms them in simulation.
translated by 谷歌翻译
具有许多移动代理的城市环境的运动计划可以看作是组合问题。通过在左右之后,左右或左后通过障碍物,自动驾驶汽车可以选择执行多个选项。这些组合方面需要在计划框架中考虑到。我们通过提出一种结合轨迹计划和操纵推理的新型计划方法来解决这个问题。我们定义了沿参考曲线的动态障碍的分类,使我们能够提取战术决策序列。我们将纵向和横向运动分开,以加快基于优化的轨迹计划。为了将获得的轨迹集绘制为操纵变体,我们定义了一种语义来描述它们。这使我们能够选择最佳轨迹,同时还可以确保随着时间的推移操纵的一致性。我们证明了我们的方法的能力,即仍被普遍认为是具有挑战性的场景。
translated by 谷歌翻译
作为自动驾驶系统的核心部分,运动计划已受到学术界和行业的广泛关注。但是,由于非体力学动力学,尤其是在存在非结构化的环境和动态障碍的情况下,没有能够有效的轨迹计划解决方案能够为空间周期关节优化。为了弥合差距,我们提出了一种多功能和实时轨迹优化方法,该方法可以在任意约束下使用完整的车辆模型生成高质量的可行轨迹。通过利用类似汽车的机器人的差异平坦性能,我们使用平坦的输出来分析所有可行性约束,以简化轨迹计划问题。此外,通过全尺寸多边形实现避免障碍物,以产生较少的保守轨迹,并具有安全保证,尤其是在紧密约束的空间中。我们通过最先进的方法介绍了全面的基准测试,这证明了所提出的方法在效率和轨迹质量方面的重要性。现实世界实验验证了我们算法的实用性。我们将发布我们的代码作为开源软件包,目的是参考研究社区。
translated by 谷歌翻译
信号时间逻辑的鲁棒性不仅评估信号是否遵守规范,而且还提供了对公式的满足或违反的量度。鲁棒性的计算基于评估潜在谓词的鲁棒性。但是,通常以无模型方式(即不包括系统动力学)定义谓词的鲁棒性。此外,精确定义复杂谓词的鲁棒性通常是不平凡的。为了解决这些问题,我们提出了模型预测鲁棒性的概念,该概念通过考虑基于模型的预测,它与以前的方法相比提供了一种更系统的评估鲁棒性的方法。特别是,我们使用高斯过程回归来基于预定的预测来学习鲁棒性,以便可以在线上有效地计算鲁棒性值。我们评估了对自动驾驶用例的方法,该案例用在记录的数据集上使用形式的交通规则中使用的谓词来评估我们的方法,这与传统方法相比,在表达性方面相比,我们的方法优势。通过将我们的鲁棒性定义纳入轨迹规划师,自动驾驶汽车比数据集中的人类驾驶员更强大地遵守交通规则。
translated by 谷歌翻译
安全与其他交通参与者的互动是自动驾驶的核心要求之一,尤其是在交叉点和遮挡中。大多数现有的方法都是为特定场景设计的,需要大量的人工劳动参数调整,以应用于不同情况。为了解决这个问题,我们首先提出了一个基于学习的交互点模型(IPM),该模型描述了代理与保护时间和交互优先级之间的相互作用以统一的方式。我们将提出的IPM进一步整合到一个新颖的计划框架中,通过在高度动态的环境中的全面模拟来证明其有效性和鲁棒性。
translated by 谷歌翻译
安全可靠的自治解决方案是下一代智能运输系统的关键组成部分。这种系统中的自动驾驶汽车必须实时考虑复杂而动态的驾驶场景,并预测附近驾驶员的行为。人类驾驶行为非常细微,对个别交通参与者具有特殊性。例如,在合并车辆的情况下,驾驶员可能会显示合作或非合作行为。这些行为必须估算并纳入安全有效驾驶的计划过程中。在这项工作中,我们提出了一个框架,用于估计高速公路上驾驶员的合作水平,并计划将动作与驾驶员的潜在行为合并。潜在参数估计问题使用粒子滤波器解决,以近似合作级别的概率分布。包括潜在状态估算的部分可观察到的马尔可夫决策过程(POMDP)在线解决,以提取合并车辆的政策。我们在高保真汽车模拟器中评估我们的方法,以对潜在状态不可知或依赖于$ \ textit {a先验{先验} $假设。
translated by 谷歌翻译
Making safe and human-like decisions is an essential capability of autonomous driving systems and learning-based behavior planning is a promising pathway toward this objective. Distinguished from existing learning-based methods that directly output decisions, this work introduces a predictive behavior planning framework that learns to predict and evaluate from human driving data. Concretely, a behavior generation module first produces a diverse set of candidate behaviors in the form of trajectory proposals. Then the proposed conditional motion prediction network is employed to forecast other agents' future trajectories conditioned on each trajectory proposal. Given the candidate plans and associated prediction results, we learn a scoring module to evaluate the plans using maximum entropy inverse reinforcement learning (IRL). We conduct comprehensive experiments to validate the proposed framework on a large-scale real-world urban driving dataset. The results reveal that the conditional prediction model is able to forecast multiple possible future trajectories given a candidate behavior and the prediction results are reactive to different plans. Moreover, the IRL-based scoring module can properly evaluate the trajectory proposals and select close-to-human ones. The proposed framework outperforms other baseline methods in terms of similarity to human driving trajectories. Moreover, we find that the conditional prediction model can improve both prediction and planning performance compared to the non-conditional model, and learning the scoring module is critical to correctly evaluating the candidate plans to align with human drivers.
translated by 谷歌翻译
在这项工作中,我们提出了世界上第一个基于闭环ML的自动驾驶计划基准。虽然存在基于ML的ML的越来越多的ML的议员,但缺乏已建立的数据集和指标限制了该领域的进展。自主车辆运动预测的现有基准专注于短期运动预测,而不是长期规划。这导致了以前的作品来使用基于L2的度量标准的开放循环评估,这不适合公平地评估长期规划。我们的基准通过引入大规模驾驶数据集,轻量级闭环模拟器和特定于运动规划的指标来克服这些限制。我们提供高质量的数据集,在美国和亚洲的4个城市提供1500h的人类驾驶数据,具有广泛不同的交通模式(波士顿,匹兹堡,拉斯维加斯和新加坡)。我们将提供具有无功代理的闭环仿真框架,并提供一系列一般和方案特定的规划指标。我们计划在Neurips 2021上发布数据集,并在2022年初开始组织基准挑战。
translated by 谷歌翻译
Although extensive research in planning has been carried out for normal scenarios, path planning in emergencies has not been thoroughly explored, especially when vehicles move at a higher speed and have less space for avoiding a collision. For emergency collision avoidance, the controller should have the ability to deal with complicated environments and take collision mitigation into consideration since the problem may have no feasible solution. We propose a safety controller by using model predictive control and artificial potential function. A new artificial potential function inspired by line charge is proposed as the cost function for our model predictive controller. The new artificial potential function takes the shape of all objects into consideration. In particular, the artificial potential function that we proposed has the flexibility to fit the shape of the road structures such as the intersection, while the artificial potential function in most of the previous work could only be used in a highway scenario. Moreover, we could realize collision mitigation for a specific part of the vehicle by increasing the quantity of the charge at the corresponding place. We have tested our methods in 192 cases from 8 different scenarios in simulation. The simulation results show that the success rate of the proposed safety controller is 20% higher than using HJ-reachability with system decomposition. It could also decrease 43% of collision that happens at the pre-assigned part.
translated by 谷歌翻译
对于自动驾驶汽车而言,遍历交叉点是一个具有挑战性的问题,尤其是当交叉路口没有交通控制时。最近,由于其成功处理自动驾驶任务,深厚的强化学习受到了广泛的关注。在这项工作中,我们解决了使用新颖的课程进行深入增强学习的问题的问题。拟议的课程导致:1)与未经课程训练的代理人相比,增强剂学习代理的更快的训练过程和2)表现更好。我们的主要贡献是两个方面:1)提供一个独特的课程,用于训练深入的强化学习者,2)显示了所提出的课程在未信号的交叉遍历任务中的应用。该框架期望自动驾驶汽车的感知系统对周围环境进行了处理。我们在Comonroad运动计划模拟器中测试我们的TTTERTIONS和四向交集的方法。
translated by 谷歌翻译