When testing conditions differ from those represented in training data, so-called out-of-distribution (OOD) inputs can mar the reliability of black-box learned components in the modern robot autonomy stack. Therefore, coping with OOD data is an important challenge on the path towards trustworthy learning-enabled open-world autonomy. In this paper, we aim to demystify the topic of OOD data and its associated challenges in the context of data-driven robotic systems, drawing connections to emerging paradigms in the ML community that study the effect of OOD data on learned models in isolation. We argue that as roboticists, we should reason about the overall system-level competence of a robot as it performs tasks in OOD conditions. We highlight key research questions around this system-level view of OOD problems to guide future research toward safe and reliable learning-enabled autonomy.
translated by 谷歌翻译
We consider the sequential decision-making problem of making proactive request assignment and rejection decisions for a profit-maximizing operator of an autonomous mobility on demand system. We formalize this problem as a Markov decision process and propose a novel combination of multi-agent Soft Actor-Critic and weighted bipartite matching to obtain an anticipative control policy. Thereby, we factorize the operator's otherwise intractable action space, but still obtain a globally coordinated decision. Experiments based on real-world taxi data show that our method outperforms state of the art benchmarks with respect to performance, stability, and computational tractability.
translated by 谷歌翻译
Autonomous vehicle (AV) stacks are typically built in a modular fashion, with explicit components performing detection, tracking, prediction, planning, control, etc. While modularity improves reusability, interpretability, and generalizability, it also suffers from compounding errors, information bottlenecks, and integration challenges. To overcome these challenges, a prominent approach is to convert the AV stack into an end-to-end neural network and train it with data. While such approaches have achieved impressive results, they typically lack interpretability and reusability, and they eschew principled analytical components, such as planning and control, in favor of deep neural networks. To enable the joint optimization of AV stacks while retaining modularity, we present DiffStack, a differentiable and modular stack for prediction, planning, and control. Crucially, our model-based planning and control algorithms leverage recent advancements in differentiable optimization to produce gradients, enabling optimization of upstream components, such as prediction, via backpropagation through planning and control. Our results on the nuScenes dataset indicate that end-to-end training with DiffStack yields substantial improvements in open-loop and closed-loop planning metrics by, e.g., learning to make fewer prediction errors that would affect planning. Beyond these immediate benefits, DiffStack opens up new opportunities for fully data-driven yet modular and interpretable AV architectures. Project website: https://sites.google.com/view/diffstack
translated by 谷歌翻译
Autonomous vehicles must often contend with conflicting planning requirements, e.g., safety and comfort could be at odds with each other if avoiding a collision calls for slamming the brakes. To resolve such conflicts, assigning importance ranking to rules (i.e., imposing a rule hierarchy) has been proposed, which, in turn, induces rankings on trajectories based on the importance of the rules they satisfy. On one hand, imposing rule hierarchies can enhance interpretability, but introduce combinatorial complexity to planning; while on the other hand, differentiable reward structures can be leveraged by modern gradient-based optimization tools, but are less interpretable and unintuitive to tune. In this paper, we present an approach to equivalently express rule hierarchies as differentiable reward structures amenable to modern gradient-based optimizers, thereby, achieving the best of both worlds. We achieve this by formulating rank-preserving reward functions that are monotonic in the rank of the trajectories induced by the rule hierarchy; i.e., higher ranked trajectories receive higher reward. Equipped with a rule hierarchy and its corresponding rank-preserving reward function, we develop a two-stage planner that can efficiently resolve conflicting planning requirements. We demonstrate that our approach can generate motion plans in ~7-10 Hz for various challenging road navigation and intersection negotiation scenarios.
translated by 谷歌翻译
We propose a learning-based robust predictive control algorithm that compensates for significant uncertainty in the dynamics for a class of discrete-time systems that are nominally linear with an additive nonlinear component. Such systems commonly model the nonlinear effects of an unknown environment on a nominal system. We optimize over a class of nonlinear feedback policies inspired by certainty equivalent "estimate-and-cancel" control laws pioneered in classical adaptive control to achieve significant performance improvements in the presence of uncertainties of large magnitude, a setting in which existing learning-based predictive control algorithms often struggle to guarantee safety. In contrast to previous work in robust adaptive MPC, our approach allows us to take advantage of structure (i.e., the numerical predictions) in the a priori unknown dynamics learned online through function approximation. Our approach also extends typical nonlinear adaptive control methods to systems with state and input constraints even when we cannot directly cancel the additive uncertain function from the dynamics. We apply contemporary statistical estimation techniques to certify the system's safety through persistent constraint satisfaction with high probability. Moreover, we propose using Bayesian meta-learning algorithms that learn calibrated model priors to help satisfy the assumptions of the control design in challenging settings. Finally, we show in simulation that our method can accommodate more significant unknown dynamics terms than existing methods and that the use of Bayesian meta-learning allows us to adapt to the test environments more rapidly.
translated by 谷歌翻译
Effectively exploring the environment is a key challenge in reinforcement learning (RL). We address this challenge by defining a novel intrinsic reward based on a foundation model, such as contrastive language image pretraining (CLIP), which can encode a wealth of domain-independent semantic visual-language knowledge about the world. Specifically, our intrinsic reward is defined based on pre-trained CLIP embeddings without any fine-tuning or learning on the target RL task. We demonstrate that CLIP-based intrinsic rewards can drive exploration towards semantically meaningful states and outperform state-of-the-art methods in challenging sparse-reward procedurally-generated environments.
translated by 谷歌翻译
基于学习的行为预测方法越来越多地被部署在现实世界的自治系统中,例如,在全球主要城市的自动驾驶汽车舰队中开始商业运营。但是,尽管有进步,但绝大多数预测系统专门针对一组经过验证的地理区域或操作设计领域,使部署到其他城市,国家或大陆。为此,我们提出了一种新颖的方法,可以有效地将行为预测模型适应新环境。我们的方法利用了元学习的最新进展,特别是贝叶斯回归,以使用自适应层增强现有的行为预测模型,该模型可以通过离线微调,在线适应或两者兼而有之有效的域传输。多个现实世界数据集的实验表明,我们的方法可以有效地适应各种看不见的环境。
translated by 谷歌翻译
多限制攀岩机器人的运动计划必须考虑机器人的姿势,联合扭矩,以及它如何使用接触力与环境相互作用。本文着重于使用非传统运动来探索不可预测的环境(例如火星洞穴)的机器人运动计划。我们的机器人概念Reachbot使用可扩展和可伸缩的动臂作为四肢,在攀爬时实现了大型可伸缩度工作区。每个可扩展的动臂都由旨在抓住岩石表面的微生物抓地力封顶。 Reachbot利用其大型工作空间来绕过障碍物,裂缝和挑战地形。我们的计划方法必须具有多功能性,以适应可变的地形特征和鲁棒性,以减轻用刺抓握随机性质的风险。在本文中,我们引入了一种图形遍历算法,以根据适用于握把的可用地形特征选择一个离散的grasps序列。该离散的计划是由一个解耦运动计划者互补的,该计划者使用基于抽样的计划和顺序凸面编程的组合来考虑身体运动和最终效应器运动的交替阶段,以优化单个阶段。我们使用运动规划师在模拟的2D洞穴环境中计划轨迹,至少有95%的成功概率,并在基线轨迹上表现出改善的鲁棒性。最后,我们通过对2D平面原型进行实验来验证运动计划算法。
translated by 谷歌翻译
轨迹预测对于自动驾驶汽车(AV)是必不可少的,以计划正确且安全的驾驶行为。尽管许多先前的作品旨在达到更高的预测准确性,但很少有人研究其方法的对抗性鲁棒性。为了弥合这一差距,我们建议研究数据驱动的轨迹预测系统的对抗性鲁棒性。我们设计了一个基于优化的对抗攻击框架,该框架利用精心设计的可区分动态模型来生成逼真的对抗轨迹。从经验上讲,我们基于最先进的预测模型的对抗性鲁棒性,并表明我们的攻击使通用指标和计划感知指标的预测错误增加了50%以上和37%。我们还表明,我们的攻击可以导致AV在模拟中驶离道路或碰撞到其他车辆中。最后,我们演示了如何使用对抗训练计划来减轻对抗性攻击。
translated by 谷歌翻译
随着输入分布在任务寿命中的发展,保持基于学习的模型的性能变得具有挑战性。本文提出了一个框架,可以通过选择标签的测试输入子集来逐步重新训练模型,从而使模型适应更改输入分布。根据(1)整个任务生命周期的模型性能以及(2)与标签和模型再培训相关的累积成本,对此框架中的算法进行了评估。我们提供了卫星姿势估计模型的开源基准,该基准在空间中的卫星图像中训练并部署在新颖场景中(例如,不同的背景或不良行为的像素),在其中评估了算法,以通过在其能力上通过在其上进行高性能来维持高性能的能力。输入的子集。我们还提出了一种新颖的算法,以通过使用贝叶斯不确定性量化从输入中表征信息获得的信息增益,并选择一个子集,并选择一个子集,该子集使用批处理主动学习中的概念来最大化集体信息增益。我们表明,我们的算法在基准上的表现优于其他算法,例如,达到与100%输入标签的算法相当的性能,而仅标记了50%的输入,从而在任务寿命中产生了低成本和高性能。
translated by 谷歌翻译