运行时执行是指针对运行时正式规范执行正确行为的理论,技术和工具。在本文中,我们对用于构建AI中执行安全性的混凝土应用程序域的运行时执行器的技术感兴趣。我们讨论了传统上如何在AI领域处理安全性,以及如何通过集成运行时执行器来提供自我学习代理的安全性。我们调查了此类执法者的一系列工作,在该工作中,我们区分了离散和连续动作空间的方法。本文的目的是更好地理解不同执法技术的优势和局限性,重点关注由于AI在AI中的应用而引起的特定挑战。最后,我们为未来的工作提出了一些开放的挑战和途径。
translated by 谷歌翻译
Besides the recent impressive results on reinforcement learning (RL), safety is still one of the major research challenges in RL. RL is a machine-learning approach to determine near-optimal policies in Markov decision processes (MDPs). In this paper, we consider the setting where the safety-relevant fragment of the MDP together with a temporal logic safety specification is given and many safety violations can be avoided by planning ahead a short time into the future. We propose an approach for online safety shielding of RL agents. During runtime, the shield analyses the safety of each available action. For any action, the shield computes the maximal probability to not violate the safety specification within the next $k$ steps when executing this action. Based on this probability and a given threshold, the shield decides whether to block an action from the agent. Existing offline shielding approaches compute exhaustively the safety of all state-action combinations ahead of time, resulting in huge computation times and large memory consumption. The intuition behind online shielding is to compute at runtime the set of all states that could be reached in the near future. For each of these states, the safety of all available actions is analysed and used for shielding as soon as one of the considered states is reached. Our approach is well suited for high-level planning problems where the time between decisions can be used for safety computations and it is sustainable for the agent to wait until these computations are finished. For our evaluation, we selected a 2-player version of the classical computer game SNAKE. The game represents a high-level planning problem that requires fast decisions and the multiplayer setting induces a large state space, which is computationally expensive to analyse exhaustively.
translated by 谷歌翻译
值得信赖的强化学习算法应有能力解决挑战性的现实问题,包括{Robustly}处理不确定性,满足{安全}的限制以避免灾难性的失败,以及在部署过程中{prencepentiming}以避免灾难性的失败}。这项研究旨在概述这些可信赖的强化学习的主要观点,即考虑其在鲁棒性,安全性和概括性上的内在脆弱性。特别是,我们给出严格的表述,对相应的方法进行分类,并讨论每个观点的基准。此外,我们提供了一个前景部分,以刺激有希望的未来方向,并简要讨论考虑人类反馈的外部漏洞。我们希望这项调查可以在统一的框架中将单独的研究汇合在一起,并促进强化学习的可信度。
translated by 谷歌翻译
背景信息:在过去几年中,机器学习(ML)一直是许多创新的核心。然而,包括在所谓的“安全关键”系统中,例如汽车或航空的系统已经被证明是非常具有挑战性的,因为ML的范式转变为ML带来完全改变传统认证方法。目的:本文旨在阐明与ML为基础的安全关键系统认证有关的挑战,以及文献中提出的解决方案,以解决它们,回答问题的问题如何证明基于机器学习的安全关键系统?'方法:我们开展2015年至2020年至2020年之间发布的研究论文的系统文献综述(SLR),涵盖了与ML系统认证有关的主题。总共确定了217篇论文涵盖了主题,被认为是ML认证的主要支柱:鲁棒性,不确定性,解释性,验证,安全强化学习和直接认证。我们分析了每个子场的主要趋势和问题,并提取了提取的论文的总结。结果:单反结果突出了社区对该主题的热情,以及在数据集和模型类型方面缺乏多样性。它还强调需要进一步发展学术界和行业之间的联系,以加深域名研究。最后,它还说明了必须在上面提到的主要支柱之间建立连接的必要性,这些主要柱主要主要研究。结论:我们强调了目前部署的努力,以实现ML基于ML的软件系统,并讨论了一些未来的研究方向。
translated by 谷歌翻译
过去半年来,从控制和强化学习社区的真实机器人部署的安全学习方法的贡献数量急剧上升。本文提供了一种简洁的但整体审查,对利用机器学习实现的最新进展,以实现在不确定因素下的安全决策,重点是统一控制理论和加固学习研究中使用的语言和框架。我们的评论包括:基于学习的控制方法,通过学习不确定的动态,加强学习方法,鼓励安全或坚固性的加固学习方法,以及可以正式证明学习控制政策安全的方法。随着基于数据和学习的机器人控制方法继续获得牵引力,研究人员必须了解何时以及如何最好地利用它们在安全势在必行的现实情景中,例如在靠近人类的情况下操作时。我们突出了一些开放的挑战,即将在未来几年推动机器人学习领域,并强调需要逼真的物理基准的基准,以便于控制和加固学习方法之间的公平比较。
translated by 谷歌翻译
Learning-enabled control systems have demonstrated impressive empirical performance on challenging control problems in robotics, but this performance comes at the cost of reduced transparency and lack of guarantees on the safety or stability of the learned controllers. In recent years, new techniques have emerged to provide these guarantees by learning certificates alongside control policies -- these certificates provide concise, data-driven proofs that guarantee the safety and stability of the learned control system. These methods not only allow the user to verify the safety of a learned controller but also provide supervision during training, allowing safety and stability requirements to influence the training process itself. In this paper, we provide a comprehensive survey of this rapidly developing field of certificate learning. We hope that this paper will serve as an accessible introduction to the theory and practice of certificate learning, both to those who wish to apply these tools to practical robotics problems and to those who wish to dive more deeply into the theory of learning for control.
translated by 谷歌翻译
Safe Reinforcement Learning can be defined as the process of learning policies that maximize the expectation of the return in problems in which it is important to ensure reasonable system performance and/or respect safety constraints during the learning and/or deployment processes. We categorize and analyze two approaches of Safe Reinforcement Learning. The first is based on the modification of the optimality criterion, the classic discounted finite/infinite horizon, with a safety factor. The second is based on the modification of the exploration process through the incorporation of external knowledge or the guidance of a risk metric. We use the proposed classification to survey the existing literature, as well as suggesting future directions for Safe Reinforcement Learning.
translated by 谷歌翻译
Safety is still one of the major research challenges in reinforcement learning (RL). In this paper, we address the problem of how to avoid safety violations of RL agents during exploration in probabilistic and partially unknown environments. Our approach combines automata learning for Markov Decision Processes (MDPs) and shield synthesis in an iterative approach. Initially, the MDP representing the environment is unknown. The agent starts exploring the environment and collects traces. From the collected traces, we passively learn MDPs that abstractly represent the safety-relevant aspects of the environment. Given a learned MDP and a safety specification, we construct a shield. For each state-action pair within a learned MDP, the shield computes exact probabilities on how likely it is that executing the action results in violating the specification from the current state within the next $k$ steps. After the shield is constructed, the shield is used during runtime and blocks any actions that induce a too large risk from the agent. The shielded agent continues to explore the environment and collects new data on the environment. Iteratively, we use the collected data to learn new MDPs with higher accuracy, resulting in turn in shields able to prevent more safety violations. We implemented our approach and present a detailed case study of a Q-learning agent exploring slippery Gridworlds. In our experiments, we show that as the agent explores more and more of the environment during training, the improved learned models lead to shields that are able to prevent many safety violations.
translated by 谷歌翻译
Learning enabled autonomous systems provide increased capabilities compared to traditional systems. However, the complexity of and probabilistic nature in the underlying methods enabling such capabilities present challenges for current systems engineering processes for assurance, and test, evaluation, verification, and validation (TEVV). This paper provides a preliminary attempt to map recently developed technical approaches in the assurance and TEVV of learning enabled autonomous systems (LEAS) literature to a traditional systems engineering v-model. This mapping categorizes such techniques into three main approaches: development, acquisition, and sustainment. We review the latest techniques to develop safe, reliable, and resilient learning enabled autonomous systems, without recommending radical and impractical changes to existing systems engineering processes. By performing this mapping, we seek to assist acquisition professionals by (i) informing comprehensive test and evaluation planning, and (ii) objectively communicating risk to leaders.
translated by 谷歌翻译
强化学习(RL)是一种有希望的方法,对现实世界的应用程序取得有限,因为确保安全探索或促进充分利用是控制具有未知模型和测量不确定性的机器人系统的挑战。这种学习问题对于连续空间(状态空间和动作空间)的复杂任务变得更加棘手。在本文中,我们提出了一种由几个方面组成的基于学习的控制框架:(1)线性时间逻辑(LTL)被利用,以便于可以通过无限视野的复杂任务转换为新颖的自动化结构; (2)我们为RL-Agent提出了一种创新的奖励计划,正式保证,使全球最佳政策最大化满足LTL规范的概率; (3)基于奖励塑造技术,我们开发了利用自动机构结构的好处进行了模块化的政策梯度架构来分解整体任务,并促进学习控制器的性能; (4)通过纳入高斯过程(GPS)来估计不确定的动态系统,我们使用指数控制屏障功能(ECBF)综合基于模型的保障措施来解决高阶相对度的问题。此外,我们利用LTL自动化和ECBF的性质来构建引导过程,以进一步提高勘探效率。最后,我们通过多个机器人环境展示了框架的有效性。我们展示了这种基于ECBF的模块化深RL算法在训练期间实现了近乎完美的成功率和保护安全性,并且在训练期间具有很高的概率信心。
translated by 谷歌翻译
This paper surveys the eld of reinforcement learning from a computer-science perspective. It is written to be accessible to researchers familiar with machine learning. Both the historical basis of the eld and a broad selection of current work are summarized. Reinforcement learning is the problem faced by an agent that learns behavior through trial-and-error interactions with a dynamic environment. The work described here has a resemblance to work in psychology, but di ers considerably in the details and in the use of the word \reinforcement." The paper discusses central issues of reinforcement learning, including trading o exploration and exploitation, establishing the foundations of the eld via Markov decision theory, learning from delayed reinforcement, constructing empirical models to accelerate learning, making use of generalization and hierarchy, and coping with hidden state. It concludes with a survey of some implemented systems and an assessment of the practical utility of current methods for reinforcement learning.
translated by 谷歌翻译
With the development of deep representation learning, the domain of reinforcement learning (RL) has become a powerful learning framework now capable of learning complex policies in high dimensional environments. This review summarises deep reinforcement learning (DRL) algorithms and provides a taxonomy of automated driving tasks where (D)RL methods have been employed, while addressing key computational challenges in real world deployment of autonomous driving agents. It also delineates adjacent domains such as behavior cloning, imitation learning, inverse reinforcement learning that are related but are not classical RL algorithms. The role of simulators in training agents, methods to validate, test and robustify existing solutions in RL are discussed.
translated by 谷歌翻译
深度强化学习(DRL)和深度多机构的强化学习(MARL)在包括游戏AI,自动驾驶汽车,机器人技术等各种领域取得了巨大的成功。但是,众所周知,DRL和Deep MARL代理的样本效率低下,即使对于相对简单的问题设置,通常也需要数百万个相互作用,从而阻止了在实地场景中的广泛应用和部署。背后的一个瓶颈挑战是众所周知的探索问题,即如何有效地探索环境和收集信息丰富的经验,从而使政策学习受益于最佳研究。在稀疏的奖励,吵闹的干扰,长距离和非平稳的共同学习者的复杂环境中,这个问题变得更加具有挑战性。在本文中,我们对单格和多代理RL的现有勘探方法进行了全面的调查。我们通过确定有效探索的几个关键挑战开始调查。除了上述两个主要分支外,我们还包括其他具有不同思想和技术的著名探索方法。除了算法分析外,我们还对一组常用基准的DRL进行了全面和统一的经验比较。根据我们的算法和实证研究,我们终于总结了DRL和Deep Marl中探索的公开问题,并指出了一些未来的方向。
translated by 谷歌翻译
马尔可夫决策过程通常用于不确定性下的顺序决策。然而,对于许多方面,从受约束或安全规范到任务和奖励结构中的各种时间(非Markovian)依赖性,需要扩展。为此,近年来,兴趣已经发展成为强化学习和时间逻辑的组合,即灵活的行为学习方法的组合,具有稳健的验证和保证。在本文中,我们描述了最近引入的常规决策过程的实验调查,该过程支持非马洛维亚奖励功能以及过渡职能。特别是,我们为常规决策过程,与在线,增量学习有关的算法扩展,对无模型和基于模型的解决方案算法的实证评估,以及以常规但非马尔维亚,网格世界的应用程序的算法扩展。
translated by 谷歌翻译
Reinforcement Learning (RL) can solve complex tasks but does not intrinsically provide any guarantees on system behavior. For real-world systems that fulfill safety-critical tasks, such guarantees on safety specifications are necessary. To bridge this gap, we propose a verifiably safe RL procedure with probabilistic guarantees. First, our approach probabilistically verifies a candidate controller with respect to a temporal logic specification, while randomizing the controller's inputs within a bounded set. Then, we use RL to improve the performance of this probabilistically verified, i.e. safe, controller and explore in the same bounded set around the controller's input as was randomized over in the verification step. Finally, we calculate probabilistic safety guarantees with respect to temporal logic specifications for the learned agent. Our approach is efficient for continuous action and state spaces and separates safety verification and performance improvement into two independent steps. We evaluate our approach on a safe evasion task where a robot has to evade a dynamic obstacle in a specific manner while trying to reach a goal. The results show that our verifiably safe RL approach leads to efficient learning and performance improvements while maintaining safety specifications.
translated by 谷歌翻译
即将开发我们呼叫所体现的系统的新一代越来越自主和自学习系统。在将这些系统部署到真实上下文中,我们面临各种工程挑战,因为它以有益的方式协调所体现的系统的行为至关重要,确保他们与我们以人为本的社会价值观的兼容性,并且设计可验证安全可靠的人类-Machine互动。我们正在争辩说,引发系统工程将来自嵌入到体现系统的温室,并确保动态联合的可信度,这种情况意识到的情境意识,意图,探索,探险,不断发展,主要是不可预测的,越来越自主的体现系统在不确定,复杂和不可预测的现实世界环境中。我们还识别了许多迫切性的系统挑战,包括可信赖的体现系统,包括强大而人为的AI,认知架构,不确定性量化,值得信赖的自融化以及持续的分析和保证。
translated by 谷歌翻译
Deep reinforcement learning is poised to revolutionise the field of AI and represents a step towards building autonomous systems with a higher level understanding of the visual world. Currently, deep learning is enabling reinforcement learning to scale to problems that were previously intractable, such as learning to play video games directly from pixels. Deep reinforcement learning algorithms are also applied to robotics, allowing control policies for robots to be learned directly from camera inputs in the real world. In this survey, we begin with an introduction to the general field of reinforcement learning, then progress to the main streams of value-based and policybased methods. Our survey will cover central algorithms in deep reinforcement learning, including the deep Q-network, trust region policy optimisation, and asynchronous advantage actor-critic. In parallel, we highlight the unique advantages of deep neural networks, focusing on visual understanding via reinforcement learning. To conclude, we describe several current areas of research within the field.
translated by 谷歌翻译
强化学习(RL)通过与环境相互作用的试验过程解决顺序决策问题。尽管RL在玩复杂的视频游戏方面取得了巨大的成功,但在现实世界中,犯错误总是不希望的。为了提高样本效率并从而降低错误,据信基于模型的增强学习(MBRL)是一个有前途的方向,它建立了环境模型,在该模型中可以进行反复试验,而无需实际成本。在这项调查中,我们对MBRL进行了审查,重点是Deep RL的最新进展。对于非壮观环境,学到的环境模型与真实环境之间始终存在概括性错误。因此,非常重要的是分析环境模型中的政策培训与实际环境中的差异,这反过来又指导了更好的模型学习,模型使用和政策培训的算法设计。此外,我们还讨论了其他形式的RL,包括离线RL,目标条件RL,多代理RL和Meta-RL的最新进展。此外,我们讨论了MBRL在现实世界任务中的适用性和优势。最后,我们通过讨论MBRL未来发展的前景来结束这项调查。我们认为,MBRL在被忽略的现实应用程序中具有巨大的潜力和优势,我们希望这项调查能够吸引更多关于MBRL的研究。
translated by 谷歌翻译
Compared with model-based control and optimization methods, reinforcement learning (RL) provides a data-driven, learning-based framework to formulate and solve sequential decision-making problems. The RL framework has become promising due to largely improved data availability and computing power in the aviation industry. Many aviation-based applications can be formulated or treated as sequential decision-making problems. Some of them are offline planning problems, while others need to be solved online and are safety-critical. In this survey paper, we first describe standard RL formulations and solutions. Then we survey the landscape of existing RL-based applications in aviation. Finally, we summarize the paper, identify the technical gaps, and suggest future directions of RL research in aviation.
translated by 谷歌翻译
主动同时定位和映射(SLAM)是规划和控制机器人运动以构建周围环境中最准确,最完整的模型的问题。自从三十多年前出现了积极感知的第一项基础工作以来,该领域在不同科学社区中受到了越来越多的关注。这带来了许多不同的方法和表述,并回顾了当前趋势,对于新的和经验丰富的研究人员来说都是非常有价值的。在这项工作中,我们在主动大满贯中调查了最先进的工作,并深入研究了仍然需要注意的公开挑战以满足现代应用程序的需求。为了实现现实世界的部署。在提供了历史观点之后,我们提出了一个统一的问题制定并审查经典解决方案方案,该方案将问题分解为三个阶段,以识别,选择和执行潜在的导航措施。然后,我们分析替代方法,包括基于深入强化学习的信念空间规划和现代技术,以及审查有关多机器人协调的相关工作。该手稿以讨论新的研究方向的讨论,解决可再现的研究,主动的空间感知和实际应用,以及其他主题。
translated by 谷歌翻译