现代分布式系统受到耐故障算法的支持,例如可靠的广播和共识,即使系统的某些节点失败,也可以确保系统的正确操作。但是,分布式算法的开发是一个手动且复杂的过程,导致科学论文通常呈现单一算法或现有算法的变化。为了自动化开发此类算法的过程,这项工作提出了一种使用强化学习来生成正确且有效耐受性分布式分布式算法的智能代理。我们表明,我们的方法能够在仅12,000个学习剧集中生成正确的耐受性可靠的广播算法,而文献中的其他人则具有相同的性能。
translated by 谷歌翻译
在各个领域采用深度学习(DL)的行业和学术界都有日益增长的需求,以解决现实世界的问题。深度加强学习(DRL)是DL在加固学习领域(RL)的应用。与任何软件系统一样,由于其程序中的故障,DRL应用程序可能会失败。在本文中,我们介绍了第一次尝试在DRL程序中分类故障。我们手动分析了使用众所周知的DRL框架(Openai健身房,多巴胺,Keras-RL,TensoRForce)和开发人员/用户报告的错误开发的DRL程序的761个文物(来自Stack Overflow帖子和GitHub问题)。我们通过几轮讨论标记和分类为已识别的故障。使用与19名开发人员/研究人员的在线调查验证了产生的分类法。为了允许在DRL程序中自动检测故障,我们已经确定了DRL程序的元模型,并开发了DRLINTER,一种利用静态分析和图形转换的基于模型的故障检测方法。 DRLINTINT的执行流程在于解析DRL程序,以生成符合我们元模型的模型,并在模型上应用检测规则以识别故障出现。使用15种合成DRLPRAGIONS评估DRLINTER的有效性,其中我们在分析的分析伪影中观察到的故障。结果表明,Drlinter可以在所有合成错误程序中成功检测故障。
translated by 谷歌翻译
本文使用JACAMO框架提供了多代理系统(MAS)的运行时验证(RV)方法。我们的目标是为MAS带来一层安全性。该层能够在系统执行过程中控制事件,而无需在每个代理的行为中进行特定的实现来识别事件。MAS已在混合智能的背景下使用。这种使用需要软件代理与人类之间的通信。在某些情况下,通过自然语言对话进行沟通。但是,这种沟通使我们引起了与控制对话流有关的关注,因此代理可以防止讨论主题的任何变化可能会损害其推理。我们证明了一个监视器的实施,该监视器旨在控制MAS中的对话流,该对话流通过自然语言与用户沟通以帮助医院病床分配的决策。
translated by 谷歌翻译
行为树(BT)是一种在自主代理中(例如机器人或计算机游戏中的虚拟实体)之间在不同任务之间进行切换的方法。 BT是创建模块化和反应性的复杂系统的一种非常有效的方法。这些属性在许多应用中至关重要,这导致BT从计算机游戏编程到AI和机器人技术的许多分支。在本书中,我们将首先对BTS进行介绍,然后我们描述BTS与早期切换结构的关系,并且在许多情况下如何概括。然后,这些想法被用作一套高效且易于使用的设计原理的基础。安全性,鲁棒性和效率等属性对于自主系统很重要,我们描述了一套使用BTS的状态空间描述正式分析这些系统的工具。借助新的分析工具,我们可以对BTS如何推广早期方法的形式形式化。我们还显示了BTS在自动化计划和机器学习中的使用。最后,我们描述了一组扩展的工具,以捕获随机BT的行为,其中动作的结果由概率描述。这些工具可以计算成功概率和完成时间。
translated by 谷歌翻译
在过去的十年中,深入的强化学习(DRL)算法已经越来越多地使用,以解决各种决策问题,例如自动驾驶和机器人技术。但是,这些算法在部署在安全至关重要的环境中时面临着巨大的挑战,因为它们经常表现出错误的行为,可能导致潜在的关键错误。评估DRL代理的安全性的一种方法是测试它们,以检测可能导致执行过程中严重失败的故障。这就提出了一个问题,即我们如何有效测试DRL政策以确保其正确性和遵守安全要求。测试DRL代理的大多数现有作品都使用扰动代理的对抗性攻击。但是,这种攻击通常会导致环境的不切实际状态。他们的主要目标是测试DRL代理的鲁棒性,而不是测试代理商在要求方面的合规性。由于DRL环境的巨大状态空间,测试执行的高成本以及DRL算法的黑盒性质,因此不可能对DRL代理进行详尽的测试。在本文中,我们提出了一种基于搜索的强化学习代理(Starla)的测试方法,以通过有效地在有限的测试预算中寻找无法执行的代理执行,以测试DRL代理的策略。我们使用机器学习模型和专用的遗传算法来缩小搜索错误的搜索。我们将Starla应用于深Q学习剂,该Qualla被广泛用作基准测试,并表明它通过检测到与代理商策略相关的更多故障来大大优于随机测试。我们还研究了如何使用我们的搜索结果提取表征DRL代理的错误事件的规则。这些规则可用于了解代理失败的条件,从而评估其部署风险。
translated by 谷歌翻译
我们考虑单个强化学习与基于事件驱动的代理商金融市场模型相互作用时学习最佳执行代理的学习动力。交易在事件时间内通过匹配引擎进行异步进行。最佳执行代理在不同级别的初始订单尺寸和不同尺寸的状态空间上进行考虑。使用校准方法考虑了对基于代理的模型和市场的影响,该方法探讨了经验性风格化事实和价格影响曲线的变化。收敛,音量轨迹和动作痕迹图用于可视化学习动力学。这表明了最佳执行代理如何在模拟的反应性市场框架内学习最佳交易决策,以及如何通过引入战略订单分类来改变模拟市场的反反应。
translated by 谷歌翻译
背景:机器学习(ML)可以实现有效的自动测试生成。目的:我们表征了新兴研究,检查测试实践,研究人员目标,应用的ML技术,评估和挑战。方法:我们对97个出版物的样本进行系统文献综述。结果:ML生成系统,GUI,单位,性能和组合测试的输入或改善现有生成方法的性能。 ML还用于生成测试判决,基于属性的和预期的输出序列。经常基于神经网络和强化学习的监督学习通常是基于Q学习的 - 很普遍,并且某些出版物还采用了无监督或半监督的学习。使用传统的测试指标和与ML相关的指标(例如准确性)评估(半/非 - )监督方法,而经常使用与奖励功能相关的测试指标来评估强化学习。结论:工作到尽头表现出巨大的希望,但是在培训数据,再探术,可伸缩性,评估复杂性,所采用的ML算法以及如何应用 - 基准和可复制性方面存在公开挑战。我们的发现可以作为该领域研究人员的路线图和灵感。
translated by 谷歌翻译
2048 is a single-player stochastic puzzle game. This intriguing and addictive game has been popular worldwide and has attracted researchers to develop game-playing programs. Due to its simplicity and complexity, 2048 has become an interesting and challenging platform for evaluating the effectiveness of machine learning methods. This dissertation conducts comprehensive research on reinforcement learning and computer game algorithms for 2048. First, this dissertation proposes optimistic temporal difference learning, which significantly improves the quality of learning by employing optimistic initialization to encourage exploration for 2048. Furthermore, based on this approach, a state-of-the-art program for 2048 is developed, which achieves the highest performance among all learning-based programs, namely an average score of 625377 points and a rate of 72% for reaching 32768-tiles. Second, this dissertation investigates several techniques related to 2048, including the n-tuple network ensemble learning, Monte Carlo tree search, and deep reinforcement learning. These techniques are promising for further improving the performance of the current state-of-the-art program. Finally, this dissertation discusses pedagogical applications related to 2048 by proposing course designs and summarizing the teaching experience. The proposed course designs use 2048-like games as materials for beginners to learn reinforcement learning and computer game algorithms. The courses have been successfully applied to graduate-level students and received well by student feedback.
translated by 谷歌翻译
Safe Reinforcement Learning can be defined as the process of learning policies that maximize the expectation of the return in problems in which it is important to ensure reasonable system performance and/or respect safety constraints during the learning and/or deployment processes. We categorize and analyze two approaches of Safe Reinforcement Learning. The first is based on the modification of the optimality criterion, the classic discounted finite/infinite horizon, with a safety factor. The second is based on the modification of the exploration process through the incorporation of external knowledge or the guidance of a risk metric. We use the proposed classification to survey the existing literature, as well as suggesting future directions for Safe Reinforcement Learning.
translated by 谷歌翻译
强化学习(RL)在很大程度上依赖于探索以从环境中学习并最大程度地获得观察到的奖励。因此,必须设计一个奖励功能,以确保从收到的经验中获得最佳学习。以前的工作将自动机和基于逻辑的奖励成型与环境假设相结合,以提供自动机制,以根据任务综合奖励功能。但是,关于如何将基于逻辑的奖励塑造扩大到多代理增强学习(MARL)的工作有限。如果任务需要合作,则环境将需要考虑联合状态,以跟踪其他代理,从而遭受对代理数量的维度的诅咒。该项目探讨了如何针对不同场景和任务设计基于逻辑的奖励成型。我们提出了一种针对半偏心逻辑基于逻辑的MARL奖励成型的新方法,该方法在代理数量中是可扩展的,并在多种情况下对其进行了评估。
translated by 谷歌翻译
我们提出了一种新的方法来自动化定理证明和演绎计划的综合,其中alphazero式的代理人正在自我培训,以完善以非确定计划表示的高级专家策略。一个类似的教师代理人是自我训练,以产生对学习者的适当相关性和难度的任务。这允许利用最少的域知识来解决训练数据无法获得或难以合成的问题。我们说明了关于命令程序不变合成问题的方法,并使用神经网络来完善教师和求解器策略。
translated by 谷歌翻译
The use of reinforcement learning has proven to be very promising for solving complex activities without human supervision during their learning process. However, their successful applications are predominantly focused on fictional and entertainment problems - such as games. Based on the above, this work aims to shed light on the application of reinforcement learning to solve this relevant real-world problem, the genome assembly. By expanding the only approach found in the literature that addresses this problem, we carefully explored the aspects of intelligent agent learning, performed by the Q-learning algorithm, to understand its suitability to be applied in scenarios whose characteristics are more similar to those faced by real genome projects. The improvements proposed here include changing the previously proposed reward system and including state space exploration optimization strategies based on dynamic pruning and mutual collaboration with evolutionary computing. These investigations were tried on 23 new environments with larger inputs than those used previously. All these environments are freely available on the internet for the evolution of this research by the scientific community. The results suggest consistent performance progress using the proposed improvements, however, they also demonstrate the limitations of them, especially related to the high dimensionality of state and action spaces. We also present, later, the paths that can be traced to tackle genome assembly efficiently in real scenarios considering recent, successfully reinforcement learning applications - including deep reinforcement learning - from other domains dealing with high-dimensional inputs.
translated by 谷歌翻译
如今,合作多代理系统用于学习如何在大规模动态环境中实现目标。然而,在这些环境中的学习是具有挑战性的:从搜索空间大小对学习时间的影响,代理商之间的低效合作。此外,增强学习算法可能遭受这种环境的长时间的收敛。本文介绍了通信框架。在拟议的沟通框架中,代理商学会有效地合作,同时通过引入新的状态计算方法,状态空间的大小将大大下降。此外,提出了一种知识传输算法以共享不同代理商之间的获得经验,并制定有效的知识融合机制,以融合利用来自其他团队成员所收到的知识的代理商自己的经验。最后,提供了模拟结果以指示所提出的方法在复杂学习任务中的功效。我们已经评估了我们对牧羊化问题的方法,结果表明,通过利用知识转移机制,学习过程加速了,通过基于状态抽象概念产生类似国家的状态空间的大小均下降。
translated by 谷歌翻译
蒙特卡洛树搜索(MCT)是设计游戏机器人或解决顺序决策问题的强大方法。该方法依赖于平衡探索和开发的智能树搜索。MCT以模拟的形式进行随机抽样,并存储动作的统计数据,以在每个随后的迭代中做出更有教育的选择。然而,该方法已成为组合游戏的最新技术,但是,在更复杂的游戏(例如那些具有较高的分支因素或实时系列的游戏)以及各种实用领域(例如,运输,日程安排或安全性)有效的MCT应用程序通常需要其与问题有关的修改或与其他技术集成。这种特定领域的修改和混合方法是本调查的主要重点。最后一项主要的MCT调查已于2012年发布。自发布以来出现的贡献特别感兴趣。
translated by 谷歌翻译
The reinforcement learning paradigm is a popular way to address problems that have only limited environmental feedback, rather than correctly labeled examples, as is common in other machine learning contexts. While significant progress has been made to improve learning in a single task, the idea of transfer learning has only recently been applied to reinforcement learning tasks. The core idea of transfer is that experience gained in learning to perform one task can help improve learning performance in a related, but different, task. In this article we present a framework that classifies transfer learning methods in terms of their capabilities and goals, and then use it to survey the existing literature, as well as to suggest future directions for transfer learning work.
translated by 谷歌翻译
General mathematical reasoning is computationally undecidable, but humans routinely solve new problems. Moreover, discoveries developed over centuries are taught to subsequent generations quickly. What structure enables this, and how might that inform automated mathematical reasoning? We posit that central to both puzzles is the structure of procedural abstractions underlying mathematics. We explore this idea in a case study on 5 sections of beginning algebra on the Khan Academy platform. To define a computational foundation, we introduce Peano, a theorem-proving environment where the set of valid actions at any point is finite. We use Peano to formalize introductory algebra problems and axioms, obtaining well-defined search problems. We observe existing reinforcement learning methods for symbolic reasoning to be insufficient to solve harder problems. Adding the ability to induce reusable abstractions ("tactics") from its own solutions allows an agent to make steady progress, solving all problems. Furthermore, these abstractions induce an order to the problems, seen at random during training. The recovered order has significant agreement with the expert-designed Khan Academy curriculum, and second-generation agents trained on the recovered curriculum learn significantly faster. These results illustrate the synergistic role of abstractions and curricula in the cultural transmission of mathematics.
translated by 谷歌翻译
Monte Carlo Tree Search (MCTS) is a recently proposed search method that combines the precision of tree search with the generality of random sampling. It has received considerable interest due to its spectacular success in the difficult problem of computer Go, but has also proved beneficial in a range of other domains. This paper is a survey of the literature to date, intended to provide a snapshot of the state of the art after the first five years of MCTS research. We outline the core algorithm's derivation, impart some structure on the many variations and enhancements that have been proposed, and summarise the results from the key game and non-game domains to which MCTS methods have been applied. A number of open research questions indicate that the field is ripe for future work.
translated by 谷歌翻译
This paper surveys the eld of reinforcement learning from a computer-science perspective. It is written to be accessible to researchers familiar with machine learning. Both the historical basis of the eld and a broad selection of current work are summarized. Reinforcement learning is the problem faced by an agent that learns behavior through trial-and-error interactions with a dynamic environment. The work described here has a resemblance to work in psychology, but di ers considerably in the details and in the use of the word \reinforcement." The paper discusses central issues of reinforcement learning, including trading o exploration and exploitation, establishing the foundations of the eld via Markov decision theory, learning from delayed reinforcement, constructing empirical models to accelerate learning, making use of generalization and hierarchy, and coping with hidden state. It concludes with a survey of some implemented systems and an assessment of the practical utility of current methods for reinforcement learning.
translated by 谷歌翻译
防御网络攻击的计算机网络需要及时应对警报和威胁情报。关于如何响应的决定涉及基于妥协指标的多个节点跨多个节点协调动作,同时最大限度地减少对网络操作的中断。目前,PlayBooks用于自动化响应过程的部分,但通常将复杂的决策留给人类分析师。在这项工作中,我们在大型工业控制网络中提出了一种深度增强学习方法,以便在大型工业控制网络中进行自主反应和恢复。我们提出了一种基于关注的神经结构,其在保护下灵活地灵活。要培训和评估自治防御者代理,我们提出了一个适合加强学习的工业控制网络仿真环境。实验表明,学习代理可以有效减轻在执行前几个月几个月的可观察信号的进步。所提出的深度加强学习方法优于模拟中完全自动化的Playbook方法,采取更少的破坏性动作,同时在网络上保留更多节点。学习的政策对攻击者行为的变化也比PlayBook方法更加强大。
translated by 谷歌翻译
Reinforcement learning (RL) is one of the most important branches of AI. Due to its capacity for self-adaption and decision-making in dynamic environments, reinforcement learning has been widely applied in multiple areas, such as healthcare, data markets, autonomous driving, and robotics. However, some of these applications and systems have been shown to be vulnerable to security or privacy attacks, resulting in unreliable or unstable services. A large number of studies have focused on these security and privacy problems in reinforcement learning. However, few surveys have provided a systematic review and comparison of existing problems and state-of-the-art solutions to keep up with the pace of emerging threats. Accordingly, we herein present such a comprehensive review to explain and summarize the challenges associated with security and privacy in reinforcement learning from a new perspective, namely that of the Markov Decision Process (MDP). In this survey, we first introduce the key concepts related to this area. Next, we cover the security and privacy issues linked to the state, action, environment, and reward function of the MDP process, respectively. We further highlight the special characteristics of security and privacy methodologies related to reinforcement learning. Finally, we discuss the possible future research directions within this area.
translated by 谷歌翻译