In recent years, reinforcement learning (RL) has become increasingly successful in its application to science and the process of scientific discovery in general. However, while RL algorithms learn to solve increasingly complex problems, interpreting the solutions they provide becomes ever more challenging. In this work, we gain insights into an RL agent's learned behavior through a post-hoc analysis based on sequence mining and clustering. Specifically, frequent and compact subroutines, used by the agent to solve a given task, are distilled as gadgets and then grouped by various metrics. This process of gadget discovery develops in three stages: First, we use an RL agent to generate data, then, we employ a mining algorithm to extract gadgets and finally, the obtained gadgets are grouped by a density-based clustering algorithm. We demonstrate our method by applying it to two quantum-inspired RL environments. First, we consider simulated quantum optics experiments for the design of high-dimensional multipartite entangled states where the algorithm finds gadgets that correspond to modern interferometer setups. Second, we consider a circuit-based quantum computing environment where the algorithm discovers various gadgets for quantum information processing, such as quantum teleportation. This approach for analyzing the policy of a learned agent is agent and environment agnostic and can yield interesting insights into any agent's policy.
translated by 谷歌翻译
Quantum Computing在古典计算机上解决困难的计算任务的显着改进承诺。然而,为实际使用设计量子电路不是琐碎的目标,并且需要专家级知识。为了帮助这一努力,提出了一种基于机器学习的方法来构建量子电路架构。以前的作品已经证明,经典的深度加强学习(DRL)算法可以成功构建量子电路架构而没有编码的物理知识。但是,这些基于DRL的作品不完全在更换设备噪声中的设置,从而需要大量的培训资源来保持RL模型最新。考虑到这一点,我们持续学习,以提高算法的性能。在本文中,我们介绍了深度Q-Learning(PPR-DQL)框架的概率策略重用来解决这个电路设计挑战。通过通过各种噪声模式进行数值模拟,我们证明了具有PPR的RL代理能够找到量子栅极序列,以比从划痕训练的代理更快地生成双量标铃声状态。所提出的框架是一般的,可以应用于其他量子栅极合成或控制问题 - 包括量子器件的自动校准。
translated by 谷歌翻译
近年来,机器学习的巨大进步已经开始对许多科学和技术的许多领域产生重大影响。在本文的文章中,我们探讨了量子技术如何从这项革命中受益。我们在说明性示例中展示了过去几年的科学家如何开始使用机器学习和更广泛的人工智能方法来分析量子测量,估计量子设备的参数,发现新的量子实验设置,协议和反馈策略,以及反馈策略,以及通常改善量子计算,量子通信和量子模拟的各个方面。我们重点介绍了公开挑战和未来的可能性,并在未来十年的一些投机愿景下得出结论。
translated by 谷歌翻译
Quantum computing (QC) promises significant advantages on certain hard computational tasks over classical computers. However, current quantum hardware, also known as noisy intermediate-scale quantum computers (NISQ), are still unable to carry out computations faithfully mainly because of the lack of quantum error correction (QEC) capability. A significant amount of theoretical studies have provided various types of QEC codes; one of the notable topological codes is the surface code, and its features, such as the requirement of only nearest-neighboring two-qubit control gates and a large error threshold, make it a leading candidate for scalable quantum computation. Recent developments of machine learning (ML)-based techniques especially the reinforcement learning (RL) methods have been applied to the decoding problem and have already made certain progress. Nevertheless, the device noise pattern may change over time, making trained decoder models ineffective. In this paper, we propose a continual reinforcement learning method to address these decoding challenges. Specifically, we implement double deep Q-learning with probabilistic policy reuse (DDQN-PPR) model to learn surface code decoding strategies for quantum environments with varying noise patterns. Through numerical simulations, we show that the proposed DDQN-PPR model can significantly reduce the computational complexity. Moreover, increasing the number of trained policies can further improve the agent's performance. Our results open a way to build more capable RL agents which can leverage previously gained knowledge to tackle QEC challenges.
translated by 谷歌翻译
在过去的十年中,深入的强化学习(RL)已经取得了长足的进步。同时,最先进的RL算法在培训时间融合方面需要大量的计算预算。最近的工作已经开始通过量子计算的角度来解决这个问题,这有望为几项传统上的艰巨任务做出理论上的速度。在这项工作中,我们研究了一类混合量子古典RL算法,我们共同称为变异量子Q-NETWORKS(VQ-DQN)。我们表明,VQ-DQN方法受到导致学习政策分歧的不稳定性的约束,研究了基于经典模拟的既定结果的重复性,并执行系统的实验以识别观察到的不稳定性的潜在解释。此外,与大多数现有的量子增强学习中现有工作相反,我们在实际量子处理单元(IBM量子设备)上执行RL算法,并研究模拟和物理量子系统之间因实施不足而进行的行为差异。我们的实验表明,与文献中相反的主张相反,与经典方法相比,即使在没有物理缺陷的情况下进行模拟,也不能最终决定是否已知量子方法,也可以提供优势。最后,我们提供了VQ-DQN作为可再现的测试床的强大,通用且经过充分测试的实现,以实现未来的实验。
translated by 谷歌翻译
Deep Reinforcement Learning is emerging as a promising approach for the continuous control task of robotic arm movement. However, the challenges of learning robust and versatile control capabilities are still far from being resolved for real-world applications, mainly because of two common issues of this learning paradigm: the exploration strategy and the slow learning speed, sometimes known as "the curse of dimensionality". This work aims at exploring and assessing the advantages of the application of Quantum Computing to one of the state-of-art Reinforcement Learning techniques for continuous control - namely Soft Actor-Critic. Specifically, the performance of a Variational Quantum Soft Actor-Critic on the movement of a virtual robotic arm has been investigated by means of digital simulations of quantum circuits. A quantum advantage over the classical algorithm has been found in terms of a significant decrease in the amount of required parameters for satisfactory model training, paving the way for further promising developments.
translated by 谷歌翻译
蒙特卡洛树搜索(MCT)是设计游戏机器人或解决顺序决策问题的强大方法。该方法依赖于平衡探索和开发的智能树搜索。MCT以模拟的形式进行随机抽样,并存储动作的统计数据,以在每个随后的迭代中做出更有教育的选择。然而,该方法已成为组合游戏的最新技术,但是,在更复杂的游戏(例如那些具有较高的分支因素或实时系列的游戏)以及各种实用领域(例如,运输,日程安排或安全性)有效的MCT应用程序通常需要其与问题有关的修改或与其他技术集成。这种特定领域的修改和混合方法是本调查的主要重点。最后一项主要的MCT调查已于2012年发布。自发布以来出现的贡献特别感兴趣。
translated by 谷歌翻译
在这项研究中,我们将人工智力的普遍增强学习(URL)代理模型扩展到量子环境。经典探索随机知识寻求代理,KL-KSA的实用功能是从密度矩阵上量子信息理论的距离措施。量子处理断层扫描(QPT)算法形成了用于建模环境动态的易解的程序。基于基于算法复杂度以及计算资源复杂性的可变成本函数来选择最佳QPT策略。我们而不是提供机器,我们估计了高级语言的成本指标,以允许现实的实验。整个代理设计封装在自我复制Quine中,基于最佳策略选择方案的预测值突变成本函数。因此,具有帕累托 - 最佳QPT政策的多个代理商使用遗传编程而发展,模仿各种资源权衡的物理理论的发展。这一正式框架被称为量子知识寻求代理(QKSA)。尽管其重要性,但很少有量子强化学习模型与量子机器学习中的电流推力相反。 QKSA是类似于古典URL模型的框架的第一个提议。类似于AIXI-TL如何是SOLOMONOFF通用归纳的资源有限的活动版本,QKSA是一个资源有限的参与观察者框架,用于最近提出的基于量子力学的基于量子学的算法的重建。 QKSA可以应用于仿真和研究量子信息理论的方面。具体地,我们证明它可以用于加速量子变分算法,该算法包括断层重建作为其积分子程序。
translated by 谷歌翻译
随着真实世界量子计算的出现,参数化量子计算可以用作量子古典机器学习系统中的假设家庭的想法正在增加牵引力的增加。这种混合系统已经表现出潜力在监督和生成学习中解决现实世界任务,最近的作品已经在特殊的人工任务中建立了他们可提供的优势。然而,在加强学习的情况下,可以说是最具挑战性的,并且学习提升将是极为有价值的,在解决甚至标准的基准测试方面没有成功地取得了成功,也没有在典型算法上表达理论上的学习优势。在这项工作中,我们均达到两者。我们提出了一种使用很少的Qubits的混合量子古典强化学习模型,我们展示了可以有效地培训,以解决若干标准基准环境。此外,我们展示和正式证明,参数化量子电路解决了用于古典模型的棘手的某些学习任务的能力,包括当前最先进的深神经网络,在离散对数问题的广泛的经典硬度下。
translated by 谷歌翻译
Adequately assigning credit to actions for future outcomes based on their contributions is a long-standing open challenge in Reinforcement Learning. The assumptions of the most commonly used credit assignment method are disadvantageous in tasks where the effects of decisions are not immediately evident. Furthermore, this method can only evaluate actions that have been selected by the agent, making it highly inefficient. Still, no alternative methods have been widely adopted in the field. Hindsight Credit Assignment is a promising, but still unexplored candidate, which aims to solve the problems of both long-term and counterfactual credit assignment. In this thesis, we empirically investigate Hindsight Credit Assignment to identify its main benefits, and key points to improve. Then, we apply it to factored state representations, and in particular to state representations based on the causal structure of the environment. In this setting, we propose a variant of Hindsight Credit Assignment that effectively exploits a given causal structure. We show that our modification greatly decreases the workload of Hindsight Credit Assignment, making it more efficient and enabling it to outperform the baseline credit assignment method on various tasks. This opens the way to other methods based on given or learned causal structures.
translated by 谷歌翻译
这是一门专门针对STEM学生开发的介绍性机器学习课程。我们的目标是为有兴趣的读者提供基础知识,以在自己的项目中使用机器学习,并将自己熟悉术语作为进一步阅读相关文献的基础。在这些讲义中,我们讨论受监督,无监督和强化学习。注释从没有神经网络的机器学习方法的说明开始,例如原理分析,T-SNE,聚类以及线性回归和线性分类器。我们继续介绍基本和先进的神经网络结构,例如密集的进料和常规神经网络,经常性的神经网络,受限的玻尔兹曼机器,(变性)自动编码器,生成的对抗性网络。讨论了潜在空间表示的解释性问题,并使用梦和对抗性攻击的例子。最后一部分致力于加强学习,我们在其中介绍了价值功能和政策学习的基本概念。
translated by 谷歌翻译
强化学习和最近的深度增强学习是解决如Markov决策过程建模的顺序决策问题的流行方法。问题和选择算法和超参数的RL建模需要仔细考虑,因为不同的配置可能需要完全不同的性能。这些考虑因素主要是RL专家的任务;然而,RL在研究人员和系统设计师不是RL专家的其他领域中逐渐变得流行。此外,许多建模决策,例如定义状态和动作空间,批次的大小和批量更新的频率以及时间戳的数量通常是手动进行的。由于这些原因,RL框架的自动化不同组成部分具有重要意义,近年来它引起了很多关注。自动RL提供了一个框架,其中RL的不同组件包括MDP建模,算法选择和超参数优化是自动建模和定义的。在本文中,我们探讨了可以在自动化RL中使用的文献和目前的工作。此外,我们讨论了Autorl中的挑战,打开问题和研究方向。
translated by 谷歌翻译
基于变异方法的量子算法是构建量子溶液的最有前途的方法之一,并在过去几年中发现了无数的应用。尽管具有适应性和简单性,但它们的可扩展性和选择合适的ATZ的选择仍然是主要的挑战。在这项工作中,我们报告了基于嵌套的蒙特卡洛树搜索(MCTS)的算法框架,并与组合多部队的bastit相结合( CMAB)模型,用于量子电路的自动设计。通过数值实验,我们证明了应用于各种问题的算法,包括量子化学中的地面能量问题,在图上进行量子优化,求解线性方程的系统,并找到编码编码与现有方法相比,用于量子误差检测代码的电路,结果表明我们的电路设计算法可以探索更大的搜索空间并优化较大系统的量子电路,从而显示出多功能性和可扩展性。
translated by 谷歌翻译
由于数据量增加,金融业的快速变化已经彻底改变了数据处理和数据分析的技术,并带来了新的理论和计算挑战。与古典随机控制理论和解决财务决策问题的其他分析方法相比,解决模型假设的财务决策问题,强化学习(RL)的新发展能够充分利用具有更少模型假设的大量财务数据并改善复杂的金融环境中的决策。该调查纸目的旨在审查最近的资金途径的发展和使用RL方法。我们介绍了马尔可夫决策过程,这是许多常用的RL方法的设置。然后引入各种算法,重点介绍不需要任何模型假设的基于价值和基于策略的方法。连接是用神经网络进行的,以扩展框架以包含深的RL算法。我们的调查通过讨论了这些RL算法在金融中各种决策问题中的应用,包括最佳执行,投资组合优化,期权定价和对冲,市场制作,智能订单路由和Robo-Awaring。
translated by 谷歌翻译
Monte Carlo Tree Search (MCTS) is a recently proposed search method that combines the precision of tree search with the generality of random sampling. It has received considerable interest due to its spectacular success in the difficult problem of computer Go, but has also proved beneficial in a range of other domains. This paper is a survey of the literature to date, intended to provide a snapshot of the state of the art after the first five years of MCTS research. We outline the core algorithm's derivation, impart some structure on the many variations and enhancements that have been proposed, and summarise the results from the key game and non-game domains to which MCTS methods have been applied. A number of open research questions indicate that the field is ripe for future work.
translated by 谷歌翻译
近年来,变异量子算法(例如量子近似优化算法(QAOA))越来越受欢迎,因为它们提供了使用NISQ设备来解决硬组合优化问题的希望。但是,众所周知,在低深度,QAOA的某些位置限制限制了其性能。为了超越这些局限性,提出了QAOA的非本地变体,即递归QAOA(RQAOA),以提高近似溶液的质量。 RQAOA的研究比QAOA的研究较少,例如,对于哪种情况,它可能无法提供高质量的解决方案。但是,由于我们正在解决$ \ mathsf {np} $ - 硬问题(特别是Ising旋转模型),因此预计RQAOA确实会失败,这提出了设计更好的组合优化量子算法的问题。本着这种精神,我们识别和分析了RQAOA失败的情况,并基于此,提出了增强的学习增强的RQAOA变体(RL-RQAOA),从而改善了RQAOA。我们表明,RL-RQAOA的性能改善了RQAOA:RL-RQAOA在这些识别的实例中,RQAOA表现不佳,并且在RQAOA几乎是最佳的情况下也表现出色。我们的工作体现了增强学习与量子(启发)优化之间的潜在有益的协同作用,这是针对硬性问题的新的,甚至更好的启发式方法。
translated by 谷歌翻译
在人类中,感知意识促进了来自感官输入的快速识别和提取信息。这种意识在很大程度上取决于人类代理人如何与环境相互作用。在这项工作中,我们提出了主动神经生成编码,用于学习动作驱动的生成模型的计算框架,而不会在动态环境中反正出错误(Backprop)。具体而言,我们开发了一种智能代理,即使具有稀疏奖励,也可以从规划的认知理论中汲取灵感。我们展示了我们框架与深度Q学习竞争力的几个简单的控制问题。我们的代理的强劲表现提供了有希望的证据,即神经推断和学习的无背方法可以推动目标定向行为。
translated by 谷歌翻译
In this thesis, we consider two simple but typical control problems and apply deep reinforcement learning to them, i.e., to cool and control a particle which is subject to continuous position measurement in a one-dimensional quadratic potential or in a quartic potential. We compare the performance of reinforcement learning control and conventional control strategies on the two problems, and show that the reinforcement learning achieves a performance comparable to the optimal control for the quadratic case, and outperforms conventional control strategies for the quartic case for which the optimal control strategy is unknown. To our knowledge, this is the first time deep reinforcement learning is applied to quantum control problems in continuous real space. Our research demonstrates that deep reinforcement learning can be used to control a stochastic quantum system in real space effectively as a measurement-feedback closed-loop controller, and our research also shows the ability of AI to discover new control strategies and properties of the quantum systems that are not well understood, and we can gain insights into these problems by learning from the AI, which opens up a new regime for scientific research.
translated by 谷歌翻译
强化学习目睹了最近在量子编程中的各种任务中的应用。基本的假设是这些任务可以建模为马尔可夫决策过程(MDP)。在这里,我们通过探索量子编程中的两个基本任务的后果来研究该假设的可行性:状态制备和门编译。通过形成离散的MDP,专门针对单量的情况(无论有没有噪声),我们可以通过策略迭代准确地为最佳策略求解。我们找到与最短门序列相对应的最佳路径,以准备状态或编译门,直至某些目标精度。例如,我们发现$ h $和$ t $门的序列长达$ 11 $生产$ \ sim 99 \%$ $ fidelity表格$(ht)^{n} | 0 \ rangle $值高达$ n = 10^{10} $。在存在门噪声的情况下,我们演示了最佳政策如何适应嘈杂的门的影响,以实现更高的状态忠诚度。我们的工作表明,人们可以将离散,随机和马尔可夫的性质强加于连续,确定性和非马克维亚量子演化,并提供理论上的洞察力,以了解为什么可以成功地使用强化学习来找到量子编程中的最佳短门序列。
translated by 谷歌翻译
This paper surveys the eld of reinforcement learning from a computer-science perspective. It is written to be accessible to researchers familiar with machine learning. Both the historical basis of the eld and a broad selection of current work are summarized. Reinforcement learning is the problem faced by an agent that learns behavior through trial-and-error interactions with a dynamic environment. The work described here has a resemblance to work in psychology, but di ers considerably in the details and in the use of the word \reinforcement." The paper discusses central issues of reinforcement learning, including trading o exploration and exploitation, establishing the foundations of the eld via Markov decision theory, learning from delayed reinforcement, constructing empirical models to accelerate learning, making use of generalization and hierarchy, and coping with hidden state. It concludes with a survey of some implemented systems and an assessment of the practical utility of current methods for reinforcement learning.
translated by 谷歌翻译