VAR-VAR控制(VVC)是通过控制电源系统中的执行器在健康状态内运行电源分配系统的问题。现有作品主要采用代表电力系统(带有树拓扑的图)作为训练深钢筋学习(RL)策略的向量的常规例程。我们提出了一个将RL与图形神经网络相结合的框架,并研究VVC设置中基于图的策略的好处和局限性。我们的结果表明,与向量表示相比,基于图的策略会渐近地收敛到相同的奖励。我们对观察和行动的影响进行进一步分析:在观察端,我们研究了基于图形的策略对功率系统中两个典型数据采集错误的鲁棒性,即传感器通信失败和测量错误。在动作端,我们表明执行器对系统有各种影响,因此使用由电源系统拓扑引起的图表表示可能不是最佳选择。最后,我们进行了一项案例研究,以证明读取功能架构和图形增强的选择可以进一步提高训练性能和鲁棒性。
translated by 谷歌翻译
We consider the problem of multi-agent navigation and collision avoidance when observations are limited to the local neighborhood of each agent. We propose InforMARL, a novel architecture for multi-agent reinforcement learning (MARL) which uses local information intelligently to compute paths for all the agents in a decentralized manner. Specifically, InforMARL aggregates information about the local neighborhood of agents for both the actor and the critic using a graph neural network and can be used in conjunction with any standard MARL algorithm. We show that (1) in training, InforMARL has better sample efficiency and performance than baseline approaches, despite using less information, and (2) in testing, it scales well to environments with arbitrary numbers of agents and obstacles.
translated by 谷歌翻译
深度强化学习(DRL)赋予了各种人工智能领域,包括模式识别,机器人技术,推荐系统和游戏。同样,图神经网络(GNN)也证明了它们在图形结构数据的监督学习方面的出色表现。最近,GNN与DRL用于图形结构环境的融合引起了很多关注。本文对这些混合动力作品进行了全面评论。这些作品可以分为两类:(1)算法增强,其中DRL和GNN相互补充以获得更好的实用性; (2)特定于应用程序的增强,其中DRL和GNN相互支持。这种融合有效地解决了工程和生命科学方面的各种复杂问题。基于审查,我们进一步分析了融合这两个领域的适用性和好处,尤其是在提高通用性和降低计算复杂性方面。最后,集成DRL和GNN的关键挑战以及潜在的未来研究方向被突出显示,这将引起更广泛的机器学习社区的关注。
translated by 谷歌翻译
过程合成经历了数字化和人工智能加速的破坏性转换。我们提出了一种基于最先进的演员批评逻辑的化学过程设计的增强学习算法。我们提出的算法代表化学过程作为图形,并使用图形卷积神经网络从过程图中学习。特别是,图形神经网络是在代理体系结构中实现的,以处理状态并做出决策。此外,我们实施了一个层次结构和混合决策过程来生成流程表,在该过程中,将单位操作迭代作为离散决策和相应的设计变量选择作为连续决策。我们证明了我们的方法在包括平衡反应,共聚物分离和回收的一个说明性案例研究中设计经济可行的流程表的潜力。结果显示在离散,连续和混合动作空间中快速学习。由于拟议的强化学习代理的灵活体系结构,该方法被预定为包括大型动作状态空间和在未来研究中处理模拟器的接口。
translated by 谷歌翻译
本文介绍了电力系统运营商的域知识如何集成到强化学习(RL)框架中,以有效学习控制电网拓扑以防止热级联的代理。由于大搜索/优化空间,典型的基于RL的拓扑控制器无法表现良好。在这里,我们提出了一个基于演员 - 评论家的代理,以解决问题的组合性质,并使用由RTE,法国TSO开发的RL环境训练代理。为了解决大型优化空间的挑战,通过使用网络物理修改环境以增强代理学习来纳入训练过程中的基于奖励调整的基于课程的方法。此外,采用多种方案的并行训练方法来避免将代理偏置到几种情况,并使其稳健地对网格操作中的自然变异性。如果没有对培训过程进行这些修改,则RL代理失败了大多数测试场景,说明了正确整合物理系统的域知识以获得真实世界的RL学习的重要性。该代理通过RTE测试2019年学习,以运行电力网络挑战,并以精确度和第1位的速度授予第2位。开发的代码是公共使用开放的。
translated by 谷歌翻译
本文介绍了电力网络的问题,可以为应用多功能增强学习(Marl)创造一个令人兴奋和挑战的现实情景。脱碳的新出现趋势在配电网络上放置过大的压力。主动电压控制被视为有希望的解决方案,以减轻电力拥塞和改善电压质量,无需额外的硬件投资,利用网络中的可控装置,例如屋顶光伏(PVS)和静态VAR补偿器(SVC)。这些可控设备出现在大量广大数字中,并分布在宽的地理区域中,使Marl成为自然候选者。本文在DEC-POMDP框架中制定了主动电压控制问题,并建立了开源环境。它旨在弥合电力社区与马尔社区之间的差距,并成为马尔算法实际应用的驱动力。最后,我们分析了主动电压控制问题的特殊特征,导致最先进的Marl方法挑战,并总结了潜在的方向。
translated by 谷歌翻译
Graph mining tasks arise from many different application domains, ranging from social networks, transportation to E-commerce, etc., which have been receiving great attention from the theoretical and algorithmic design communities in recent years, and there has been some pioneering work employing the research-rich Reinforcement Learning (RL) techniques to address graph data mining tasks. However, these graph mining methods and RL models are dispersed in different research areas, which makes it hard to compare them. In this survey, we provide a comprehensive overview of RL and graph mining methods and generalize these methods to Graph Reinforcement Learning (GRL) as a unified formulation. We further discuss the applications of GRL methods across various domains and summarize the method descriptions, open-source codes, and benchmark datasets of GRL methods. Furthermore, we propose important directions and challenges to be solved in the future. As far as we know, this is the latest work on a comprehensive survey of GRL, this work provides a global view and a learning resource for scholars. In addition, we create an online open-source for both interested scholars who want to enter this rapidly developing domain and experts who would like to compare GRL methods.
translated by 谷歌翻译
Optimal Power Flow (OPF) is a very traditional research area within the power systems field that seeks for the optimal operation point of electric power plants, and which needs to be solved every few minutes in real-world scenarios. However, due to the nonconvexities that arise in power generation systems, there is not yet a fast, robust solution technique for the full Alternating Current Optimal Power Flow (ACOPF). In the last decades, power grids have evolved into a typical dynamic, non-linear and large-scale control system, known as the power system, so searching for better and faster ACOPF solutions is becoming crucial. Appearance of Graph Neural Networks (GNN) has allowed the natural use of Machine Learning (ML) algorithms on graph data, such as power networks. On the other hand, Deep Reinforcement Learning (DRL) is known for its powerful capability to solve complex decision-making problems. Although solutions that use these two methods separately are beginning to appear in the literature, none has yet combined the advantages of both. We propose a novel architecture based on the Proximal Policy Optimization algorithm with Graph Neural Networks to solve the Optimal Power Flow. The objective is to design an architecture that learns how to solve the optimization problem and that is at the same time able to generalize to unseen scenarios. We compare our solution with the DCOPF in terms of cost after having trained our DRL agent on IEEE 30 bus system and then computing the OPF on that base network with topology changes
translated by 谷歌翻译
最近的研究表明,图形神经网络(GNNS)可以学习适用于典型的多层Perceptron(MLP)的运动控制的政策,具有卓越的转移和多任务性能(Wang等,2018; Huang Et al。,2020)。到目前为止,由于传感器和致动器的数量增长,GNN的性能随着传感器和执行器的数量而迅速变化,结果已经限于对小剂量的训练。在监督学习环境中使用GNN的关键动机是它们对大图的适用性,但尚未实现这种益处用于运动控制。我们将宽松的GNN架构中的弱点识别出导致这种较差的缩放:在网络中的MLP中过度拟合,用于编码,解码和传播消息。为了打击这一点,我们引入了雪花,一种用于高维连续控制的GNN训练方法,可以冻结受影响的网络部分中的参数。雪花显着提高了GNN在大型代理上的运动控制的性能,现在与MLP的性能相匹配,以及具有卓越的转移性能。
translated by 谷歌翻译
Neural algorithmic reasoning studies the problem of learning algorithms with neural networks, especially with graph architectures. A recent proposal, XLVIN, reaps the benefits of using a graph neural network that simulates the value iteration algorithm in deep reinforcement learning agents. It allows model-free planning without access to privileged information about the environment, which is usually unavailable. However, XLVIN only supports discrete action spaces, and is hence nontrivially applicable to most tasks of real-world interest. We expand XLVIN to continuous action spaces by discretization, and evaluate several selective expansion policies to deal with the large planning graphs. Our proposal, CNAP, demonstrates how neural algorithmic reasoning can make a measurable impact in higher-dimensional continuous control settings, such as MuJoCo, bringing gains in low-data settings and outperforming model-free baselines.
translated by 谷歌翻译
大型人口系统的分析和控制对研究和工程的各个领域引起了极大的兴趣,从机器人群的流行病学到经济学和金融。一种越来越流行和有效的方法来实现多代理系统中的顺序决策,这是通过多机构增强学习,因为它允许对高度复杂的系统进行自动和无模型的分析。但是,可伸缩性的关键问题使控制和增强学习算法的设计变得复杂,尤其是在具有大量代理的系统中。尽管强化学习在许多情况下都发现了经验成功,但许多代理商的问题很快就变得棘手了,需要特别考虑。在这项调查中,我们将阐明当前的方法,以通过多代理强化学习以及通过诸如平均场游戏,集体智能或复杂的网络理论等研究领域进行仔细理解和分析大型人口系统。这些经典独立的主题领域提供了多种理解或建模大型人口系统的方法,这可能非常适合将来的可拖动MARL算法制定。最后,我们调查了大规模控制的潜在应用领域,并确定了实用系统中学习算法的富有成果的未来应用。我们希望我们的调查可以为理论和应用科学的初级和高级研究人员提供洞察力和未来的方向。
translated by 谷歌翻译
在人工多智能体系中,学习协作政策的能力是基于代理商的沟通技巧,他们必须能够编码从环境中收到的信息,并学习如何与手头任务所要求的其他代理分享它。我们介绍了一个深度加强学习方法,连接驱动的通信(CDC),促进了多种子体协作行为的出现,仅通过经验。代理被建模为加权图的节点,其状态相关的边缘编码可以交换的对方式。我们介绍了一种依赖于图形的关注机制,可以控制代理的传入消息如何加权。此机制完全核对图表所表示的系统的当前状态,并在捕获信息如何在图中流动的扩散过程中构建。图形拓扑未被假定已知先验,但在代理人的观察中动态依赖于代理人,并以端到端的方式与注意机制和政策同时学习。我们的经验结果表明,CDC能够学习有效的协作政策,并可以在合作导航任务上过度执行竞争学习算法。
translated by 谷歌翻译
连续控制的强化学习(RL)通常采用其支持涵盖整个动作空间的分布。在这项工作中,我们调查了培训的代理经常更喜欢在该空间的界限中普遍采取行动的俗称已知的现象。我们在最佳控制中汲取理论联系,以发出Bang-Bang行为的出现,并在各种最近的RL算法中提供广泛的实证评估。我们通过伯努利分布替换正常高斯,该分布仅考虑沿着每个动作维度的极端 - Bang-Bang控制器。令人惊讶的是,这在几种连续控制基准测试中实现了最先进的性能 - 与机器人硬件相比,能量和维护成本影响控制器选择。由于勘探,学习和最终解决方案纠缠在RL中,我们提供了额外的模仿学习实验,以减少探索对我们分析的影响。最后,我们表明我们的观察结果概括了旨在模拟现实世界挑战和评估因素来减轻Bang-Bang解决方案的因素的环境。我们的调查结果强调了对基准测试连续控制算法的挑战,特别是在潜在的现实世界应用中。
translated by 谷歌翻译
可再生能源的增加集成为电源分销网络的运行带来了许多技术挑战。其中,由可再生能源的不稳定性引起的电压波动正在受到越来越多的关注。最近在主动电压控制任务中广泛研究了电网中的多个控制单元(能够处理电源系统快速变化)中的多个控制单元。但是,基于MARL的现有方法忽略了网格的独特性质,并实现有限的性能。在本文中,我们介绍了变压器体系结构,以提取适应电力网络问题的表示形式,并提出基于变压器的多代理参与者 - 批判框架(T-MAAC)以稳定电源分配网络中的电压。此外,我们采用了针对电压控制任务量身定制的新型辅助任务训练过程,从而提高了样品效率并促进基于变压器模型的表示。我们将T-MAAC与不同的多代理 - 参与者批评算法相结合,而主动电压控制任务的一致改进证明了该方法的有效性。
translated by 谷歌翻译
Proper functioning of connected and automated vehicles (CAVs) is crucial for the safety and efficiency of future intelligent transport systems. Meanwhile, transitioning to fully autonomous driving requires a long period of mixed autonomy traffic, including both CAVs and human-driven vehicles. Thus, collaboration decision-making for CAVs is essential to generate appropriate driving behaviors to enhance the safety and efficiency of mixed autonomy traffic. In recent years, deep reinforcement learning (DRL) has been widely used in solving decision-making problems. However, the existing DRL-based methods have been mainly focused on solving the decision-making of a single CAV. Using the existing DRL-based methods in mixed autonomy traffic cannot accurately represent the mutual effects of vehicles and model dynamic traffic environments. To address these shortcomings, this article proposes a graph reinforcement learning (GRL) approach for multi-agent decision-making of CAVs in mixed autonomy traffic. First, a generic and modular GRL framework is designed. Then, a systematic review of DRL and GRL methods is presented, focusing on the problems addressed in recent research. Moreover, a comparative study on different GRL methods is further proposed based on the designed framework to verify the effectiveness of GRL methods. Results show that the GRL methods can well optimize the performance of multi-agent decision-making for CAVs in mixed autonomy traffic compared to the DRL methods. Finally, challenges and future research directions are summarized. This study can provide a valuable research reference for solving the multi-agent decision-making problems of CAVs in mixed autonomy traffic and can promote the implementation of GRL-based methods into intelligent transportation systems. The source code of our work can be found at https://github.com/Jacklinkk/Graph_CAVs.
translated by 谷歌翻译
组合优化是运营研究和计算机科学领域的一个公认领域。直到最近,它的方法一直集中在孤立地解决问题实例,而忽略了它们通常源于实践中的相关数据分布。但是,近年来,人们对使用机器学习,尤其是图形神经网络(GNN)的兴趣激增,作为组合任务的关键构件,直接作为求解器或通过增强确切的求解器。GNN的电感偏差有效地编码了组合和关系输入,因为它们对排列和对输入稀疏性的意识的不变性。本文介绍了对这个新兴领域的最新主要进步的概念回顾,旨在优化和机器学习研究人员。
translated by 谷歌翻译
模型压缩是在功率和内存受限资源上部署深神网络(DNN)的必要技术。但是,现有的模型压缩方法通常依赖于人类的专业知识,并专注于参数的本地重要性,而忽略了DNN中丰富的拓扑信息。在本文中,我们提出了一种基于图神经网络(GNNS)的新型多阶段嵌入技术,以识别DNN拓扑并使用增强学习(RL)以找到合适的压缩策略。我们执行了资源约束(即失败)通道修剪,并将我们的方法与最先进的模型压缩方法进行了比较。我们评估了从典型到移动友好网络的各种模型的方法,例如Resnet家族,VGG-16,Mobilenet-V1/V2和Shufflenet。结果表明,我们的方法可以通过最低的微调成本实现更高的压缩比,但产生了出色和竞争性的表现。
translated by 谷歌翻译
机器人的形态和行为的互相适应变得与快速的3D-制造方法和高效的深强化学习算法的出现越来越重要。对于互相适应的方法应用到真实世界的一个主要挑战是由于模型和仿真不准确的模拟到现实的差距。然而,以前的工作主要集中在形态开发的分析模型,并用大量的用户群(微)模拟器的进化适应的研究,忽视的模拟到现实差距的存在和在现实世界中制造周期的成本。本文提出了一种新的办法,结合经典的高频率计算昂贵的图形神经网络的代理数据高效互相适应深层神经网络具有不同度的自由度数。在仿真结果表明,新方法可以通过有效的设计优化与离线强化学习相结合共同适应的生产周期这样一个有限的数量中的代理程序,它允许在今后的工作中直接应用到真实世界的互相适应任务评估
translated by 谷歌翻译
单代理(SA)强化学习系统在非稳定性问题上表现出突出的重新解决。但是,多智能体增强学习(Marl)通常可以超越SA系统,并且在缩放时。此外,MASYSTEMS可以通过协作来超级支持,这可能通过OB-SENT IDS,或用于共享CORLABORATOR之间的信息的通信系统发生。在这里,我们开发了一种分布式MA学习机制,其能够基于分散的部分可观察的Markovdecision(Dec-POMDPS)和图形神经网络(GNN)进行通信的能力。训练机学习模型消耗的时间和精力最小,可以通过协作MA机制实现性能。在现实世界的情景中,这是一个近海风电场,包括一组套装的风力发电机,目的是最大化集体效率。对于SA系统,MA协作表现出显着降低的时间和更高累积的奖励在看不见的缩放方案。
translated by 谷歌翻译
Technology advancements in wireless communications and high-performance Extended Reality (XR) have empowered the developments of the Metaverse. The demand for Metaverse applications and hence, real-time digital twinning of real-world scenes is increasing. Nevertheless, the replication of 2D physical world images into 3D virtual world scenes is computationally intensive and requires computation offloading. The disparity in transmitted scene dimension (2D as opposed to 3D) leads to asymmetric data sizes in uplink (UL) and downlink (DL). To ensure the reliability and low latency of the system, we consider an asynchronous joint UL-DL scenario where in the UL stage, the smaller data size of the physical world scenes captured by multiple extended reality users (XUs) will be uploaded to the Metaverse Console (MC) to be construed and rendered. In the DL stage, the larger-size 3D virtual world scenes need to be transmitted back to the XUs. The decisions pertaining to computation offloading and channel assignment are optimized in the UL stage, and the MC will optimize power allocation for users assigned with a channel in the UL transmission stage. Some problems arise therefrom: (i) interactive multi-process chain, specifically Asynchronous Markov Decision Process (AMDP), (ii) joint optimization in multiple processes, and (iii) high-dimensional objective functions, or hybrid reward scenarios. To ensure the reliability and low latency of the system, we design a novel multi-agent reinforcement learning algorithm structure, namely Asynchronous Actors Hybrid Critic (AAHC). Extensive experiments demonstrate that compared to proposed baselines, AAHC obtains better solutions with preferable training time.
translated by 谷歌翻译