6G将移动移动网络以增加复杂程度。为了处理这种复杂性,网络参数的优化是确保高性能和及时适应动态网络环境的关键。天线倾斜的优化提供了一种实用且具有成本效益的方法,以提高网络中的覆盖率和容量。通过学习自适应策略优于传统的倾斜优化方法,基于强化学习(RL)的先前方法对倾斜优化具有很大的通知。但是,大多数现有的RL方法都基于单个小区特征表示,它无法完全表征代理状态,从而导致次优的性能。此外,由于国家行动爆炸和泛化能力,大多数此类方法缺乏可扩展性。在本文中,我们提出了一个关于倾斜优化的Q-Learnal(GaQ)算法的图表。 GaQ依赖于图形注意机制来选择相关的邻居信息,提高代理状态表示,并根据使用深Q-Network(DQN)的观察历史更新倾斜控制策略。我们表明GAQ有效地捕获重要的网络信息,并通过大边距与本地信息优于标准DQN。此外,我们展示了概括到不同尺寸和密度的网络部署的能力。
translated by 谷歌翻译
深度强化学习(DRL)赋予了各种人工智能领域,包括模式识别,机器人技术,推荐系统和游戏。同样,图神经网络(GNN)也证明了它们在图形结构数据的监督学习方面的出色表现。最近,GNN与DRL用于图形结构环境的融合引起了很多关注。本文对这些混合动力作品进行了全面评论。这些作品可以分为两类:(1)算法增强,其中DRL和GNN相互补充以获得更好的实用性; (2)特定于应用程序的增强,其中DRL和GNN相互支持。这种融合有效地解决了工程和生命科学方面的各种复杂问题。基于审查,我们进一步分析了融合这两个领域的适用性和好处,尤其是在提高通用性和降低计算复杂性方面。最后,集成DRL和GNN的关键挑战以及潜在的未来研究方向被突出显示,这将引起更广泛的机器学习社区的关注。
translated by 谷歌翻译
Graph mining tasks arise from many different application domains, ranging from social networks, transportation to E-commerce, etc., which have been receiving great attention from the theoretical and algorithmic design communities in recent years, and there has been some pioneering work employing the research-rich Reinforcement Learning (RL) techniques to address graph data mining tasks. However, these graph mining methods and RL models are dispersed in different research areas, which makes it hard to compare them. In this survey, we provide a comprehensive overview of RL and graph mining methods and generalize these methods to Graph Reinforcement Learning (GRL) as a unified formulation. We further discuss the applications of GRL methods across various domains and summarize the method descriptions, open-source codes, and benchmark datasets of GRL methods. Furthermore, we propose important directions and challenges to be solved in the future. As far as we know, this is the latest work on a comprehensive survey of GRL, this work provides a global view and a learning resource for scholars. In addition, we create an online open-source for both interested scholars who want to enter this rapidly developing domain and experts who would like to compare GRL methods.
translated by 谷歌翻译
安全是空中交通时的主要问题。通过成对分离最小值确保无人驾驶飞机(无人机)之间的飞行安全性,利用冲突检测和分辨方法。现有方法主要处理成对冲突,但由于交通密度的预期增加,可能会发生两个以上的无人机的遇到。在本文中,我们将多UAV冲突解决模型作为多功能加强学习问题。我们实现了一种基于图形神经网络的算法,配合代理可以与共同生成分辨率的操作进行通信。该模型在具有3和4个当前代理的情况下进行评估。结果表明,代理商能够通过合作策略成功解决多UV冲突。
translated by 谷歌翻译
测试点插入(TPI)是一种可增强可测试性的技术,特别是对于逻辑内置的自我测试(LBIST),由于其相对较低的故障覆盖率。在本文中,我们提出了一种基于DeepTPI的Deep Greatherions学习(DRL)的新型TPI方法。与以前基于学习的解决方案将TPI任务作为监督学习问题不同,我们训练了一种新颖的DRL代理,即实例化为图神经网络(GNN)和深Q学习网络(DQN)的组合,以最大程度地提高测试覆盖范围改进。具体而言,我们将电路模型为有向图并设计基于图的值网络,以估计插入不同测试点的动作值。 DRL代理的策略定义为选择具有最大值的操作。此外,我们将预先训练模型的一般节点嵌入到增强节点特征,并为值网络提出专用的可验证性注意力机制。与商业DFT工具相比,具有各种尺度的电路的实验结果表明,DEEPTPI显着改善了测试覆盖范围。这项工作的代码可在https://github.com/cure-lab/deeptpi上获得。
translated by 谷歌翻译
动态作业车间调度问题(DJSP)是一类是专门考虑固有的不确定性,如切换顺序要求和现实的智能制造的设置可能机器故障调度任务。因为传统方法不能动态生成环境的扰动面有效调度策略,我们制定DJSP马尔可夫决策过程(MDP)通过强化学习(RL)加以解决。为此,我们提出了一个灵活的混合架构,采用析取图的状态和一组通用的调度规则与之前最小的领域知识的行动空间。注意机制被用作状态的特征提取的图形表示学习(GRL)模块,并且采用双决斗深Q-网络与优先重放和嘈杂的网络(D3QPN)到每个状态映射到最适当的调度规则。此外,我们提出Gymjsp,基于众所周知的或图书馆公共标杆,提供了RL和DJSP研究社区标准化现成的现成工具。各种DJSP实例综合实验证实,我们提出的框架是优于基准算法可在所有情况下,较小的完工时间,并提供了在混合架构的各个组成部分的有效性实证理由。
translated by 谷歌翻译
Proper functioning of connected and automated vehicles (CAVs) is crucial for the safety and efficiency of future intelligent transport systems. Meanwhile, transitioning to fully autonomous driving requires a long period of mixed autonomy traffic, including both CAVs and human-driven vehicles. Thus, collaboration decision-making for CAVs is essential to generate appropriate driving behaviors to enhance the safety and efficiency of mixed autonomy traffic. In recent years, deep reinforcement learning (DRL) has been widely used in solving decision-making problems. However, the existing DRL-based methods have been mainly focused on solving the decision-making of a single CAV. Using the existing DRL-based methods in mixed autonomy traffic cannot accurately represent the mutual effects of vehicles and model dynamic traffic environments. To address these shortcomings, this article proposes a graph reinforcement learning (GRL) approach for multi-agent decision-making of CAVs in mixed autonomy traffic. First, a generic and modular GRL framework is designed. Then, a systematic review of DRL and GRL methods is presented, focusing on the problems addressed in recent research. Moreover, a comparative study on different GRL methods is further proposed based on the designed framework to verify the effectiveness of GRL methods. Results show that the GRL methods can well optimize the performance of multi-agent decision-making for CAVs in mixed autonomy traffic compared to the DRL methods. Finally, challenges and future research directions are summarized. This study can provide a valuable research reference for solving the multi-agent decision-making problems of CAVs in mixed autonomy traffic and can promote the implementation of GRL-based methods into intelligent transportation systems. The source code of our work can be found at https://github.com/Jacklinkk/Graph_CAVs.
translated by 谷歌翻译
5G及以后的移动网络将以前所未有的规模支持异质用例,从而要求自动控制和优化针对单个用户需求的网络功能。当前的蜂窝体系结构不可能对无线电访问网络(RAN)进行这种细粒度控制。为了填补这一空白,开放式运行范式及其规范引入了一个带有抽象的开放体系结构,该架构可以启用闭环控制并提供数据驱动和智能优化RAN在用户级别上。这是通过在网络边缘部署在近实时RAN智能控制器(接近RT RIC)上的自定义RAN控制应用程序(即XAPP)获得的。尽管有这些前提,但截至今天,研究界缺乏用于构建数据驱动XAPP的沙箱,并创建大型数据集以有效的AI培训。在本文中,我们通过引入NS-O-RAN来解决此问题,NS-O-RAN是一个软件框架,该框架将现实世界中的生产级近距离RIC与NS-3上的基于3GPP的模拟环境集成在一起,从而实现了XAPPS和XAPPS的开发自动化的大规模数据收集和深入强化学习驱动的控制策略的测试,以在用户级别的优化中进行优化。此外,我们提出了第一个特定于用户的O-RAN交通转向(TS)智能移交框架。它使用随机的合奏混合物,结合了最先进的卷积神经网络体系结构,以最佳地为网络中的每个用户分配服务基站。我们的TS XAPP接受了NS-O-RAN收集的超过4000万个数据点的培训,该数据点在近距离RIC上运行,并控制其基站。我们在大规模部署中评估了性能,这表明基于XAPP的交换可以使吞吐量和频谱效率平均比传统的移交启发式方法提高50%,而动机性开销较少。
translated by 谷歌翻译
Adaptive mesh refinement (AMR) is necessary for efficient finite element simulations of complex physical phenomenon, as it allocates limited computational budget based on the need for higher or lower resolution, which varies over space and time. We present a novel formulation of AMR as a fully-cooperative Markov game, in which each element is an independent agent who makes refinement and de-refinement choices based on local information. We design a novel deep multi-agent reinforcement learning (MARL) algorithm called Value Decomposition Graph Network (VDGN), which solves the two core challenges that AMR poses for MARL: posthumous credit assignment due to agent creation and deletion, and unstructured observations due to the diversity of mesh geometries. For the first time, we show that MARL enables anticipatory refinement of regions that will encounter complex features at future times, thereby unlocking entirely new regions of the error-cost objective landscape that are inaccessible by traditional methods based on local error estimators. Comprehensive experiments show that VDGN policies significantly outperform error threshold-based policies in global error and cost metrics. We show that learned policies generalize to test problems with physical features, mesh geometries, and longer simulation times that were not seen in training. We also extend VDGN with multi-objective optimization capabilities to find the Pareto front of the tradeoff between cost and error.
translated by 谷歌翻译
未来的互联网涉及几种新兴技术,例如5G和5G网络,车辆网络,无人机(UAV)网络和物联网(IOT)。此外,未来的互联网变得异质并分散了许多相关网络实体。每个实体可能需要做出本地决定,以在动态和不确定的网络环境下改善网络性能。最近使用标准学习算法,例如单药强化学习(RL)或深入强化学习(DRL),以使每个网络实体作为代理人通过与未知环境进行互动来自适应地学习最佳决策策略。但是,这种算法未能对网络实体之间的合作或竞争进行建模,而只是将其他实体视为可能导致非平稳性问题的环境的一部分。多机构增强学习(MARL)允许每个网络实体不仅观察环境,还可以观察其他实体的政策来学习其最佳政策。结果,MAL可以显着提高网络实体的学习效率,并且最近已用于解决新兴网络中的各种问题。在本文中,我们因此回顾了MAL在新兴网络中的应用。特别是,我们提供了MARL的教程,以及对MARL在下一代互联网中的应用进行全面调查。特别是,我们首先介绍单代机Agent RL和MARL。然后,我们回顾了MAL在未来互联网中解决新兴问题的许多应用程序。这些问题包括网络访问,传输电源控制,计算卸载,内容缓存,数据包路由,无人机网络的轨迹设计以及网络安全问题。
translated by 谷歌翻译
In heterogeneous networks (HetNets), the overlap of small cells and the macro cell causes severe cross-tier interference. Although there exist some approaches to address this problem, they usually require global channel state information, which is hard to obtain in practice, and get the sub-optimal power allocation policy with high computational complexity. To overcome these limitations, we propose a multi-agent deep reinforcement learning (MADRL) based power control scheme for the HetNet, where each access point makes power control decisions independently based on local information. To promote cooperation among agents, we develop a penalty-based Q learning (PQL) algorithm for MADRL systems. By introducing regularization terms in the loss function, each agent tends to choose an experienced action with high reward when revisiting a state, and thus the policy updating speed slows down. In this way, an agent's policy can be learned by other agents more easily, resulting in a more efficient collaboration process. We then implement the proposed PQL in the considered HetNet and compare it with other distributed-training-and-execution (DTE) algorithms. Simulation results show that our proposed PQL can learn the desired power control policy from a dynamic environment where the locations of users change episodically and outperform existing DTE MADRL algorithms.
translated by 谷歌翻译
社交机器人被称为社交网络上的自动帐户,这些帐户试图像人类一样行事。尽管图形神经网络(GNNS)已大量应用于社会机器人检测领域,但大量的领域专业知识和先验知识大量参与了最先进的方法,以设计专门的神经网络体系结构,以设计特定的神经网络体系结构。分类任务。但是,在模型设计中涉及超大的节点和网络层,通常会导致过度平滑的问题和缺乏嵌入歧视。在本文中,我们提出了罗斯加斯(Rosgas),这是一种新颖的加强和自我监督的GNN Architecture搜索框架,以适应性地指出了最合适的多跳跃社区和GNN体系结构中的层数。更具体地说,我们将社交机器人检测问题视为以用户为中心的子图嵌入和分类任务。我们利用异构信息网络来通过利用帐户元数据,关系,行为特征和内容功能来展示用户连接。 Rosgas使用多代理的深钢筋学习(RL)机制来导航最佳邻域和网络层的搜索,以分别学习每个目标用户的子图嵌入。开发了一种用于加速RL训练过程的最接近的邻居机制,Rosgas可以借助自我监督的学习来学习更多的判别子图。 5个Twitter数据集的实验表明,Rosgas在准确性,训练效率和稳定性方面优于最先进的方法,并且在处理看不见的样本时具有更好的概括。
translated by 谷歌翻译
Reinforcement Learning (RL) is currently one of the most commonly used techniques for traffic signal control (TSC), which can adaptively adjusted traffic signal phase and duration according to real-time traffic data. However, a fully centralized RL approach is beset with difficulties in a multi-network scenario because of exponential growth in state-action space with increasing intersections. Multi-agent reinforcement learning (MARL) can overcome the high-dimension problem by employing the global control of each local RL agent, but it also brings new challenges, such as the failure of convergence caused by the non-stationary Markov Decision Process (MDP). In this paper, we introduce an off-policy nash deep Q-Network (OPNDQN) algorithm, which mitigates the weakness of both fully centralized and MARL approaches. The OPNDQN algorithm solves the problem that traditional algorithms cannot be used in large state-action space traffic models by utilizing a fictitious game approach at each iteration to find the nash equilibrium among neighboring intersections, from which no intersection has incentive to unilaterally deviate. One of main advantages of OPNDQN is to mitigate the non-stationarity of multi-agent Markov process because it considers the mutual influence among neighboring intersections by sharing their actions. On the other hand, for training a large traffic network, the convergence rate of OPNDQN is higher than that of existing MARL approaches because it does not incorporate all state information of each agent. We conduct an extensive experiments by using Simulation of Urban MObility simulator (SUMO), and show the dominant superiority of OPNDQN over several existing MARL approaches in terms of average queue length, episode training reward and average waiting time.
translated by 谷歌翻译
深度神经网络的强大学习能力使强化学习者能够直接从连续环境中学习有效的控制政策。从理论上讲,为了实现稳定的性能,神经网络假设I.I.D.不幸的是,在训练数据在时间上相关且非平稳的一般强化学习范式中,输入不存在。这个问题可能导致“灾难性干扰”和性能崩溃的现象。在本文中,我们提出智商,即干涉意识深度Q学习,以减轻单任务深度加固学习中的灾难性干扰。具体来说,我们求助于在线聚类,以实现在线上下文部门,以及一个多头网络和一个知识蒸馏正规化术语,用于保留学习上下文的政策。与现有方法相比,智商基于深Q网络,始终如一地提高稳定性和性能,并通过对经典控制和ATARI任务进行了广泛的实验。该代码可在以下网址公开获取:https://github.com/sweety-dm/interference-aware-ware-deep-q-learning。
translated by 谷歌翻译
Multi-agent settings remain a fundamental challenge in the reinforcement learning (RL) domain due to the partial observability and the lack of accurate real-time interactions across agents. In this paper, we propose a new method based on local communication learning to tackle the multi-agent RL (MARL) challenge within a large number of agents coexisting. First, we design a new communication protocol that exploits the ability of depthwise convolution to efficiently extract local relations and learn local communication between neighboring agents. To facilitate multi-agent coordination, we explicitly learn the effect of joint actions by taking the policies of neighboring agents as inputs. Second, we introduce the mean-field approximation into our method to reduce the scale of agent interactions. To more effectively coordinate behaviors of neighboring agents, we enhance the mean-field approximation by a supervised policy rectification network (PRN) for rectifying real-time agent interactions and by a learnable compensation term for correcting the approximation bias. The proposed method enables efficient coordination as well as outperforms several baseline approaches on the adaptive traffic signal control (ATSC) task and the StarCraft II multi-agent challenge (SMAC).
translated by 谷歌翻译
In this paper, we build on advances introduced by the Deep Q-Networks (DQN) approach to extend the multi-objective tabular Reinforcement Learning (RL) algorithm W-learning to large state spaces. W-learning algorithm can naturally solve the competition between multiple single policies in multi-objective environments. However, the tabular version does not scale well to environments with large state spaces. To address this issue, we replace underlying Q-tables with DQN, and propose an addition of W-Networks, as a replacement for tabular weights (W) representations. We evaluate the resulting Deep W-Networks (DWN) approach in two widely-accepted multi-objective RL benchmarks: deep sea treasure and multi-objective mountain car. We show that DWN solves the competition between multiple policies while outperforming the baseline in the form of a DQN solution. Additionally, we demonstrate that the proposed algorithm can find the Pareto front in both tested environments.
translated by 谷歌翻译
在探索中,由于当前的低效率而引起的强化学习领域,具有较大动作空间的学习控制政策是一个具有挑战性的问题。在这项工作中,我们介绍了深入的强化学习(DRL)算法呼叫多动作网络(MAN)学习,以应对大型离散动作空间的挑战。我们建议将动作空间分为两个组件,从而为每个子行动创建一个值神经网络。然后,人使用时间差异学习来同步训练网络,这比训练直接动作输出的单个网络要简单。为了评估所提出的方法,我们在块堆叠任务上测试了人,然后扩展了人类从Atari Arcade学习环境中使用18个动作空间的12个游戏。我们的结果表明,人的学习速度比深Q学习和双重Q学习更快,这意味着我们的方法比当前可用于大型动作空间的方法更好地执行同步时间差异算法。
translated by 谷歌翻译
在这项工作中,我们将注意力集中在数据分布与基于Q学基于Q学基于函数近似之间的相互作用的研究。我们提供了一个理论和实证分析,以及为什么数据分布的不同性质可以有助于调节算法不稳定性的来源。首先,我们重新审视近似动态编程算法性能的理论界限。其次,我们提供了一种新型的四态MDP,突出了在线和离线设置中具有功能近似的Q学习算法的数据分布的影响。最后,我们通过实验评估数据分布属性在离线深度Q网算法的性能中的影响。我们的结果表明:(i)数据分布需要拥有某些属性,以便在离线设置中鲁棒地学习,即距离MDP的最佳策略和高覆盖范围内的分布在状态 - 动作空间上的低距离; (ii)高熵数据分布可以有助于减轻算法不稳定性的来源。
translated by 谷歌翻译
在人工多智能体系中,学习协作政策的能力是基于代理商的沟通技巧,他们必须能够编码从环境中收到的信息,并学习如何与手头任务所要求的其他代理分享它。我们介绍了一个深度加强学习方法,连接驱动的通信(CDC),促进了多种子体协作行为的出现,仅通过经验。代理被建模为加权图的节点,其状态相关的边缘编码可以交换的对方式。我们介绍了一种依赖于图形的关注机制,可以控制代理的传入消息如何加权。此机制完全核对图表所表示的系统的当前状态,并在捕获信息如何在图中流动的扩散过程中构建。图形拓扑未被假定已知先验,但在代理人的观察中动态依赖于代理人,并以端到端的方式与注意机制和政策同时学习。我们的经验结果表明,CDC能够学习有效的协作政策,并可以在合作导航任务上过度执行竞争学习算法。
translated by 谷歌翻译
加固学习在机器学习中推动了令人印象深刻的进步。同时,量子增强机学习算法使用量子退火的底层划伤。最近,已经提出了一种组合两个范例的多代理强化学习(MARL)架构。这种新的算法利用Q值近似的量子Boltzmann机器(QBMS)在收敛所需的时间步长方面具有优于常规的深度增强学习。但是,该算法仅限于单代理和小型2x2多代理网格域。在这项工作中,我们提出了对原始概念的延伸,以解决更具挑战性问题。类似于Classic DQN,我们添加了重播缓冲区的体验,并使用不同的网络来估计目标和策略值。实验结果表明,学习变得更加稳定,使代理能够在具有更高复杂性的网格域中找到最佳策略。此外,我们还评估参数共享如何影响多代理域中的代理行为。量子采样证明是一种有希望的加强学习任务的方法,但目前受到QPU尺寸的限制,因此通过输入和Boltzmann机器的大小。
translated by 谷歌翻译