最近的研究表明,引入代理商之间的沟通可以显着提高合作多智能体增强学习(MARL)的整体性能。在许多现实世界的情景中,通信可能是昂贵的,多代理系统的带宽受到某些约束。占据通信资源的冗余消息可以阻止信息性消息的传输,从而危及性能。在本文中,我们的目标是学习最小的足够的通信信息。首先,我们通过完整的图表启动代理之间的通信。然后我们将图形信息瓶颈(GIB)原则介绍到这个完整的图表中,并从图形结构上获得优化。基于优化,提出了一种名为CommGIB的新型多代理通信模块,其有效地压缩了通信图中的结构信息和节点信息来处理带宽约束的设置。进行了交通管制和斯坦径II的广泛实验。结果表明,与最先进的算法相比,所提出的方法可以在带宽限制的环境中实现更好的性能,具有尤其是大型多功能机构任务中的尤其是大的边距。
translated by 谷歌翻译
通过集中培训和分散执行的价值功能分解是有助于解决合作多功能协商强化任务的承诺。该地区QMIX的方法之一已成为最先进的,在星际争霸II微型管理基准上实现了最佳性能。然而,已知QMIX中每个代理估计的单调混合是限制它可以表示的关节动作Q值,以及单个代理价值函数估计的全局状态信息,通常导致子优相。为此,我们呈现LSF-SAC,这是一种新颖的框架,其具有基于变分推理的信息共享机制,作为额外的状态信息,以帮助在价值函数分子中提供各个代理。我们证明,这种潜在的个人状态信息共享可以显着扩展价值函数分解的力量,而通过软演员批评设计仍然可以在LSF-SAC中保持完全分散的执行。我们在星际争霸II微型管理挑战上评估LSF-SAC,并证明它在挑战协作任务方面优于几种最先进的方法。我们进一步设定了广泛的消融研究,以定位核算其绩效改进的关键因素。我们认为,这种新的洞察力可以导致新的地方价值估算方法和变分的深度学习算法。可以在https://sites.google.com/view/sacmm处找到演示视频和实现代码。
translated by 谷歌翻译
多代理增强学习(MARL)在价值函数分解方法的发展中见证了重大进展。由于单调性,它可以通过最大程度地分解每个代理实用程序来优化联合动作值函数。在本文中,我们表明,在部分可观察到的MARL问题中,代理商对自己的行为的订购可能会对代表功能类施加并发约束(跨不同状态),从而在培训期间造成重大估计错误。我们解决了这一限制,并提出了PAC,PAC是一个新的框架,利用了最佳联合行动选择的反事实预测产生的辅助信息,这可以通过新颖的反事实损失通过新颖的辅助来实现价值功能分解。开发了一种基于变异推理的信息编码方法,以从估计的基线收集和编码反事实预测。为了实现分散的执行,我们还得出了受最大收入MARL框架启发的分级分配的代理策略。我们评估了有关多代理捕食者捕食者和一组Starcraft II微管理任务的PAC。经验结果表明,在所有基准上,PAC对基于最先进的价值和基于策略的多代理增强学习算法的结果得到了改善。
translated by 谷歌翻译
学习稀疏协调图表适应了代理之间的协调动态,这是合作多学院学习的一个长期问题。本文研究了这个问题,并提出了一种新的方法,该方法使用回报函数的方差来构建上下文意识到的稀疏协调拓扑。从理论上讲,我们通过证明回报函数的差异越小,删除相应的边缘后,较小的操作选择将会改变。此外,我们建议学习行动表示,以有效地减少回报功能估计错误对图形构造的影响。为了凭经验评估我们的方法,我们通过在文献中收集经典的协调问题,增加了它们的难度并将其分类为不同类型,我们介绍了多代理协调(MACO)基准。我们在Maco和Starcraft II微管理基准上进行了案例研究和实验,以证明稀疏图学习的动力学,图形稀疏性的影响以及我们方法的学习性能。 (MACO基准和代码可在https://github.com/tonghanwang/casec-maco-benchmark上公开获得。)
translated by 谷歌翻译
最近被证明通过深度加强学习(RL)或模仿学习(IL)来学习沟通是解决多智能传道路径查找(MAPF)的有效方法。然而,现有的基于通信的MAPF求解器专注于广播通信,代理将其消息广播给所有其他或预定义代理。它不仅是不切实际的,而且导致冗余信息甚至可能损害多功能协作。简洁的通信计划应该了解哪些信息与每个代理的决策过程有关和影响。为了解决这个问题,我们考虑一个请求 - 回复方案并提出决策因果通信(DCC),这是一个简单但有效的模型,使代理能够在培训和执行期间选择邻居进行通信。具体地,邻居才被确定为当存在该邻居的存在导致在中央代理上的决策调整时相关的邻居。此判决仅基于代理人的本地观察,因此适用于分散执行来处理大规模问题。富有障碍环境中的实证评估表明了我们方法的低通信开销的高成功率。
translated by 谷歌翻译
Communication is supposed to improve multi-agent collaboration and overall performance in cooperative Multi-agent reinforcement learning (MARL). However, such improvements are prevalently limited in practice since most existing communication schemes ignore communication overheads (e.g., communication delays). In this paper, we demonstrate that ignoring communication delays has detrimental effects on collaborations, especially in delay-sensitive tasks such as autonomous driving. To mitigate this impact, we design a delay-aware multi-agent communication model (DACOM) to adapt communication to delays. Specifically, DACOM introduces a component, TimeNet, that is responsible for adjusting the waiting time of an agent to receive messages from other agents such that the uncertainty associated with delay can be addressed. Our experiments reveal that DACOM has a non-negligible performance improvement over other mechanisms by making a better trade-off between the benefits of communication and the costs of waiting for messages.
translated by 谷歌翻译
Communication enables agents to cooperate to achieve their goals. Learning when to communicate, i.e., sparse (in time) communication, and whom to message is particularly important when bandwidth is limited. Recent work in learning sparse individualized communication, however, suffers from high variance during training, where decreasing communication comes at the cost of decreased reward, particularly in cooperative tasks. We use the information bottleneck to reframe sparsity as a representation learning problem, which we show naturally enables lossless sparse communication at lower budgets than prior art. In this paper, we propose a method for true lossless sparsity in communication via Information Maximizing Gated Sparse Multi-Agent Communication (IMGS-MAC). Our model uses two individualized regularization objectives, an information maximization autoencoder and sparse communication loss, to create informative and sparse communication. We evaluate the learned communication `language' through direct causal analysis of messages in non-sparse runs to determine the range of lossless sparse budgets, which allow zero-shot sparsity, and the range of sparse budgets that will inquire a reward loss, which is minimized by our learned gating function with few-shot sparsity. To demonstrate the efficacy of our results, we experiment in cooperative multi-agent tasks where communication is essential for success. We evaluate our model with both continuous and discrete messages. We focus our analysis on a variety of ablations to show the effect of message representations, including their properties, and lossless performance of our model.
translated by 谷歌翻译
沟通可以帮助代理商获得有关他人的信息,以便可以学习更好的协调行为。一些现有的工作会与其他人传达预测的未来轨迹,希望能为其他人做些更好的协调能力提供线索。但是,当对代理人同步处理时,有时会发生循环依赖性,因此很难协调决策。在本文中,我们提出了一种新颖的交流方案,顺序通信(SEQCOMM)。 Seqcomm不同步(高级代理在低级阶段之前做出决定),并有两个通信阶段。在谈判阶段,代理通过传达观测的隐藏状态并比较意图的价值来确定决策的优先级,这是通过对环境动态进行建模来获得的。在发射阶段,高级代理商领导着做出决策并与低级代理商进行交流。从理论上讲,我们证明Seqcomm学到的政策可以单调地改善并融合。从经验上讲,我们表明SEQCOMM在各种多机构合作任务中都优于现有方法。
translated by 谷歌翻译
在多机构强化学习中,沟通对于鼓励代理商之间的合作至关重要。由于网络条件随代理的移动性而变化,并且在传输过程中的随机性变化,因此现实无线网络中的通信可能非常不可靠。我们提出一个框架来通过解决三个基本问题来学习实用的沟通策略:(1)何时:代理商不仅基于消息重要性,而且是无线渠道条件来学习沟通时间。 (2)什么:代理增强了带有无线网络测量结果的消息内容,以更好地选择游戏和通信操作。 (3)如何:代理使用新颖的神经信息编码器来保存从接收到的消息中保留所有信息,而不管消息的数量和顺序如何。与最新的ART相比,在逼真的无线网络设置下模拟标准基准测试,我们在游戏性能,收敛速度和沟通效率方面取得了重大改进。
translated by 谷歌翻译
Multi-agent settings remain a fundamental challenge in the reinforcement learning (RL) domain due to the partial observability and the lack of accurate real-time interactions across agents. In this paper, we propose a new method based on local communication learning to tackle the multi-agent RL (MARL) challenge within a large number of agents coexisting. First, we design a new communication protocol that exploits the ability of depthwise convolution to efficiently extract local relations and learn local communication between neighboring agents. To facilitate multi-agent coordination, we explicitly learn the effect of joint actions by taking the policies of neighboring agents as inputs. Second, we introduce the mean-field approximation into our method to reduce the scale of agent interactions. To more effectively coordinate behaviors of neighboring agents, we enhance the mean-field approximation by a supervised policy rectification network (PRN) for rectifying real-time agent interactions and by a learnable compensation term for correcting the approximation bias. The proposed method enables efficient coordination as well as outperforms several baseline approaches on the adaptive traffic signal control (ATSC) task and the StarCraft II multi-agent challenge (SMAC).
translated by 谷歌翻译
迄今为止,通信系统主要旨在可靠地交流位序列。这种方法提供了有效的工程设计,这些设计对消息的含义或消息交换所旨在实现的目标不可知。但是,下一代系统可以通过将消息语义和沟通目标折叠到其设计中来丰富。此外,可以使这些系统了解进行交流交流的环境,从而为新颖的设计见解提供途径。本教程总结了迄今为止的努力,从早期改编,语义意识和以任务为导向的通信开始,涵盖了基础,算法和潜在的实现。重点是利用信息理论提供基础的方法,以及学习在语义和任务感知通信中的重要作用。
translated by 谷歌翻译
在人工多智能体系中,学习协作政策的能力是基于代理商的沟通技巧,他们必须能够编码从环境中收到的信息,并学习如何与手头任务所要求的其他代理分享它。我们介绍了一个深度加强学习方法,连接驱动的通信(CDC),促进了多种子体协作行为的出现,仅通过经验。代理被建模为加权图的节点,其状态相关的边缘编码可以交换的对方式。我们介绍了一种依赖于图形的关注机制,可以控制代理的传入消息如何加权。此机制完全核对图表所表示的系统的当前状态,并在捕获信息如何在图中流动的扩散过程中构建。图形拓扑未被假定已知先验,但在代理人的观察中动态依赖于代理人,并以端到端的方式与注意机制和政策同时学习。我们的经验结果表明,CDC能够学习有效的协作政策,并可以在合作导航任务上过度执行竞争学习算法。
translated by 谷歌翻译
Recently, model-based agents have achieved better performance than model-free ones using the same computational budget and training time in single-agent environments. However, due to the complexity of multi-agent systems, it is tough to learn the model of the environment. The significant compounding error may hinder the learning process when model-based methods are applied to multi-agent tasks. This paper proposes an implicit model-based multi-agent reinforcement learning method based on value decomposition methods. Under this method, agents can interact with the learned virtual environment and evaluate the current state value according to imagined future states in the latent space, making agents have the foresight. Our approach can be applied to any multi-agent value decomposition method. The experimental results show that our method improves the sample efficiency in different partially observable Markov decision process domains.
translated by 谷歌翻译
协调图是一种有前途的模型代理协作在多智能体增强学习中的合作方法。它将一个大的多代理系统分解为代表底层协调依赖性的重叠组套件。此范例中的一个危急挑战是计算基于图形的值分子的最大值动作的复杂性。它指的是分散的约束优化问题(DCOP),其恒定比率近似是NP - 硬问题。为了绕过这一基本硬度,提出了一种新的方法,命名为自组织的多项式协调图(SOP-CG),它使用结构化图表来保证具有足够功能表达的所致DCOP的最优性。我们将图形拓扑扩展为状态依赖性,将图形选择作为假想的代理商,最终从统一的Bellman Optimaly方程中获得端到端的学习范例。在实验中,我们表明我们的方法了解可解释的图形拓扑,诱导有效的协调,并提高各种合作多功能机构任务的性能。
translated by 谷歌翻译
深度强化学习(DRL)和深度多机构的强化学习(MARL)在包括游戏AI,自动驾驶汽车,机器人技术等各种领域取得了巨大的成功。但是,众所周知,DRL和Deep MARL代理的样本效率低下,即使对于相对简单的问题设置,通常也需要数百万个相互作用,从而阻止了在实地场景中的广泛应用和部署。背后的一个瓶颈挑战是众所周知的探索问题,即如何有效地探索环境和收集信息丰富的经验,从而使政策学习受益于最佳研究。在稀疏的奖励,吵闹的干扰,长距离和非平稳的共同学习者的复杂环境中,这个问题变得更加具有挑战性。在本文中,我们对单格和多代理RL的现有勘探方法进行了全面的调查。我们通过确定有效探索的几个关键挑战开始调查。除了上述两个主要分支外,我们还包括其他具有不同思想和技术的著名探索方法。除了算法分析外,我们还对一组常用基准的DRL进行了全面和统一的经验比较。根据我们的算法和实证研究,我们终于总结了DRL和Deep Marl中探索的公开问题,并指出了一些未来的方向。
translated by 谷歌翻译
Along with the springing up of semantics-empowered communication (SemCom) researches, it is now witnessing an unprecedentedly growing interest towards a wide range of aspects (e.g., theories, applications, metrics and implementations) in both academia and industry. In this work, we primarily aim to provide a comprehensive survey on both the background and research taxonomy, as well as a detailed technical tutorial. Specifically, we start by reviewing the literature and answering the "what" and "why" questions in semantic transmissions. Afterwards, we present corresponding ecosystems, including theories, metrics, datasets and toolkits, on top of which the taxonomy for research directions is presented. Furthermore, we propose to categorize the critical enabling techniques by explicit and implicit reasoning-based methods, and elaborate on how they evolve and contribute to modern content \& channel semantics-empowered communications. Besides reviewing and summarizing the latest efforts in SemCom, we discuss the relations with other communication levels (e.g., reliable and goal-oriented communications) from a holistic and unified viewpoint. Subsequently, in order to facilitate the future developments and industrial applications, we also highlight advanced practical techniques for boosting semantic accuracy, robustness, and large-scale scalability, just to mention a few. Finally, we discuss the technical challenges that shed light on future research opportunities.
translated by 谷歌翻译
代理商通信可能会显着提高需要协调以实现共享目标的多代理任务的性能。事先工作表明,可以使用多智能体增强学习和消息传递网络架构学习代理商通信协议。然而,这些模型使用不受约束的广播通信模型,其中代理在每个步骤中与所有其他代理通信,即使任务不需要它。在现实世界应用中,如果通信可以受系统限制的限制,如带宽,电源和网络容量,则可能需要减少发送的消息的数量。在这项工作中,我们探讨了最大限度地减少通信的简单方法,同时在多任务学习中最大化性能:同时优化特定于任务的目标和通信惩罚。我们表明,目的可以使用强化和Gumbel-Softmax Reparameterization优化。我们介绍了两种稳定培训的技术:50%的培训和消息转发。在仅50%的剧集中培训沟通惩罚可防止我们的模型关闭外向消息。其次,重复消息先前接收的消息有助于模型保留信息,并进一步提高性能。通过这些技术,我们表明我们可以减少75%的通信,没有损失。
translated by 谷歌翻译
未来的互联网涉及几种新兴技术,例如5G和5G网络,车辆网络,无人机(UAV)网络和物联网(IOT)。此外,未来的互联网变得异质并分散了许多相关网络实体。每个实体可能需要做出本地决定,以在动态和不确定的网络环境下改善网络性能。最近使用标准学习算法,例如单药强化学习(RL)或深入强化学习(DRL),以使每个网络实体作为代理人通过与未知环境进行互动来自适应地学习最佳决策策略。但是,这种算法未能对网络实体之间的合作或竞争进行建模,而只是将其他实体视为可能导致非平稳性问题的环境的一部分。多机构增强学习(MARL)允许每个网络实体不仅观察环境,还可以观察其他实体的政策来学习其最佳政策。结果,MAL可以显着提高网络实体的学习效率,并且最近已用于解决新兴网络中的各种问题。在本文中,我们因此回顾了MAL在新兴网络中的应用。特别是,我们提供了MARL的教程,以及对MARL在下一代互联网中的应用进行全面调查。特别是,我们首先介绍单代机Agent RL和MARL。然后,我们回顾了MAL在未来互联网中解决新兴问题的许多应用程序。这些问题包括网络访问,传输电源控制,计算卸载,内容缓存,数据包路由,无人机网络的轨迹设计以及网络安全问题。
translated by 谷歌翻译
允许代理商通过沟通共享信息对于解决多代理增强学习中的复杂任务至关重要。在这项工作中,我们考虑了给定通信协议是否可以表达任意政策的问题。通过观察许多现有协议可以看作是图神经网络(GNN)的实例,我们证明了联合动作选择与节点标记的等效性。通过证明其表达能力的标准GNN方法,我们从现有的GNN文献中汲取了限制,并考虑使用以下方式增强剂观察:(1)独特的代理ID和(2)随机噪声。我们提供了有关这些方法如何产生普遍表达性交流的理论分析,并证明它们能够针对相同代理的任意行动集。从经验上讲,这些增强被发现可以改善需要表达性交流的任务的性能,而通常发现最佳通信协议是任务依赖性的。
translated by 谷歌翻译
Graph mining tasks arise from many different application domains, ranging from social networks, transportation to E-commerce, etc., which have been receiving great attention from the theoretical and algorithmic design communities in recent years, and there has been some pioneering work employing the research-rich Reinforcement Learning (RL) techniques to address graph data mining tasks. However, these graph mining methods and RL models are dispersed in different research areas, which makes it hard to compare them. In this survey, we provide a comprehensive overview of RL and graph mining methods and generalize these methods to Graph Reinforcement Learning (GRL) as a unified formulation. We further discuss the applications of GRL methods across various domains and summarize the method descriptions, open-source codes, and benchmark datasets of GRL methods. Furthermore, we propose important directions and challenges to be solved in the future. As far as we know, this is the latest work on a comprehensive survey of GRL, this work provides a global view and a learning resource for scholars. In addition, we create an online open-source for both interested scholars who want to enter this rapidly developing domain and experts who would like to compare GRL methods.
translated by 谷歌翻译