智能论文笔记

Distributed Reinforcement Learning for Privacy-Preserving Dynamic Edge Caching

Shengheng Liu , Chong Zheng , Yongming Huang , Tony Q. S. Quek

分类：机器学习 | 人工智能

2021-10-20

移动边缘计算（MEC）是一个突出的计算范例，它扩展了无线通信的应用领域。由于用户设备和MEC服务器的能力的限制，边缘缓存（EC）优化对于有效利用启用MEC的无线网络中的高速利用。然而，内容普及空间和时间的动态和复杂性以及用户的隐私保护对EC优化构成了重大挑战。在本文中，提出了一种隐私保留的分布式深度确定性政策梯度（P2D3PG）算法，以最大化MEC网络中设备的高速缓存命中率。具体而言，我们认为内容流行度是动态，复杂和不可观察的事实，并制定了在隐私保存的限制下作为分布式问题的设备的高速缓存命中速率的最大化。特别是，我们将分布式优化转换为分布式的无模型马尔可夫决策过程问题，然后介绍一种隐私保留的联合学习方法，用于普及预测。随后，基于分布式增强学学习开发了P2D3PG算法以解决分布式问题。仿真结果表明，在保护用户隐私的同时通过基线方法提高EC击中率的提出方法的优越性。

translated by 谷歌翻译

Unsupervised Recurrent Federated Learning for Edge Popularity Prediction in Privacy-Preserving Mobile Edge Computing Networks

Chong Zheng , Shengheng Liu , Yongming Huang , Wei Zhang , Luxi Yang

分类：人工智能 | 机器学习

2022-07-02

如今，无线通信正在迅速重塑整个行业。特别是，移动边缘计算（MEC）是一种用于工业互联网（IIOT）的促成技术，它使强大的计算/存储基础架构更靠近移动终端，从而大大降低了响应延迟。为了获得在网络边缘积极缓存的好处，对最终设备之间的受欢迎程度的精确知识至关重要。但是，在许多IIOT场景中，内容流行的内容流行以及数据私人关系的复杂性质对其获取构成了艰巨的挑战。在本文中，我们建议针对MEC启用的IIOT提供无监督和保护隐私的普及预测框架。引入了本地和全球流行的概念，并将每个用户的随时间变化为无模型的马尔可夫链。在此基础上，提出了一种新颖的无监督的复发性联合学习（URFL）算法，以预测分布式的流行，同时实现隐私保护和无监督的培训。仿真表明，提出的框架可以根据降低的根平方误差提高预测准确性，高达$ 60.5 \％-68.7 \％$。此外，避免了手动标签和违反用户数据隐私的行为。

translated by 谷歌翻译

Applications of Multi-Agent Reinforcement Learning in Future Internet: A Comprehensive Survey

Tianxu Li , Kun Zhu , Nguyen Cong Luong , Dusit Niyato , Qihui Wu , Yang Zhang , Bing Chen

分类：人工智能 | 机器学习

2021-10-26

未来的互联网涉及几种新兴技术，例如5G和5G网络，车辆网络，无人机（UAV）网络和物联网（IOT）。此外，未来的互联网变得异质并分散了许多相关网络实体。每个实体可能需要做出本地决定，以在动态和不确定的网络环境下改善网络性能。最近使用标准学习算法，例如单药强化学习（RL）或深入强化学习（DRL），以使每个网络实体作为代理人通过与未知环境进行互动来自适应地学习最佳决策策略。但是，这种算法未能对网络实体之间的合作或竞争进行建模，而只是将其他实体视为可能导致非平稳性问题的环境的一部分。多机构增强学习（MARL）允许每个网络实体不仅观察环境，还可以观察其他实体的政策来学习其最佳政策。结果，MAL可以显着提高网络实体的学习效率，并且最近已用于解决新兴网络中的各种问题。在本文中，我们因此回顾了MAL在新兴网络中的应用。特别是，我们提供了MARL的教程，以及对MARL在下一代互联网中的应用进行全面调查。特别是，我们首先介绍单代机Agent RL和MARL。然后，我们回顾了MAL在未来互联网中解决新兴问题的许多应用程序。这些问题包括网络访问，传输电源控制，计算卸载，内容缓存，数据包路由，无人机网络的轨迹设计以及网络安全问题。

translated by 谷歌翻译

Mobility-Aware Cooperative Caching in Vehicular Edge Computing Based on Asynchronous Federated and Deep Reinforcement Learning

Qiong Wu , Yu Zhao , Qiang Fan , Pingyi Fan , Jiangzhou Wang , Cui Zhang

分类：机器学习

2022-08-02

车辆边缘计算（VEC）可以在网络边缘的不同RSU中缓存内容，以支持实时车辆应用。在VEC中，由于车辆的高运动特性，有必要提前缓存用户数据，并为车辆用户学习最流行和最有趣的内容。由于用户数据通常包含隐私信息，因此用户不愿与他人共享其数据。为了解决这个问题，传统的联合学习（FL）需要通过汇总所有用户的本地模型来保护用户的隐私来同步更新全局模型。但是，车辆可能会在实现本地模型培训之前经常离开VEC的覆盖范围，因此无法按预期上传本地型号，这将降低全球模型的准确性。此外，本地RSU的缓存能力有限，流行内容是多样的，因此预测的流行内容的大小通常超过本地RSU的缓存能力。因此，在考虑内容传输延迟的同时，VEC应在不同的RSU中缓存预测的流行内容。在本文中，我们考虑了车辆的流动性，并提出了基于联合和深度强化学习（CAFR）的VEC中的合作缓存计划。我们首先考虑车辆的移动性，并提出异步FL算法以获得准确的全局模型，然后提出一种算法来预测基于全球模型的流行内容。此外，我们考虑了车辆的移动性，并提出了深入的强化学习算法，以获取预测流行内容的最佳合作缓存位置，以优化内容传输延迟。广泛的实验结果表明，CAFR方案的表现优于其他基线缓存方案。

translated by 谷歌翻译

Latency Optimization for Blockchain-Empowered Federated Learning in Multi-Server Edge Computing

Dinh C. Nguyen , Seyyedali Hosseinalipour , David J. Love , Pubudu N. Pathirana , Christopher G. Brinton

分类：机器学习

2022-03-18

在本文中，我们研究了多服务器边缘计算中基于区块链的联合学习（BFL）的新延迟优化问题。在此系统模型中，分布式移动设备（MDS）与一组Edge服务器（ESS）通信，以同时处理机器学习（ML）模型培训和阻止开采。为了协助ML模型培训用于资源受限的MD，我们制定了一种卸载策略，使MD可以将其数据传输到相关的ESS之一。然后，我们基于共识机制在边缘层上提出了一个新的分散的ML模型聚合解决方案，以通过基于对等（P2P）基于基于的区块链通信构建全局ML模型。区块链在MDS和ESS之间建立信任，以促进可靠的ML模型共享和合作共识形成，并能够快速消除由中毒攻击引起的操纵模型。我们将延迟感知的BFL作为优化，旨在通过联合考虑数据卸载决策，MDS的传输功率，MDS数据卸载，MDS的计算分配和哈希功率分配来最大程度地减少系统延迟。鉴于离散卸载和连续分配变量的混合作用空间，我们提出了一种具有参数化优势演员评论家算法的新型深度强化学习方案。从理论上讲，我们根据聚合延迟，迷你批量大小和P2P通信回合的数量来表征BFL的收敛属性。我们的数值评估证明了我们所提出的方案优于基线，从模型训练效率，收敛速度，系统潜伏期和对模型中毒攻击的鲁棒性方面。

translated by 谷歌翻译

Online Service Migration in Edge Computing with Incomplete Information: A Deep Recurrent Actor-Critic Method

Jin Wang , Jia Hu , Geyong Min , Qiang Ni , Tarek El-Ghazawi

分类：机器学习

2020-12-16

多访问边缘计算（MEC）是一个新兴的计算范式，将云计算扩展到网络边缘，以支持移动设备上的资源密集型应用程序。作为MEC的关键问题，服务迁移需要决定如何迁移用户服务，以维持用户在覆盖范围和容量有限的MEC服务器之间漫游的服务质量。但是，由于动态的MEC环境和用户移动性，找到最佳的迁移策略是棘手的。许多现有研究根据完整的系统级信息做出集中式迁移决策，这是耗时的，并且缺乏理想的可扩展性。为了应对这些挑战，我们提出了一种新颖的学习驱动方法，该方法以用户为中心，可以通过使用不完整的系统级信息来做出有效的在线迁移决策。具体而言，服务迁移问题被建模为可观察到的马尔可夫决策过程（POMDP）。为了解决POMDP，我们设计了一个新的编码网络，该网络结合了长期记忆（LSTM）和一个嵌入式矩阵，以有效提取隐藏信息，并进一步提出了一种定制的非政策型演员 - 批判性算法，以进行有效的训练。基于现实世界的移动性痕迹的广泛实验结果表明，这种新方法始终优于启发式和最先进的学习驱动算法，并且可以在各种MEC场景上取得近乎最佳的结果。

translated by 谷歌翻译

Computation Offloading and Resource Allocation in F-RANs: A Federated Deep Reinforcement Learning Approach

Lingling Zhang , Yanxiang Jiang , Fu-Chun Zheng , Mehdi Bennis , Xiaohu You

分类：机器学习 | 人工智能

2022-06-13

FOG无线电访问网络（F-RAN）是一项有前途的技术，用户移动设备（MDS）可以将计算任务卸载到附近的FOG接入点（F-APS）。由于F-APS的资源有限，因此设计有效的任务卸载方案很重要。在本文中，通过考虑随时间变化的网络环境，制定了F-RAN中的动态计算卸载和资源分配问题，以最大程度地减少MD的任务执行延迟和能源消耗。为了解决该问题，提出了基于联合的深入强化学习（DRL）算法，其中深层确定性策略梯度（DDPG）算法在每个F-AP中执行计算卸载和资源分配。利用联合学习来培训DDPG代理，以降低培训过程的计算复杂性并保护用户隐私。仿真结果表明，与其他现有策略相比，提议的联合DDPG算法可以更快地实现MDS更快的任务执行延迟和能源消耗。

translated by 谷歌翻译

Exploiting Deep Reinforcement Learning for Edge Caching in Cell-Free Massive MIMO Systems

Yu Zhang , Shuaifei Chen , Jiayi Zhang

分类：人工智能

2022-08-26

通过协调许多继任访问点（APS）来协同为机上用户服务，可以满足铁路无线通信的严格体验质量（QOE）要求，可以满足铁路无线通信的严格体验质量（QOE）要求。一个关键的挑战是如何及时交付所需的内容，这是由于越来越多的火车速度引起的根本性变化的传播环境。在本文中，我们建议在即将到来的AP上主动缓存可能要求的内容，这些APS执行连贯的传输以减少端到端延迟。提出了长期的QoE-最大化问题，并提出了两种缓存放置算法。一个基于启发式凸优化（HCO），另一个基于软性角色批评（SAC）的深入增强学习（DRL）。与常规基准相比，数值结果显示了我们在QOE上提出的算法的优势并命中了概率。使用高级DRL模型，SAC通过准确预测用户请求来优于QOE上的HCO。

translated by 谷歌翻译

HTML版本

Distributed Machine Learning for UAV Swarms: Computing, Sensing, and Semantics

Yahao Ding , Zhaohui Yang , Quoc-Viet Pham , Zhaoyang Zhang , Mohammad Shikh-Bahaei

分类：机器学习 | 人工智能

2023-01-03

Unmanned aerial vehicle (UAV) swarms are considered as a promising technique for next-generation communication networks due to their flexibility, mobility, low cost, and the ability to collaboratively and autonomously provide services. Distributed learning (DL) enables UAV swarms to intelligently provide communication services, multi-directional remote surveillance, and target tracking. In this survey, we first introduce several popular DL algorithms such as federated learning (FL), multi-agent Reinforcement Learning (MARL), distributed inference, and split learning, and present a comprehensive overview of their applications for UAV swarms, such as trajectory design, power control, wireless resource allocation, user assignment, perception, and satellite communications. Then, we present several state-of-the-art applications of UAV swarms in wireless communication systems, such us reconfigurable intelligent surface (RIS), virtual reality (VR), semantic communications, and discuss the problems and challenges that DL-enabled UAV swarms can solve in these applications. Finally, we describe open problems of using DL in UAV swarms and future research directions of DL enabled UAV swarms. In summary, this survey provides a comprehensive survey of various DL applications for UAV swarms in extensive scenarios.

translated by 谷歌翻译

Federated Learning in Mobile Edge Networks: A Comprehensive Survey

Wei Yang Bryan Lim , Nguyen Cong Luong , Dinh Thai Hoang , Yutao Jiao , Ying-Chang Liang , Qiang Yang , Dusit Niyato , Chunyan Miao

分类：

2019-09-26

In recent years, mobile devices are equipped with increasingly advanced sensing and computing capabilities. Coupled with advancements in Deep Learning (DL), this opens up countless possibilities for meaningful applications, e.g., for medical purposes and in vehicular networks. Traditional cloudbased Machine Learning (ML) approaches require the data to be centralized in a cloud server or data center. However, this results in critical issues related to unacceptable latency and communication inefficiency. To this end, Mobile Edge Computing (MEC) has been proposed to bring intelligence closer to the edge, where data is produced. However, conventional enabling technologies for ML at mobile edge networks still require personal data to be shared with external parties, e.g., edge servers. Recently, in light of increasingly stringent data privacy legislations and growing privacy concerns, the concept of Federated Learning (FL) has been introduced. In FL, end devices use their local data to train an ML model required by the server. The end devices then send the model updates rather than raw data to the server for aggregation. FL can serve as an enabling technology in mobile edge networks since it enables the collaborative training of an ML model and also enables DL for mobile edge network optimization. However, in a large-scale and complex mobile edge network, heterogeneous devices with varying constraints are involved. This raises challenges of communication costs, resource allocation, and privacy and security in the implementation of FL at scale. In this survey, we begin with an introduction to the background and fundamentals of FL. Then, we highlight the aforementioned challenges of FL implementation and review existing solutions. Furthermore, we present the applications of FL for mobile edge network optimization. Finally, we discuss the important challenges and future research directions in FL.

translated by 谷歌翻译

Content Popularity Prediction in Fog-RANs: A Clustered Federated Learning Based Approach

Zhiheng Wang , Yanxiang Jiang , Fu-Chun Zheng , Mehdi Bennis , Xiaohu You

分类：机器学习 | 人工智能

2022-06-13

在本文中，研究了FOG无线电访问网络（F-RAN）中的内容流行度预测问题。基于聚集的联合学习，我们提出了一种新颖的移动性知名度预测策略，该政策将内容受欢迎程度整合在本地用户和移动用户方面。对于本地用户，通过学习本地用户和内容的隐藏表示形式来预测内容的普及。本地用户和内容的初始功能是通过将邻居信息与自我信息结合在一起来生成的。然后，引入了双通道神经网络（DCNN）模型，以通过从初始功能中产生深层特征来学习隐藏表示形式。对于移动用户，通过用户偏好学习预测内容流行。为了区分内容受欢迎程度的区域变化，采用了聚类联合学习（CFL），这使具有相似区域类型的雾接入点（F-APS）彼此受益，并为每个F-AP提供更专业的DCNN模型。仿真结果表明，我们提出的政策对传统政策实现了重大的绩效提高。

translated by 谷歌翻译

Beyond 5G Networks: Integration of Communication, Computing, Caching, and Control

Musbahu Mohammed Adam , Liqiang Zhao , Kezhi Wang , Zhu Han

分类：机器学习

2022-12-26

In recent years, the exponential proliferation of smart devices with their intelligent applications poses severe challenges on conventional cellular networks. Such challenges can be potentially overcome by integrating communication, computing, caching, and control (i4C) technologies. In this survey, we first give a snapshot of different aspects of the i4C, comprising background, motivation, leading technological enablers, potential applications, and use cases. Next, we describe different models of communication, computing, caching, and control (4C) to lay the foundation of the integration approach. We review current state-of-the-art research efforts related to the i4C, focusing on recent trends of both conventional and artificial intelligence (AI)-based integration approaches. We also highlight the need for intelligence in resources integration. Then, we discuss integration of sensing and communication (ISAC) and classify the integration approaches into various classes. Finally, we propose open challenges and present future research directions for beyond 5G networks, such as 6G.

translated by 谷歌翻译

Federated learning and next generation wireless communications: A survey on bidirectional relationship

Debaditya Shome , Omer Waqar , Wali Ullah Khan

分类：机器学习

2021-10-14

为了满足下一代无线通信网络的极其异构要求，研究界越来越依赖于使用机器学习解决方案进行实时决策和无线电资源管理。传统的机器学习采用完全集中的架构，其中整个培训数据在一个节点上收集，即云服务器，显着提高了通信开销，并提高了严重的隐私问题。迄今为止，最近提出了作为联合学习（FL）称为联合学习的分布式机器学习范式。在FL中，每个参与边缘设备通过使用自己的培训数据列举其本地模型。然后，通过无线信道，本地训练模型的权重或参数被发送到中央ps，聚合它们并更新全局模型。一方面，FL对优化无线通信网络的资源起着重要作用，另一方面，无线通信对于FL至关重要。因此，FL和无线通信之间存在“双向”关系。虽然FL是一个新兴的概念，但许多出版物已经在FL的领域发表了发布及其对下一代无线网络的应用。尽管如此，我们注意到没有任何作品突出了FL和无线通信之间的双向关系。因此，本调查纸的目的是通过提供关于FL和无线通信之间的相互依存性的及时和全面的讨论来弥合文学中的这种差距。

translated by 谷歌翻译

Accuracy-Guaranteed Collaborative DNN Inference in Industrial IoT via Deep Reinforcement Learning

Wen Wu , Peng Yang , Weiting Zhang , Conghao Zhou , Xuemin , Shen

分类：人工智能 | 机器学习

2022-12-31

Collaboration among industrial Internet of Things (IoT) devices and edge networks is essential to support computation-intensive deep neural network (DNN) inference services which require low delay and high accuracy. Sampling rate adaption which dynamically configures the sampling rates of industrial IoT devices according to network conditions, is the key in minimizing the service delay. In this paper, we investigate the collaborative DNN inference problem in industrial IoT networks. To capture the channel variation and task arrival randomness, we formulate the problem as a constrained Markov decision process (CMDP). Specifically, sampling rate adaption, inference task offloading and edge computing resource allocation are jointly considered to minimize the average service delay while guaranteeing the long-term accuracy requirements of different inference services. Since CMDP cannot be directly solved by general reinforcement learning (RL) algorithms due to the intractable long-term constraints, we first transform the CMDP into an MDP by leveraging the Lyapunov optimization technique. Then, a deep RL-based algorithm is proposed to solve the MDP. To expedite the training process, an optimization subroutine is embedded in the proposed algorithm to directly obtain the optimal edge computing resource allocation. Extensive simulation results are provided to demonstrate that the proposed RL-based algorithm can significantly reduce the average service delay while preserving long-term inference accuracy with a high probability.

translated by 谷歌翻译

Deep Reinforcement Learning for Trajectory Path Planning and Distributed Inference in Resource-Constrained UAV Swarms

Marwan Dhuheir , Emna Baccour , Aiman Erbad , Sinan Sabeeh Al-Obaidi , Mounir Hamdi

分类：机器学习 | 机器人

2022-12-21

The deployment flexibility and maneuverability of Unmanned Aerial Vehicles (UAVs) increased their adoption in various applications, such as wildfire tracking, border monitoring, etc. In many critical applications, UAVs capture images and other sensory data and then send the captured data to remote servers for inference and data processing tasks. However, this approach is not always practical in real-time applications due to the connection instability, limited bandwidth, and end-to-end latency. One promising solution is to divide the inference requests into multiple parts (layers or segments), with each part being executed in a different UAV based on the available resources. Furthermore, some applications require the UAVs to traverse certain areas and capture incidents; thus, planning their paths becomes critical particularly, to reduce the latency of making the collaborative inference process. Specifically, planning the UAVs trajectory can reduce the data transmission latency by communicating with devices in the same proximity while mitigating the transmission interference. This work aims to design a model for distributed collaborative inference requests and path planning in a UAV swarm while respecting the resource constraints due to the computational load and memory usage of the inference requests. The model is formulated as an optimization problem and aims to minimize latency. The formulated problem is NP-hard so finding the optimal solution is quite complex; thus, this paper introduces a real-time and dynamic solution for online applications using deep reinforcement learning. We conduct extensive simulations and compare our results to the-state-of-the-art studies demonstrating that our model outperforms the competing models.

translated by 谷歌翻译

Device Selection for the Coexistence of URLLC and Distributed Learning Services

Milad Ganjalizadeh , Hossein Shokri Ghadikolaei , Deniz Gündüz , Marina Petrova

分类：机器学习

2022-12-22

Recent advances in distributed artificial intelligence (AI) have led to tremendous breakthroughs in various communication services, from fault-tolerant factory automation to smart cities. When distributed learning is run over a set of wirelessly connected devices, random channel fluctuations and the incumbent services running on the same network impact the performance of both distributed learning and the coexisting service. In this paper, we investigate a mixed service scenario where distributed AI workflow and ultra-reliable low latency communication (URLLC) services run concurrently over a network. Consequently, we propose a risk sensitivity-based formulation for device selection to minimize the AI training delays during its convergence period while ensuring that the operational requirements of the URLLC service are met. To address this challenging coexistence problem, we transform it into a deep reinforcement learning problem and address it via a framework based on soft actor-critic algorithm. We evaluate our solution with a realistic and 3GPP-compliant simulator for factory automation use cases. Our simulation results confirm that our solution can significantly decrease the training delay of the distributed AI service while keeping the URLLC availability above its required threshold and close to the scenario where URLLC solely consumes all network resources.

translated by 谷歌翻译

Learning-Based Client Selection for Federated Learning Services Over Wireless Networks with Constrained Monetary Budgets

Zhipeng Cheng , Xuwei Fan , Minghui Liwang , Ning Chen , Xianbin Wang

分类：机器学习 | 人工智能

2022-08-08

我们调查了无线网络中多个联合学习（FL）服务的数据质量感知动态客户选择问题，每个客户都有动态数据集，用于同时培训多个FL服务，每种FL服务都必须为客户付费。限制货币预算。在训练回合中，这个问题被正式化为不合作的马尔可夫游戏。提出了一种基于多代理的混合增强算法，以优化共同的客户选择和付款操作，同时避免采取行动冲突。仿真结果表明，我们提出的算法可以显着改善训练性能。

translated by 谷歌翻译

Wireless for Machine Learning

Henrik Hellström , José Mairton B. da Silva Jr , Mohammad Mohammadi Amiri , Mingzhe Chen , Viktoria Fodor , H. Vincent Poor , Carlo Fischione

分类：机器学习

2020-08-31

随着数据生成越来越多地在没有连接连接的设备上进行，因此与机器学习（ML）相关的流量将在无线网络中无处不在。许多研究表明，传统的无线协议高效或不可持续以支持ML，这创造了对新的无线通信方法的需求。在这项调查中，我们对最先进的无线方法进行了详尽的审查，这些方法是专门设计用于支持分布式数据集的ML服务的。当前，文献中有两个明确的主题，模拟的无线计算和针对ML优化的数字无线电资源管理。这项调查对这些方法进行了全面的介绍，回顾了最重要的作品，突出了开放问题并讨论了应用程序方案。

translated by 谷歌翻译

Decentralized Federated Reinforcement Learning for User-Centric Dynamic TFDD Control

Ziyan Yin , Zhe Wang , Jun Li , Ming Ding , Wen Chen , Shi Jin

分类：机器学习

2022-11-04

The explosive growth of dynamic and heterogeneous data traffic brings great challenges for 5G and beyond mobile networks. To enhance the network capacity and reliability, we propose a learning-based dynamic time-frequency division duplexing (D-TFDD) scheme that adaptively allocates the uplink and downlink time-frequency resources of base stations (BSs) to meet the asymmetric and heterogeneous traffic demands while alleviating the inter-cell interference. We formulate the problem as a decentralized partially observable Markov decision process (Dec-POMDP) that maximizes the long-term expected sum rate under the users' packet dropping ratio constraints. In order to jointly optimize the global resources in a decentralized manner, we propose a federated reinforcement learning (RL) algorithm named federated Wolpertinger deep deterministic policy gradient (FWDDPG) algorithm. The BSs decide their local time-frequency configurations through RL algorithms and achieve global training via exchanging local RL models with their neighbors under a decentralized federated learning framework. Specifically, to deal with the large-scale discrete action space of each BS, we adopt a DDPG-based algorithm to generate actions in a continuous space, and then utilize Wolpertinger policy to reduce the mapping errors from continuous action space back to discrete action space. Simulation results demonstrate the superiority of our proposed algorithm to benchmark algorithms with respect to system sum rate.

translated by 谷歌翻译

Optimization for Master-UAV-powered Auxiliary-Aerial-IRS-assisted IoT Networks: An Option-based Multi-agent Hierarchical Deep Reinforcement Learning Approach

Jingren Xu , Xin Kang , Ronghaixiang Zhang , Ying-Chang Liang , Sumei Sun

分类：机器学习

2021-12-20

本文调查了大师无人机（MUAV） - 互联网（IOT）网络，我们建议使用配备有智能反射表面（IRS）的可充电辅助UAV（AUAV）来增强来自MUAV的通信信号并将MUAG作为充电电源利用。在拟议的模型下，我们研究了这些能量有限的无人机的最佳协作策略，以最大限度地提高物联网网络的累计吞吐量。根据两个无人机之间是否有收费，配制了两个优化问题。为了解决这些问题，提出了两个多代理深度强化学习（DRL）方法，这些方法是集中培训多师深度确定性政策梯度（CT-MADDPG）和多代理深度确定性政策选项评论仪（MADDPOC）。结果表明，CT-MADDPG可以大大减少对UAV硬件的计算能力的要求，拟议的MADDPOC能够在连续动作域中支持低水平的多代理合作学习，其优于优势基于选项的分层DRL，只支持单代理学习和离散操作。

translated by 谷歌翻译