智能论文笔记

A Deep Reinforcement Learning-based Adaptive Charging Policy for Wireless Rechargeable Sensor Networks

Ngoc Bui , Phi Le Nguyen , Viet Anh Nguyen , Phan Thuan Do

分类：机器学习

2022-08-16

无线传感器网络由随机分布的传感器节点组成，用于监视目标或感兴趣的区域。由于每个传感器的电池容量有限，因此维持连续监视的网络是一个挑战。无线电源传输技术正在作为可靠的解决方案，用于通过部署移动充电器（MC）为传感器充电传感器。但是，由于网络中出现不确定性，为MC设计最佳的充电路径是具有挑战性的。由于网络拓扑的不可预测的变化，例如节点故障，传感器的能耗率可能会显着波动。这些变化也导致每个传感器的重要性变化，在现有作品中通常被认为是相同的。我们在本文中提出了一种使用深度强化学习（DRL）方法提出新颖的自适应充电方案，以解决这些挑战。具体来说，我们赋予MC采用充电策略，该策略确定了下一个在网络当前状态上充电条件的传感器。然后，我们使用深层神经网络来参数这项收费策略，该策略将通过强化学习技术进行培训。我们的模型可以适应网络拓扑的自发变化。经验结果表明，所提出的算法的表现优于现有的按需算法的大幅度边缘。

translated by 谷歌翻译

Joint Cluster Head Selection and Trajectory Planning in UAV-Aided IoT Networks by Reinforcement Learning with Sequential Model

Botao Zhu , Ebrahim Bedeer , Ha H. Nguyen , Robert Barton , Jerome Henry

分类：机器学习

2021-12-01

雇用无人驾驶航空公司（无人机）吸引了日益增长的兴趣，并成为互联网（物联网）网络中的数据收集技术的最先进技术。在本文中，目的是最大限度地减少UAV-IOT系统的总能耗，我们制定了联合设计了UAV的轨迹和选择IOT网络中的群集头作为受约束的组合优化问题的问题，该问题被归类为NP-努力解决。我们提出了一种新的深度加强学习（DRL），其具有顺序模型策略，可以通过无监督方式有效地学习由UAV的轨迹设计来实现由序列到序列神经网络表示的策略。通过广泛的模拟，所获得的结果表明，与其他基线算法相比，所提出的DRL方法可以找到无人机的轨迹，这些轨迹需要更少的能量消耗，并实现近乎最佳性能。此外，仿真结果表明，我们所提出的DRL算法的训练模型具有出色的概括能力，对更大的问题尺寸而没有必要恢复模型。

translated by 谷歌翻译

Applications of Multi-Agent Reinforcement Learning in Future Internet: A Comprehensive Survey

Tianxu Li , Kun Zhu , Nguyen Cong Luong , Dusit Niyato , Qihui Wu , Yang Zhang , Bing Chen

分类：人工智能 | 机器学习

2021-10-26

未来的互联网涉及几种新兴技术，例如5G和5G网络，车辆网络，无人机（UAV）网络和物联网（IOT）。此外，未来的互联网变得异质并分散了许多相关网络实体。每个实体可能需要做出本地决定，以在动态和不确定的网络环境下改善网络性能。最近使用标准学习算法，例如单药强化学习（RL）或深入强化学习（DRL），以使每个网络实体作为代理人通过与未知环境进行互动来自适应地学习最佳决策策略。但是，这种算法未能对网络实体之间的合作或竞争进行建模，而只是将其他实体视为可能导致非平稳性问题的环境的一部分。多机构增强学习（MARL）允许每个网络实体不仅观察环境，还可以观察其他实体的政策来学习其最佳政策。结果，MAL可以显着提高网络实体的学习效率，并且最近已用于解决新兴网络中的各种问题。在本文中，我们因此回顾了MAL在新兴网络中的应用。特别是，我们提供了MARL的教程，以及对MARL在下一代互联网中的应用进行全面调查。特别是，我们首先介绍单代机Agent RL和MARL。然后，我们回顾了MAL在未来互联网中解决新兴问题的许多应用程序。这些问题包括网络访问，传输电源控制，计算卸载，内容缓存，数据包路由，无人机网络的轨迹设计以及网络安全问题。

translated by 谷歌翻译

Optimization for Master-UAV-powered Auxiliary-Aerial-IRS-assisted IoT Networks: An Option-based Multi-agent Hierarchical Deep Reinforcement Learning Approach

Jingren Xu , Xin Kang , Ronghaixiang Zhang , Ying-Chang Liang , Sumei Sun

分类：机器学习

2021-12-20

本文调查了大师无人机（MUAV） - 互联网（IOT）网络，我们建议使用配备有智能反射表面（IRS）的可充电辅助UAV（AUAV）来增强来自MUAV的通信信号并将MUAG作为充电电源利用。在拟议的模型下，我们研究了这些能量有限的无人机的最佳协作策略，以最大限度地提高物联网网络的累计吞吐量。根据两个无人机之间是否有收费，配制了两个优化问题。为了解决这些问题，提出了两个多代理深度强化学习（DRL）方法，这些方法是集中培训多师深度确定性政策梯度（CT-MADDPG）和多代理深度确定性政策选项评论仪（MADDPOC）。结果表明，CT-MADDPG可以大大减少对UAV硬件的计算能力的要求，拟议的MADDPOC能够在连续动作域中支持低水平的多代理合作学习，其优于优势基于选项的分层DRL，只支持单代理学习和离散操作。

translated by 谷歌翻译

Beyond 5G Networks: Integration of Communication, Computing, Caching, and Control

Musbahu Mohammed Adam , Liqiang Zhao , Kezhi Wang , Zhu Han

分类：机器学习

2022-12-26

In recent years, the exponential proliferation of smart devices with their intelligent applications poses severe challenges on conventional cellular networks. Such challenges can be potentially overcome by integrating communication, computing, caching, and control (i4C) technologies. In this survey, we first give a snapshot of different aspects of the i4C, comprising background, motivation, leading technological enablers, potential applications, and use cases. Next, we describe different models of communication, computing, caching, and control (4C) to lay the foundation of the integration approach. We review current state-of-the-art research efforts related to the i4C, focusing on recent trends of both conventional and artificial intelligence (AI)-based integration approaches. We also highlight the need for intelligence in resources integration. Then, we discuss integration of sensing and communication (ISAC) and classify the integration approaches into various classes. Finally, we propose open challenges and present future research directions for beyond 5G networks, such as 6G.

translated by 谷歌翻译

UAV-Assisted Space-Air-Ground Integrated Networks: A Technical Review of Recent Learning Algorithms

Atefeh H. Arani , Peng Hu , Yeying Zhu

分类：机器学习

2022-11-27

Recent technological advancements in space, air and ground components have made possible a new network paradigm called "space-air-ground integrated network" (SAGIN). Unmanned aerial vehicles (UAVs) play a key role in SAGINs. However, due to UAVs' high dynamics and complexity, the real-world deployment of a SAGIN becomes a major barrier for realizing such SAGINs. Compared to the space and terrestrial components, UAVs are expected to meet performance requirements with high flexibility and dynamics using limited resources. Therefore, employing UAVs in various usage scenarios requires well-designed planning in algorithmic approaches. In this paper, we provide a comprehensive review of recent learning-based algorithmic approaches. We consider possible reward functions and discuss the state-of-the-art algorithms for optimizing the reward functions, including Q-learning, deep Q-learning, multi-armed bandit (MAB), particle swarm optimization (PSO) and satisfaction-based learning algorithms. Unlike other survey papers, we focus on the methodological perspective of the optimization problem, which can be applicable to various UAV-assisted missions on a SAGIN using these algorithms. We simulate users and environments according to real-world scenarios and compare the learning-based and PSO-based methods in terms of throughput, load, fairness, computation time, etc. We also implement and evaluate the 2-dimensional (2D) and 3-dimensional (3D) variations of these algorithms to reflect different deployment cases. Our simulation suggests that the $3$D satisfaction-based learning algorithm outperforms the other approaches for various metrics in most cases. We discuss some open challenges at the end and our findings aim to provide design guidelines for algorithm selections while optimizing the deployment of UAV-assisted SAGINs.

translated by 谷歌翻译

Online Service Migration in Edge Computing with Incomplete Information: A Deep Recurrent Actor-Critic Method

Jin Wang , Jia Hu , Geyong Min , Qiang Ni , Tarek El-Ghazawi

分类：机器学习

2020-12-16

多访问边缘计算（MEC）是一个新兴的计算范式，将云计算扩展到网络边缘，以支持移动设备上的资源密集型应用程序。作为MEC的关键问题，服务迁移需要决定如何迁移用户服务，以维持用户在覆盖范围和容量有限的MEC服务器之间漫游的服务质量。但是，由于动态的MEC环境和用户移动性，找到最佳的迁移策略是棘手的。许多现有研究根据完整的系统级信息做出集中式迁移决策，这是耗时的，并且缺乏理想的可扩展性。为了应对这些挑战，我们提出了一种新颖的学习驱动方法，该方法以用户为中心，可以通过使用不完整的系统级信息来做出有效的在线迁移决策。具体而言，服务迁移问题被建模为可观察到的马尔可夫决策过程（POMDP）。为了解决POMDP，我们设计了一个新的编码网络，该网络结合了长期记忆（LSTM）和一个嵌入式矩阵，以有效提取隐藏信息，并进一步提出了一种定制的非政策型演员 - 批判性算法，以进行有效的训练。基于现实世界的移动性痕迹的广泛实验结果表明，这种新方法始终优于启发式和最先进的学习驱动算法，并且可以在各种MEC场景上取得近乎最佳的结果。

translated by 谷歌翻译

Intelligent Resource Allocation in Dense LoRa Networks using Deep Reinforcement Learning

Inaam Ilahi , Muhammad Usama , Muhammad Omer Farooq , Muhammad Umar Janjua , Junaid Qadir

分类：人工智能

2020-12-22

未来几年物联网设备计数的预期增加促使有效算法的开发，可以帮助其有效管理，同时保持功耗低。在本文中，我们提出了一种智能多通道资源分配算法，用于Loradrl的密集Lora网络，并提供详细的性能评估。我们的结果表明，所提出的算法不仅显着提高了Lorawan的分组传递比（PDR），而且还能够支持移动终端设备（EDS），同时确保较低的功耗，因此增加了网络的寿命和容量。}大多数之前作品侧重于提出改进网络容量的不同MAC协议，即Lorawan，传输前的延迟等。我们展示通过使用Loradrl，我们可以通过Aloha \ TextColor {Black}与Lorasim相比，我们可以实现相同的效率LORA-MAB在将复杂性从EDS移动到网关的同时，因此使EDS更简单和更便宜。此外，我们在大规模的频率干扰攻击下测试Loradrl的性能，并显示其对环境变化的适应性。我们表明，与基于学习的技术相比，Loradrl的输出改善了最先进的技术的性能，从而提高了PR的500多种\％。

translated by 谷歌翻译

Distributed Machine Learning for UAV Swarms: Computing, Sensing, and Semantics

Yahao Ding , Zhaohui Yang , Quoc-Viet Pham , Zhaoyang Zhang , Mohammad Shikh-Bahaei

分类：机器学习 | 人工智能

2023-01-03

Unmanned aerial vehicle (UAV) swarms are considered as a promising technique for next-generation communication networks due to their flexibility, mobility, low cost, and the ability to collaboratively and autonomously provide services. Distributed learning (DL) enables UAV swarms to intelligently provide communication services, multi-directional remote surveillance, and target tracking. In this survey, we first introduce several popular DL algorithms such as federated learning (FL), multi-agent Reinforcement Learning (MARL), distributed inference, and split learning, and present a comprehensive overview of their applications for UAV swarms, such as trajectory design, power control, wireless resource allocation, user assignment, perception, and satellite communications. Then, we present several state-of-the-art applications of UAV swarms in wireless communication systems, such us reconfigurable intelligent surface (RIS), virtual reality (VR), semantic communications, and discuss the problems and challenges that DL-enabled UAV swarms can solve in these applications. Finally, we describe open problems of using DL in UAV swarms and future research directions of DL enabled UAV swarms. In summary, this survey provides a comprehensive survey of various DL applications for UAV swarms in extensive scenarios.

translated by 谷歌翻译

Learning based Age of Information Minimization in UAV-relayed IoT Networks

Biplav Choudhury , Prasenjit Karmakar , Vijay K. Shah , Jeffrey H. Reed

分类：机器学习

2022-03-08

无人驾驶飞机（UAV）用作空中基础站，可将时间敏感的包装从物联网设备传递到附近的陆地底站（TBS）。在此类无人产用的物联网网络中安排数据包，以确保TBS在TBS上确保新鲜（或最新的）物联网设备的数据包是一个挑战性的问题，因为它涉及两个同时的步骤（i）（i）在IOT设备上生成的数据包的同时进行样本由UAVS [HOP-1]和（ii）将采样数据包从UAVS更新到TBS [Hop-2]。为了解决这个问题，我们建议针对两跳UAV相关的IoT网络的信息年龄（AOI）调度算法。首先，我们提出了一个低复杂的AOI调度程序，称为MAF-MAD，该计划使用UAV（HOP-1）和最大AOI差异（MAD）策略采样最大AOI（MAF）策略，以更新从无人机到TBS（Hop-2）。我们证明，MAF-MAD是理想条件下的最佳AOI调度程序（无线无线通道和在物联网设备上产生交通生成）。相反，对于一般条件（物联网设备的损失渠道条件和不同的周期性交通生成），提出了深厚的增强学习算法，即近端政策优化（PPO）基于调度程序。仿真结果表明，在所有考虑的一般情况下，建议的基于PPO的调度程序优于MAF-MAD，MAF和Round-Robin等其他调度程序。

translated by 谷歌翻译

Energy-aware optimization of UAV base stations placement via decentralized multi-agent Q-learning

Babatunji Omoniwa , Boris Galkin , Ivana Dusparic

分类：机器学习

2021-06-01

可以部署作为空中基站（UAV-BS）的无人机飞行器，以便在增加网络需求，现有基础设施中的失败点或灾难的情况下为地面设备提供无线连接。然而，考虑到它们的板载电池容量有限，挑战无人机的能量是挑战。先前已经用于提高诸如多个无人机的能量利用的加强学习（RL）方法，然而，假设中央云控制器具有完全了解端设备的位置，即控制器周期性地扫描并发送更新无人机决策。在具有服务接地设备的UAVS的动态网络环境中，此假设在动态网络环境中是不切实际的。为了解决这个问题，我们提出了一种分散的Q学习方法，其中每个UAV-BS都配备了一种自主代理，可以最大化移动地设备的连接，同时提高其能量利用率。实验结果表明，该设计的设计显着优于联合最大化连接地面装置的数量和UAV-BS的能量利用中的集中方法。

translated by 谷歌翻译

Energy-Efficient Wake-Up Signalling for Machine-Type Devices Based on Traffic-Aware Long-Short Term Memory Prediction

David E. Ruíz-Guirola , Carlos A. Rodríguez-López , Samuel Montejo-Sánchez , Richard Demo Souza , Onel L. A. López , Hirley Alves

分类：机器学习

2022-06-13

减少能源消耗是低功率机型通信（MTC）网络中的一个紧迫问题。在这方面，旨在最大程度地减少机器型设备（MTD）无线电接口所消耗的能量的唤醒信号（WUS）技术是一种有前途的解决方案。但是，最新的WUS机制使用静态操作参数，因此它们无法有效地适应系统动力学。为了克服这一点，我们设计了一个简单但有效的神经网络，以预测MTC流量模式并相应地配置WU。我们提出的预测WUS（FWUS）利用了基于精确的长期记忆（LSTM） - 基于流量预测，该预测允许通过避免在闲置状态下的频繁页面监视场合来延长MTD的睡眠时间。仿真结果显示了我们方法的有效性。流量预测错误显示为4％以下，分别为错误警报和错过检测概率低于8.8％和1.3％。在减少能源消耗方面，FWUS的表现可以胜过高达32％的最佳基准机制。最后，我们证明了FWUS动态适应交通密度变化的能力，促进了低功率MTC可伸缩性

translated by 谷歌翻译

Attention-Based Model and Deep Reinforcement Learning for Distribution of Event Processing Tasks

A. Mazayev , F. Al-Tam , N. Correia

分类：机器学习

2021-12-07

事件处理是动态和响应互联网（物联网）的基石。该领域的最近方法基于代表性状态转移（REST）原则，其允许将事件处理任务放置在遵循相同原理的任何设备上。但是，任务应在边缘设备之间正确分布，以确保公平资源利用率和保证无缝执行。本文调查了深入学习的使用，以公平分配任务。提出了一种基于关注的神经网络模型，在不同场景下产生有效的负载平衡解决方案。所提出的模型基于变压器和指针网络架构，并通过Advantage演员批评批评学习算法训练。该模型旨在缩放到事件处理任务的数量和边缘设备的数量，不需要重新调整甚至再刷新。广泛的实验结果表明，拟议的模型在许多关键绩效指标中优于传统的启发式。通用设计和所获得的结果表明，所提出的模型可能适用于几个其他负载平衡问题变化，这使得该提案是由于其可扩展性和效率而在现实世界场景中使用的有吸引力的选择。

translated by 谷歌翻译

Fairness Based Energy-Efficient 3D Path Planning of a Portable Access Point: A Deep Reinforcement Learning Approach

Nithin Babu , Igor Donevski , Alvaro Valcarce , Petar Popovski , Jimmy Jessen Nielsen , Constantinos B. Papadias

分类：人工智能 | 机器学习 | 机器人

2022-08-10

在这项工作中，我们优化了基于无人机（UAV）的便携式接入点（PAP）的3D轨迹，该轨迹为一组接地节点（GNS）提供无线服务。此外，根据Peukert效果，我们考虑无人机电池的实用非线性电池放电。因此，我们以一种新颖的方式提出问题，代表了基于公平的能源效率度量的最大化，并被称为公平能源效率（费用）。费用指标定义了一个系统，该系统对每用户服务的公平性和PAP的能源效率都非常重要。该法式问题采用非凸面问题的形式，并具有不可扣除的约束。为了获得解决方案，我们将问题表示为具有连续状态和动作空间的马尔可夫决策过程（MDP）。考虑到解决方案空间的复杂性，我们使用双胞胎延迟的深层确定性政策梯度（TD3）参与者 - 批判性深入强化学习（DRL）框架来学习最大化系统费用的政策。我们进行两种类型的RL培训来展示我们方法的有效性：第一种（离线）方法在整个训练阶段保持GN的位置相同；第二种方法将学习的政策概括为GN的任何安排，通过更改GN的位置，每次培训情节后。数值评估表明，忽视Peukert效应高估了PAP的播放时间，可以通过最佳选择PAP的飞行速度来解决。此外，用户公平，能源效率，因此可以通过有效地将PAP移动到GN上方，从而提高系统的费用价值。因此，我们注意到郊区，城市和茂密的城市环境的基线情景高达88.31％，272.34％和318.13％。

translated by 谷歌翻译

Deep Reinforcement Learning for Trajectory Path Planning and Distributed Inference in Resource-Constrained UAV Swarms

Marwan Dhuheir , Emna Baccour , Aiman Erbad , Sinan Sabeeh Al-Obaidi , Mounir Hamdi

分类：机器学习 | 机器人

2022-12-21

The deployment flexibility and maneuverability of Unmanned Aerial Vehicles (UAVs) increased their adoption in various applications, such as wildfire tracking, border monitoring, etc. In many critical applications, UAVs capture images and other sensory data and then send the captured data to remote servers for inference and data processing tasks. However, this approach is not always practical in real-time applications due to the connection instability, limited bandwidth, and end-to-end latency. One promising solution is to divide the inference requests into multiple parts (layers or segments), with each part being executed in a different UAV based on the available resources. Furthermore, some applications require the UAVs to traverse certain areas and capture incidents; thus, planning their paths becomes critical particularly, to reduce the latency of making the collaborative inference process. Specifically, planning the UAVs trajectory can reduce the data transmission latency by communicating with devices in the same proximity while mitigating the transmission interference. This work aims to design a model for distributed collaborative inference requests and path planning in a UAV swarm while respecting the resource constraints due to the computational load and memory usage of the inference requests. The model is formulated as an optimization problem and aims to minimize latency. The formulated problem is NP-hard so finding the optimal solution is quite complex; thus, this paper introduces a real-time and dynamic solution for online applications using deep reinforcement learning. We conduct extensive simulations and compare our results to the-state-of-the-art studies demonstrating that our model outperforms the competing models.

translated by 谷歌翻译

Deep Reinforcement Learning for Cyber Security

Thanh Thi Nguyen , Vijay Janapa Reddi

分类：人工智能 | 机器学习 | (统计)机器学习

2019-06-13

互联网连接系统的规模大大增加，这些系统比以往任何时候都更接触到网络攻击。网络攻击的复杂性和动态需要保护机制响应，自适应和可扩展。机器学习，或更具体地说，深度增强学习（DRL），方法已经广泛提出以解决这些问题。通过将深入学习纳入传统的RL，DRL能够解决复杂，动态，特别是高维的网络防御问题。本文提出了对为网络安全开发的DRL方法进行了调查。我们触及不同的重要方面，包括基于DRL的网络 - 物理系统的安全方法，自主入侵检测技术和基于多元的DRL的游戏理论模拟，用于防范策略对网络攻击。还给出了对基于DRL的网络安全的广泛讨论和未来的研究方向。我们预计这一全面审查提供了基础，并促进了未来的研究，探讨了越来越复杂的网络安全问题。

translated by 谷歌翻译

Multi-Agent Collaborative Inference via DNN Decoupling: Intermediate Feature Compression and Edge Learning

Zhiwei Hao , Guanyu Xu , Yong Luo , Han Hu , Jianping An , Shiwen Mao

分类：机器学习

2022-05-24

最近，通过协作推断部署深神经网络（DNN）模型，该推断将预训练的模型分为两个部分，并分别在用户设备（UE）和Edge Server上执行它们，从而变得有吸引力。但是，DNN的大型中间特征会阻碍灵活的脱钩，现有方法要么集中在单个UE方案上，要么只是在考虑所需的CPU周期的情况下定义任务，但忽略了单个DNN层的不可分割性。在本文中，我们研究了多代理协作推理方案，其中单个边缘服务器协调了多个UES的推理。我们的目标是为所有UES实现快速和节能的推断。为了实现这一目标，我们首先设计了一种基于自动编码器的轻型方法，以压缩大型中间功能。然后，我们根据DNN的推理开销定义任务，并将问题作为马尔可夫决策过程（MDP）。最后，我们提出了一种多代理混合近端策略优化（MAHPPO）算法，以解决混合动作空间的优化问题。我们对不同类型的网络进行了广泛的实验，结果表明，我们的方法可以降低56％的推理潜伏期，并节省多达72 \％的能源消耗。

translated by 谷歌翻译

Multi-hop RIS-Empowered Terahertz Communications: A DRL-based Hybrid Beamforming Design

Chongwen Huang , Zhaohui Yang , George C. Alexandropoulos , Kai Xiong , Li Wei , Chau Yuen , Zhaoyang Zhang , Merouane Debbah

分类：机器学习

2021-01-22

Terahertz频段（0.1---10 THZ）中的无线通信被视为未来第六代（6G）无线通信系统的关键促进技术之一，超出了大量多重输入多重输出（大量MIMO）技术。但是，THZ频率的非常高的传播衰减和分子吸收通常限制了信号传输距离和覆盖范围。从最近在可重构智能表面（RIS）上实现智能无线电传播环境的突破，我们为多跳RIS RIS辅助通信网络提供了一种新型的混合波束形成方案，以改善THZ波段频率的覆盖范围。特别是，部署了多个被动和可控的RIS，以协助基站（BS）和多个单人体用户之间的传输。我们通过利用最新的深钢筋学习（DRL）来应对传播损失的最新进展，研究了BS在BS和RISS上的模拟光束矩阵的联合设计。为了改善拟议的基于DRL的算法的收敛性，然后设计了两种算法，以初始化数字波束形成和使用交替优化技术的模拟波束形成矩阵。仿真结果表明，与基准相比，我们提出的方案能够改善50 \％的THZ通信范围。此外，还表明，我们提出的基于DRL的方法是解决NP-固定光束形成问题的最先进方法，尤其是当RIS辅助THZ通信网络的信号经历多个啤酒花时。

translated by 谷歌翻译

DRL-M4MR: An Intelligent Multicast Routing Approach Based on DQN Deep Reinforcement Learning in SDN

Chenwei Zhao , Miao Ye , Xingsi Xue , Jianhui Lv , Qiuxiang Jiang , Yong Wang

分类：人工智能

2022-07-31

传统的多播路由方法在构建多播树时存在一些问题，例如对网络状态信息的访问有限，对网络的动态和复杂变化的适应性不佳以及不灵活的数据转发。为了解决这些缺陷，软件定义网络（SDN）中的最佳多播路由问题是根据多目标优化问题量身定制的，以及基于深Q网络（DQN）深度强化学习（DQN）的智能多播路由算法DRL-M4MR（ DRL）方法旨在构建SDN中的多播树。首先，通过组合SDN的全局视图和控制，将多播树状态矩阵，链路带宽矩阵，链路延迟矩阵和链路延迟损耗矩阵设计为DRL代理的状态空间。其次，代理的动作空间是网络中的所有链接，而动作选择策略旨在将链接添加到四种情况下的当前多播树。第三，单步和最终奖励功能表格旨在指导智能以做出决定以构建最佳多播树。实验结果表明，与现有算法相比，DRL-M4MR的多播树结构可以在训练后获得更好的带宽，延迟和数据包损耗率，并且可以在动态网络环境中做出更智能的多播路由决策。

translated by 谷歌翻译

Asynchronous Hybrid Reinforcement Learning for Latency and Reliability Optimization in the Metaverse over Wireless Communications

Wenhan Yu , Terence Jie Chua , Jun Zhao

分类：机器学习

2022-12-30

Technology advancements in wireless communications and high-performance Extended Reality (XR) have empowered the developments of the Metaverse. The demand for Metaverse applications and hence, real-time digital twinning of real-world scenes is increasing. Nevertheless, the replication of 2D physical world images into 3D virtual world scenes is computationally intensive and requires computation offloading. The disparity in transmitted scene dimension (2D as opposed to 3D) leads to asymmetric data sizes in uplink (UL) and downlink (DL). To ensure the reliability and low latency of the system, we consider an asynchronous joint UL-DL scenario where in the UL stage, the smaller data size of the physical world scenes captured by multiple extended reality users (XUs) will be uploaded to the Metaverse Console (MC) to be construed and rendered. In the DL stage, the larger-size 3D virtual world scenes need to be transmitted back to the XUs. The decisions pertaining to computation offloading and channel assignment are optimized in the UL stage, and the MC will optimize power allocation for users assigned with a channel in the UL transmission stage. Some problems arise therefrom: (i) interactive multi-process chain, specifically Asynchronous Markov Decision Process (AMDP), (ii) joint optimization in multiple processes, and (iii) high-dimensional objective functions, or hybrid reward scenarios. To ensure the reliability and low latency of the system, we design a novel multi-agent reinforcement learning algorithm structure, namely Asynchronous Actors Hybrid Critic (AAHC). Extensive experiments demonstrate that compared to proposed baselines, AAHC obtains better solutions with preferable training time.

translated by 谷歌翻译