智能论文笔记

OnSlicing: Online End-to-End Network Slicing with Reinforcement Learning

Qiang Liu , Nakjung Choi , Tao Han

分类：机器学习

2021-11-02

网络切片允许移动网络运营商虚拟化基础架构，并提供定制的切片，以支持具有异构要求的各种用例。在线深度加强学习（DRL）在解决网络问题和消除模拟 - 现实差异方面表现出有希望的潜力。然而，在线DRL优化跨域资源，作为DRL的随机探索违反了切片的服务级别协议（SLA）和基础架构的资源限制。在本文中，我们提出了一个在线端到端网络切片系统的Onslicing，以实现最小的资源用法，同时满足切片的SLA。 Onslicing允许为每个切片个性化学习，并通过使用新的约束感知策略更新方法和主动基线切换机制来维护其SLA。在基础架构中的切片和参数协调中，符合基础设施的资源限制，符合基础架构的资源限制。 Onslicing进一步减轻了在早期学习阶段的在线学习的差表现不佳，该阶段模仿基于规则的解决方案。此外，我们设计了四个新的域管理员，可以分别在零档的时间尺寸，传输，核心和边缘网络中启用动态资源配置。我们在基于OpenAirInterface的端到端切片测试平面上实现了onSlicing，其中4G LTE和5G NR，OpenDaylight SDN平台和OpenAir-CN核心网络。实验结果表明，与基于规则的解决方案相比，持续达到61.3％的使用量减少，并在在线学习阶段保持近零违规（0.06％）。随着在线学习融合，与最先进的在线DRL解决方案相比，在没有任何违规的情况下，在没有任何违规的情况下减少了12.5％的使用。

translated by 谷歌翻译

CLARA: A Constrained Reinforcement Learning Based Resource Allocation Framework for Network Slicing

Yongshuai Liu , Jiaxin Ding , Zhi-Li Zhang , Xin Liu

分类：机器学习

2021-11-16

随着移动网络的增殖，我们正在遇到强大的服务多样化，这需要从现有网络的更大灵活性。建议网络切片作为5G和未来网络的资源利用解决方案，以解决这种可怕需求。在网络切片中，动态资源编排和网络切片管理对于最大化资源利用率至关重要。不幸的是，由于缺乏准确的模型和动态隐藏结构，这种过程对于传统方法来说太复杂。在不知道模型和隐藏结构的情况下，我们将问题作为受约束的马尔可夫决策过程（CMDP）制定。此外，我们建议使用Clara解决问题，这是一种基于钢筋的基于资源分配算法。特别是，我们分别使用自适应内部点策略优化和投影层分析累积和瞬时约束。评估表明，Clara明显优于资源配置的基线，通过服务需求保证。

translated by 谷歌翻译

Evolutionary Deep Reinforcement Learning for Dynamic Slice Management in O-RAN

Fatemeh Lotfi , Omid Semiari , Fatemeh Afghah

分类：人工智能 | 机器学习 | 神经与进化计算

2022-08-30

需要下一代无线网络以同时满足各种服务和标准。为了解决即将到来的严格条件，开发了具有柔性设计，分解虚拟和可编程组件以及智能闭环控制等特征的新型开放式访问网络（O-RAN）。面对不断变化的情况，O-Ran切片被研究为确保网络服务质量（QoS）的关键策略。但是，必须动态控制不同的网络切片，以避免由环境快速变化引起的服务水平一致性（SLA）变化。因此，本文介绍了一个新颖的框架，能够通过智能提供的提供资源来管理网络切片。由于不同的异质环境，智能机器学习方法需要足够的探索来处理无线网络中最严厉的情况并加速收敛。为了解决这个问题，提出了一种新解决方案，基于基于进化的深度强化学习（EDRL），以加速和优化无线电访问网络（RAN）智能控制器（RIC）模块中的切片管理学习过程。为此，O-RAN切片被表示为Markov决策过程（MDP），然后最佳地解决了资源分配，以使用EDRL方法满足服务需求。在达到服务需求方面，仿真结果表明，所提出的方法的表现优于DRL基线62.2％。

translated by 谷歌翻译

HTML版本

Applications of Multi-Agent Reinforcement Learning in Future Internet: A Comprehensive Survey

Tianxu Li , Kun Zhu , Nguyen Cong Luong , Dusit Niyato , Qihui Wu , Yang Zhang , Bing Chen

分类：人工智能 | 机器学习

2021-10-26

未来的互联网涉及几种新兴技术，例如5G和5G网络，车辆网络，无人机（UAV）网络和物联网（IOT）。此外，未来的互联网变得异质并分散了许多相关网络实体。每个实体可能需要做出本地决定，以在动态和不确定的网络环境下改善网络性能。最近使用标准学习算法，例如单药强化学习（RL）或深入强化学习（DRL），以使每个网络实体作为代理人通过与未知环境进行互动来自适应地学习最佳决策策略。但是，这种算法未能对网络实体之间的合作或竞争进行建模，而只是将其他实体视为可能导致非平稳性问题的环境的一部分。多机构增强学习（MARL）允许每个网络实体不仅观察环境，还可以观察其他实体的政策来学习其最佳政策。结果，MAL可以显着提高网络实体的学习效率，并且最近已用于解决新兴网络中的各种问题。在本文中，我们因此回顾了MAL在新兴网络中的应用。特别是，我们提供了MARL的教程，以及对MARL在下一代互联网中的应用进行全面调查。特别是，我们首先介绍单代机Agent RL和MARL。然后，我们回顾了MAL在未来互联网中解决新兴问题的许多应用程序。这些问题包括网络访问，传输电源控制，计算卸载，内容缓存，数据包路由，无人机网络的轨迹设计以及网络安全问题。

translated by 谷歌翻译

Holistic Network Virtualization and Pervasive Network Intelligence for 6G

Xuemin , Shen , Jie Gao , Wen Wu , Mushu Li , Conghao Zhou , Weihua Zhuang

分类：人工智能

2023-01-02

In this tutorial paper, we look into the evolution and prospect of network architecture and propose a novel conceptual architecture for the 6th generation (6G) networks. The proposed architecture has two key elements, i.e., holistic network virtualization and pervasive artificial intelligence (AI). The holistic network virtualization consists of network slicing and digital twin, from the aspects of service provision and service demand, respectively, to incorporate service-centric and user-centric networking. The pervasive network intelligence integrates AI into future networks from the perspectives of networking for AI and AI for networking, respectively. Building on holistic network virtualization and pervasive network intelligence, the proposed architecture can facilitate three types of interplay, i.e., the interplay between digital twin and network slicing paradigms, between model-driven and data-driven methods for network management, and between virtualization and AI, to maximize the flexibility, scalability, adaptivity, and intelligence for 6G networks. We also identify challenges and open issues related to the proposed architecture. By providing our vision, we aim to inspire further discussions and developments on the potential architecture of 6G.

translated by 谷歌翻译

Multi-Objective Provisioning of Network Slices using Deep Reinforcement Learning

Chien-Cheng Wu , Vasilis Friderikos1 , Cedomir Stefanovic

分类：机器学习

2022-07-27

网络切片（NS）对于有效启用下一代网络中的发散网络应用至关重要。尽管如此，网络服务中的复杂服务质量（QoS）要求和多样性的异质性需要网络切片供应（NSP）优化的高计算时间。传统优化方法在满足网络应用程序的低潜伏期和高可靠性方面具有挑战性。为此，我们将实时NSP建模为在线网络切片配置（ONSP）问题。具体而言，我们将ONSP问题作为在线多目标整数编程优化（MOIPO）问题。然后，我们通过将近端策略优化（PPO）方法应用于交通需求预测来近似于Moipo问题的解决方案。我们的仿真结果表明，与最先进的Moipo求解器相比，该方法的有效性具有较低的SLA违规率和网络操作成本。

translated by 谷歌翻译

Effective Multi-User Delay-Constrained Scheduling with Deep Recurrent Reinforcement Learning

Pihe Hu , Ling Pan , Yu Chen , Zhixuan Fang , Longbo Huang

分类：机器学习

2022-08-30

多用户延迟约束调度在许多现实世界应用中都很重要，包括无线通信，实时流和云计算。然而，它提出了一个关键的挑战，因为调度程序需要做出实时决策，以确保没有系统动力学的先前信息，这可能是时间变化且难以估算的。此外，许多实际情况都遭受了部分可观察性问题的影响，例如，由于感应噪声或隐藏的相关性。为了应对这些挑战，我们提出了一种深入的强化学习（DRL）算法，称为Recurrent Softmax延迟深层双重确定性策略梯度（$ \ Mathtt {RSD4} $），这是一种基于数据驱动的方法，基于部分观察到的Markov决策过程（POMDP）配方。 $ \ mathtt {rsd4} $分别通过拉格朗日双重和延迟敏感的队列保证资源和延迟约束。它还可以通过复发性神经网络（RNN）启用的记忆机制有效地解决部分可观察性，并引入用户级分解和节点级别的合并以确保可扩展性。对模拟/现实世界数据集的广泛实验表明，$ \ mathtt {rsd4} $对系统动力学和部分可观察到的环境是可靠的，并且在现有的DRL和非基于DRL的方法上实现了卓越的性能。

translated by 谷歌翻译

HTML版本

ColO-RAN: Developing Machine Learning-based xApps for Open RAN Closed-loop Control on Programmable Experimental Platforms

Michele Polese , Leonardo Bonati , Salvatore D'Oro , Stefano Basagni , Tommaso Melodia

分类：机器学习

2021-12-17

尽管开放式运输所带来的新机遇，但基于ML的网络自动化的进步已经缓慢，主要是因为大规模数据集和实验测试基础设施的不可用。这减缓了实际网络上的深度加强学习（DRL）代理的开发和广泛采用，延迟了智能和自主运行控制的进展。在本文中，我们通过提出用于开放式RAN基于DRL基闭环控制的设计，培训，测试和实验评估的实用解决方案和软件管道来解决这些挑战。我们介绍了Colo-RAN，这是一个具有软件定义的无线电循环的第一个公开的大型O-RAN测试框架。在ColoSseum无线网络仿真器的规模和计算能力上，Colo-RAN使用O-RAN组件，可编程基站和“无线数据厂”来实现ML研究。具体而言，我们设计并开发三种示例性XApp，用于基于DRL的RAN切片，调度和在线模型培训，并评估其在具有7个软化基站和42个用户的蜂窝网络上的性能。最后，我们通过在竞技场上部署一个室内可编程测试平台来展示Colo-RAN到不同平台的可移植性。我们的一类大型评估的广泛结果突出了基于DRL的自适应控制的益处和挑战。他们还提供关于无线DRL管道的开发的见解，从数据分析到DRL代理商的设计，以及与现场训练相关的权衡。 Colo-RAN和收集的大型数据集将公开向研究界公开提供。

translated by 谷歌翻译

The Cost of Learning: Efficiency vs. Efficacy of Learning-Based RRM for 6G

Seyyidahmed Lahmer , Federico Chiariotti , Andrea Zanella

分类：人工智能

2022-11-30

In the past few years, Deep Reinforcement Learning (DRL) has become a valuable solution to automatically learn efficient resource management strategies in complex networks. In many scenarios, the learning task is performed in the Cloud, while experience samples are generated directly by edge nodes or users. Therefore, the learning task involves some data exchange which, in turn, subtracts a certain amount of transmission resources from the system. This creates a friction between the need to speed up convergence towards an effective strategy, which requires the allocation of resources to transmit learning samples, and the need to maximize the amount of resources used for data plane communication, maximizing users' Quality of Service (QoS), which requires the learning process to be efficient, i.e., minimize its overhead. In this paper, we investigate this trade-off and propose a dynamic balancing strategy between the learning and data planes, which allows the centralized learning agent to quickly converge to an efficient resource allocation strategy while minimizing the impact on QoS. Simulation results show that the proposed method outperforms static allocation methods, converging to the optimal policy (i.e., maximum efficacy and minimum overhead of the learning plane) in the long run.

translated by 谷歌翻译

Online Service Migration in Edge Computing with Incomplete Information: A Deep Recurrent Actor-Critic Method

Jin Wang , Jia Hu , Geyong Min , Qiang Ni , Tarek El-Ghazawi

分类：机器学习

2020-12-16

多访问边缘计算（MEC）是一个新兴的计算范式，将云计算扩展到网络边缘，以支持移动设备上的资源密集型应用程序。作为MEC的关键问题，服务迁移需要决定如何迁移用户服务，以维持用户在覆盖范围和容量有限的MEC服务器之间漫游的服务质量。但是，由于动态的MEC环境和用户移动性，找到最佳的迁移策略是棘手的。许多现有研究根据完整的系统级信息做出集中式迁移决策，这是耗时的，并且缺乏理想的可扩展性。为了应对这些挑战，我们提出了一种新颖的学习驱动方法，该方法以用户为中心，可以通过使用不完整的系统级信息来做出有效的在线迁移决策。具体而言，服务迁移问题被建模为可观察到的马尔可夫决策过程（POMDP）。为了解决POMDP，我们设计了一个新的编码网络，该网络结合了长期记忆（LSTM）和一个嵌入式矩阵，以有效提取隐藏信息，并进一步提出了一种定制的非政策型演员 - 批判性算法，以进行有效的训练。基于现实世界的移动性痕迹的广泛实验结果表明，这种新方法始终优于启发式和最先进的学习驱动算法，并且可以在各种MEC场景上取得近乎最佳的结果。

translated by 谷歌翻译

Beyond 5G Networks: Integration of Communication, Computing, Caching, and Control

Musbahu Mohammed Adam , Liqiang Zhao , Kezhi Wang , Zhu Han

分类：机器学习

2022-12-26

In recent years, the exponential proliferation of smart devices with their intelligent applications poses severe challenges on conventional cellular networks. Such challenges can be potentially overcome by integrating communication, computing, caching, and control (i4C) technologies. In this survey, we first give a snapshot of different aspects of the i4C, comprising background, motivation, leading technological enablers, potential applications, and use cases. Next, we describe different models of communication, computing, caching, and control (4C) to lay the foundation of the integration approach. We review current state-of-the-art research efforts related to the i4C, focusing on recent trends of both conventional and artificial intelligence (AI)-based integration approaches. We also highlight the need for intelligence in resources integration. Then, we discuss integration of sensing and communication (ISAC) and classify the integration approaches into various classes. Finally, we propose open challenges and present future research directions for beyond 5G networks, such as 6G.

translated by 谷歌翻译

Device Selection for the Coexistence of URLLC and Distributed Learning Services

Milad Ganjalizadeh , Hossein Shokri Ghadikolaei , Deniz Gündüz , Marina Petrova

分类：机器学习

2022-12-22

Recent advances in distributed artificial intelligence (AI) have led to tremendous breakthroughs in various communication services, from fault-tolerant factory automation to smart cities. When distributed learning is run over a set of wirelessly connected devices, random channel fluctuations and the incumbent services running on the same network impact the performance of both distributed learning and the coexisting service. In this paper, we investigate a mixed service scenario where distributed AI workflow and ultra-reliable low latency communication (URLLC) services run concurrently over a network. Consequently, we propose a risk sensitivity-based formulation for device selection to minimize the AI training delays during its convergence period while ensuring that the operational requirements of the URLLC service are met. To address this challenging coexistence problem, we transform it into a deep reinforcement learning problem and address it via a framework based on soft actor-critic algorithm. We evaluate our solution with a realistic and 3GPP-compliant simulator for factory automation use cases. Our simulation results confirm that our solution can significantly decrease the training delay of the distributed AI service while keeping the URLLC availability above its required threshold and close to the scenario where URLLC solely consumes all network resources.

translated by 谷歌翻译

Federated Deep Reinforcement Learning for the Distributed Control of NextG Wireless Networks

Peyman Tehrani , Francesco Restuccia , Marco Levorato

分类：机器学习

2021-12-07

预计下一代（NEVERG）网络将支持苛刻的触觉互联网应用，例如增强现实和连接的自动车辆。虽然最近的创新带来了更大的联系能力的承诺，它们对环境的敏感性以及不稳定的性能无视基于传统的基于模型的控制理由。零触摸数据驱动的方法可以提高网络适应当前操作条件的能力。诸如强化学习（RL）算法等工具可以仅基于观察历史来构建最佳控制策略。具体而言，使用深神经网络（DNN）作为预测器的深RL（DRL）已经被示出，即使在复杂的环境和高维输入中也能够实现良好的性能。但是，DRL模型的培训需要大量数据，这可能会限制其对潜在环境的不断发展统计数据的适应性。此外，无线网络是固有的分布式系统，其中集中式DRL方法需要过多的数据交换，而完全分布的方法可能导致较慢的收敛速率和性能下降。在本文中，为了解决这些挑战，我们向DRL提出了联合学习（FL）方法，我们指的是联邦DRL（F-DRL），其中基站（BS）通过仅共享模型的重量协作培训嵌入式DNN而不是训练数据。我们评估了两个不同版本的F-DRL，价值和策略，并显示出与分布式和集中式DRL相比实现的卓越性能。

translated by 谷歌翻译

Programmable and Customized Intelligence for Traffic Steering in 5G Networks Using Open RAN Architectures

Andrea Lacava , Michele Polese , Rajarajan Sivaraj , Rahul Soundrarajan , Bhawani Shanker Bhati , Tarunjeet Singh , Tommaso Zugno , Francesca Cuomo , Tommaso Melodia

分类：人工智能

2022-09-28

5G及以后的移动网络将以前所未有的规模支持异质用例，从而要求自动控制和优化针对单个用户需求的网络功能。当前的蜂窝体系结构不可能对无线电访问网络（RAN）进行这种细粒度控制。为了填补这一空白，开放式运行范式及其规范引入了一个带有抽象的开放体系结构，该架构可以启用闭环控制并提供数据驱动和智能优化RAN在用户级别上。这是通过在网络边缘部署在近实时RAN智能控制器（接近RT RIC）上的自定义RAN控制应用程序（即XAPP）获得的。尽管有这些前提，但截至今天，研究界缺乏用于构建数据驱动XAPP的沙箱，并创建大型数据集以有效的AI培训。在本文中，我们通过引入NS-O-RAN来解决此问题，NS-O-RAN是一个软件框架，该框架将现实世界中的生产级近距离RIC与NS-3上的基于3GPP的模拟环境集成在一起，从而实现了XAPPS和XAPPS的开发自动化的大规模数据收集和深入强化学习驱动的控制策略的测试，以在用户级别的优化中进行优化。此外，我们提出了第一个特定于用户的O-RAN交通转向（TS）智能移交框架。它使用随机的合奏混合物，结合了最先进的卷积神经网络体系结构，以最佳地为网络中的每个用户分配服务基站。我们的TS XAPP接受了NS-O-RAN收集的超过4000万个数据点的培训，该数据点在近距离RIC上运行，并控制其基站。我们在大规模部署中评估了性能，这表明基于XAPP的交换可以使吞吐量和频谱效率平均比传统的移交启发式方法提高50％，而动机性开销较少。

translated by 谷歌翻译

Distributed Machine Learning for UAV Swarms: Computing, Sensing, and Semantics

Yahao Ding , Zhaohui Yang , Quoc-Viet Pham , Zhaoyang Zhang , Mohammad Shikh-Bahaei

分类：机器学习 | 人工智能

2023-01-03

Unmanned aerial vehicle (UAV) swarms are considered as a promising technique for next-generation communication networks due to their flexibility, mobility, low cost, and the ability to collaboratively and autonomously provide services. Distributed learning (DL) enables UAV swarms to intelligently provide communication services, multi-directional remote surveillance, and target tracking. In this survey, we first introduce several popular DL algorithms such as federated learning (FL), multi-agent Reinforcement Learning (MARL), distributed inference, and split learning, and present a comprehensive overview of their applications for UAV swarms, such as trajectory design, power control, wireless resource allocation, user assignment, perception, and satellite communications. Then, we present several state-of-the-art applications of UAV swarms in wireless communication systems, such us reconfigurable intelligent surface (RIS), virtual reality (VR), semantic communications, and discuss the problems and challenges that DL-enabled UAV swarms can solve in these applications. Finally, we describe open problems of using DL in UAV swarms and future research directions of DL enabled UAV swarms. In summary, this survey provides a comprehensive survey of various DL applications for UAV swarms in extensive scenarios.

translated by 谷歌翻译

Deep Learning-Driven Edge Video Analytics: A Survey

Renjie Xu , Saiedeh Razavi , Rong Zheng

分类：计算机视觉 | 机器学习

2022-11-28

Video, as a key driver in the global explosion of digital information, can create tremendous benefits for human society. Governments and enterprises are deploying innumerable cameras for a variety of applications, e.g., law enforcement, emergency management, traffic control, and security surveillance, all facilitated by video analytics (VA). This trend is spurred by the rapid advancement of deep learning (DL), which enables more precise models for object classification, detection, and tracking. Meanwhile, with the proliferation of Internet-connected devices, massive amounts of data are generated daily, overwhelming the cloud. Edge computing, an emerging paradigm that moves workloads and services from the network core to the network edge, has been widely recognized as a promising solution. The resulting new intersection, edge video analytics (EVA), begins to attract widespread attention. Nevertheless, only a few loosely-related surveys exist on this topic. A dedicated venue for collecting and summarizing the latest advances of EVA is highly desired by the community. Besides, the basic concepts of EVA (e.g., definition, architectures, etc.) are ambiguous and neglected by these surveys due to the rapid development of this domain. A thorough clarification is needed to facilitate a consensus on these concepts. To fill in these gaps, we conduct a comprehensive survey of the recent efforts on EVA. In this paper, we first review the fundamentals of edge computing, followed by an overview of VA. The EVA system and its enabling techniques are discussed next. In addition, we introduce prevalent frameworks and datasets to aid future researchers in the development of EVA systems. Finally, we discuss existing challenges and foresee future research directions. We believe this survey will help readers comprehend the relationship between VA and edge computing, and spark new ideas on EVA.

translated by 谷歌翻译

Job Scheduling in Datacenters using Constraint Controlled RL

Vanamala Venkataswamy

分类：机器学习

2022-11-10

This paper studies a model for online job scheduling in green datacenters. In green datacenters, resource availability depends on the power supply from the renewables. Intermittent power supply from renewables leads to intermittent resource availability, inducing job delays (and associated costs). Green datacenter operators must intelligently manage their workloads and available power supply to extract maximum benefits. The scheduler's objective is to schedule jobs on a set of resources to maximize the total value (revenue) while minimizing the overall job delay. A trade-off exists between achieving high job value on the one hand and low expected delays on the other. Hence, the aims of achieving high rewards and low costs are in opposition. In addition, datacenter operators often prioritize multiple objectives, including high system utilization and job completion. To accomplish the opposing goals of maximizing total job value and minimizing job delays, we apply the Proportional-Integral-Derivative (PID) Lagrangian methods in Deep Reinforcement Learning to job scheduling problem in the green datacenter environment. Lagrangian methods are widely used algorithms for constrained optimization problems. We adopt a controls perspective to learn the Lagrange multiplier with proportional, integral, and derivative control, achieving favorable learning dynamics. Feedback control defines cost terms for the learning agent, monitors the cost limits during training, and continuously adjusts the learning parameters to achieve stable performance. Our experiments demonstrate improved performance compared to scheduling policies without the PID Lagrangian methods. Experimental results illustrate the effectiveness of the Constraint Controlled Reinforcement Learning (CoCoRL) scheduler that simultaneously satisfies multiple objectives.

translated by 谷歌翻译

AI-Native Network Slicing for 6G Networks

Wen Wu , Conghao Zhou , Mushu Li , Huaqing Wu , Haibo Zhou , Ning Zhang , Xuemin , Shen , Weihua Zhuang

分类：机器学习

2021-05-18

随着全球推出第五代（5G）网络，有必要超越5G，并设想6G网络。预计6G网络将具有空间空气地集成网络，高级网络虚拟化和无处不在的智能。本文介绍了一个用于6G网络的人工智能（AI） - 网络切片架构，以实现AI和网络切片的协同作用，从而促进智能网络管理和支持新兴AI服务。首先在网络切片生命周期中讨论基于AI的解决方案，以智能地管理网络切片，即用于切片的AI。然后，研究了网络切片解决方案，通过构建AI实例和执行高效的资源管理来支持Emerging AI服务，即AI的切片。最后，提出了一个案例研究，然后讨论了6G网络中的AI-Native Network SliCing必不可少的开放研究问题。

translated by 谷歌翻译

Asynchronous Hybrid Reinforcement Learning for Latency and Reliability Optimization in the Metaverse over Wireless Communications

Wenhan Yu , Terence Jie Chua , Jun Zhao

分类：机器学习

2022-12-30

Technology advancements in wireless communications and high-performance Extended Reality (XR) have empowered the developments of the Metaverse. The demand for Metaverse applications and hence, real-time digital twinning of real-world scenes is increasing. Nevertheless, the replication of 2D physical world images into 3D virtual world scenes is computationally intensive and requires computation offloading. The disparity in transmitted scene dimension (2D as opposed to 3D) leads to asymmetric data sizes in uplink (UL) and downlink (DL). To ensure the reliability and low latency of the system, we consider an asynchronous joint UL-DL scenario where in the UL stage, the smaller data size of the physical world scenes captured by multiple extended reality users (XUs) will be uploaded to the Metaverse Console (MC) to be construed and rendered. In the DL stage, the larger-size 3D virtual world scenes need to be transmitted back to the XUs. The decisions pertaining to computation offloading and channel assignment are optimized in the UL stage, and the MC will optimize power allocation for users assigned with a channel in the UL transmission stage. Some problems arise therefrom: (i) interactive multi-process chain, specifically Asynchronous Markov Decision Process (AMDP), (ii) joint optimization in multiple processes, and (iii) high-dimensional objective functions, or hybrid reward scenarios. To ensure the reliability and low latency of the system, we design a novel multi-agent reinforcement learning algorithm structure, namely Asynchronous Actors Hybrid Critic (AAHC). Extensive experiments demonstrate that compared to proposed baselines, AAHC obtains better solutions with preferable training time.

translated by 谷歌翻译

RLOps: Development Life-cycle of Reinforcement Learning Aided Open RAN

Peizheng Li , Jonathan Thomas , Xiaoyang Wang , Ahmed Khalil , Abdelrahim Ahmad , Rui Inacio , Shipra Kapoor , Arjun Parekh , Angela Doufexi , Arman Shojaeifard

分类：机器学习

2021-11-12

无线电接入网络（RAN）技术继续见证巨大的增长，开放式运行越来越最近的势头。在O-RAN规范中，RAN智能控制器（RIC）用作自动化主机。本文介绍了对O-RAN堆栈相关的机器学习（ML）的原则，特别是加强学习（RL）。此外，我们审查无线网络的最先进的研究，并将其投入到RAN框架和O-RAN架构的层次结构上。我们在整个开发生命周期中提供ML / RL模型面临的挑战的分类：从系统规范到生产部署（数据采集，模型设计，测试和管理等）。为了解决挑战，我们将一组现有的MLOPS原理整合，当考虑RL代理时，具有独特的特性。本文讨论了系统的生命周期模型开发，测试和验证管道，称为：RLOPS。我们讨论了RLOP的所有基本部分，包括：模型规范，开发和蒸馏，生产环境服务，运营监控，安全/安全和数据工程平台。根据这些原则，我们提出了最佳实践，以实现自动化和可重复的模型开发过程。

translated by 谷歌翻译