FOG无线电访问网络(F-RAN)是一项有前途的技术,用户移动设备(MDS)可以将计算任务卸载到附近的FOG接入点(F-APS)。由于F-APS的资源有限,因此设计有效的任务卸载方案很重要。在本文中,通过考虑随时间变化的网络环境,制定了F-RAN中的动态计算卸载和资源分配问题,以最大程度地减少MD的任务执行延迟和能源消耗。为了解决该问题,提出了基于联合的深入强化学习(DRL)算法,其中深层确定性策略梯度(DDPG)算法在每个F-AP中执行计算卸载和资源分配。利用联合学习来培训DDPG代理,以降低培训过程的计算复杂性并保护用户隐私。仿真结果表明,与其他现有策略相比,提议的联合DDPG算法可以更快地实现MDS更快的任务执行延迟和能源消耗。
translated by 谷歌翻译
预计下一代(NEVERG)网络将支持苛刻的触觉互联网应用,例如增强现实和连接的自动车辆。虽然最近的创新带来了更大的联系能力的承诺,它们对环境的敏感性以及不稳定的性能无视基于传统的基于模型的控制理由。零触摸数据驱动的方法可以提高网络适应当前操作条件的能力。诸如强化学习(RL)算法等工具可以仅基于观察历史来构建最佳控制策略。具体而言,使用深神经网络(DNN)作为预测器的深RL(DRL)已经被示出,即使在复杂的环境和高维输入中也能够实现良好的性能。但是,DRL模型的培训需要大量数据,这可能会限制其对潜在环境的不断发展统计数据的适应性。此外,无线网络是固有的分布式系统,其中集中式DRL方法需要过多的数据交换,而完全分布的方法可能导致较慢的收敛速率和性能下降。在本文中,为了解决这些挑战,我们向DRL提出了联合学习(FL)方法,我们指的是联邦DRL(F-DRL),其中基站(BS)通过仅共享模型的重量协作培训嵌入式DNN而不是训练数据。我们评估了两个不同版本的F-DRL,价值和策略,并显示出与分布式和集中式DRL相比实现的卓越性能。
translated by 谷歌翻译
未来的互联网涉及几种新兴技术,例如5G和5G网络,车辆网络,无人机(UAV)网络和物联网(IOT)。此外,未来的互联网变得异质并分散了许多相关网络实体。每个实体可能需要做出本地决定,以在动态和不确定的网络环境下改善网络性能。最近使用标准学习算法,例如单药强化学习(RL)或深入强化学习(DRL),以使每个网络实体作为代理人通过与未知环境进行互动来自适应地学习最佳决策策略。但是,这种算法未能对网络实体之间的合作或竞争进行建模,而只是将其他实体视为可能导致非平稳性问题的环境的一部分。多机构增强学习(MARL)允许每个网络实体不仅观察环境,还可以观察其他实体的政策来学习其最佳政策。结果,MAL可以显着提高网络实体的学习效率,并且最近已用于解决新兴网络中的各种问题。在本文中,我们因此回顾了MAL在新兴网络中的应用。特别是,我们提供了MARL的教程,以及对MARL在下一代互联网中的应用进行全面调查。特别是,我们首先介绍单代机Agent RL和MARL。然后,我们回顾了MAL在未来互联网中解决新兴问题的许多应用程序。这些问题包括网络访问,传输电源控制,计算卸载,内容缓存,数据包路由,无人机网络的轨迹设计以及网络安全问题。
translated by 谷歌翻译
在本文中,我们研究了多服务器边缘计算中基于区块链的联合学习(BFL)的新延迟优化问题。在此系统模型中,分布式移动设备(MDS)与一组Edge服务器(ESS)通信,以同时处理机器学习(ML)模型培训和阻止开采。为了协助ML模型培训用于资源受限的MD,我们制定了一种卸载策略,使MD可以将其数据传输到相关的ESS之一。然后,我们基于共识机制在边缘层上提出了一个新的分散的ML模型聚合解决方案,以通过基于对等(P2P)基于基于的区块链通信构建全局ML模型。区块链在MDS和ESS之间建立信任,以促进可靠的ML模型共享和合作共识形成,并能够快速消除由中毒攻击引起的操纵模型。我们将延迟感知的BFL作为优化,旨在通过联合考虑数据卸载决策,MDS的传输功率,MDS数据卸载,MDS的计算分配和哈希功率分配来最大程度地减少系统延迟。鉴于离散卸载和连续分配变量的混合作用空间,我们提出了一种具有参数化优势演员评论家算法的新型深度强化学习方案。从理论上讲,我们根据聚合延迟,迷你批量大小和P2P通信回合的数量来表征BFL的收敛属性。我们的数值评估证明了我们所提出的方案优于基线,从模型训练效率,收敛速度,系统潜伏期和对模型中毒攻击的鲁棒性方面。
translated by 谷歌翻译
In heterogeneous networks (HetNets), the overlap of small cells and the macro cell causes severe cross-tier interference. Although there exist some approaches to address this problem, they usually require global channel state information, which is hard to obtain in practice, and get the sub-optimal power allocation policy with high computational complexity. To overcome these limitations, we propose a multi-agent deep reinforcement learning (MADRL) based power control scheme for the HetNet, where each access point makes power control decisions independently based on local information. To promote cooperation among agents, we develop a penalty-based Q learning (PQL) algorithm for MADRL systems. By introducing regularization terms in the loss function, each agent tends to choose an experienced action with high reward when revisiting a state, and thus the policy updating speed slows down. In this way, an agent's policy can be learned by other agents more easily, resulting in a more efficient collaboration process. We then implement the proposed PQL in the considered HetNet and compare it with other distributed-training-and-execution (DTE) algorithms. Simulation results show that our proposed PQL can learn the desired power control policy from a dynamic environment where the locations of users change episodically and outperform existing DTE MADRL algorithms.
translated by 谷歌翻译
多访问边缘计算(MEC)是一个新兴的计算范式,将云计算扩展到网络边缘,以支持移动设备上的资源密集型应用程序。作为MEC的关键问题,服务迁移需要决定如何迁移用户服务,以维持用户在覆盖范围和容量有限的MEC服务器之间漫游的服务质量。但是,由于动态的MEC环境和用户移动性,找到最佳的迁移策略是棘手的。许多现有研究根据完整的系统级信息做出集中式迁移决策,这是耗时的,并且缺乏理想的可扩展性。为了应对这些挑战,我们提出了一种新颖的学习驱动方法,该方法以用户为中心,可以通过使用不完整的系统级信息来做出有效的在线迁移决策。具体而言,服务迁移问题被建模为可观察到的马尔可夫决策过程(POMDP)。为了解决POMDP,我们设计了一个新的编码网络,该网络结合了长期记忆(LSTM)和一个嵌入式矩阵,以有效提取隐藏信息,并进一步提出了一种定制的非政策型演员 - 批判性算法,以进行有效的训练。基于现实世界的移动性痕迹的广泛实验结果表明,这种新方法始终优于启发式和最先进的学习驱动算法,并且可以在各种MEC场景上取得近乎最佳的结果。
translated by 谷歌翻译
Collaboration among industrial Internet of Things (IoT) devices and edge networks is essential to support computation-intensive deep neural network (DNN) inference services which require low delay and high accuracy. Sampling rate adaption which dynamically configures the sampling rates of industrial IoT devices according to network conditions, is the key in minimizing the service delay. In this paper, we investigate the collaborative DNN inference problem in industrial IoT networks. To capture the channel variation and task arrival randomness, we formulate the problem as a constrained Markov decision process (CMDP). Specifically, sampling rate adaption, inference task offloading and edge computing resource allocation are jointly considered to minimize the average service delay while guaranteeing the long-term accuracy requirements of different inference services. Since CMDP cannot be directly solved by general reinforcement learning (RL) algorithms due to the intractable long-term constraints, we first transform the CMDP into an MDP by leveraging the Lyapunov optimization technique. Then, a deep RL-based algorithm is proposed to solve the MDP. To expedite the training process, an optimization subroutine is embedded in the proposed algorithm to directly obtain the optimal edge computing resource allocation. Extensive simulation results are provided to demonstrate that the proposed RL-based algorithm can significantly reduce the average service delay while preserving long-term inference accuracy with a high probability.
translated by 谷歌翻译
室内多机器人通信面临两个关键挑战:一个是由堵塞(例如墙壁)引起的严重信号强度降解,另一个是由机器人移动性引起的动态环境。为了解决这些问题,我们考虑可重构的智能表面(RIS)来克服信号阻塞并协助多个机器人之间的轨迹设计。同时,采用了非正交的多重访问(NOMA)来应对频谱的稀缺并增强机器人的连通性。考虑到机器人的电池能力有限,我们旨在通过共同优化接入点(AP)的发射功率,RIS的相移和机器人的轨迹来最大化能源效率。开发了一种新颖的联邦深入强化学习(F-DRL)方法,以通过一个动态的长期目标解决这个具有挑战性的问题。通过每个机器人规划其路径和下行链路功率,AP只需要确定RIS的相移,这可以大大保存由于训练维度降低而导致的计算开销。仿真结果揭示了以下发现:i)与集中式DRL相比,提出的F-DRL可以减少至少86%的收敛时间; ii)设计的算法可以适应越来越多的机器人; iii)与传统的基于OMA的基准相比,NOMA增强方案可以实现更高的能源效率。
translated by 谷歌翻译
Unmanned aerial vehicle (UAV) swarms are considered as a promising technique for next-generation communication networks due to their flexibility, mobility, low cost, and the ability to collaboratively and autonomously provide services. Distributed learning (DL) enables UAV swarms to intelligently provide communication services, multi-directional remote surveillance, and target tracking. In this survey, we first introduce several popular DL algorithms such as federated learning (FL), multi-agent Reinforcement Learning (MARL), distributed inference, and split learning, and present a comprehensive overview of their applications for UAV swarms, such as trajectory design, power control, wireless resource allocation, user assignment, perception, and satellite communications. Then, we present several state-of-the-art applications of UAV swarms in wireless communication systems, such us reconfigurable intelligent surface (RIS), virtual reality (VR), semantic communications, and discuss the problems and challenges that DL-enabled UAV swarms can solve in these applications. Finally, we describe open problems of using DL in UAV swarms and future research directions of DL enabled UAV swarms. In summary, this survey provides a comprehensive survey of various DL applications for UAV swarms in extensive scenarios.
translated by 谷歌翻译
我们调查了无线网络中多个联合学习(FL)服务的数据质量感知动态客户选择问题,每个客户都有动态数据集,用于同时培训多个FL服务,每种FL服务都必须为客户付费。限制货币预算。在训练回合中,这个问题被正式化为不合作的马尔可夫游戏。提出了一种基于多代理的混合增强算法,以优化共同的客户选择和付款操作,同时避免采取行动冲突。仿真结果表明,我们提出的算法可以显着改善训练性能。
translated by 谷歌翻译
The deployment flexibility and maneuverability of Unmanned Aerial Vehicles (UAVs) increased their adoption in various applications, such as wildfire tracking, border monitoring, etc. In many critical applications, UAVs capture images and other sensory data and then send the captured data to remote servers for inference and data processing tasks. However, this approach is not always practical in real-time applications due to the connection instability, limited bandwidth, and end-to-end latency. One promising solution is to divide the inference requests into multiple parts (layers or segments), with each part being executed in a different UAV based on the available resources. Furthermore, some applications require the UAVs to traverse certain areas and capture incidents; thus, planning their paths becomes critical particularly, to reduce the latency of making the collaborative inference process. Specifically, planning the UAVs trajectory can reduce the data transmission latency by communicating with devices in the same proximity while mitigating the transmission interference. This work aims to design a model for distributed collaborative inference requests and path planning in a UAV swarm while respecting the resource constraints due to the computational load and memory usage of the inference requests. The model is formulated as an optimization problem and aims to minimize latency. The formulated problem is NP-hard so finding the optimal solution is quite complex; thus, this paper introduces a real-time and dynamic solution for online applications using deep reinforcement learning. We conduct extensive simulations and compare our results to the-state-of-the-art studies demonstrating that our model outperforms the competing models.
translated by 谷歌翻译
Hybrid FSO/RF system requires an efficient FSO and RF link switching mechanism to improve the system capacity by realizing the complementary benefits of both the links. The dynamics of network conditions, such as fog, dust, and sand storms compound the link switching problem and control complexity. To address this problem, we initiate the study of deep reinforcement learning (DRL) for link switching of hybrid FSO/RF systems. Specifically, in this work, we focus on actor-critic called Actor/Critic-FSO/RF and Deep-Q network (DQN) called DQN-FSO/RF for FSO/RF link switching under atmospheric turbulences. To formulate the problem, we define the state, action, and reward function of a hybrid FSO/RF system. DQN-FSO/RF frequently updates the deployed policy that interacts with the environment in a hybrid FSO/RF system, resulting in high switching costs. To overcome this, we lift this problem to ensemble consensus-based representation learning for deep reinforcement called DQNEnsemble-FSO/RF. The proposed novel DQNEnsemble-FSO/RF DRL approach uses consensus learned features representations based on an ensemble of asynchronous threads to update the deployed policy. Experimental results corroborate that the proposed DQNEnsemble-FSO/RF's consensus-learned features switching achieves better performance than Actor/Critic-FSO/RF, DQN-FSO/RF, and MyOpic for FSO/RF link switching while keeping the switching cost significantly low.
translated by 谷歌翻译
Technology advancements in wireless communications and high-performance Extended Reality (XR) have empowered the developments of the Metaverse. The demand for Metaverse applications and hence, real-time digital twinning of real-world scenes is increasing. Nevertheless, the replication of 2D physical world images into 3D virtual world scenes is computationally intensive and requires computation offloading. The disparity in transmitted scene dimension (2D as opposed to 3D) leads to asymmetric data sizes in uplink (UL) and downlink (DL). To ensure the reliability and low latency of the system, we consider an asynchronous joint UL-DL scenario where in the UL stage, the smaller data size of the physical world scenes captured by multiple extended reality users (XUs) will be uploaded to the Metaverse Console (MC) to be construed and rendered. In the DL stage, the larger-size 3D virtual world scenes need to be transmitted back to the XUs. The decisions pertaining to computation offloading and channel assignment are optimized in the UL stage, and the MC will optimize power allocation for users assigned with a channel in the UL transmission stage. Some problems arise therefrom: (i) interactive multi-process chain, specifically Asynchronous Markov Decision Process (AMDP), (ii) joint optimization in multiple processes, and (iii) high-dimensional objective functions, or hybrid reward scenarios. To ensure the reliability and low latency of the system, we design a novel multi-agent reinforcement learning algorithm structure, namely Asynchronous Actors Hybrid Critic (AAHC). Extensive experiments demonstrate that compared to proposed baselines, AAHC obtains better solutions with preferable training time.
translated by 谷歌翻译
对于正交多访问(OMA)系统,服务的用户设备(UES)的数量仅限于可用的正交资源的数量。另一方面,非正交多访问(NOMA)方案允许多个UES使用相同的正交资源。这种额外的自由度为资源分配带来了新的挑战。缓冲状态信息(BSI),例如等待传输的数据包的大小和年龄,可用于改善OMA系统中的调度。在本文中,我们研究了BSI对上行链路多载波NOMA场景中集中调度程序的性能的影响,UE具有各种数据速率和延迟要求。为了处理将UES分配给资源的大型组合空间,我们提出了一个基于Actor-Critic-Critic强化学习纳入BSI的新型调度程序。使用诺基亚的“无线套件”进行培训和评估。我们提出了各种新颖的技术来稳定和加快训练。建议的调度程序优于基准调度程序。
translated by 谷歌翻译
本文提出了一种有效且新颖的多重深度强化学习(MADRL)的方法,用于解决联合虚拟网络功能(VNF)的位置和路由(P&R),其中同时提供了具有差异性要求的多个服务请求。服务请求的差异要求反映出其延迟和成本敏感的因素。我们首先构建了VNF P&R问题,以共同减少NP完整的服务延迟和资源消耗成本的加权总和。然后,将关节VNF P&R问题分解为两个迭代子任务:放置子任务和路由子任务。每个子任务由多个并发并行顺序决策过程组成。通过调用深层确定性策略梯度方法和多代理技术,MADRL-P&R框架旨在执行两个子任务。提出了新的联合奖励和内部奖励机制,以匹配安置和路由子任务的目标和约束。我们还提出了基于参数迁移的模型重新训练方法来处理不断变化的网络拓扑。通过实验证实,提议的MADRL-P&R框架在服务成本和延迟方面优于其替代方案,并为个性化服务需求提供了更高的灵活性。基于参数迁移的模型重新训练方法可以在中等网络拓扑变化下有效加速收敛。
translated by 谷歌翻译
通过协调许多继任访问点(APS)来协同为机上用户服务,可以满足铁路无线通信的严格体验质量(QOE)要求,可以满足铁路无线通信的严格体验质量(QOE)要求。一个关键的挑战是如何及时交付所需的内容,这是由于越来越多的火车速度引起的根本性变化的传播环境。在本文中,我们建议在即将到来的AP上主动缓存可能要求的内容,这些APS执行连贯的传输以减少端到端延迟。提出了长期的QoE-最大化问题,并提出了两种缓存放置算法。一个基于启发式凸优化(HCO),另一个基于软性角色批评(SAC)的深入增强学习(DRL)。与常规基准相比,数值结果显示了我们在QOE上提出的算法的优势并命中了概率。使用高级DRL模型,SAC通过准确预测用户请求来优于QOE上的HCO。
translated by 谷歌翻译
The explosive growth of dynamic and heterogeneous data traffic brings great challenges for 5G and beyond mobile networks. To enhance the network capacity and reliability, we propose a learning-based dynamic time-frequency division duplexing (D-TFDD) scheme that adaptively allocates the uplink and downlink time-frequency resources of base stations (BSs) to meet the asymmetric and heterogeneous traffic demands while alleviating the inter-cell interference. We formulate the problem as a decentralized partially observable Markov decision process (Dec-POMDP) that maximizes the long-term expected sum rate under the users' packet dropping ratio constraints. In order to jointly optimize the global resources in a decentralized manner, we propose a federated reinforcement learning (RL) algorithm named federated Wolpertinger deep deterministic policy gradient (FWDDPG) algorithm. The BSs decide their local time-frequency configurations through RL algorithms and achieve global training via exchanging local RL models with their neighbors under a decentralized federated learning framework. Specifically, to deal with the large-scale discrete action space of each BS, we adopt a DDPG-based algorithm to generate actions in a continuous space, and then utilize Wolpertinger policy to reduce the mapping errors from continuous action space back to discrete action space. Simulation results demonstrate the superiority of our proposed algorithm to benchmark algorithms with respect to system sum rate.
translated by 谷歌翻译
协作深度加强学习(CDRL)算法,其中多个代理可以在无线网络上协调是一种有希望的方法,以便在复杂的动态环境中依赖实时决策的未来智能和自主系统。尽管如此,在实际情况下,CDRL由​​于代理的异质性及其学习任务,不同环境,学习时间限制以及无线网络的资源限制,因此CDRL面临着许多挑战。为了解决这些挑战,在本文中,提出了一种新颖的语义感知CDRL方法,以使一组异构未经训练的代理具有语义连接的DRL任务,以在资源受限无线蜂窝网络上有效地协作。为此,提出了一种新的异构联邦DRL(HFDRL)算法,以选择用于协作的语义相关DRL代理的最佳子集。然后,该方法将共同优化合作选定代理的训练损失和无线带宽分配,以便在其实时任务的时间限制内培训每个代理。仿真结果表明,与最先进的基线相比,所提出的算法的卓越性能。
translated by 谷歌翻译
需要下一代无线网络以同时满足各种服务和标准。为了解决即将到来的严格条件,开发了具有柔性设计,分解虚拟和可编程组件以及智能闭环控制等特征的新型开放式访问网络(O-RAN)。面对不断变化的情况,O-Ran切片被研究为确保网络服务质量(QoS)的关键策略。但是,必须动态控制不同的网络切片,以避免由环境快速变化引起的服务水平一致性(SLA)变化。因此,本文介绍了一个新颖的框架,能够通过智能提供的提供资源来管理网络切片。由于不同的异质环境,智能机器学习方法需要足够的探索来处理无线网络中最严厉的情况并加速收敛。为了解决这个问题,提出了一种新解决方案,基于基于进化的深度强化学习(EDRL),以加速和优化无线电访问网络(RAN)智能控制器(RIC)模块中的切片管理学习过程。为此,O-RAN切片被表示为Markov决策过程(MDP),然后最佳地解决了资源分配,以使用EDRL方法满足服务需求。在达到服务需求方面,仿真结果表明,所提出的方法的表现优于DRL基线62.2%。
translated by 谷歌翻译
合作的感知在将车辆的感知范围扩展到超出其视线之外至关重要。然而,在有限的通信资源下交换原始感官数据是不可行的。为了实现有效的合作感知,车辆需要解决以下基本问题:需要共享哪些感官数据?,在哪个分辨率?,以及哪个车辆?为了回答这个问题,在本文中,提出了一种新颖的框架来允许加强学习(RL)基于车辆关联,资源块(RB)分配和通过利用基于四叉的点的协作感知消息(CPM)的内容选择云压缩机制。此外,引入了联合的RL方法,以便在跨车辆上加速训练过程。仿真结果表明,RL代理能够有效地学习车辆关联,RB分配和消息内容选择,同时在接收的感官信息方面最大化车辆的满足。结果还表明,与非联邦方法相比,联邦RL改善了培训过程,可以在与非联邦方法相同的时间内实现更好的政策。
translated by 谷歌翻译