我们考虑了在透明的蜂窝车辆到所有物品(C-V2X)系统中的联合渠道分配和电力分配的问题,其中多个车辆到网络(V2N)上行链路共享与多个车辆到车辆的时频资源( v2v)排,使连接和自动驾驶汽车的团体可以紧密地一起旅行。由于在车辆环境中使用高用户移动性的性质,依赖全球渠道信息的传统集中优化方法在具有大量用户的C-V2X系统中可能不可行。利用多机构增强学习(RL)方法,我们提出了分布式资源分配(RA)算法来克服这一挑战。具体而言,我们将RA问题建模为多代理系统。仅基于本地渠道信息,每个排领导者充当代理,共同相互交互,因此选择了子频段和功率水平的最佳组合来传输其信号。为此,我们利用双重Q学习算法在同时最大化V2N链接的总和率的目标下共同训练代理,并满足所需延迟限制的每个V2V链接的数据包输送概率。仿真结果表明,与众所周知的详尽搜索算法相比,我们提出的基于RL的算法提供了紧密的性能。
translated by 谷歌翻译
In heterogeneous networks (HetNets), the overlap of small cells and the macro cell causes severe cross-tier interference. Although there exist some approaches to address this problem, they usually require global channel state information, which is hard to obtain in practice, and get the sub-optimal power allocation policy with high computational complexity. To overcome these limitations, we propose a multi-agent deep reinforcement learning (MADRL) based power control scheme for the HetNet, where each access point makes power control decisions independently based on local information. To promote cooperation among agents, we develop a penalty-based Q learning (PQL) algorithm for MADRL systems. By introducing regularization terms in the loss function, each agent tends to choose an experienced action with high reward when revisiting a state, and thus the policy updating speed slows down. In this way, an agent's policy can be learned by other agents more easily, resulting in a more efficient collaboration process. We then implement the proposed PQL in the considered HetNet and compare it with other distributed-training-and-execution (DTE) algorithms. Simulation results show that our proposed PQL can learn the desired power control policy from a dynamic environment where the locations of users change episodically and outperform existing DTE MADRL algorithms.
translated by 谷歌翻译
未来的互联网涉及几种新兴技术,例如5G和5G网络,车辆网络,无人机(UAV)网络和物联网(IOT)。此外,未来的互联网变得异质并分散了许多相关网络实体。每个实体可能需要做出本地决定,以在动态和不确定的网络环境下改善网络性能。最近使用标准学习算法,例如单药强化学习(RL)或深入强化学习(DRL),以使每个网络实体作为代理人通过与未知环境进行互动来自适应地学习最佳决策策略。但是,这种算法未能对网络实体之间的合作或竞争进行建模,而只是将其他实体视为可能导致非平稳性问题的环境的一部分。多机构增强学习(MARL)允许每个网络实体不仅观察环境,还可以观察其他实体的政策来学习其最佳政策。结果,MAL可以显着提高网络实体的学习效率,并且最近已用于解决新兴网络中的各种问题。在本文中,我们因此回顾了MAL在新兴网络中的应用。特别是,我们提供了MARL的教程,以及对MARL在下一代互联网中的应用进行全面调查。特别是,我们首先介绍单代机Agent RL和MARL。然后,我们回顾了MAL在未来互联网中解决新兴问题的许多应用程序。这些问题包括网络访问,传输电源控制,计算卸载,内容缓存,数据包路由,无人机网络的轨迹设计以及网络安全问题。
translated by 谷歌翻译
预计下一代(NEVERG)网络将支持苛刻的触觉互联网应用,例如增强现实和连接的自动车辆。虽然最近的创新带来了更大的联系能力的承诺,它们对环境的敏感性以及不稳定的性能无视基于传统的基于模型的控制理由。零触摸数据驱动的方法可以提高网络适应当前操作条件的能力。诸如强化学习(RL)算法等工具可以仅基于观察历史来构建最佳控制策略。具体而言,使用深神经网络(DNN)作为预测器的深RL(DRL)已经被示出,即使在复杂的环境和高维输入中也能够实现良好的性能。但是,DRL模型的培训需要大量数据,这可能会限制其对潜在环境的不断发展统计数据的适应性。此外,无线网络是固有的分布式系统,其中集中式DRL方法需要过多的数据交换,而完全分布的方法可能导致较慢的收敛速率和性能下降。在本文中,为了解决这些挑战,我们向DRL提出了联合学习(FL)方法,我们指的是联邦DRL(F-DRL),其中基站(BS)通过仅共享模型的重量协作培训嵌入式DNN而不是训练数据。我们评估了两个不同版本的F-DRL,价值和策略,并显示出与分布式和集中式DRL相比实现的卓越性能。
translated by 谷歌翻译
车辆到车辆(V2V)通信的性能在很大程度上取决于使用的调度方法。虽然集中式网络调度程序提供高V2V通信可靠性,但它们的操作通常仅限于具有完整的蜂窝网络覆盖范围的区域。相比之下,在细胞外覆盖区域中,使用了相对效率低下的分布式无线电资源管理。为了利用集中式方法的好处来增强V2V通信在缺乏蜂窝覆盖的道路上的可靠性,我们建议使用VRLS(车辆加固学习调度程序),这是一种集中的调度程序,该调度程序主动为覆盖外的V2V Communications主动分配资源,以前}车辆离开蜂窝网络覆盖范围。通过在模拟的车辆环境中进行培训,VRL可以学习一项适应环境变化的调度策略,从而消除了在复杂的现实生活环境中对有针对性(重新)培训的需求。我们评估了在不同的移动性,网络负载,无线通道和资源配置下VRL的性能。 VRL的表现优于最新的区域中最新分布式调度算法,而无需蜂窝网络覆盖,通过在高负载条件下将数据包错误率降低了一半,并在低负载方案中实现了接近最大的可靠性。
translated by 谷歌翻译
车辆到基础设施(V2I)通信对于增强自动驾驶汽车(AV)的可靠性至关重要。但是,道路交通和AVS无线连接的不确定性会严重损害及时的决策。因此,至关重要的是,同时优化AVS的网络选择和驱动政策,以最大程度地减少道路碰撞,同时最大化通信数据速率。在本文中,我们开发了一个增强学习(RL)框架,以表征有效的网络选择和自主驾驶策略在传统的Sub-6GHz Spectrum和Terahertz(THZ)频率上运行的多波段车辆网络(VNET)中。所提出的框架旨在(i)通过自动驾驶的角度控制车辆的运动动力学(即速度和加速度)来最大化交通流量,并最大程度地减少冲突,以及(ii)通过共同控制车辆的交接,并最大程度地减少数据速率从电信的角度来看运动动力学和网络选择。我们将这个问题作为马尔可夫决策过程(MDP)提出,并开发了基于Q的深度学习解决方案,以优化给定AV状态的加速度,减速,车道变速器和AV基准站分配等动作。 AV的状态是根据AV的速度和通信渠道状态定义的。数值结果表明了与车辆运动动力学,交接和通信数据速率相互依赖性有关的有趣见解。拟议的政策使AVS能够采用具有改善连接性的安全驾驶行为。
translated by 谷歌翻译
许多现实世界的应用程序都可以作为多机构合作问题进行配置,例如网络数据包路由和自动驾驶汽车的协调。深入增强学习(DRL)的出现为通过代理和环境的相互作用提供了一种有前途的多代理合作方法。但是,在政策搜索过程中,传统的DRL解决方案遭受了多个代理具有连续动作空间的高维度。此外,代理商政策的动态性使训练非平稳。为了解决这些问题,我们建议采用高级决策和低水平的个人控制,以进行有效的政策搜索,提出一种分层增强学习方法。特别是,可以在高级离散的动作空间中有效地学习多个代理的合作。同时,低水平的个人控制可以减少为单格强化学习。除了分层增强学习外,我们还建议对手建模网络在学习过程中对其他代理的政策进行建模。与端到端的DRL方法相反,我们的方法通过以层次结构将整体任务分解为子任务来降低学习的复杂性。为了评估我们的方法的效率,我们在合作车道变更方案中进行了现实世界中的案例研究。模拟和现实世界实验都表明我们的方法在碰撞速度和收敛速度中的优越性。
translated by 谷歌翻译
The explosive growth of dynamic and heterogeneous data traffic brings great challenges for 5G and beyond mobile networks. To enhance the network capacity and reliability, we propose a learning-based dynamic time-frequency division duplexing (D-TFDD) scheme that adaptively allocates the uplink and downlink time-frequency resources of base stations (BSs) to meet the asymmetric and heterogeneous traffic demands while alleviating the inter-cell interference. We formulate the problem as a decentralized partially observable Markov decision process (Dec-POMDP) that maximizes the long-term expected sum rate under the users' packet dropping ratio constraints. In order to jointly optimize the global resources in a decentralized manner, we propose a federated reinforcement learning (RL) algorithm named federated Wolpertinger deep deterministic policy gradient (FWDDPG) algorithm. The BSs decide their local time-frequency configurations through RL algorithms and achieve global training via exchanging local RL models with their neighbors under a decentralized federated learning framework. Specifically, to deal with the large-scale discrete action space of each BS, we adopt a DDPG-based algorithm to generate actions in a continuous space, and then utilize Wolpertinger policy to reduce the mapping errors from continuous action space back to discrete action space. Simulation results demonstrate the superiority of our proposed algorithm to benchmark algorithms with respect to system sum rate.
translated by 谷歌翻译
合作的感知在将车辆的感知范围扩展到超出其视线之外至关重要。然而,在有限的通信资源下交换原始感官数据是不可行的。为了实现有效的合作感知,车辆需要解决以下基本问题:需要共享哪些感官数据?,在哪个分辨率?,以及哪个车辆?为了回答这个问题,在本文中,提出了一种新颖的框架来允许加强学习(RL)基于车辆关联,资源块(RB)分配和通过利用基于四叉的点的协作感知消息(CPM)的内容选择云压缩机制。此外,引入了联合的RL方法,以便在跨车辆上加速训练过程。仿真结果表明,RL代理能够有效地学习车辆关联,RB分配和消息内容选择,同时在接收的感官信息方面最大化车辆的满足。结果还表明,与非联邦方法相比,联邦RL改善了培训过程,可以在与非联邦方法相同的时间内实现更好的政策。
translated by 谷歌翻译
交通优化挑战,如负载平衡,流量调度和提高数据包交付时间,是广域网(WAN)中困难的在线决策问题。例如,需要复杂的启发式方法,以找到改善分组输送时间并最小化可能由链接故障或拥塞引起的中断的最佳路径。最近的加强学习(RL)算法的成功可以提供有用的解决方案,以建立更好的鲁棒系统,这些系统从无模式设置中学习。在这项工作中,我们考虑了一条路径优化问题,专门针对数据包路由,在大型复杂网络中。我们开发和评估一种无模型方法,应用多代理元增强学习(MAMRL),可以确定每个数据包的下一跳,以便将其传递到其目的地,最短的时间整体。具体地,我们建议利用和比较深度策略优化RL算法,以便在通信网络中启用分布式无模型控制,并呈现基于新的Meta学习的框架Mamrl,以便快速适应拓扑变化。为了评估所提出的框架,我们用各种WAN拓扑模拟。我们广泛的数据包级仿真结果表明,与古典最短路径和传统的加强学习方法相比,Mamrl即使网络需求增加也显着降低了平均分组交付时间;与非元深策略优化算法相比,我们的结果显示在连杆故障发生的同时出现相当的平均数据包交付时间时减少较少的剧集中的数据包丢失。
translated by 谷歌翻译
本文介绍了基于多代理增强学习的频谱共享频谱共享的概念扩展到异质车辆网络(HETVNET)。在这里,多个车辆对车辆(V2V)链接了其他车辆对接口(V2I)以及其他网络的频谱。车辆网络中的快速变化环境限制了集中CSI并分配渠道的想法。因此,这里使用实施基于ML的方法的想法,以便可以在所有车辆中以分布式方式实施。这里的每个板载单元(OBU)都可以感觉到频道中的信号,并基于该信息运行RL以决定自主采用的频道。在这里,每个V2V链接将是MARL中的代理商。这个想法是训练RL模型,以使这些代理商可以协作而不是竞争。
translated by 谷歌翻译
与LTE网络相比,5G的愿景在于提供较高的数据速率,低延迟(为了实现近实时应用程序),大大增加了基站容量以及用户的接近完美服务质量(QoS)。为了提供此类服务,5G系统将支持LTE,NR,NR-U和Wi-Fi等访问技术的各种组合。每种无线电访问技术(RAT)都提供不同类型的访问,这些访问应在用户中对其进行最佳分配和管理。除了资源管理外,5G系统还将支持双重连接服务。因此,网络的编排对于系统经理在旧式访问技术方面来说是一个更困难的问题。在本文中,我们提出了一种基于联合元学习(FML)的大鼠分配算法,该算法使RAN Intelligent Controller(RIC)能够更快地适应动态变化的环境。我们设计了一个包含LTE和5G NR服务技术的模拟环境。在模拟中,我们的目标是在传输的截止日期内满足UE需求,以提供更高的QoS值。我们将提出的算法与单个RL试剂,爬行动物算法和基于规则的启发式方法进行了比较。仿真结果表明,提出的FML方法分别在第一部部署回合21%和12%时达到了较高的缓存率。此外,在比较方法中,提出的方法最快地适应了新任务和环境。
translated by 谷歌翻译
Technology advancements in wireless communications and high-performance Extended Reality (XR) have empowered the developments of the Metaverse. The demand for Metaverse applications and hence, real-time digital twinning of real-world scenes is increasing. Nevertheless, the replication of 2D physical world images into 3D virtual world scenes is computationally intensive and requires computation offloading. The disparity in transmitted scene dimension (2D as opposed to 3D) leads to asymmetric data sizes in uplink (UL) and downlink (DL). To ensure the reliability and low latency of the system, we consider an asynchronous joint UL-DL scenario where in the UL stage, the smaller data size of the physical world scenes captured by multiple extended reality users (XUs) will be uploaded to the Metaverse Console (MC) to be construed and rendered. In the DL stage, the larger-size 3D virtual world scenes need to be transmitted back to the XUs. The decisions pertaining to computation offloading and channel assignment are optimized in the UL stage, and the MC will optimize power allocation for users assigned with a channel in the UL transmission stage. Some problems arise therefrom: (i) interactive multi-process chain, specifically Asynchronous Markov Decision Process (AMDP), (ii) joint optimization in multiple processes, and (iii) high-dimensional objective functions, or hybrid reward scenarios. To ensure the reliability and low latency of the system, we design a novel multi-agent reinforcement learning algorithm structure, namely Asynchronous Actors Hybrid Critic (AAHC). Extensive experiments demonstrate that compared to proposed baselines, AAHC obtains better solutions with preferable training time.
translated by 谷歌翻译
Recent technological advancements in space, air and ground components have made possible a new network paradigm called "space-air-ground integrated network" (SAGIN). Unmanned aerial vehicles (UAVs) play a key role in SAGINs. However, due to UAVs' high dynamics and complexity, the real-world deployment of a SAGIN becomes a major barrier for realizing such SAGINs. Compared to the space and terrestrial components, UAVs are expected to meet performance requirements with high flexibility and dynamics using limited resources. Therefore, employing UAVs in various usage scenarios requires well-designed planning in algorithmic approaches. In this paper, we provide a comprehensive review of recent learning-based algorithmic approaches. We consider possible reward functions and discuss the state-of-the-art algorithms for optimizing the reward functions, including Q-learning, deep Q-learning, multi-armed bandit (MAB), particle swarm optimization (PSO) and satisfaction-based learning algorithms. Unlike other survey papers, we focus on the methodological perspective of the optimization problem, which can be applicable to various UAV-assisted missions on a SAGIN using these algorithms. We simulate users and environments according to real-world scenarios and compare the learning-based and PSO-based methods in terms of throughput, load, fairness, computation time, etc. We also implement and evaluate the 2-dimensional (2D) and 3-dimensional (3D) variations of these algorithms to reflect different deployment cases. Our simulation suggests that the $3$D satisfaction-based learning algorithm outperforms the other approaches for various metrics in most cases. We discuss some open challenges at the end and our findings aim to provide design guidelines for algorithm selections while optimizing the deployment of UAV-assisted SAGINs.
translated by 谷歌翻译
自动驾驶汽车(AV)必须在动态环境中安全有效地操作。为此,配备联合雷达通信(JRC)功能的AVS可以通过使用雷达检测和数据通信功能来增强驾驶安全性。但是,在不确定性和周围环境的动态下,通过两种不同功能优化AV系统的性能非常具有挑战性。在这项工作中,我们首先提出一个基于马尔可夫决策过程(MDP)的智能优化框架,以帮助AV在周围环境的动态和不确定性下选择JRC操作功能时做出最佳决策。然后,我们开发了一种有效的学习算法,利用了深度强化学习技术的最新进展,以找到AV的最佳政策,而无需任何有关周围环境的先前信息。此外,为了使我们提出的框架更加可扩展,我们开发了一种转移学习(TL)机制,该机制使AV能够利用有价值的体验来加速培训过程,以加速培训过程。广泛的模拟表明,与其他常规的深钢筋学习方法相比,提议的可转移深钢筋学习框架可将AV的障碍检测概率降低到67%。
translated by 谷歌翻译
自动驾驶在过去二十年中吸引了重要的研究兴趣,因为它提供了许多潜在的好处,包括释放驾驶和减轻交通拥堵的司机等。尽管进展有前途,但车道变化仍然是自治车辆(AV)的巨大挑战,特别是在混合和动态的交通方案中。最近,强化学习(RL)是一种强大的数据驱动控制方法,已被广泛探索了在令人鼓舞的效果中的通道中的车道改变决策。然而,这些研究的大多数研究专注于单车展,并且在多个AVS与人类驱动车辆(HDV)共存的情况下,道路变化已经受到稀缺的关注。在本文中,我们在混合交通公路环境中制定了多个AVS的车道改变决策,作为多功能增强学习(Marl)问题,其中每个AV基于相邻AV的动作使车道变化的决定和HDV。具体地,使用新颖的本地奖励设计和参数共享方案开发了一种多代理优势演员批评网络(MA2C)。特别是,提出了一种多目标奖励功能来纳入燃油效率,驾驶舒适度和自主驾驶的安全性。综合实验结果,在三种不同的交通密度和各级人类司机侵略性下进行,表明我们所提出的Marl框架在效率,安全和驾驶员舒适方面始终如一地优于几个最先进的基准。
translated by 谷歌翻译
Reinforcement Learning (RL) is currently one of the most commonly used techniques for traffic signal control (TSC), which can adaptively adjusted traffic signal phase and duration according to real-time traffic data. However, a fully centralized RL approach is beset with difficulties in a multi-network scenario because of exponential growth in state-action space with increasing intersections. Multi-agent reinforcement learning (MARL) can overcome the high-dimension problem by employing the global control of each local RL agent, but it also brings new challenges, such as the failure of convergence caused by the non-stationary Markov Decision Process (MDP). In this paper, we introduce an off-policy nash deep Q-Network (OPNDQN) algorithm, which mitigates the weakness of both fully centralized and MARL approaches. The OPNDQN algorithm solves the problem that traditional algorithms cannot be used in large state-action space traffic models by utilizing a fictitious game approach at each iteration to find the nash equilibrium among neighboring intersections, from which no intersection has incentive to unilaterally deviate. One of main advantages of OPNDQN is to mitigate the non-stationarity of multi-agent Markov process because it considers the mutual influence among neighboring intersections by sharing their actions. On the other hand, for training a large traffic network, the convergence rate of OPNDQN is higher than that of existing MARL approaches because it does not incorporate all state information of each agent. We conduct an extensive experiments by using Simulation of Urban MObility simulator (SUMO), and show the dominant superiority of OPNDQN over several existing MARL approaches in terms of average queue length, episode training reward and average waiting time.
translated by 谷歌翻译
Unmanned aerial vehicle (UAV) swarms are considered as a promising technique for next-generation communication networks due to their flexibility, mobility, low cost, and the ability to collaboratively and autonomously provide services. Distributed learning (DL) enables UAV swarms to intelligently provide communication services, multi-directional remote surveillance, and target tracking. In this survey, we first introduce several popular DL algorithms such as federated learning (FL), multi-agent Reinforcement Learning (MARL), distributed inference, and split learning, and present a comprehensive overview of their applications for UAV swarms, such as trajectory design, power control, wireless resource allocation, user assignment, perception, and satellite communications. Then, we present several state-of-the-art applications of UAV swarms in wireless communication systems, such us reconfigurable intelligent surface (RIS), virtual reality (VR), semantic communications, and discuss the problems and challenges that DL-enabled UAV swarms can solve in these applications. Finally, we describe open problems of using DL in UAV swarms and future research directions of DL enabled UAV swarms. In summary, this survey provides a comprehensive survey of various DL applications for UAV swarms in extensive scenarios.
translated by 谷歌翻译
无人驾驶飞行器(UAV)是支持各种服务,包括通信的技术突破之一。UAV将在提高无线网络的物理层安全方面发挥关键作用。本文定义了窃听地面用户与UAV之间的链路的问题,该联接器用作空中基站(ABS)。提出了加强学习算法Q - 学习和深Q网络(DQN),用于优化ABS的位置和传输功率,以增强地面用户的数据速率。如果没有系统了解窃听器的位置,这会增加保密容量。与Q-Learnch和基线方法相比,仿真结果显示了拟议DQN的快速收敛性和最高保密能力。
translated by 谷歌翻译
Terahertz频段(0.1---10 THZ)中的无线通信被视为未来第六代(6G)无线通信系统的关键促进技术之一,超出了大量多重输入多重输出(大量MIMO)技术。但是,THZ频率的非常高的传播衰减和分子吸收通常限制了信号传输距离和覆盖范围。从最近在可重构智能表面(RIS)上实现智能无线电传播环境的突破,我们为多跳RIS RIS辅助通信网络提供了一种新型的混合波束形成方案,以改善THZ波段频率的覆盖范围。特别是,部署了多个被动和可控的RIS,以协助基站(BS)和多个单人体用户之间的传输。我们通过利用最新的深钢筋学习(DRL)来应对传播损失的最新进展,研究了BS在BS和RISS上的模拟光束矩阵的联合设计。为了改善拟议的基于DRL的算法的收敛性,然后设计了两种算法,以初始化数字波束形成和使用交替优化技术的模拟波束形成矩阵。仿真结果表明,与基准相比,我们提出的方案能够改善50 \%的THZ通信范围。此外,还表明,我们提出的基于DRL的方法是解决NP-固定光束形成问题的最先进方法,尤其是当RIS辅助THZ通信网络的信号经历多个啤酒花时。
translated by 谷歌翻译