无人驾驶飞机(UAV)用作空中基础站,可将时间敏感的包装从物联网设备传递到附近的陆地底站(TBS)。在此类无人产用的物联网网络中安排数据包,以确保TBS在TBS上确保新鲜(或最新的)物联网设备的数据包是一个挑战性的问题,因为它涉及两个同时的步骤(i)(i)在IOT设备上生成的数据包的同时进行样本由UAVS [HOP-1]和(ii)将采样数据包从UAVS更新到TBS [Hop-2]。为了解决这个问题,我们建议针对两跳UAV相关的IoT网络的信息年龄(AOI)调度算法。首先,我们提出了一个低复杂的AOI调度程序,称为MAF-MAD,该计划使用UAV(HOP-1)和最大AOI差异(MAD)策略采样最大AOI(MAF)策略,以更新从无人机到TBS(Hop-2)。我们证明,MAF-MAD是理想条件下的最佳AOI调度程序(无线无线通道和在物联网设备上产生交通生成)。相反,对于一般条件(物联网设备的损失渠道条件和不同的周期性交通生成),提出了深厚的增强学习算法,即近端政策优化(PPO)基于调度程序。仿真结果表明,在所有考虑的一般情况下,建议的基于PPO的调度程序优于MAF-MAD,MAF和Round-Robin等其他调度程序。
translated by 谷歌翻译
未来的互联网涉及几种新兴技术,例如5G和5G网络,车辆网络,无人机(UAV)网络和物联网(IOT)。此外,未来的互联网变得异质并分散了许多相关网络实体。每个实体可能需要做出本地决定,以在动态和不确定的网络环境下改善网络性能。最近使用标准学习算法,例如单药强化学习(RL)或深入强化学习(DRL),以使每个网络实体作为代理人通过与未知环境进行互动来自适应地学习最佳决策策略。但是,这种算法未能对网络实体之间的合作或竞争进行建模,而只是将其他实体视为可能导致非平稳性问题的环境的一部分。多机构增强学习(MARL)允许每个网络实体不仅观察环境,还可以观察其他实体的政策来学习其最佳政策。结果,MAL可以显着提高网络实体的学习效率,并且最近已用于解决新兴网络中的各种问题。在本文中,我们因此回顾了MAL在新兴网络中的应用。特别是,我们提供了MARL的教程,以及对MARL在下一代互联网中的应用进行全面调查。特别是,我们首先介绍单代机Agent RL和MARL。然后,我们回顾了MAL在未来互联网中解决新兴问题的许多应用程序。这些问题包括网络访问,传输电源控制,计算卸载,内容缓存,数据包路由,无人机网络的轨迹设计以及网络安全问题。
translated by 谷歌翻译
本文调查了大师无人机(MUAV) - 互联网(IOT)网络,我们建议使用配备有智能反射表面(IRS)的可充电辅助UAV(AUAV)来增强来自MUAV的通信信号并将MUAG作为充电电源利用。在拟议的模型下,我们研究了这些能量有限的无人机的最佳协作策略,以最大限度地提高物联网网络的累计吞吐量。根据两个无人机之间是否有收费,配制了两个优化问题。为了解决这些问题,提出了两个多代理深度强化学习(DRL)方法,这些方法是集中培训多师深度确定性政策梯度(CT-MADDPG)和多代理深度确定性政策选项评论仪(MADDPOC)。结果表明,CT-MADDPG可以大大减少对UAV硬件的计算能力的要求,拟议的MADDPOC能够在连续动作域中支持低水平的多代理合作学习,其优于优势基于选项的分层DRL,只支持单代理学习和离散操作。
translated by 谷歌翻译
The deployment flexibility and maneuverability of Unmanned Aerial Vehicles (UAVs) increased their adoption in various applications, such as wildfire tracking, border monitoring, etc. In many critical applications, UAVs capture images and other sensory data and then send the captured data to remote servers for inference and data processing tasks. However, this approach is not always practical in real-time applications due to the connection instability, limited bandwidth, and end-to-end latency. One promising solution is to divide the inference requests into multiple parts (layers or segments), with each part being executed in a different UAV based on the available resources. Furthermore, some applications require the UAVs to traverse certain areas and capture incidents; thus, planning their paths becomes critical particularly, to reduce the latency of making the collaborative inference process. Specifically, planning the UAVs trajectory can reduce the data transmission latency by communicating with devices in the same proximity while mitigating the transmission interference. This work aims to design a model for distributed collaborative inference requests and path planning in a UAV swarm while respecting the resource constraints due to the computational load and memory usage of the inference requests. The model is formulated as an optimization problem and aims to minimize latency. The formulated problem is NP-hard so finding the optimal solution is quite complex; thus, this paper introduces a real-time and dynamic solution for online applications using deep reinforcement learning. We conduct extensive simulations and compare our results to the-state-of-the-art studies demonstrating that our model outperforms the competing models.
translated by 谷歌翻译
在这项工作中,我们优化了基于无人机(UAV)的便携式接入点(PAP)的3D轨迹,该轨迹为一组接地节点(GNS)提供无线服务。此外,根据Peukert效果,我们考虑无人机电池的实用非线性电池放电。因此,我们以一种新颖的方式提出问题,代表了基于公平的能源效率度量的最大化,并被称为公平能源效率(费用)。费用指标定义了一个系统,该系统对每用户服务的公平性和PAP的能源效率都非常重要。该法式问题采用非凸面问题的形式,并具有不可扣除的约束。为了获得解决方案,我们将问题表示为具有连续状态和动作空间的马尔可夫决策过程(MDP)。考虑到解决方案空间的复杂性,我们使用双胞胎延迟的深层确定性政策梯度(TD3)参与者 - 批判性深入强化学习(DRL)框架来学习最大化系统费用的政策。我们进行两种类型的RL培训来展示我们方法的有效性:第一种(离线)方法在整个训练阶段保持GN的位置相同;第二种方法将学习的政策概括为GN的任何安排,通过更改GN的位置,每次培训情节后。数值评估表明,忽视Peukert效应高估了PAP的播放时间,可以通过最佳选择PAP的飞行速度来解决。此外,用户公平,能源效率,因此可以通过有效地将PAP移动到GN上方,从而提高系统的费用价值。因此,我们注意到郊区,城市和茂密的城市环境的基线情景高达88.31%,272.34%和318.13%。
translated by 谷歌翻译
Recent technological advancements in space, air and ground components have made possible a new network paradigm called "space-air-ground integrated network" (SAGIN). Unmanned aerial vehicles (UAVs) play a key role in SAGINs. However, due to UAVs' high dynamics and complexity, the real-world deployment of a SAGIN becomes a major barrier for realizing such SAGINs. Compared to the space and terrestrial components, UAVs are expected to meet performance requirements with high flexibility and dynamics using limited resources. Therefore, employing UAVs in various usage scenarios requires well-designed planning in algorithmic approaches. In this paper, we provide a comprehensive review of recent learning-based algorithmic approaches. We consider possible reward functions and discuss the state-of-the-art algorithms for optimizing the reward functions, including Q-learning, deep Q-learning, multi-armed bandit (MAB), particle swarm optimization (PSO) and satisfaction-based learning algorithms. Unlike other survey papers, we focus on the methodological perspective of the optimization problem, which can be applicable to various UAV-assisted missions on a SAGIN using these algorithms. We simulate users and environments according to real-world scenarios and compare the learning-based and PSO-based methods in terms of throughput, load, fairness, computation time, etc. We also implement and evaluate the 2-dimensional (2D) and 3-dimensional (3D) variations of these algorithms to reflect different deployment cases. Our simulation suggests that the $3$D satisfaction-based learning algorithm outperforms the other approaches for various metrics in most cases. We discuss some open challenges at the end and our findings aim to provide design guidelines for algorithm selections while optimizing the deployment of UAV-assisted SAGINs.
translated by 谷歌翻译
Unmanned aerial vehicle (UAV) swarms are considered as a promising technique for next-generation communication networks due to their flexibility, mobility, low cost, and the ability to collaboratively and autonomously provide services. Distributed learning (DL) enables UAV swarms to intelligently provide communication services, multi-directional remote surveillance, and target tracking. In this survey, we first introduce several popular DL algorithms such as federated learning (FL), multi-agent Reinforcement Learning (MARL), distributed inference, and split learning, and present a comprehensive overview of their applications for UAV swarms, such as trajectory design, power control, wireless resource allocation, user assignment, perception, and satellite communications. Then, we present several state-of-the-art applications of UAV swarms in wireless communication systems, such us reconfigurable intelligent surface (RIS), virtual reality (VR), semantic communications, and discuss the problems and challenges that DL-enabled UAV swarms can solve in these applications. Finally, we describe open problems of using DL in UAV swarms and future research directions of DL enabled UAV swarms. In summary, this survey provides a comprehensive survey of various DL applications for UAV swarms in extensive scenarios.
translated by 谷歌翻译
改善人与人之间的互动性和互连性是元视频的亮点之一。荟萃分析依赖于核心方法,数字孪生,这是将物理世界对象,人,动作和场景复制到虚拟世界中的一种手段。能够在实时和移动性的情况下访问与物理世界相关的场景和信息,对于为所有用户开发高度可访问,互动和互连体验至关重要。这种开发使来自其他位置的用户可以访问有关另一个位置发生的事件的高质量现实世界和最新信息,并与他人进行超相互交流的社交。然而,由于虚拟世界图形的数据大小以及对低延迟传输的需求,因此其他人从元评估中产生的持续,平稳的更新是一项具有挑战性的任务。随着移动增强现实(MAR)的开发,用户也可以通过高度交互方式(即使在移动性下)通过元视频进行交互。因此,在我们的工作中,我们考虑了一个环境,其中包括移动车辆互联网(IOV)的用户,并通过无线通信从Metaverse Service Provister Pasting Stations(MSPCSS)下载实时虚拟世界更新。我们设计了一个具有多个单元站的环境,其中将在细胞站之间交换用户虚拟世界图形下载任务。由于传输延迟是在移动性下接收虚拟世界更新的主要关注点,因此我们的工作旨在分配系统资源,以最大程度地减少用户在车辆中使用的总时间,以便从单元站下载其虚拟世界场景。我们利用深度强化学习并评估不同环境配置下算法的性能。我们的工作提供了启用AI支持的6G通信的元视体的用例。
translated by 谷歌翻译
The explosive growth of dynamic and heterogeneous data traffic brings great challenges for 5G and beyond mobile networks. To enhance the network capacity and reliability, we propose a learning-based dynamic time-frequency division duplexing (D-TFDD) scheme that adaptively allocates the uplink and downlink time-frequency resources of base stations (BSs) to meet the asymmetric and heterogeneous traffic demands while alleviating the inter-cell interference. We formulate the problem as a decentralized partially observable Markov decision process (Dec-POMDP) that maximizes the long-term expected sum rate under the users' packet dropping ratio constraints. In order to jointly optimize the global resources in a decentralized manner, we propose a federated reinforcement learning (RL) algorithm named federated Wolpertinger deep deterministic policy gradient (FWDDPG) algorithm. The BSs decide their local time-frequency configurations through RL algorithms and achieve global training via exchanging local RL models with their neighbors under a decentralized federated learning framework. Specifically, to deal with the large-scale discrete action space of each BS, we adopt a DDPG-based algorithm to generate actions in a continuous space, and then utilize Wolpertinger policy to reduce the mapping errors from continuous action space back to discrete action space. Simulation results demonstrate the superiority of our proposed algorithm to benchmark algorithms with respect to system sum rate.
translated by 谷歌翻译
无人驾驶飞行器(UAV)是支持各种服务,包括通信的技术突破之一。UAV将在提高无线网络的物理层安全方面发挥关键作用。本文定义了窃听地面用户与UAV之间的链路的问题,该联接器用作空中基站(ABS)。提出了加强学习算法Q - 学习和深Q网络(DQN),用于优化ABS的位置和传输功率,以增强地面用户的数据速率。如果没有系统了解窃听器的位置,这会增加保密容量。与Q-Learnch和基线方法相比,仿真结果显示了拟议DQN的快速收敛性和最高保密能力。
translated by 谷歌翻译
Hybrid FSO/RF system requires an efficient FSO and RF link switching mechanism to improve the system capacity by realizing the complementary benefits of both the links. The dynamics of network conditions, such as fog, dust, and sand storms compound the link switching problem and control complexity. To address this problem, we initiate the study of deep reinforcement learning (DRL) for link switching of hybrid FSO/RF systems. Specifically, in this work, we focus on actor-critic called Actor/Critic-FSO/RF and Deep-Q network (DQN) called DQN-FSO/RF for FSO/RF link switching under atmospheric turbulences. To formulate the problem, we define the state, action, and reward function of a hybrid FSO/RF system. DQN-FSO/RF frequently updates the deployed policy that interacts with the environment in a hybrid FSO/RF system, resulting in high switching costs. To overcome this, we lift this problem to ensemble consensus-based representation learning for deep reinforcement called DQNEnsemble-FSO/RF. The proposed novel DQNEnsemble-FSO/RF DRL approach uses consensus learned features representations based on an ensemble of asynchronous threads to update the deployed policy. Experimental results corroborate that the proposed DQNEnsemble-FSO/RF's consensus-learned features switching achieves better performance than Actor/Critic-FSO/RF, DQN-FSO/RF, and MyOpic for FSO/RF link switching while keeping the switching cost significantly low.
translated by 谷歌翻译
自动驾驶汽车(AV)必须在动态环境中安全有效地操作。为此,配备联合雷达通信(JRC)功能的AVS可以通过使用雷达检测和数据通信功能来增强驾驶安全性。但是,在不确定性和周围环境的动态下,通过两种不同功能优化AV系统的性能非常具有挑战性。在这项工作中,我们首先提出一个基于马尔可夫决策过程(MDP)的智能优化框架,以帮助AV在周围环境的动态和不确定性下选择JRC操作功能时做出最佳决策。然后,我们开发了一种有效的学习算法,利用了深度强化学习技术的最新进展,以找到AV的最佳政策,而无需任何有关周围环境的先前信息。此外,为了使我们提出的框架更加可扩展,我们开发了一种转移学习(TL)机制,该机制使AV能够利用有价值的体验来加速培训过程,以加速培训过程。广泛的模拟表明,与其他常规的深钢筋学习方法相比,提议的可转移深钢筋学习框架可将AV的障碍检测概率降低到67%。
translated by 谷歌翻译
雇用无人驾驶航空公司(无人机)吸引了日益增长的兴趣,并成为互联网(物联网)网络中的数据收集技术的最先进技术。在本文中,目的是最大限度地减少UAV-IOT系统的总能耗,我们制定了联合设计了UAV的轨迹和选择IOT网络中的群集头作为受约束的组合优化问题的问题,该问题被归类为NP-努力解决。我们提出了一种新的深度加强学习(DRL),其具有顺序模型策略,可以通过无监督方式有效地学习由UAV的轨迹设计来实现由序列到序列神经网络表示的策略。通过广泛的模拟,所获得的结果表明,与其他基线算法相比,所提出的DRL方法可以找到无人机的轨迹,这些轨迹需要更少的能量消耗,并实现近乎最佳性能。此外,仿真结果表明,我们所提出的DRL算法的训练模型具有出色的概括能力,对更大的问题尺寸而没有必要恢复模型。
translated by 谷歌翻译
The modern dynamic and heterogeneous network brings differential environments with respective state transition probability to agents, which leads to the local strategy trap problem of traditional federated reinforcement learning (FRL) based network optimization algorithm. To solve this problem, we propose a novel Differentiated Federated Reinforcement Learning (DFRL), which evolves the global policy model integration and local inference with the global policy model in traditional FRL to a collaborative learning process with parallel global trends learning and differential local policy model learning. In the DFRL, the local policy learning model is adaptively updated with the global trends model and local environment and achieves better differentiated adaptation. We evaluate the outperformance of the proposal compared with the state-of-the-art FRL in a classical CartPole game with heterogeneous environments. Furthermore, we implement the proposal in the heterogeneous Space-air-ground Integrated Network (SAGIN) for the classical traffic offloading problem in network. The simulation result shows that the proposal shows better global performance and fairness than baselines in terms of throughput, delay, and packet drop rate.
translated by 谷歌翻译
Recent advances in distributed artificial intelligence (AI) have led to tremendous breakthroughs in various communication services, from fault-tolerant factory automation to smart cities. When distributed learning is run over a set of wirelessly connected devices, random channel fluctuations and the incumbent services running on the same network impact the performance of both distributed learning and the coexisting service. In this paper, we investigate a mixed service scenario where distributed AI workflow and ultra-reliable low latency communication (URLLC) services run concurrently over a network. Consequently, we propose a risk sensitivity-based formulation for device selection to minimize the AI training delays during its convergence period while ensuring that the operational requirements of the URLLC service are met. To address this challenging coexistence problem, we transform it into a deep reinforcement learning problem and address it via a framework based on soft actor-critic algorithm. We evaluate our solution with a realistic and 3GPP-compliant simulator for factory automation use cases. Our simulation results confirm that our solution can significantly decrease the training delay of the distributed AI service while keeping the URLLC availability above its required threshold and close to the scenario where URLLC solely consumes all network resources.
translated by 谷歌翻译
第五世代和第六代无线通信网络正在启用工具,例如物联网设备,无人驾驶汽车(UAV)和人工智能,以使用设备网络来改善农业景观,以自动监视农田。对大面积进行调查需要在特定时间段内执行许多图像分类任务,以防止发生事件发生的情况,例如火灾或洪水。无人机具有有限的能量和计算能力,并且可能无法在本地和适当的时间内执行所有强烈的图像分类任务。因此,假定无人机能够部分将其工作量分开到附近的多访问边缘计算设备。无人机需要一种决策算法,该算法将决定将执行任务的位置,同时还考虑网络中其他无人机的时间限制和能量级别。在本文中,我们介绍了一种深入的Q学习方法(DQL)来解决这个多目标问题。将所提出的方法与Q学习和三个启发式基线进行了比较,模拟结果表明,我们提出的基于DQL的方法在涉及无人机的剩余电池电量和违规截止日期的百分比时可相当。此外,我们的方法能够比Q学习快13倍。
translated by 谷歌翻译
Technology advancements in wireless communications and high-performance Extended Reality (XR) have empowered the developments of the Metaverse. The demand for Metaverse applications and hence, real-time digital twinning of real-world scenes is increasing. Nevertheless, the replication of 2D physical world images into 3D virtual world scenes is computationally intensive and requires computation offloading. The disparity in transmitted scene dimension (2D as opposed to 3D) leads to asymmetric data sizes in uplink (UL) and downlink (DL). To ensure the reliability and low latency of the system, we consider an asynchronous joint UL-DL scenario where in the UL stage, the smaller data size of the physical world scenes captured by multiple extended reality users (XUs) will be uploaded to the Metaverse Console (MC) to be construed and rendered. In the DL stage, the larger-size 3D virtual world scenes need to be transmitted back to the XUs. The decisions pertaining to computation offloading and channel assignment are optimized in the UL stage, and the MC will optimize power allocation for users assigned with a channel in the UL transmission stage. Some problems arise therefrom: (i) interactive multi-process chain, specifically Asynchronous Markov Decision Process (AMDP), (ii) joint optimization in multiple processes, and (iii) high-dimensional objective functions, or hybrid reward scenarios. To ensure the reliability and low latency of the system, we design a novel multi-agent reinforcement learning algorithm structure, namely Asynchronous Actors Hybrid Critic (AAHC). Extensive experiments demonstrate that compared to proposed baselines, AAHC obtains better solutions with preferable training time.
translated by 谷歌翻译
对于正交多访问(OMA)系统,服务的用户设备(UES)的数量仅限于可用的正交资源的数量。另一方面,非正交多访问(NOMA)方案允许多个UES使用相同的正交资源。这种额外的自由度为资源分配带来了新的挑战。缓冲状态信息(BSI),例如等待传输的数据包的大小和年龄,可用于改善OMA系统中的调度。在本文中,我们研究了BSI对上行链路多载波NOMA场景中集中调度程序的性能的影响,UE具有各种数据速率和延迟要求。为了处理将UES分配给资源的大型组合空间,我们提出了一个基于Actor-Critic-Critic强化学习纳入BSI的新型调度程序。使用诺基亚的“无线套件”进行培训和评估。我们提出了各种新颖的技术来稳定和加快训练。建议的调度程序优于基准调度程序。
translated by 谷歌翻译
The connectivity-aware path design is crucial in the effective deployment of autonomous Unmanned Aerial Vehicles (UAVs). Recently, Reinforcement Learning (RL) algorithms have become the popular approach to solving this type of complex problem, but RL algorithms suffer slow convergence. In this paper, we propose a Transfer Learning (TL) approach, where we use a teacher policy previously trained in an old domain to boost the path learning of the agent in the new domain. As the exploration processes and the training continue, the agent refines the path design in the new domain based on the subsequent interactions with the environment. We evaluate our approach considering an old domain at sub-6 GHz and a new domain at millimeter Wave (mmWave). The teacher path policy, previously trained at sub-6 GHz path, is the solution to a connectivity-aware path problem that we formulate as a constrained Markov Decision Process (CMDP). We employ a Lyapunov-based model-free Deep Q-Network (DQN) to solve the path design at sub-6 GHz that guarantees connectivity constraint satisfaction. We empirically demonstrate the effectiveness of our approach for different urban environment scenarios. The results demonstrate that our proposed approach is capable of reducing the training time considerably at mmWave.
translated by 谷歌翻译
多访问边缘计算(MEC)是一个新兴的计算范式,将云计算扩展到网络边缘,以支持移动设备上的资源密集型应用程序。作为MEC的关键问题,服务迁移需要决定如何迁移用户服务,以维持用户在覆盖范围和容量有限的MEC服务器之间漫游的服务质量。但是,由于动态的MEC环境和用户移动性,找到最佳的迁移策略是棘手的。许多现有研究根据完整的系统级信息做出集中式迁移决策,这是耗时的,并且缺乏理想的可扩展性。为了应对这些挑战,我们提出了一种新颖的学习驱动方法,该方法以用户为中心,可以通过使用不完整的系统级信息来做出有效的在线迁移决策。具体而言,服务迁移问题被建模为可观察到的马尔可夫决策过程(POMDP)。为了解决POMDP,我们设计了一个新的编码网络,该网络结合了长期记忆(LSTM)和一个嵌入式矩阵,以有效提取隐藏信息,并进一步提出了一种定制的非政策型演员 - 批判性算法,以进行有效的训练。基于现实世界的移动性痕迹的广泛实验结果表明,这种新方法始终优于启发式和最先进的学习驱动算法,并且可以在各种MEC场景上取得近乎最佳的结果。
translated by 谷歌翻译