航空基站(ABS)允许智能农场从物联网(IoT)设备的ABS卸载复杂任务的处理责任。 IoT设备的能源和计算资源有限,因此需要为需要ABS支持的系统提供高级解决方案。本文介绍了一种新型的基于多进取的风险敏感的增强学习方法,用于用于智能农业的ABS任务计划。该问题被定义为任务卸载,并在其截止日期之前完成IoT任务的严格条件。此外,该算法还必须考虑ABS的能量能力有限。结果表明,我们提出的方法的表现优于几种启发式方法和经典的Q学习方法。此外,我们提供了混合整数线性编程解决方案,以确定性能的下限,并阐明我们的风险敏感解决方案与最佳解决方案之间的差距。比较证明了我们的广泛仿真结果表明,我们的方法是一种有前途的方法,可以为智能农场中的物联网任务提供保证的任务处理服务,同时增加了该农场中ABS的悬停时间。
translated by 谷歌翻译
第五世代和第六代无线通信网络正在启用工具,例如物联网设备,无人驾驶汽车(UAV)和人工智能,以使用设备网络来改善农业景观,以自动监视农田。对大面积进行调查需要在特定时间段内执行许多图像分类任务,以防止发生事件发生的情况,例如火灾或洪水。无人机具有有限的能量和计算能力,并且可能无法在本地和适当的时间内执行所有强烈的图像分类任务。因此,假定无人机能够部分将其工作量分开到附近的多访问边缘计算设备。无人机需要一种决策算法,该算法将决定将执行任务的位置,同时还考虑网络中其他无人机的时间限制和能量级别。在本文中,我们介绍了一种深入的Q学习方法(DQL)来解决这个多目标问题。将所提出的方法与Q学习和三个启发式基线进行了比较,模拟结果表明,我们提出的基于DQL的方法在涉及无人机的剩余电池电量和违规截止日期的百分比时可相当。此外,我们的方法能够比Q学习快13倍。
translated by 谷歌翻译
未来的互联网涉及几种新兴技术,例如5G和5G网络,车辆网络,无人机(UAV)网络和物联网(IOT)。此外,未来的互联网变得异质并分散了许多相关网络实体。每个实体可能需要做出本地决定,以在动态和不确定的网络环境下改善网络性能。最近使用标准学习算法,例如单药强化学习(RL)或深入强化学习(DRL),以使每个网络实体作为代理人通过与未知环境进行互动来自适应地学习最佳决策策略。但是,这种算法未能对网络实体之间的合作或竞争进行建模,而只是将其他实体视为可能导致非平稳性问题的环境的一部分。多机构增强学习(MARL)允许每个网络实体不仅观察环境,还可以观察其他实体的政策来学习其最佳政策。结果,MAL可以显着提高网络实体的学习效率,并且最近已用于解决新兴网络中的各种问题。在本文中,我们因此回顾了MAL在新兴网络中的应用。特别是,我们提供了MARL的教程,以及对MARL在下一代互联网中的应用进行全面调查。特别是,我们首先介绍单代机Agent RL和MARL。然后,我们回顾了MAL在未来互联网中解决新兴问题的许多应用程序。这些问题包括网络访问,传输电源控制,计算卸载,内容缓存,数据包路由,无人机网络的轨迹设计以及网络安全问题。
translated by 谷歌翻译
The deployment flexibility and maneuverability of Unmanned Aerial Vehicles (UAVs) increased their adoption in various applications, such as wildfire tracking, border monitoring, etc. In many critical applications, UAVs capture images and other sensory data and then send the captured data to remote servers for inference and data processing tasks. However, this approach is not always practical in real-time applications due to the connection instability, limited bandwidth, and end-to-end latency. One promising solution is to divide the inference requests into multiple parts (layers or segments), with each part being executed in a different UAV based on the available resources. Furthermore, some applications require the UAVs to traverse certain areas and capture incidents; thus, planning their paths becomes critical particularly, to reduce the latency of making the collaborative inference process. Specifically, planning the UAVs trajectory can reduce the data transmission latency by communicating with devices in the same proximity while mitigating the transmission interference. This work aims to design a model for distributed collaborative inference requests and path planning in a UAV swarm while respecting the resource constraints due to the computational load and memory usage of the inference requests. The model is formulated as an optimization problem and aims to minimize latency. The formulated problem is NP-hard so finding the optimal solution is quite complex; thus, this paper introduces a real-time and dynamic solution for online applications using deep reinforcement learning. We conduct extensive simulations and compare our results to the-state-of-the-art studies demonstrating that our model outperforms the competing models.
translated by 谷歌翻译
Unmanned aerial vehicle (UAV) swarms are considered as a promising technique for next-generation communication networks due to their flexibility, mobility, low cost, and the ability to collaboratively and autonomously provide services. Distributed learning (DL) enables UAV swarms to intelligently provide communication services, multi-directional remote surveillance, and target tracking. In this survey, we first introduce several popular DL algorithms such as federated learning (FL), multi-agent Reinforcement Learning (MARL), distributed inference, and split learning, and present a comprehensive overview of their applications for UAV swarms, such as trajectory design, power control, wireless resource allocation, user assignment, perception, and satellite communications. Then, we present several state-of-the-art applications of UAV swarms in wireless communication systems, such us reconfigurable intelligent surface (RIS), virtual reality (VR), semantic communications, and discuss the problems and challenges that DL-enabled UAV swarms can solve in these applications. Finally, we describe open problems of using DL in UAV swarms and future research directions of DL enabled UAV swarms. In summary, this survey provides a comprehensive survey of various DL applications for UAV swarms in extensive scenarios.
translated by 谷歌翻译
Recent technological advancements in space, air and ground components have made possible a new network paradigm called "space-air-ground integrated network" (SAGIN). Unmanned aerial vehicles (UAVs) play a key role in SAGINs. However, due to UAVs' high dynamics and complexity, the real-world deployment of a SAGIN becomes a major barrier for realizing such SAGINs. Compared to the space and terrestrial components, UAVs are expected to meet performance requirements with high flexibility and dynamics using limited resources. Therefore, employing UAVs in various usage scenarios requires well-designed planning in algorithmic approaches. In this paper, we provide a comprehensive review of recent learning-based algorithmic approaches. We consider possible reward functions and discuss the state-of-the-art algorithms for optimizing the reward functions, including Q-learning, deep Q-learning, multi-armed bandit (MAB), particle swarm optimization (PSO) and satisfaction-based learning algorithms. Unlike other survey papers, we focus on the methodological perspective of the optimization problem, which can be applicable to various UAV-assisted missions on a SAGIN using these algorithms. We simulate users and environments according to real-world scenarios and compare the learning-based and PSO-based methods in terms of throughput, load, fairness, computation time, etc. We also implement and evaluate the 2-dimensional (2D) and 3-dimensional (3D) variations of these algorithms to reflect different deployment cases. Our simulation suggests that the $3$D satisfaction-based learning algorithm outperforms the other approaches for various metrics in most cases. We discuss some open challenges at the end and our findings aim to provide design guidelines for algorithm selections while optimizing the deployment of UAV-assisted SAGINs.
translated by 谷歌翻译
本文调查了大师无人机(MUAV) - 互联网(IOT)网络,我们建议使用配备有智能反射表面(IRS)的可充电辅助UAV(AUAV)来增强来自MUAV的通信信号并将MUAG作为充电电源利用。在拟议的模型下,我们研究了这些能量有限的无人机的最佳协作策略,以最大限度地提高物联网网络的累计吞吐量。根据两个无人机之间是否有收费,配制了两个优化问题。为了解决这些问题,提出了两个多代理深度强化学习(DRL)方法,这些方法是集中培训多师深度确定性政策梯度(CT-MADDPG)和多代理深度确定性政策选项评论仪(MADDPOC)。结果表明,CT-MADDPG可以大大减少对UAV硬件的计算能力的要求,拟议的MADDPOC能够在连续动作域中支持低水平的多代理合作学习,其优于优势基于选项的分层DRL,只支持单代理学习和离散操作。
translated by 谷歌翻译
我们考虑一个用于边缘计算应用程序的智能传感器网络,该网络采样了感兴趣的信号,并将更新发送到基站进行远程全局监视。传感器配备了传感和计算,并且可以在传输前在板载上发送原始数据或处理它们。边缘的有限硬件资源产生基本的潜伏期 - 准确性权衡:原始测量值不准确,但及时,而计算延迟后准确的处理更新可用。同样,如果传感器在板载处理需要数据压缩,则无线通信引起的延迟可能会更高。因此,需要决定何时传感器应传输原始测量或依靠本地处理以最大程度地提高整体网络性能。为了解决这个传感设计问题,我们对一个嵌入计算和通信延迟的估计理论优化框架进行建模,并提出一种基于强化学习的方法,以在每个传感器上动态分配计算资源。我们提出的方法的有效性是通过数值模拟的验证,该案例研究是由无人机和自动驾驶车辆驱动的案例研究。
translated by 谷歌翻译
无人驾驶飞机(UAV)用作空中基础站,可将时间敏感的包装从物联网设备传递到附近的陆地底站(TBS)。在此类无人产用的物联网网络中安排数据包,以确保TBS在TBS上确保新鲜(或最新的)物联网设备的数据包是一个挑战性的问题,因为它涉及两个同时的步骤(i)(i)在IOT设备上生成的数据包的同时进行样本由UAVS [HOP-1]和(ii)将采样数据包从UAVS更新到TBS [Hop-2]。为了解决这个问题,我们建议针对两跳UAV相关的IoT网络的信息年龄(AOI)调度算法。首先,我们提出了一个低复杂的AOI调度程序,称为MAF-MAD,该计划使用UAV(HOP-1)和最大AOI差异(MAD)策略采样最大AOI(MAF)策略,以更新从无人机到TBS(Hop-2)。我们证明,MAF-MAD是理想条件下的最佳AOI调度程序(无线无线通道和在物联网设备上产生交通生成)。相反,对于一般条件(物联网设备的损失渠道条件和不同的周期性交通生成),提出了深厚的增强学习算法,即近端政策优化(PPO)基于调度程序。仿真结果表明,在所有考虑的一般情况下,建议的基于PPO的调度程序优于MAF-MAD,MAF和Round-Robin等其他调度程序。
translated by 谷歌翻译
With the increasing growth of information through smart devices, increasing the quality level of human life requires various computational paradigms presentation including the Internet of Things, fog, and cloud. Between these three paradigms, the cloud computing paradigm as an emerging technology adds cloud layer services to the edge of the network so that resource allocation operations occur close to the end-user to reduce resource processing time and network traffic overhead. Hence, the resource allocation problem for its providers in terms of presenting a suitable platform, by using computational paradigms is considered a challenge. In general, resource allocation approaches are divided into two methods, including auction-based methods(goal, increase profits for service providers-increase user satisfaction and usability) and optimization-based methods(energy, cost, network exploitation, Runtime, reduction of time delay). In this paper, according to the latest scientific achievements, a comprehensive literature study (CLS) on artificial intelligence methods based on resource allocation optimization without considering auction-based methods in various computing environments are provided such as cloud computing, Vehicular Fog Computing, wireless, IoT, vehicular networks, 5G networks, vehicular cloud architecture,machine-to-machine communication(M2M),Train-to-Train(T2T) communication network, Peer-to-Peer(P2P) network. Since deep learning methods based on artificial intelligence are used as the most important methods in resource allocation problems; Therefore, in this paper, resource allocation approaches based on deep learning are also used in the mentioned computational environments such as deep reinforcement learning, Q-learning technique, reinforcement learning, online learning, and also Classical learning methods such as Bayesian learning, Cummins clustering, Markov decision process.
translated by 谷歌翻译
This paper studies a model for online job scheduling in green datacenters. In green datacenters, resource availability depends on the power supply from the renewables. Intermittent power supply from renewables leads to intermittent resource availability, inducing job delays (and associated costs). Green datacenter operators must intelligently manage their workloads and available power supply to extract maximum benefits. The scheduler's objective is to schedule jobs on a set of resources to maximize the total value (revenue) while minimizing the overall job delay. A trade-off exists between achieving high job value on the one hand and low expected delays on the other. Hence, the aims of achieving high rewards and low costs are in opposition. In addition, datacenter operators often prioritize multiple objectives, including high system utilization and job completion. To accomplish the opposing goals of maximizing total job value and minimizing job delays, we apply the Proportional-Integral-Derivative (PID) Lagrangian methods in Deep Reinforcement Learning to job scheduling problem in the green datacenter environment. Lagrangian methods are widely used algorithms for constrained optimization problems. We adopt a controls perspective to learn the Lagrange multiplier with proportional, integral, and derivative control, achieving favorable learning dynamics. Feedback control defines cost terms for the learning agent, monitors the cost limits during training, and continuously adjusts the learning parameters to achieve stable performance. Our experiments demonstrate improved performance compared to scheduling policies without the PID Lagrangian methods. Experimental results illustrate the effectiveness of the Constraint Controlled Reinforcement Learning (CoCoRL) scheduler that simultaneously satisfies multiple objectives.
translated by 谷歌翻译
Collaboration among industrial Internet of Things (IoT) devices and edge networks is essential to support computation-intensive deep neural network (DNN) inference services which require low delay and high accuracy. Sampling rate adaption which dynamically configures the sampling rates of industrial IoT devices according to network conditions, is the key in minimizing the service delay. In this paper, we investigate the collaborative DNN inference problem in industrial IoT networks. To capture the channel variation and task arrival randomness, we formulate the problem as a constrained Markov decision process (CMDP). Specifically, sampling rate adaption, inference task offloading and edge computing resource allocation are jointly considered to minimize the average service delay while guaranteeing the long-term accuracy requirements of different inference services. Since CMDP cannot be directly solved by general reinforcement learning (RL) algorithms due to the intractable long-term constraints, we first transform the CMDP into an MDP by leveraging the Lyapunov optimization technique. Then, a deep RL-based algorithm is proposed to solve the MDP. To expedite the training process, an optimization subroutine is embedded in the proposed algorithm to directly obtain the optimal edge computing resource allocation. Extensive simulation results are provided to demonstrate that the proposed RL-based algorithm can significantly reduce the average service delay while preserving long-term inference accuracy with a high probability.
translated by 谷歌翻译
Technology advancements in wireless communications and high-performance Extended Reality (XR) have empowered the developments of the Metaverse. The demand for Metaverse applications and hence, real-time digital twinning of real-world scenes is increasing. Nevertheless, the replication of 2D physical world images into 3D virtual world scenes is computationally intensive and requires computation offloading. The disparity in transmitted scene dimension (2D as opposed to 3D) leads to asymmetric data sizes in uplink (UL) and downlink (DL). To ensure the reliability and low latency of the system, we consider an asynchronous joint UL-DL scenario where in the UL stage, the smaller data size of the physical world scenes captured by multiple extended reality users (XUs) will be uploaded to the Metaverse Console (MC) to be construed and rendered. In the DL stage, the larger-size 3D virtual world scenes need to be transmitted back to the XUs. The decisions pertaining to computation offloading and channel assignment are optimized in the UL stage, and the MC will optimize power allocation for users assigned with a channel in the UL transmission stage. Some problems arise therefrom: (i) interactive multi-process chain, specifically Asynchronous Markov Decision Process (AMDP), (ii) joint optimization in multiple processes, and (iii) high-dimensional objective functions, or hybrid reward scenarios. To ensure the reliability and low latency of the system, we design a novel multi-agent reinforcement learning algorithm structure, namely Asynchronous Actors Hybrid Critic (AAHC). Extensive experiments demonstrate that compared to proposed baselines, AAHC obtains better solutions with preferable training time.
translated by 谷歌翻译
在这项工作中,我们优化了基于无人机(UAV)的便携式接入点(PAP)的3D轨迹,该轨迹为一组接地节点(GNS)提供无线服务。此外,根据Peukert效果,我们考虑无人机电池的实用非线性电池放电。因此,我们以一种新颖的方式提出问题,代表了基于公平的能源效率度量的最大化,并被称为公平能源效率(费用)。费用指标定义了一个系统,该系统对每用户服务的公平性和PAP的能源效率都非常重要。该法式问题采用非凸面问题的形式,并具有不可扣除的约束。为了获得解决方案,我们将问题表示为具有连续状态和动作空间的马尔可夫决策过程(MDP)。考虑到解决方案空间的复杂性,我们使用双胞胎延迟的深层确定性政策梯度(TD3)参与者 - 批判性深入强化学习(DRL)框架来学习最大化系统费用的政策。我们进行两种类型的RL培训来展示我们方法的有效性:第一种(离线)方法在整个训练阶段保持GN的位置相同;第二种方法将学习的政策概括为GN的任何安排,通过更改GN的位置,每次培训情节后。数值评估表明,忽视Peukert效应高估了PAP的播放时间,可以通过最佳选择PAP的飞行速度来解决。此外,用户公平,能源效率,因此可以通过有效地将PAP移动到GN上方,从而提高系统的费用价值。因此,我们注意到郊区,城市和茂密的城市环境的基线情景高达88.31%,272.34%和318.13%。
translated by 谷歌翻译
雇用无人驾驶航空公司(无人机)吸引了日益增长的兴趣,并成为互联网(物联网)网络中的数据收集技术的最先进技术。在本文中,目的是最大限度地减少UAV-IOT系统的总能耗,我们制定了联合设计了UAV的轨迹和选择IOT网络中的群集头作为受约束的组合优化问题的问题,该问题被归类为NP-努力解决。我们提出了一种新的深度加强学习(DRL),其具有顺序模型策略,可以通过无监督方式有效地学习由UAV的轨迹设计来实现由序列到序列神经网络表示的策略。通过广泛的模拟,所获得的结果表明,与其他基线算法相比,所提出的DRL方法可以找到无人机的轨迹,这些轨迹需要更少的能量消耗,并实现近乎最佳性能。此外,仿真结果表明,我们所提出的DRL算法的训练模型具有出色的概括能力,对更大的问题尺寸而没有必要恢复模型。
translated by 谷歌翻译
In recent years, the exponential proliferation of smart devices with their intelligent applications poses severe challenges on conventional cellular networks. Such challenges can be potentially overcome by integrating communication, computing, caching, and control (i4C) technologies. In this survey, we first give a snapshot of different aspects of the i4C, comprising background, motivation, leading technological enablers, potential applications, and use cases. Next, we describe different models of communication, computing, caching, and control (4C) to lay the foundation of the integration approach. We review current state-of-the-art research efforts related to the i4C, focusing on recent trends of both conventional and artificial intelligence (AI)-based integration approaches. We also highlight the need for intelligence in resources integration. Then, we discuss integration of sensing and communication (ISAC) and classify the integration approaches into various classes. Finally, we propose open challenges and present future research directions for beyond 5G networks, such as 6G.
translated by 谷歌翻译
尽管深度神经网络(DNN)已成为多个无处不在的应用程序的骨干技术,但它们在资源受限的机器中的部署,例如物联网(IoT)设备,仍然具有挑战性。为了满足这种范式的资源要求,引入了与IoT协同作用的深入推断。但是,DNN网络的分布遭受严重的数据泄漏。已经提出了各种威胁,包括黑盒攻击,恶意参与者可以恢复送入其设备的任意输入。尽管许多对策旨在实现隐私的DNN,但其中大多数会导致额外的计算和较低的准确性。在本文中,我们提出了一种方法,该方法通过重新考虑分配策略而无需牺牲模型性能来针对协作深度推断的安全性。特别是,我们检查了使该模型容易受到黑盒威胁的不同DNN分区,并得出了应分配每个设备的数据量以隐藏原始输入的所有权。我们将这种方法制定为一种优化,在该方法中,我们在共同推导的延迟与数据级别的数据级别之间建立了权衡。接下来,为了放大最佳解决方案,我们将方法塑造为支持异质设备以及多个DNN/数据集的增强学习(RL)设计。
translated by 谷歌翻译
FOG无线电访问网络(F-RAN)是一项有前途的技术,用户移动设备(MDS)可以将计算任务卸载到附近的FOG接入点(F-APS)。由于F-APS的资源有限,因此设计有效的任务卸载方案很重要。在本文中,通过考虑随时间变化的网络环境,制定了F-RAN中的动态计算卸载和资源分配问题,以最大程度地减少MD的任务执行延迟和能源消耗。为了解决该问题,提出了基于联合的深入强化学习(DRL)算法,其中深层确定性策略梯度(DDPG)算法在每个F-AP中执行计算卸载和资源分配。利用联合学习来培训DDPG代理,以降低培训过程的计算复杂性并保护用户隐私。仿真结果表明,与其他现有策略相比,提议的联合DDPG算法可以更快地实现MDS更快的任务执行延迟和能源消耗。
translated by 谷歌翻译
网络切片允许移动网络运营商虚拟化基础架构,并提供定制的切片,以支持具有异构要求的各种用例。在线深度加强学习(DRL)在解决网络问题和消除模拟 - 现实差异方面表现出有希望的潜力。然而,在线DRL优化跨域资源,作为DRL的随机探索违反了切片的服务级别协议(SLA)和基础架构的资源限制。在本文中,我们提出了一个在线端到端网络切片系统的Onslicing,以实现最小的资源用法,同时满足切片的SLA。 Onslicing允许为每个切片个性化学习,并通过使用新的约束感知策略更新方法和主动基线切换机制来维护其SLA。在基础架构中的切片和参数协调中,符合基础设施的资源限制,符合基础架构的资源限制。 Onslicing进一步减轻了在早期学习阶段的在线学习的差表现不佳,该阶段模仿基于规则的解决方案。此外,我们设计了四个新的域管理员,可以分别在零档的时间尺寸,传输,核心和边缘网络中启用动态资源配置。我们在基于OpenAirInterface的端到端切片测试平面上实现了onSlicing,其中4G LTE和5G NR,OpenDaylight SDN平台和OpenAir-CN核心网络。实验结果表明,与基于规则的解决方案相比,持续达到61.3%的使用量减少,并在在线学习阶段保持近零违规(0.06%)。随着在线学习融合,与最先进的在线DRL解决方案相比,在没有任何违规的情况下,在没有任何违规的情况下减少了12.5%的使用。
translated by 谷歌翻译
事件处理是动态和响应互联网(物联网)的基石。该领域的最近方法基于代表性状态转移(REST)原则,其允许将事件处理任务放置在遵循相同原理的任何设备上。但是,任务应在边缘设备之间正确分布,以确保公平资源利用率和保证无缝执行。本文调查了深入学习的使用,以公平分配任务。提出了一种基于关注的神经网络模型,在不同场景下产生有效的负载平衡解决方案。所提出的模型基于变压器和指针网络架构,并通过Advantage演员批评批评学习算法训练。该模型旨在缩放到事件处理任务的数量和边缘设备的数量,不需要重新调整甚至再刷新。广泛的实验结果表明,拟议的模型在许多关键绩效指标中优于传统的启发式。通用设计和所获得的结果表明,所提出的模型可能适用于几个其他负载平衡问题变化,这使得该提案是由于其可扩展性和效率而在现实世界场景中使用的有吸引力的选择。
translated by 谷歌翻译