智能论文笔记

Multi-Agent Deep Reinforcement Learning For Optimising Energy Efficiency of Fixed-Wing UAV Cellular Access Points

Boris Galkin , Babatunji Omoniwa , Ivana Dusparic

分类：机器学习

2021-11-03

无人驾驶飞行器（无人机）承诺成为下一代通信的内在部分，因为它们可以部署为提供无线连接到地面用户，以补充现有的地面网络。大多数现有研究使用UAV接入点的蜂窝覆盖率考虑了旋转翼UAV设计（即Quadcopters）。但是，我们预计固定翼的无人机在需要长途飞行时间（例如农村覆盖范围）的情况下更适合连接目的（例如农村覆盖率），因为与旋翼设计。由于固定翼无人机通常无法悬停在适当位置，因此它们的部署优化涉及以允许它们以节能的方式向地面用户提供高质量服务的方式优化其单独的飞行轨迹。在本文中，我们提出了一种多功能深度加强学习方法来优化固定翼UAV蜂窝接入点的能效，同时允许它们向地面用户提供高质量的服务。在我们的分散方法中，每个UAV都配备了Dueling Deep Q-Network（DDQN）代理，可以通过一系列时间步来调整UV的3D轨迹。通过与邻居协调，无人机以优化总系统能效的方式调整各个飞行轨迹。我们基准对我们对一系列启发式轨迹规划策略的方法进行基准，并证明我们的方法可以将系统能效提高到70％。

translated by 谷歌翻译

Energy-aware optimization of UAV base stations placement via decentralized multi-agent Q-learning

Babatunji Omoniwa , Boris Galkin , Ivana Dusparic

分类：机器学习

2021-06-01

可以部署作为空中基站（UAV-BS）的无人机飞行器，以便在增加网络需求，现有基础设施中的失败点或灾难的情况下为地面设备提供无线连接。然而，考虑到它们的板载电池容量有限，挑战无人机的能量是挑战。先前已经用于提高诸如多个无人机的能量利用的加强学习（RL）方法，然而，假设中央云控制器具有完全了解端设备的位置，即控制器周期性地扫描并发送更新无人机决策。在具有服务接地设备的UAVS的动态网络环境中，此假设在动态网络环境中是不切实际的。为了解决这个问题，我们提出了一种分散的Q学习方法，其中每个UAV-BS都配备了一种自主代理，可以最大化移动地设备的连接，同时提高其能量利用率。实验结果表明，该设计的设计显着优于联合最大化连接地面装置的数量和UAV-BS的能量利用中的集中方法。

translated by 谷歌翻译

Adaptive Height Optimisation for Cellular-Connected UAVs using Reinforcement Learning

Erika Fonseca , Boris Galkin , Ramy Amer , Luiz A. DaSilva , Ivana Dusparic

分类：机器学习

2020-07-27

提供可靠的连接到蜂窝连接的无人机可以非常具有挑战性;它们的性能高度取决于周围环境的性质，例如地面BSS的密度和高度。另一方面，高层建筑可能阻断来自地面BS的不期望的干扰信号，从而提高了UVS与其服务BS之间的连接。为了解决此类环境中的无人机的连接，本文提出了一种RL算法，以动态优化UAV的高度，因为它在通过环境中移动，目标是提高其经历的吞吐量。所提出的解决方案是使用来自爱尔兰都柏林市中心的两个不同地点的实验获得的测量来评估。在第一场景中，UAV连接到宏小区，而在第二场景中，UAV将在双层移动网络中关联到不同的小单元。结果表明，与基线方法相比，该溶液的吞吐量增加了6％至41％。

translated by 谷歌翻译

Fairness Based Energy-Efficient 3D Path Planning of a Portable Access Point: A Deep Reinforcement Learning Approach

Nithin Babu , Igor Donevski , Alvaro Valcarce , Petar Popovski , Jimmy Jessen Nielsen , Constantinos B. Papadias

分类：人工智能 | 机器学习 | 机器人

2022-08-10

在这项工作中，我们优化了基于无人机（UAV）的便携式接入点（PAP）的3D轨迹，该轨迹为一组接地节点（GNS）提供无线服务。此外，根据Peukert效果，我们考虑无人机电池的实用非线性电池放电。因此，我们以一种新颖的方式提出问题，代表了基于公平的能源效率度量的最大化，并被称为公平能源效率（费用）。费用指标定义了一个系统，该系统对每用户服务的公平性和PAP的能源效率都非常重要。该法式问题采用非凸面问题的形式，并具有不可扣除的约束。为了获得解决方案，我们将问题表示为具有连续状态和动作空间的马尔可夫决策过程（MDP）。考虑到解决方案空间的复杂性，我们使用双胞胎延迟的深层确定性政策梯度（TD3）参与者 - 批判性深入强化学习（DRL）框架来学习最大化系统费用的政策。我们进行两种类型的RL培训来展示我们方法的有效性：第一种（离线）方法在整个训练阶段保持GN的位置相同；第二种方法将学习的政策概括为GN的任何安排，通过更改GN的位置，每次培训情节后。数值评估表明，忽视Peukert效应高估了PAP的播放时间，可以通过最佳选择PAP的飞行速度来解决。此外，用户公平，能源效率，因此可以通过有效地将PAP移动到GN上方，从而提高系统的费用价值。因此，我们注意到郊区，城市和茂密的城市环境的基线情景高达88.31％，272.34％和318.13％。

translated by 谷歌翻译

UAV-Assisted Space-Air-Ground Integrated Networks: A Technical Review of Recent Learning Algorithms

Atefeh H. Arani , Peng Hu , Yeying Zhu

分类：机器学习

2022-11-27

Recent technological advancements in space, air and ground components have made possible a new network paradigm called "space-air-ground integrated network" (SAGIN). Unmanned aerial vehicles (UAVs) play a key role in SAGINs. However, due to UAVs' high dynamics and complexity, the real-world deployment of a SAGIN becomes a major barrier for realizing such SAGINs. Compared to the space and terrestrial components, UAVs are expected to meet performance requirements with high flexibility and dynamics using limited resources. Therefore, employing UAVs in various usage scenarios requires well-designed planning in algorithmic approaches. In this paper, we provide a comprehensive review of recent learning-based algorithmic approaches. We consider possible reward functions and discuss the state-of-the-art algorithms for optimizing the reward functions, including Q-learning, deep Q-learning, multi-armed bandit (MAB), particle swarm optimization (PSO) and satisfaction-based learning algorithms. Unlike other survey papers, we focus on the methodological perspective of the optimization problem, which can be applicable to various UAV-assisted missions on a SAGIN using these algorithms. We simulate users and environments according to real-world scenarios and compare the learning-based and PSO-based methods in terms of throughput, load, fairness, computation time, etc. We also implement and evaluate the 2-dimensional (2D) and 3-dimensional (3D) variations of these algorithms to reflect different deployment cases. Our simulation suggests that the $3$D satisfaction-based learning algorithm outperforms the other approaches for various metrics in most cases. We discuss some open challenges at the end and our findings aim to provide design guidelines for algorithm selections while optimizing the deployment of UAV-assisted SAGINs.

translated by 谷歌翻译

Optimization for Master-UAV-powered Auxiliary-Aerial-IRS-assisted IoT Networks: An Option-based Multi-agent Hierarchical Deep Reinforcement Learning Approach

Jingren Xu , Xin Kang , Ronghaixiang Zhang , Ying-Chang Liang , Sumei Sun

分类：机器学习

2021-12-20

本文调查了大师无人机（MUAV） - 互联网（IOT）网络，我们建议使用配备有智能反射表面（IRS）的可充电辅助UAV（AUAV）来增强来自MUAV的通信信号并将MUAG作为充电电源利用。在拟议的模型下，我们研究了这些能量有限的无人机的最佳协作策略，以最大限度地提高物联网网络的累计吞吐量。根据两个无人机之间是否有收费，配制了两个优化问题。为了解决这些问题，提出了两个多代理深度强化学习（DRL）方法，这些方法是集中培训多师深度确定性政策梯度（CT-MADDPG）和多代理深度确定性政策选项评论仪（MADDPOC）。结果表明，CT-MADDPG可以大大减少对UAV硬件的计算能力的要求，拟议的MADDPOC能够在连续动作域中支持低水平的多代理合作学习，其优于优势基于选项的分层DRL，只支持单代理学习和离散操作。

translated by 谷歌翻译

Joint Cluster Head Selection and Trajectory Planning in UAV-Aided IoT Networks by Reinforcement Learning with Sequential Model

Botao Zhu , Ebrahim Bedeer , Ha H. Nguyen , Robert Barton , Jerome Henry

分类：机器学习

2021-12-01

雇用无人驾驶航空公司（无人机）吸引了日益增长的兴趣，并成为互联网（物联网）网络中的数据收集技术的最先进技术。在本文中，目的是最大限度地减少UAV-IOT系统的总能耗，我们制定了联合设计了UAV的轨迹和选择IOT网络中的群集头作为受约束的组合优化问题的问题，该问题被归类为NP-努力解决。我们提出了一种新的深度加强学习（DRL），其具有顺序模型策略，可以通过无监督方式有效地学习由UAV的轨迹设计来实现由序列到序列神经网络表示的策略。通过广泛的模拟，所获得的结果表明，与其他基线算法相比，所提出的DRL方法可以找到无人机的轨迹，这些轨迹需要更少的能量消耗，并实现近乎最佳性能。此外，仿真结果表明，我们所提出的DRL算法的训练模型具有出色的概括能力，对更大的问题尺寸而没有必要恢复模型。

translated by 谷歌翻译

Distributed Machine Learning for UAV Swarms: Computing, Sensing, and Semantics

Yahao Ding , Zhaohui Yang , Quoc-Viet Pham , Zhaoyang Zhang , Mohammad Shikh-Bahaei

分类：机器学习 | 人工智能

2023-01-03

Unmanned aerial vehicle (UAV) swarms are considered as a promising technique for next-generation communication networks due to their flexibility, mobility, low cost, and the ability to collaboratively and autonomously provide services. Distributed learning (DL) enables UAV swarms to intelligently provide communication services, multi-directional remote surveillance, and target tracking. In this survey, we first introduce several popular DL algorithms such as federated learning (FL), multi-agent Reinforcement Learning (MARL), distributed inference, and split learning, and present a comprehensive overview of their applications for UAV swarms, such as trajectory design, power control, wireless resource allocation, user assignment, perception, and satellite communications. Then, we present several state-of-the-art applications of UAV swarms in wireless communication systems, such us reconfigurable intelligent surface (RIS), virtual reality (VR), semantic communications, and discuss the problems and challenges that DL-enabled UAV swarms can solve in these applications. Finally, we describe open problems of using DL in UAV swarms and future research directions of DL enabled UAV swarms. In summary, this survey provides a comprehensive survey of various DL applications for UAV swarms in extensive scenarios.

translated by 谷歌翻译

Autonomous Navigation and Configuration of Integrated Access Backhauling for UAV Base Station Using Reinforcement Learning

Hongyi Zhang , Jingya Li , Zhiqiang Qi , Xingqin Lin , Anders Aronsson , Jan Bosch , Helena Holmström Olsson

分类：机器学习

2021-12-14

快速可靠的连接对于提高公共安全关键任务（MC）用户的情境意识和运营效率至关重要。在紧急情况或灾害环境中，如果现有的蜂窝网络覆盖和容量可能无法满足MC通信需求，可以迅速地利用可部署网络的解决方案，例如单元轮/翼，以确保对MC用户的可靠连接。在本文中，我们考虑一种情况，其中宏基站（BS）由于自然灾害而被破坏，并且设置了携带BS（UAV-BS）的无人驾驶飞行器（UAV-BS）以为灾区中的用户提供临时覆盖。使用5G集成访问和回程（IAB）技术将UAV-BS集成到移动网络中。我们提出了一种框架和信令程序，用于将机器学习应用于此用例。深度加强学习算法旨在共同优化访问和回程天线倾斜以及UAV-BS的三维位置，以便在保持良好的回程连接的同时最佳地服务于地面MC用户。我们的结果表明，所提出的算法可以自主地导航和配置UAV-BS以提高吞吐量并降低MC用户的下降速率。

translated by 谷歌翻译

A Transfer Learning Approach for UAV Path Design with Connectivity Outage Constraint

Gianluca Fontanesi , Anding Zhu , Mahnaz Arvaneh , Hamed Ahmadi

分类：机器人

2022-11-07

The connectivity-aware path design is crucial in the effective deployment of autonomous Unmanned Aerial Vehicles (UAVs). Recently, Reinforcement Learning (RL) algorithms have become the popular approach to solving this type of complex problem, but RL algorithms suffer slow convergence. In this paper, we propose a Transfer Learning (TL) approach, where we use a teacher policy previously trained in an old domain to boost the path learning of the agent in the new domain. As the exploration processes and the training continue, the agent refines the path design in the new domain based on the subsequent interactions with the environment. We evaluate our approach considering an old domain at sub-6 GHz and a new domain at millimeter Wave (mmWave). The teacher path policy, previously trained at sub-6 GHz path, is the solution to a connectivity-aware path problem that we formulate as a constrained Markov Decision Process (CMDP). We employ a Lyapunov-based model-free Deep Q-Network (DQN) to solve the path design at sub-6 GHz that guarantees connectivity constraint satisfaction. We empirically demonstrate the effectiveness of our approach for different urban environment scenarios. The results demonstrate that our proposed approach is capable of reducing the training time considerably at mmWave.

translated by 谷歌翻译

Multiscale Adaptive Scheduling and Path-Planning for Power-Constrained UAV-Relays via SMDPs

Bharath Keshavamurthy , Nicolo Michelusi

分类：人工智能

2022-09-16

我们描述了分散的旋转翼无人机套件的编排，从而增强了陆地基站的覆盖范围和服务能力。我们的目标是最大程度地减少在泊松到达下的地面用户处理传输请求中涉及的时间平均水平的潜伏期，但要受到平均无人机限制。配备速率适应能够有效利用空对地面通道随机，我们首先通过半马尔可夫决策过程制定了单个继电器的最佳控制策略，并具有针对无人机轨迹设计的竞争性群体优化。因此，我们详细介绍了这种结构的多尺度分解：径向等待速度的外部决策和结束位置优化了预期的长期延迟功率权衡；因此，关于角度等待速度，服务时间表和无人机轨迹的内部决策贪婪地最大程度地减少了瞬时延迟功率成本。接下来，通过复制和共识驱动的命令和控制概括无人机群，该政策嵌入了传播最大化和冲突解决启发式方法。我们证明，我们的框架提供了卓越的性能相对于平均服务等待和平均每个UAV功耗：相对于静态无人机部署的数据有效载荷交付的速度快11倍，并且比Deep-Q网络解决方案快2倍；值得注意的是，我们的计划中的一个继电器在联合连续的凸面近似政策下超出了三个继电器62％。

translated by 谷歌翻译

Deep Reinforcement Learning for Task Offloading in UAV-Aided Smart Farm Networks

Anne Catherine Nguyen , Turgay Pamuklu , Aisha Syed , W. Sean Kennedy , Melike Erol-Kantarci

分类：人工智能

2022-09-15

第五世代和第六代无线通信网络正在启用工具，例如物联网设备，无人驾驶汽车（UAV）和人工智能，以使用设备网络来改善农业景观，以自动监视农田。对大面积进行调查需要在特定时间段内执行许多图像分类任务，以防止发生事件发生的情况，例如火灾或洪水。无人机具有有限的能量和计算能力，并且可能无法在本地和适当的时间内执行所有强烈的图像分类任务。因此，假定无人机能够部分将其工作量分开到附近的多访问边缘计算设备。无人机需要一种决策算法，该算法将决定将执行任务的位置，同时还考虑网络中其他无人机的时间限制和能量级别。在本文中，我们介绍了一种深入的Q学习方法（DQL）来解决这个多目标问题。将所提出的方法与Q学习和三个启发式基线进行了比较，模拟结果表明，我们提出的基于DQL的方法在涉及无人机的剩余电池电量和违规截止日期的百分比时可相当。此外，我们的方法能够比Q学习快13倍。

translated by 谷歌翻译

Applications of Multi-Agent Reinforcement Learning in Future Internet: A Comprehensive Survey

Tianxu Li , Kun Zhu , Nguyen Cong Luong , Dusit Niyato , Qihui Wu , Yang Zhang , Bing Chen

分类：人工智能 | 机器学习

2021-10-26

未来的互联网涉及几种新兴技术，例如5G和5G网络，车辆网络，无人机（UAV）网络和物联网（IOT）。此外，未来的互联网变得异质并分散了许多相关网络实体。每个实体可能需要做出本地决定，以在动态和不确定的网络环境下改善网络性能。最近使用标准学习算法，例如单药强化学习（RL）或深入强化学习（DRL），以使每个网络实体作为代理人通过与未知环境进行互动来自适应地学习最佳决策策略。但是，这种算法未能对网络实体之间的合作或竞争进行建模，而只是将其他实体视为可能导致非平稳性问题的环境的一部分。多机构增强学习（MARL）允许每个网络实体不仅观察环境，还可以观察其他实体的政策来学习其最佳政策。结果，MAL可以显着提高网络实体的学习效率，并且最近已用于解决新兴网络中的各种问题。在本文中，我们因此回顾了MAL在新兴网络中的应用。特别是，我们提供了MARL的教程，以及对MARL在下一代互联网中的应用进行全面调查。特别是，我们首先介绍单代机Agent RL和MARL。然后，我们回顾了MAL在未来互联网中解决新兴问题的许多应用程序。这些问题包括网络访问，传输电源控制，计算卸载，内容缓存，数据包路由，无人机网络的轨迹设计以及网络安全问题。

translated by 谷歌翻译

Deep Reinforcement Learning for Trajectory Path Planning and Distributed Inference in Resource-Constrained UAV Swarms

Marwan Dhuheir , Emna Baccour , Aiman Erbad , Sinan Sabeeh Al-Obaidi , Mounir Hamdi

分类：机器学习 | 机器人

2022-12-21

The deployment flexibility and maneuverability of Unmanned Aerial Vehicles (UAVs) increased their adoption in various applications, such as wildfire tracking, border monitoring, etc. In many critical applications, UAVs capture images and other sensory data and then send the captured data to remote servers for inference and data processing tasks. However, this approach is not always practical in real-time applications due to the connection instability, limited bandwidth, and end-to-end latency. One promising solution is to divide the inference requests into multiple parts (layers or segments), with each part being executed in a different UAV based on the available resources. Furthermore, some applications require the UAVs to traverse certain areas and capture incidents; thus, planning their paths becomes critical particularly, to reduce the latency of making the collaborative inference process. Specifically, planning the UAVs trajectory can reduce the data transmission latency by communicating with devices in the same proximity while mitigating the transmission interference. This work aims to design a model for distributed collaborative inference requests and path planning in a UAV swarm while respecting the resource constraints due to the computational load and memory usage of the inference requests. The model is formulated as an optimization problem and aims to minimize latency. The formulated problem is NP-hard so finding the optimal solution is quite complex; thus, this paper introduces a real-time and dynamic solution for online applications using deep reinforcement learning. We conduct extensive simulations and compare our results to the-state-of-the-art studies demonstrating that our model outperforms the competing models.

translated by 谷歌翻译

Learning based Age of Information Minimization in UAV-relayed IoT Networks

Biplav Choudhury , Prasenjit Karmakar , Vijay K. Shah , Jeffrey H. Reed

分类：机器学习

2022-03-08

无人驾驶飞机（UAV）用作空中基础站，可将时间敏感的包装从物联网设备传递到附近的陆地底站（TBS）。在此类无人产用的物联网网络中安排数据包，以确保TBS在TBS上确保新鲜（或最新的）物联网设备的数据包是一个挑战性的问题，因为它涉及两个同时的步骤（i）（i）在IOT设备上生成的数据包的同时进行样本由UAVS [HOP-1]和（ii）将采样数据包从UAVS更新到TBS [Hop-2]。为了解决这个问题，我们建议针对两跳UAV相关的IoT网络的信息年龄（AOI）调度算法。首先，我们提出了一个低复杂的AOI调度程序，称为MAF-MAD，该计划使用UAV（HOP-1）和最大AOI差异（MAD）策略采样最大AOI（MAF）策略，以更新从无人机到TBS（Hop-2）。我们证明，MAF-MAD是理想条件下的最佳AOI调度程序（无线无线通道和在物联网设备上产生交通生成）。相反，对于一般条件（物联网设备的损失渠道条件和不同的周期性交通生成），提出了深厚的增强学习算法，即近端政策优化（PPO）基于调度程序。仿真结果表明，在所有考虑的一般情况下，建议的基于PPO的调度程序优于MAF-MAD，MAF和Round-Robin等其他调度程序。

translated by 谷歌翻译

Aerial Base Station Positioning and Power Control for Securing Communications: A Deep Q-Network Approach

Aly Sabri Abdalla , Ali Behfarnia , Vuk Marojevic

分类：机器学习

2021-12-21

无人驾驶飞行器（UAV）是支持各种服务，包括通信的技术突破之一。UAV将在提高无线网络的物理层安全方面发挥关键作用。本文定义了窃听地面用户与UAV之间的链路的问题，该联接器用作空中基站（ABS）。提出了加强学习算法Q - 学习和深Q网络（DQN），用于优化ABS的位置和传输功率，以增强地面用户的数据速率。如果没有系统了解窃听器的位置，这会增加保密容量。与Q-Learnch和基线方法相比，仿真结果显示了拟议DQN的快速收敛性和最高保密能力。

translated by 谷歌翻译

Energy-Efficient Trajectory Design of a Multi-IRS Assisted Portable Access Point

Nithin Babu , Marco Virgili , Mohammad Al-jarrah , Xiaoye Jing , Emad Alsusa , Petar Popovski , Andrew Forsyth , Christos Masouros , Constantinos B. Papadias

分类：机器人

2022-09-01

在这项工作中，我们提出了一个框架，用于部署的无人驾驶汽车（UAV）的便携式接入点（PAP），以服务于一组接地节点（GNS）。除PAP和GNS外，该系统还由安装在人造结构上的一组智能反射表面（IRS）组成，以增加每焦耳的能源消耗的钻头数量，这些能量消耗被测量为全球能源效率（GEE）。 PAP的GEE轨迹是通过考虑UAV推进能量消耗和PAP电池的PEUKERT效应来设计的，PAP电池代表了精确的电池放电曲线作为无人机功耗概况的非线性功能。 GEE轨迹设计问题分为两个阶段：在第一个阶段，使用多层圆形填料方法找到了PAP的路径和可行位置，并使用替代方案计算所需的IRS相移值优化方法考虑了IRS元素的幅度和相位响应之间的相互依赖性；在第二阶段，使用新型的多轨迹设计算法计算PAP飞行速度和用户调度。数值评估表明：忽略Peukert效应高估了PAP的可用飞行时间；一定的阈值后，增加电池尺寸会减少PAP的可用飞行时间；与其他基线场景相比，IRS模块的存在改善了系统的GEE。与使用顺序凸编程和Dinkelbach算法的组合开发的单圈轨迹相比，多圈轨迹可节省更多的能量。

translated by 谷歌翻译

Distributed-Training-and-Execution Multi-Agent Reinforcement Learning for Power Control in HetNet

Kaidi Xu , Nguyen Van Huynh , Geoffrey Ye Li

分类：机器学习

2022-12-15

In heterogeneous networks (HetNets), the overlap of small cells and the macro cell causes severe cross-tier interference. Although there exist some approaches to address this problem, they usually require global channel state information, which is hard to obtain in practice, and get the sub-optimal power allocation policy with high computational complexity. To overcome these limitations, we propose a multi-agent deep reinforcement learning (MADRL) based power control scheme for the HetNet, where each access point makes power control decisions independently based on local information. To promote cooperation among agents, we develop a penalty-based Q learning (PQL) algorithm for MADRL systems. By introducing regularization terms in the loss function, each agent tends to choose an experienced action with high reward when revisiting a state, and thus the policy updating speed slows down. In this way, an agent's policy can be learned by other agents more easily, resulting in a more efficient collaboration process. We then implement the proposed PQL in the considered HetNet and compare it with other distributed-training-and-execution (DTE) algorithms. Simulation results show that our proposed PQL can learn the desired power control policy from a dynamic environment where the locations of users change episodically and outperform existing DTE MADRL algorithms.

translated by 谷歌翻译

Asynchronous Hybrid Reinforcement Learning for Latency and Reliability Optimization in the Metaverse over Wireless Communications

Wenhan Yu , Terence Jie Chua , Jun Zhao

分类：机器学习

2022-12-30

Technology advancements in wireless communications and high-performance Extended Reality (XR) have empowered the developments of the Metaverse. The demand for Metaverse applications and hence, real-time digital twinning of real-world scenes is increasing. Nevertheless, the replication of 2D physical world images into 3D virtual world scenes is computationally intensive and requires computation offloading. The disparity in transmitted scene dimension (2D as opposed to 3D) leads to asymmetric data sizes in uplink (UL) and downlink (DL). To ensure the reliability and low latency of the system, we consider an asynchronous joint UL-DL scenario where in the UL stage, the smaller data size of the physical world scenes captured by multiple extended reality users (XUs) will be uploaded to the Metaverse Console (MC) to be construed and rendered. In the DL stage, the larger-size 3D virtual world scenes need to be transmitted back to the XUs. The decisions pertaining to computation offloading and channel assignment are optimized in the UL stage, and the MC will optimize power allocation for users assigned with a channel in the UL transmission stage. Some problems arise therefrom: (i) interactive multi-process chain, specifically Asynchronous Markov Decision Process (AMDP), (ii) joint optimization in multiple processes, and (iii) high-dimensional objective functions, or hybrid reward scenarios. To ensure the reliability and low latency of the system, we design a novel multi-agent reinforcement learning algorithm structure, namely Asynchronous Actors Hybrid Critic (AAHC). Extensive experiments demonstrate that compared to proposed baselines, AAHC obtains better solutions with preferable training time.

translated by 谷歌翻译

When Robotics Meets Wireless Communications: An Introductory Tutorial

Bonilla Licea Daniel , Ghogho Mounir , Saska Martin

分类：机器人

2022-09-05

研究界，工业和社会中地面移动机器人（MRS）和无人机（UAV）的重要性正在迅速发展。如今，这些代理中的许多代理都配备了通信系统，在某些情况下，对于成功完成某些任务至关重要。在这种情况下，我们已经开始见证在机器人技术和通信的交集中开发一个新的跨学科研究领域。该研究领域的意图是将无人机集成到5G和6G通信网络中。这项研究无疑将在不久的将来导致许多重要的应用。然而，该研究领域发展的主要障碍之一是，大多数研究人员通过过度简化机器人技术或通信方面来解决这些问题。这阻碍了达到这个新的跨学科研究领域的全部潜力的能力。在本教程中，我们介绍了一些建模工具，从跨学科的角度来解决涉及机器人技术和通信的问题所需的一些建模工具。作为此类问题的说明性示例，我们将重点放在本教程上，讨论通信感知轨迹计划的问题。

translated by 谷歌翻译