智能论文笔记

Asynchronous Hybrid Reinforcement Learning for Latency and Reliability Optimization in the Metaverse over Wireless Communications

Wenhan Yu , Terence Jie Chua , Jun Zhao

分类：机器学习

2022-12-30

Technology advancements in wireless communications and high-performance Extended Reality (XR) have empowered the developments of the Metaverse. The demand for Metaverse applications and hence, real-time digital twinning of real-world scenes is increasing. Nevertheless, the replication of 2D physical world images into 3D virtual world scenes is computationally intensive and requires computation offloading. The disparity in transmitted scene dimension (2D as opposed to 3D) leads to asymmetric data sizes in uplink (UL) and downlink (DL). To ensure the reliability and low latency of the system, we consider an asynchronous joint UL-DL scenario where in the UL stage, the smaller data size of the physical world scenes captured by multiple extended reality users (XUs) will be uploaded to the Metaverse Console (MC) to be construed and rendered. In the DL stage, the larger-size 3D virtual world scenes need to be transmitted back to the XUs. The decisions pertaining to computation offloading and channel assignment are optimized in the UL stage, and the MC will optimize power allocation for users assigned with a channel in the UL transmission stage. Some problems arise therefrom: (i) interactive multi-process chain, specifically Asynchronous Markov Decision Process (AMDP), (ii) joint optimization in multiple processes, and (iii) high-dimensional objective functions, or hybrid reward scenarios. To ensure the reliability and low latency of the system, we design a novel multi-agent reinforcement learning algorithm structure, namely Asynchronous Actors Hybrid Critic (AAHC). Extensive experiments demonstrate that compared to proposed baselines, AAHC obtains better solutions with preferable training time.

translated by 谷歌翻译

Resource Allocation for Mobile Metaverse with the Internet of Vehicles over 6G Wireless Communications: A Deep Reinforcement Learning Approach

Terence Jie Chua , Wenhan Yu , Jun Zhao

分类：机器学习

2022-09-27

改善人与人之间的互动性和互连性是元视频的亮点之一。荟萃分析依赖于核心方法，数字孪生，这是将物理世界对象，人，动作和场景复制到虚拟世界中的一种手段。能够在实时和移动性的情况下访问与物理世界相关的场景和信息，对于为所有用户开发高度可访问，互动和互连体验至关重要。这种开发使来自其他位置的用户可以访问有关另一个位置发生的事件的高质量现实世界和最新信息，并与他人进行超相互交流的社交。然而，由于虚拟世界图形的数据大小以及对低延迟传输的需求，因此其他人从元评估中产生的持续，平稳的更新是一项具有挑战性的任务。随着移动增强现实（MAR）的开发，用户也可以通过高度交互方式（即使在移动性下）通过元视频进行交互。因此，在我们的工作中，我们考虑了一个环境，其中包括移动车辆互联网（IOV）的用户，并通过无线通信从Metaverse Service Provister Pasting Stations（MSPCSS）下载实时虚拟世界更新。我们设计了一个具有多个单元站的环境，其中将在细胞站之间交换用户虚拟世界图形下载任务。由于传输延迟是在移动性下接收虚拟世界更新的主要关注点，因此我们的工作旨在分配系统资源，以最大程度地减少用户在车辆中使用的总时间，以便从单元站下载其虚拟世界场景。我们利用深度强化学习并评估不同环境配置下算法的性能。我们的工作提供了启用AI支持的6G通信的元视体的用例。

translated by 谷歌翻译

Applications of Multi-Agent Reinforcement Learning in Future Internet: A Comprehensive Survey

Tianxu Li , Kun Zhu , Nguyen Cong Luong , Dusit Niyato , Qihui Wu , Yang Zhang , Bing Chen

分类：人工智能 | 机器学习

2021-10-26

未来的互联网涉及几种新兴技术，例如5G和5G网络，车辆网络，无人机（UAV）网络和物联网（IOT）。此外，未来的互联网变得异质并分散了许多相关网络实体。每个实体可能需要做出本地决定，以在动态和不确定的网络环境下改善网络性能。最近使用标准学习算法，例如单药强化学习（RL）或深入强化学习（DRL），以使每个网络实体作为代理人通过与未知环境进行互动来自适应地学习最佳决策策略。但是，这种算法未能对网络实体之间的合作或竞争进行建模，而只是将其他实体视为可能导致非平稳性问题的环境的一部分。多机构增强学习（MARL）允许每个网络实体不仅观察环境，还可以观察其他实体的政策来学习其最佳政策。结果，MAL可以显着提高网络实体的学习效率，并且最近已用于解决新兴网络中的各种问题。在本文中，我们因此回顾了MAL在新兴网络中的应用。特别是，我们提供了MARL的教程，以及对MARL在下一代互联网中的应用进行全面调查。特别是，我们首先介绍单代机Agent RL和MARL。然后，我们回顾了MAL在未来互联网中解决新兴问题的许多应用程序。这些问题包括网络访问，传输电源控制，计算卸载，内容缓存，数据包路由，无人机网络的轨迹设计以及网络安全问题。

translated by 谷歌翻译

Distributed Machine Learning for UAV Swarms: Computing, Sensing, and Semantics

Yahao Ding , Zhaohui Yang , Quoc-Viet Pham , Zhaoyang Zhang , Mohammad Shikh-Bahaei

分类：机器学习 | 人工智能

2023-01-03

Unmanned aerial vehicle (UAV) swarms are considered as a promising technique for next-generation communication networks due to their flexibility, mobility, low cost, and the ability to collaboratively and autonomously provide services. Distributed learning (DL) enables UAV swarms to intelligently provide communication services, multi-directional remote surveillance, and target tracking. In this survey, we first introduce several popular DL algorithms such as federated learning (FL), multi-agent Reinforcement Learning (MARL), distributed inference, and split learning, and present a comprehensive overview of their applications for UAV swarms, such as trajectory design, power control, wireless resource allocation, user assignment, perception, and satellite communications. Then, we present several state-of-the-art applications of UAV swarms in wireless communication systems, such us reconfigurable intelligent surface (RIS), virtual reality (VR), semantic communications, and discuss the problems and challenges that DL-enabled UAV swarms can solve in these applications. Finally, we describe open problems of using DL in UAV swarms and future research directions of DL enabled UAV swarms. In summary, this survey provides a comprehensive survey of various DL applications for UAV swarms in extensive scenarios.

translated by 谷歌翻译

Multi-Agent Collaborative Inference via DNN Decoupling: Intermediate Feature Compression and Edge Learning

Zhiwei Hao , Guanyu Xu , Yong Luo , Han Hu , Jianping An , Shiwen Mao

分类：机器学习

2022-05-24

最近，通过协作推断部署深神经网络（DNN）模型，该推断将预训练的模型分为两个部分，并分别在用户设备（UE）和Edge Server上执行它们，从而变得有吸引力。但是，DNN的大型中间特征会阻碍灵活的脱钩，现有方法要么集中在单个UE方案上，要么只是在考虑所需的CPU周期的情况下定义任务，但忽略了单个DNN层的不可分割性。在本文中，我们研究了多代理协作推理方案，其中单个边缘服务器协调了多个UES的推理。我们的目标是为所有UES实现快速和节能的推断。为了实现这一目标，我们首先设计了一种基于自动编码器的轻型方法，以压缩大型中间功能。然后，我们根据DNN的推理开销定义任务，并将问题作为马尔可夫决策过程（MDP）。最后，我们提出了一种多代理混合近端策略优化（MAHPPO）算法，以解决混合动作空间的优化问题。我们对不同类型的网络进行了广泛的实验，结果表明，我们的方法可以降低56％的推理潜伏期，并节省多达72 \％的能源消耗。

translated by 谷歌翻译

Multi-hop RIS-Empowered Terahertz Communications: A DRL-based Hybrid Beamforming Design

Chongwen Huang , Zhaohui Yang , George C. Alexandropoulos , Kai Xiong , Li Wei , Chau Yuen , Zhaoyang Zhang , Merouane Debbah

分类：机器学习

2021-01-22

Terahertz频段（0.1---10 THZ）中的无线通信被视为未来第六代（6G）无线通信系统的关键促进技术之一，超出了大量多重输入多重输出（大量MIMO）技术。但是，THZ频率的非常高的传播衰减和分子吸收通常限制了信号传输距离和覆盖范围。从最近在可重构智能表面（RIS）上实现智能无线电传播环境的突破，我们为多跳RIS RIS辅助通信网络提供了一种新型的混合波束形成方案，以改善THZ波段频率的覆盖范围。特别是，部署了多个被动和可控的RIS，以协助基站（BS）和多个单人体用户之间的传输。我们通过利用最新的深钢筋学习（DRL）来应对传播损失的最新进展，研究了BS在BS和RISS上的模拟光束矩阵的联合设计。为了改善拟议的基于DRL的算法的收敛性，然后设计了两种算法，以初始化数字波束形成和使用交替优化技术的模拟波束形成矩阵。仿真结果表明，与基准相比，我们提出的方案能够改善50 \％的THZ通信范围。此外，还表明，我们提出的基于DRL的方法是解决NP-固定光束形成问题的最先进方法，尤其是当RIS辅助THZ通信网络的信号经历多个啤酒花时。

translated by 谷歌翻译

Unified, User and Task (UUT) Centered Artificial Intelligence for Metaverse Edge Computing

Terence Jie Chua , Wenhan Yu , Jun Zhao

分类：人工智能 | 机器学习

2022-12-19

The Metaverse can be considered the extension of the present-day web, which integrates the physical and virtual worlds, delivering hyper-realistic user experiences. The inception of the Metaverse brings forth many ecosystem services such as content creation, social entertainment, in-world value transfer, intelligent traffic, healthcare. These services are compute-intensive and require computation offloading onto a Metaverse edge computing server (MECS). Existing Metaverse edge computing approaches do not efficiently and effectively handle resource allocation to ensure a fluid, seamless and hyper-realistic Metaverse experience required for Metaverse ecosystem services. Therefore, we introduce a new Metaverse-compatible, Unified, User and Task (UUT) centered artificial intelligence (AI)- based mobile edge computing (MEC) paradigm, which serves as a concept upon which future AI control algorithms could be built to develop a more user and task-focused MEC.

translated by 谷歌翻译

Demand-Side Scheduling Based on Multi-Agent Deep Actor-Critic Learning for Smart Grids

Joash Lee , Wenbo Wang , Dusit Niyato

分类：机器学习 | (统计)机器学习

2020-05-05

我们考虑了需求侧能源管理的问题，每个家庭都配备了能够在线安排家用电器的智能电表。目的是最大程度地减少实时定价计划下的整体成本。尽管以前的作品引入了集中式方法，在该方法中，调度算法具有完全可观察的性能，但我们提出了将智能网格环境作为马尔可夫游戏的表述。每个家庭都是具有部分可观察性的去中心化代理，可以在现实环境中进行可扩展性和隐私保护。电网操作员产生的价格信号随能量需求而变化。我们提出了从代理商的角度来解决部分可观察性和环境的局部可观察性的扩展，以解决部分可观察性。该算法学习了一位集中批评者，该批评者协调分散的代理商的培训。因此，我们的方法使用集中学习，但分散执行。仿真结果表明，我们的在线深入强化学习方法可以纯粹基于瞬时观察和价格信号来降低所有消耗的总能量的峰值与平均值和所有家庭的电力。

translated by 谷歌翻译

Latency Optimization for Blockchain-Empowered Federated Learning in Multi-Server Edge Computing

Dinh C. Nguyen , Seyyedali Hosseinalipour , David J. Love , Pubudu N. Pathirana , Christopher G. Brinton

分类：机器学习

2022-03-18

在本文中，我们研究了多服务器边缘计算中基于区块链的联合学习（BFL）的新延迟优化问题。在此系统模型中，分布式移动设备（MDS）与一组Edge服务器（ESS）通信，以同时处理机器学习（ML）模型培训和阻止开采。为了协助ML模型培训用于资源受限的MD，我们制定了一种卸载策略，使MD可以将其数据传输到相关的ESS之一。然后，我们基于共识机制在边缘层上提出了一个新的分散的ML模型聚合解决方案，以通过基于对等（P2P）基于基于的区块链通信构建全局ML模型。区块链在MDS和ESS之间建立信任，以促进可靠的ML模型共享和合作共识形成，并能够快速消除由中毒攻击引起的操纵模型。我们将延迟感知的BFL作为优化，旨在通过联合考虑数据卸载决策，MDS的传输功率，MDS数据卸载，MDS的计算分配和哈希功率分配来最大程度地减少系统延迟。鉴于离散卸载和连续分配变量的混合作用空间，我们提出了一种具有参数化优势演员评论家算法的新型深度强化学习方案。从理论上讲，我们根据聚合延迟，迷你批量大小和P2P通信回合的数量来表征BFL的收敛属性。我们的数值评估证明了我们所提出的方案优于基线，从模型训练效率，收敛速度，系统潜伏期和对模型中毒攻击的鲁棒性方面。

translated by 谷歌翻译

Online Service Migration in Edge Computing with Incomplete Information: A Deep Recurrent Actor-Critic Method

Jin Wang , Jia Hu , Geyong Min , Qiang Ni , Tarek El-Ghazawi

分类：机器学习

2020-12-16

多访问边缘计算（MEC）是一个新兴的计算范式，将云计算扩展到网络边缘，以支持移动设备上的资源密集型应用程序。作为MEC的关键问题，服务迁移需要决定如何迁移用户服务，以维持用户在覆盖范围和容量有限的MEC服务器之间漫游的服务质量。但是，由于动态的MEC环境和用户移动性，找到最佳的迁移策略是棘手的。许多现有研究根据完整的系统级信息做出集中式迁移决策，这是耗时的，并且缺乏理想的可扩展性。为了应对这些挑战，我们提出了一种新颖的学习驱动方法，该方法以用户为中心，可以通过使用不完整的系统级信息来做出有效的在线迁移决策。具体而言，服务迁移问题被建模为可观察到的马尔可夫决策过程（POMDP）。为了解决POMDP，我们设计了一个新的编码网络，该网络结合了长期记忆（LSTM）和一个嵌入式矩阵，以有效提取隐藏信息，并进一步提出了一种定制的非政策型演员 - 批判性算法，以进行有效的训练。基于现实世界的移动性痕迹的广泛实验结果表明，这种新方法始终优于启发式和最先进的学习驱动算法，并且可以在各种MEC场景上取得近乎最佳的结果。

translated by 谷歌翻译

Deep Reinforcement Learning for Trajectory Path Planning and Distributed Inference in Resource-Constrained UAV Swarms

Marwan Dhuheir , Emna Baccour , Aiman Erbad , Sinan Sabeeh Al-Obaidi , Mounir Hamdi

分类：机器学习 | 机器人

2022-12-21

The deployment flexibility and maneuverability of Unmanned Aerial Vehicles (UAVs) increased their adoption in various applications, such as wildfire tracking, border monitoring, etc. In many critical applications, UAVs capture images and other sensory data and then send the captured data to remote servers for inference and data processing tasks. However, this approach is not always practical in real-time applications due to the connection instability, limited bandwidth, and end-to-end latency. One promising solution is to divide the inference requests into multiple parts (layers or segments), with each part being executed in a different UAV based on the available resources. Furthermore, some applications require the UAVs to traverse certain areas and capture incidents; thus, planning their paths becomes critical particularly, to reduce the latency of making the collaborative inference process. Specifically, planning the UAVs trajectory can reduce the data transmission latency by communicating with devices in the same proximity while mitigating the transmission interference. This work aims to design a model for distributed collaborative inference requests and path planning in a UAV swarm while respecting the resource constraints due to the computational load and memory usage of the inference requests. The model is formulated as an optimization problem and aims to minimize latency. The formulated problem is NP-hard so finding the optimal solution is quite complex; thus, this paper introduces a real-time and dynamic solution for online applications using deep reinforcement learning. We conduct extensive simulations and compare our results to the-state-of-the-art studies demonstrating that our model outperforms the competing models.

translated by 谷歌翻译

Optimization for Master-UAV-powered Auxiliary-Aerial-IRS-assisted IoT Networks: An Option-based Multi-agent Hierarchical Deep Reinforcement Learning Approach

Jingren Xu , Xin Kang , Ronghaixiang Zhang , Ying-Chang Liang , Sumei Sun

分类：机器学习

2021-12-20

本文调查了大师无人机（MUAV） - 互联网（IOT）网络，我们建议使用配备有智能反射表面（IRS）的可充电辅助UAV（AUAV）来增强来自MUAV的通信信号并将MUAG作为充电电源利用。在拟议的模型下，我们研究了这些能量有限的无人机的最佳协作策略，以最大限度地提高物联网网络的累计吞吐量。根据两个无人机之间是否有收费，配制了两个优化问题。为了解决这些问题，提出了两个多代理深度强化学习（DRL）方法，这些方法是集中培训多师深度确定性政策梯度（CT-MADDPG）和多代理深度确定性政策选项评论仪（MADDPOC）。结果表明，CT-MADDPG可以大大减少对UAV硬件的计算能力的要求，拟议的MADDPOC能够在连续动作域中支持低水平的多代理合作学习，其优于优势基于选项的分层DRL，只支持单代理学习和离散操作。

translated by 谷歌翻译

Device Selection for the Coexistence of URLLC and Distributed Learning Services

Milad Ganjalizadeh , Hossein Shokri Ghadikolaei , Deniz Gündüz , Marina Petrova

分类：机器学习

2022-12-22

Recent advances in distributed artificial intelligence (AI) have led to tremendous breakthroughs in various communication services, from fault-tolerant factory automation to smart cities. When distributed learning is run over a set of wirelessly connected devices, random channel fluctuations and the incumbent services running on the same network impact the performance of both distributed learning and the coexisting service. In this paper, we investigate a mixed service scenario where distributed AI workflow and ultra-reliable low latency communication (URLLC) services run concurrently over a network. Consequently, we propose a risk sensitivity-based formulation for device selection to minimize the AI training delays during its convergence period while ensuring that the operational requirements of the URLLC service are met. To address this challenging coexistence problem, we transform it into a deep reinforcement learning problem and address it via a framework based on soft actor-critic algorithm. We evaluate our solution with a realistic and 3GPP-compliant simulator for factory automation use cases. Our simulation results confirm that our solution can significantly decrease the training delay of the distributed AI service while keeping the URLLC availability above its required threshold and close to the scenario where URLLC solely consumes all network resources.

translated by 谷歌翻译

Computation Offloading and Resource Allocation in F-RANs: A Federated Deep Reinforcement Learning Approach

Lingling Zhang , Yanxiang Jiang , Fu-Chun Zheng , Mehdi Bennis , Xiaohu You

分类：机器学习 | 人工智能

2022-06-13

FOG无线电访问网络（F-RAN）是一项有前途的技术，用户移动设备（MDS）可以将计算任务卸载到附近的FOG接入点（F-APS）。由于F-APS的资源有限，因此设计有效的任务卸载方案很重要。在本文中，通过考虑随时间变化的网络环境，制定了F-RAN中的动态计算卸载和资源分配问题，以最大程度地减少MD的任务执行延迟和能源消耗。为了解决该问题，提出了基于联合的深入强化学习（DRL）算法，其中深层确定性策略梯度（DDPG）算法在每个F-AP中执行计算卸载和资源分配。利用联合学习来培训DDPG代理，以降低培训过程的计算复杂性并保护用户隐私。仿真结果表明，与其他现有策略相比，提议的联合DDPG算法可以更快地实现MDS更快的任务执行延迟和能源消耗。

translated by 谷歌翻译

Distributed-Training-and-Execution Multi-Agent Reinforcement Learning for Power Control in HetNet

Kaidi Xu , Nguyen Van Huynh , Geoffrey Ye Li

分类：机器学习

2022-12-15

In heterogeneous networks (HetNets), the overlap of small cells and the macro cell causes severe cross-tier interference. Although there exist some approaches to address this problem, they usually require global channel state information, which is hard to obtain in practice, and get the sub-optimal power allocation policy with high computational complexity. To overcome these limitations, we propose a multi-agent deep reinforcement learning (MADRL) based power control scheme for the HetNet, where each access point makes power control decisions independently based on local information. To promote cooperation among agents, we develop a penalty-based Q learning (PQL) algorithm for MADRL systems. By introducing regularization terms in the loss function, each agent tends to choose an experienced action with high reward when revisiting a state, and thus the policy updating speed slows down. In this way, an agent's policy can be learned by other agents more easily, resulting in a more efficient collaboration process. We then implement the proposed PQL in the considered HetNet and compare it with other distributed-training-and-execution (DTE) algorithms. Simulation results show that our proposed PQL can learn the desired power control policy from a dynamic environment where the locations of users change episodically and outperform existing DTE MADRL algorithms.

translated by 谷歌翻译

Progress and summary of reinforcement learning on energy management of MPS-EV

Jincheng Hu , Yang Lin , Liang Chu , Zhuoran Hou , Jihan Li , Jingjing Jiang , Yuanjian Zhang

分类：机器学习

2022-11-08

The high emission and low energy efficiency caused by internal combustion engines (ICE) have become unacceptable under environmental regulations and the energy crisis. As a promising alternative solution, multi-power source electric vehicles (MPS-EVs) introduce different clean energy systems to improve powertrain efficiency. The energy management strategy (EMS) is a critical technology for MPS-EVs to maximize efficiency, fuel economy, and range. Reinforcement learning (RL) has become an effective methodology for the development of EMS. RL has received continuous attention and research, but there is still a lack of systematic analysis of the design elements of RL-based EMS. To this end, this paper presents an in-depth analysis of the current research on RL-based EMS (RL-EMS) and summarizes the design elements of RL-based EMS. This paper first summarizes the previous applications of RL in EMS from five aspects: algorithm, perception scheme, decision scheme, reward function, and innovative training method. The contribution of advanced algorithms to the training effect is shown, the perception and control schemes in the literature are analyzed in detail, different reward function settings are classified, and innovative training methods with their roles are elaborated. Finally, by comparing the development routes of RL and RL-EMS, this paper identifies the gap between advanced RL solutions and existing RL-EMS. Finally, this paper suggests potential development directions for implementing advanced artificial intelligence (AI) solutions in EMS.

translated by 谷歌翻译

DRL Enabled Coverage and Capacity Optimization in STAR-RIS Assisted Networks

Xinyu Gao , Wenqiang Yi , Yuanwei Liu , Jianhua Zhang , Ping Zhang

分类：人工智能

2022-09-01

同时传输和反射可重构的智能表面（星际摩托车）是一种有前途的被动装置，通过同时传输和反映入射信号，从而有助于全空间覆盖。作为无线通信的新范式，如何分析星际轮胎的覆盖范围和能力性能变得至关重要，但具有挑战性。为了解决星际辅助网络中的覆盖范围和容量优化（CCO）问题，提出了多目标近端策略优化（MO-PPO）算法来处理长期利益，而不是传统优化算法。为了在每个目标之间取得平衡，MO-PPO算法提供了一组最佳解决方案，以形成Pareto前部（PF），其中PF上的任何解决方案都被视为最佳结果。此外，研究了为了提高MO-PPO算法的性能，两种更新策略，即基于动作值的更新策略（AVU）和基于损失功能的更新策略（LFUS）。对于AVU，改进的点是整合覆盖范围和容量的动作值，然后更新损失函数。对于LFU，改进的点仅是为覆盖范围和容量损失函数分配动态权重，而权重在每个更新时由最小值求解器计算出来。数值结果表明，调查的更新策略在不同情况下的固定权重优化算法优于MO优化算法，其中包括不同数量的样品网格，星轮的数量，星轮中的元素数量和大小星际船。此外，星际辅助网络比没有星际轮胎的传统无线网络获得更好的性能。此外，具有相同的带宽，毫米波能够提供比低6 GHz更高的容量，但覆盖率较小。

translated by 谷歌翻译

HTML版本

A further exploration of deep Multi-Agent Reinforcement Learning with Hybrid Action Space

Hongzhi Hua , Guixuan Wen , Kaigui Wu

分类：机器学习 | 人工智能

2022-08-30

将深度强化学习（DRL）扩展到多代理领域的研究已经解决了许多复杂的问题，并取得了重大成就。但是，几乎所有这些研究都只关注离散或连续的动作空间，而且很少有作品曾经使用过多代理的深度强化学习来实现现实世界中的环境问题，这些问题主要具有混合动作空间。因此，在本文中，我们提出了两种算法：深层混合软性角色批评（MAHSAC）和多代理混合杂种深层确定性政策梯度（MAHDDPG）来填补这一空白。这两种算法遵循集中式培训和分散执行（CTDE）范式，并可以解决混合动作空间问题。我们的经验在多代理粒子环境上运行，这是一个简单的多代理粒子世界，以及一些基本的模拟物理。实验结果表明，这些算法具有良好的性能。

translated by 谷歌翻译

Recent Advances in Reinforcement Learning in Finance

Ben Hambly , Renyuan Xu , Huining Yang

分类：机器学习

2021-12-08

由于数据量增加，金融业的快速变化已经彻底改变了数据处理和数据分析的技术，并带来了新的理论和计算挑战。与古典随机控制理论和解决财务决策问题的其他分析方法相比，解决模型假设的财务决策问题，强化学习（RL）的新发展能够充分利用具有更少模型假设的大量财务数据并改善复杂的金融环境中的决策。该调查纸目的旨在审查最近的资金途径的发展和使用RL方法。我们介绍了马尔可夫决策过程，这是许多常用的RL方法的设置。然后引入各种算法，重点介绍不需要任何模型假设的基于价值和基于策略的方法。连接是用神经网络进行的，以扩展框架以包含深的RL算法。我们的调查通过讨论了这些RL算法在金融中各种决策问题中的应用，包括最佳执行，投资组合优化，期权定价和对冲，市场制作，智能订单路由和Robo-Awaring。

translated by 谷歌翻译

Hierarchical Reinforcement Learning with Opponent Modeling for Distributed Multi-agent Cooperation

Zhixuan Liang , Jiannong Cao , Shan Jiang , Divya Saxena , Huafeng Xu

分类：人工智能 | 机器人

2022-06-25

许多现实世界的应用程序都可以作为多机构合作问题进行配置，例如网络数据包路由和自动驾驶汽车的协调。深入增强学习（DRL）的出现为通过代理和环境的相互作用提供了一种有前途的多代理合作方法。但是，在政策搜索过程中，传统的DRL解决方案遭受了多个代理具有连续动作空间的高维度。此外，代理商政策的动态性使训练非平稳。为了解决这些问题，我们建议采用高级决策和低水平的个人控制，以进行有效的政策搜索，提出一种分层增强学习方法。特别是，可以在高级离散的动作空间中有效地学习多个代理的合作。同时，低水平的个人控制可以减少为单格强化学习。除了分层增强学习外，我们还建议对手建模网络在学习过程中对其他代理的政策进行建模。与端到端的DRL方法相反，我们的方法通过以层次结构将整体任务分解为子任务来降低学习的复杂性。为了评估我们的方法的效率，我们在合作车道变更方案中进行了现实世界中的案例研究。模拟和现实世界实验都表明我们的方法在碰撞速度和收敛速度中的优越性。

translated by 谷歌翻译