智能论文笔记

A Bayesian Framework for Digital Twin-Based Control, Monitoring, and Data Collection in Wireless Systems

Clement Ruah , Osvaldo Simeone , Bashir Al-Hashimi

分类：机器学习

2022-12-02

Commonly adopted in the manufacturing and aerospace sectors, digital twin (DT) platforms are increasingly seen as a promising paradigm to control, monitor, and analyze software-based, "open", communication systems. Notably, DT platforms provide a sandbox in which to test artificial intelligence (AI) solutions for communication systems, potentially reducing the need to collect data and test algorithms in the field, i.e., on the physical twin (PT). A key challenge in the deployment of DT systems is to ensure that virtual control optimization, monitoring, and analysis at the DT are safe and reliable, avoiding incorrect decisions caused by "model exploitation". To address this challenge, this paper presents a general Bayesian framework with the aim of quantifying and accounting for model uncertainty at the DT that is caused by limitations in the amount and quality of data available at the DT from the PT. In the proposed framework, the DT builds a Bayesian model of the communication system, which is leveraged to enable core DT functionalities such as control via multi-agent reinforcement learning (MARL), monitoring of the PT for anomaly detection, prediction, data-collection optimization, and counterfactual analysis. To exemplify the application of the proposed framework, we specifically investigate a case-study system encompassing multiple sensing devices that report to a common receiver. Experimental results validate the effectiveness of the proposed Bayesian framework as compared to standard frequentist model-based solutions.

translated by 谷歌翻译

Applications of Multi-Agent Reinforcement Learning in Future Internet: A Comprehensive Survey

Tianxu Li , Kun Zhu , Nguyen Cong Luong , Dusit Niyato , Qihui Wu , Yang Zhang , Bing Chen

分类：人工智能 | 机器学习

2021-10-26

未来的互联网涉及几种新兴技术，例如5G和5G网络，车辆网络，无人机（UAV）网络和物联网（IOT）。此外，未来的互联网变得异质并分散了许多相关网络实体。每个实体可能需要做出本地决定，以在动态和不确定的网络环境下改善网络性能。最近使用标准学习算法，例如单药强化学习（RL）或深入强化学习（DRL），以使每个网络实体作为代理人通过与未知环境进行互动来自适应地学习最佳决策策略。但是，这种算法未能对网络实体之间的合作或竞争进行建模，而只是将其他实体视为可能导致非平稳性问题的环境的一部分。多机构增强学习（MARL）允许每个网络实体不仅观察环境，还可以观察其他实体的政策来学习其最佳政策。结果，MAL可以显着提高网络实体的学习效率，并且最近已用于解决新兴网络中的各种问题。在本文中，我们因此回顾了MAL在新兴网络中的应用。特别是，我们提供了MARL的教程，以及对MARL在下一代互联网中的应用进行全面调查。特别是，我们首先介绍单代机Agent RL和MARL。然后，我们回顾了MAL在未来互联网中解决新兴问题的许多应用程序。这些问题包括网络访问，传输电源控制，计算卸载，内容缓存，数据包路由，无人机网络的轨迹设计以及网络安全问题。

translated by 谷歌翻译

Scalable and Decentralized Algorithms for Anomaly Detection via Learning-Based Controlled Sensing

Geethu Joseph , Chen Zhong , M. Cenk Gursoy , Senem Velipasalar , Pramod K. Varshney

分类：机器学习 | (统计)机器学习

2021-12-08

我们解决了从给定集中选择和观察过程的问题，以找到其中的异常。决策者在任何给定的时间瞬间观察过程的子集，并获得相应过程是否异常的嘈杂二进制指示符。在该设置中，我们开发了一种异常检测算法，该检测算法选择在给定的时间瞬间观察的过程，决定何时停止观察，并宣布对异常过程的决定。检测算法的目的是识别具有超过所需值的精度的异常，同时最小化决策制定的延迟。我们设计了一种集中式算法，其中通过公共代理和分散算法共同选择进程，其中对于每个过程独立决定是否选择过程。我们的算法依赖于使用每个过程的边际概率定义的马尔可夫决策过程正常或异常，调节观察结果。我们利用深度演员批评加强学习框架实现了检测算法。与在此主题的事先工作不同，在流程数量中具有指数复杂性，我们的算法具有在过程数量中的多项式的计算和内存要求。我们通过将它们与最先进的方法进行比较来证明这些算法使用数值实验的功效。

translated by 谷歌翻译

Learning Emergent Random Access Protocol for LEO Satellite Networks

Ju-Hyung Lee , Hyowoon Seo , Jihong Park , Mehdi Bennis , Young-Chai Ko

分类：机器学习

2021-12-03

设想了一座低空地球轨道（LEO）卫星（SAT）的Mega-Constulation，以提供超出第五代（5G）蜂窝系统的全球覆盖网网络。 Leo SAT网络在时代的SAT网络拓扑中展示了许多用户的极长链接距离。这使得现有的多个访问协议，例如基于随机接入信道（RACH）的蜂窝协议，专为固定地面网络拓扑而设计，不适用于。为了克服这个问题，在本文中，我们提出了一种新颖的LEO SAT网络无随机访问解决方案，被称为随机接入信道协议（ERACH）。在与现有的基于模型和标准化协议的鲜明对比中，ERACH是一种无模型方法，通过使用多档次深度加强学习（Madrl），通过与非静止网络环境的互动出现。此外，通过利用已知的SAT轨道模式，ERACH不需要跨越用户的中心协调或额外的通信，而训练会聚通过规则的轨道模式稳定。与RACH相比，我们从各种模拟中展示了我们所提出的ERACH的平均网络吞吐量增加了54.6％，平均访问延迟较低的两倍，同时实现了0.989的jain的公平指数。

translated by 谷歌翻译

Device Selection for the Coexistence of URLLC and Distributed Learning Services

Milad Ganjalizadeh , Hossein Shokri Ghadikolaei , Deniz Gündüz , Marina Petrova

分类：机器学习

2022-12-22

Recent advances in distributed artificial intelligence (AI) have led to tremendous breakthroughs in various communication services, from fault-tolerant factory automation to smart cities. When distributed learning is run over a set of wirelessly connected devices, random channel fluctuations and the incumbent services running on the same network impact the performance of both distributed learning and the coexisting service. In this paper, we investigate a mixed service scenario where distributed AI workflow and ultra-reliable low latency communication (URLLC) services run concurrently over a network. Consequently, we propose a risk sensitivity-based formulation for device selection to minimize the AI training delays during its convergence period while ensuring that the operational requirements of the URLLC service are met. To address this challenging coexistence problem, we transform it into a deep reinforcement learning problem and address it via a framework based on soft actor-critic algorithm. We evaluate our solution with a realistic and 3GPP-compliant simulator for factory automation use cases. Our simulation results confirm that our solution can significantly decrease the training delay of the distributed AI service while keeping the URLLC availability above its required threshold and close to the scenario where URLLC solely consumes all network resources.

translated by 谷歌翻译

Active Sensing for Search and Tracking: A Review

Luca Varotto , Angelo Cenedese , Andrea Cavallaro

分类：机器人

2021-12-04

主动位置估计（APE）是使用一个或多个传感平台本地化一个或多个目标的任务。 APE是搜索和拯救任务，野生动物监测，源期限估计和协作移动机器人的关键任务。 APE的成功取决于传感平台的合作水平，他们的数量，他们的自由度和收集的信息的质量。 APE控制法通过满足纯粹剥削或纯粹探索性标准，可以实现主动感测。前者最大限度地减少了位置估计的不确定性;虽然后者驱动了更接近其任务完成的平台。在本文中，我们定义了系统地分类的主要元素，并批判地讨论该域中的最新状态。我们还提出了一个参考框架作为对截图相关的解决方案的形式主义。总体而言，本调查探讨了主要挑战，并设想了本地化任务的自主感知系统领域的主要研究方向。促进用于搜索和跟踪应用的强大主动感测方法的开发也有益。

translated by 谷歌翻译

Wireless for Machine Learning

Henrik Hellström , José Mairton B. da Silva Jr , Mohammad Mohammadi Amiri , Mingzhe Chen , Viktoria Fodor , H. Vincent Poor , Carlo Fischione

分类：机器学习

2020-08-31

随着数据生成越来越多地在没有连接连接的设备上进行，因此与机器学习（ML）相关的流量将在无线网络中无处不在。许多研究表明，传统的无线协议高效或不可持续以支持ML，这创造了对新的无线通信方法的需求。在这项调查中，我们对最先进的无线方法进行了详尽的审查，这些方法是专门设计用于支持分布式数据集的ML服务的。当前，文献中有两个明确的主题，模拟的无线计算和针对ML优化的数字无线电资源管理。这项调查对这些方法进行了全面的介绍，回顾了最重要的作品，突出了开放问题并讨论了应用程序方案。

translated by 谷歌翻译

Exploration in Deep Reinforcement Learning: A Comprehensive Survey

Tianpei Yang , Hongyao Tang , Chenjia Bai , Jinyi Liu , Jianye Hao , Zhaopeng Meng , Peng Liu , Zhen Wang

分类：人工智能 | 机器学习

2021-09-14

深度强化学习（DRL）和深度多机构的强化学习（MARL）在包括游戏AI，自动驾驶汽车，机器人技术等各种领域取得了巨大的成功。但是，众所周知，DRL和Deep MARL代理的样本效率低下，即使对于相对简单的问题设置，通常也需要数百万个相互作用，从而阻止了在实地场景中的广泛应用和部署。背后的一个瓶颈挑战是众所周知的探索问题，即如何有效地探索环境和收集信息丰富的经验，从而使政策学习受益于最佳研究。在稀疏的奖励，吵闹的干扰，长距离和非平稳的共同学习者的复杂环境中，这个问题变得更加具有挑战性。在本文中，我们对单格和多代理RL的现有勘探方法进行了全面的调查。我们通过确定有效探索的几个关键挑战开始调查。除了上述两个主要分支外，我们还包括其他具有不同思想和技术的著名探索方法。除了算法分析外，我们还对一组常用基准的DRL进行了全面和统一的经验比较。根据我们的算法和实证研究，我们终于总结了DRL和Deep Marl中探索的公开问题，并指出了一些未来的方向。

translated by 谷歌翻译

Distributed Machine Learning for UAV Swarms: Computing, Sensing, and Semantics

Yahao Ding , Zhaohui Yang , Quoc-Viet Pham , Zhaoyang Zhang , Mohammad Shikh-Bahaei

分类：机器学习 | 人工智能

2023-01-03

Unmanned aerial vehicle (UAV) swarms are considered as a promising technique for next-generation communication networks due to their flexibility, mobility, low cost, and the ability to collaboratively and autonomously provide services. Distributed learning (DL) enables UAV swarms to intelligently provide communication services, multi-directional remote surveillance, and target tracking. In this survey, we first introduce several popular DL algorithms such as federated learning (FL), multi-agent Reinforcement Learning (MARL), distributed inference, and split learning, and present a comprehensive overview of their applications for UAV swarms, such as trajectory design, power control, wireless resource allocation, user assignment, perception, and satellite communications. Then, we present several state-of-the-art applications of UAV swarms in wireless communication systems, such us reconfigurable intelligent surface (RIS), virtual reality (VR), semantic communications, and discuss the problems and challenges that DL-enabled UAV swarms can solve in these applications. Finally, we describe open problems of using DL in UAV swarms and future research directions of DL enabled UAV swarms. In summary, this survey provides a comprehensive survey of various DL applications for UAV swarms in extensive scenarios.

translated by 谷歌翻译

Decentralized Federated Reinforcement Learning for User-Centric Dynamic TFDD Control

Ziyan Yin , Zhe Wang , Jun Li , Ming Ding , Wen Chen , Shi Jin

分类：机器学习

2022-11-04

The explosive growth of dynamic and heterogeneous data traffic brings great challenges for 5G and beyond mobile networks. To enhance the network capacity and reliability, we propose a learning-based dynamic time-frequency division duplexing (D-TFDD) scheme that adaptively allocates the uplink and downlink time-frequency resources of base stations (BSs) to meet the asymmetric and heterogeneous traffic demands while alleviating the inter-cell interference. We formulate the problem as a decentralized partially observable Markov decision process (Dec-POMDP) that maximizes the long-term expected sum rate under the users' packet dropping ratio constraints. In order to jointly optimize the global resources in a decentralized manner, we propose a federated reinforcement learning (RL) algorithm named federated Wolpertinger deep deterministic policy gradient (FWDDPG) algorithm. The BSs decide their local time-frequency configurations through RL algorithms and achieve global training via exchanging local RL models with their neighbors under a decentralized federated learning framework. Specifically, to deal with the large-scale discrete action space of each BS, we adopt a DDPG-based algorithm to generate actions in a continuous space, and then utilize Wolpertinger policy to reduce the mapping errors from continuous action space back to discrete action space. Simulation results demonstrate the superiority of our proposed algorithm to benchmark algorithms with respect to system sum rate.

translated by 谷歌翻译

Bayesian Active Meta-Learning for Few Pilot Demodulation and Equalization

Kfir M. Cohen , Sangwoo Park , Osvaldo Simeone , Shlomo Shamai

分类：机器学习

2021-08-02

Two of the main principles underlying the life cycle of an artificial intelligence (AI) module in communication networks are adaptation and monitoring. Adaptation refers to the need to adjust the operation of an AI module depending on the current conditions; while monitoring requires measures of the reliability of an AI module's decisions. Classical frequentist learning methods for the design of AI modules fall short on both counts of adaptation and monitoring, catering to one-off training and providing overconfident decisions. This paper proposes a solution to address both challenges by integrating meta-learning with Bayesian learning. As a specific use case, the problems of demodulation and equalization over a fading channel based on the availability of few pilots are studied. Meta-learning processes pilot information from multiple frames in order to extract useful shared properties of effective demodulators across frames. The resulting trained demodulators are demonstrated, via experiments, to offer better calibrated soft decisions, at the computational cost of running an ensemble of networks at run time. The capacity to quantify uncertainty in the model parameter space is further leveraged by extending Bayesian meta-learning to an active setting. In it, the designer can select in a sequential fashion channel conditions under which to generate data for meta-learning from a channel simulator. Bayesian active meta-learning is seen in experiments to significantly reduce the number of frames required to obtain efficient adaptation procedure for new frames.

translated by 谷歌翻译

Recent Advances in Reinforcement Learning in Finance

Ben Hambly , Renyuan Xu , Huining Yang

分类：机器学习

2021-12-08

由于数据量增加，金融业的快速变化已经彻底改变了数据处理和数据分析的技术，并带来了新的理论和计算挑战。与古典随机控制理论和解决财务决策问题的其他分析方法相比，解决模型假设的财务决策问题，强化学习（RL）的新发展能够充分利用具有更少模型假设的大量财务数据并改善复杂的金融环境中的决策。该调查纸目的旨在审查最近的资金途径的发展和使用RL方法。我们介绍了马尔可夫决策过程，这是许多常用的RL方法的设置。然后引入各种算法，重点介绍不需要任何模型假设的基于价值和基于策略的方法。连接是用神经网络进行的，以扩展框架以包含深的RL算法。我们的调查通过讨论了这些RL算法在金融中各种决策问题中的应用，包括最佳执行，投资组合优化，期权定价和对冲，市场制作，智能订单路由和Robo-Awaring。

translated by 谷歌翻译

Asynchronous Hybrid Reinforcement Learning for Latency and Reliability Optimization in the Metaverse over Wireless Communications

Wenhan Yu , Terence Jie Chua , Jun Zhao

分类：机器学习

2022-12-30

Technology advancements in wireless communications and high-performance Extended Reality (XR) have empowered the developments of the Metaverse. The demand for Metaverse applications and hence, real-time digital twinning of real-world scenes is increasing. Nevertheless, the replication of 2D physical world images into 3D virtual world scenes is computationally intensive and requires computation offloading. The disparity in transmitted scene dimension (2D as opposed to 3D) leads to asymmetric data sizes in uplink (UL) and downlink (DL). To ensure the reliability and low latency of the system, we consider an asynchronous joint UL-DL scenario where in the UL stage, the smaller data size of the physical world scenes captured by multiple extended reality users (XUs) will be uploaded to the Metaverse Console (MC) to be construed and rendered. In the DL stage, the larger-size 3D virtual world scenes need to be transmitted back to the XUs. The decisions pertaining to computation offloading and channel assignment are optimized in the UL stage, and the MC will optimize power allocation for users assigned with a channel in the UL transmission stage. Some problems arise therefrom: (i) interactive multi-process chain, specifically Asynchronous Markov Decision Process (AMDP), (ii) joint optimization in multiple processes, and (iii) high-dimensional objective functions, or hybrid reward scenarios. To ensure the reliability and low latency of the system, we design a novel multi-agent reinforcement learning algorithm structure, namely Asynchronous Actors Hybrid Critic (AAHC). Extensive experiments demonstrate that compared to proposed baselines, AAHC obtains better solutions with preferable training time.

translated by 谷歌翻译

Robust Bayesian Learning for Reliable Wireless AI: Framework and Applications

Matteo Zecchin , Sangwoo Park , Osvaldo Simeone , Marios Kountouris , David Gesbert

分类：机器学习 | 人工智能

2022-07-01

这项工作仔细研究了传统的机器学习方法通过可靠性和鲁棒性的镜头应用于无线通信问题。深度学习技术采用了常见的框架，并已知提供校准较差的决策，这些决策不会再现由训练数据规模的限制引起的真正不确定性。贝叶斯学习原则上能够解决这一缺点，但实际上，模型错误指定和异常值的存在损害。在无线通信设置中，这两个问题都普遍存在，其中机器学习模型的能力受资源限制的影响，培训数据受噪声和干扰的影响。在这种情况下，我们探讨了强大的贝叶斯学习框架的应用。经过教程式的贝叶斯学习介绍，我们就精确，校准和对异常值和错误指定的鲁棒性进行了强大的贝叶斯学习对几个重要的无线沟通问题的优点。

translated by 谷歌翻译

Learning based Age of Information Minimization in UAV-relayed IoT Networks

Biplav Choudhury , Prasenjit Karmakar , Vijay K. Shah , Jeffrey H. Reed

分类：机器学习

2022-03-08

无人驾驶飞机（UAV）用作空中基础站，可将时间敏感的包装从物联网设备传递到附近的陆地底站（TBS）。在此类无人产用的物联网网络中安排数据包，以确保TBS在TBS上确保新鲜（或最新的）物联网设备的数据包是一个挑战性的问题，因为它涉及两个同时的步骤（i）（i）在IOT设备上生成的数据包的同时进行样本由UAVS [HOP-1]和（ii）将采样数据包从UAVS更新到TBS [Hop-2]。为了解决这个问题，我们建议针对两跳UAV相关的IoT网络的信息年龄（AOI）调度算法。首先，我们提出了一个低复杂的AOI调度程序，称为MAF-MAD，该计划使用UAV（HOP-1）和最大AOI差异（MAD）策略采样最大AOI（MAF）策略，以更新从无人机到TBS（Hop-2）。我们证明，MAF-MAD是理想条件下的最佳AOI调度程序（无线无线通道和在物联网设备上产生交通生成）。相反，对于一般条件（物联网设备的损失渠道条件和不同的周期性交通生成），提出了深厚的增强学习算法，即近端政策优化（PPO）基于调度程序。仿真结果表明，在所有考虑的一般情况下，建议的基于PPO的调度程序优于MAF-MAD，MAF和Round-Robin等其他调度程序。

translated by 谷歌翻译

Scheduling Out-of-Coverage Vehicular Communications Using Reinforcement Learning

Taylan Şahin , Ramin Khalili , Mate Boban , Adam Wolisz

分类：人工智能

2022-07-13

车辆到车辆（V2V）通信的性能在很大程度上取决于使用的调度方法。虽然集中式网络调度程序提供高V2V通信可靠性，但它们的操作通常仅限于具有完整的蜂窝网络覆盖范围的区域。相比之下，在细胞外覆盖区域中，使用了相对效率低下的分布式无线电资源管理。为了利用集中式方法的好处来增强V2V通信在缺乏蜂窝覆盖的道路上的可靠性，我们建议使用VRLS（车辆加固学习调度程序），这是一种集中的调度程序，该调度程序主动为覆盖外的V2V Communications主动分配资源，以前}车辆离开蜂窝网络覆盖范围。通过在模拟的车辆环境中进行培训，VRL可以学习一项适应环境变化的调度策略，从而消除了在复杂的现实生活环境中对有针对性（重新）培训的需求。我们评估了在不同的移动性，网络负载，无线通道和资源配置下VRL的性能。 VRL的表现优于最新的区域中最新分布式调度算法，而无需蜂窝网络覆盖，通过在高负载条件下将数据包错误率降低了一半，并在低负载方案中实现了接近最大的可靠性。

translated by 谷歌翻译

When Machine Learning Meets Spectrum Sharing Security: Methodologies and Challenges

Qun Wang , Haijian Sun , Rose Qingyang Hu , Arupjyoti Bhuyan

分类：机器学习

2022-01-12

互联网连接系统的指数增长产生了许多挑战，例如频谱短缺问题，需要有效的频谱共享（SS）解决方案。复杂和动态的SS系统可以接触不同的潜在安全性和隐私问题，需要保护机制是自适应，可靠和可扩展的。基于机器学习（ML）的方法经常提议解决这些问题。在本文中，我们对最近的基于ML的SS方法，最关键的安全问题和相应的防御机制提供了全面的调查。特别是，我们详细说明了用于提高SS通信系统的性能的最先进的方法，包括基于ML基于ML的基于的数据库辅助SS网络，ML基于基于的数据库辅助SS网络，包括基于ML的数据库辅助的SS网络，基于ML的LTE-U网络，基于ML的环境反向散射网络和其他基于ML的SS解决方案。我们还从物理层和基于ML算法的相应防御策略的安全问题，包括主要用户仿真（PUE）攻击，频谱感测数据伪造（SSDF）攻击，干扰攻击，窃听攻击和隐私问题。最后，还给出了对ML基于ML的开放挑战的广泛讨论。这种全面的审查旨在为探索新出现的ML的潜力提供越来越复杂的SS及其安全问题，提供基础和促进未来的研究。

translated by 谷歌翻译

RLOps: Development Life-cycle of Reinforcement Learning Aided Open RAN

Peizheng Li , Jonathan Thomas , Xiaoyang Wang , Ahmed Khalil , Abdelrahim Ahmad , Rui Inacio , Shipra Kapoor , Arjun Parekh , Angela Doufexi , Arman Shojaeifard

分类：机器学习

2021-11-12

无线电接入网络（RAN）技术继续见证巨大的增长，开放式运行越来越最近的势头。在O-RAN规范中，RAN智能控制器（RIC）用作自动化主机。本文介绍了对O-RAN堆栈相关的机器学习（ML）的原则，特别是加强学习（RL）。此外，我们审查无线网络的最先进的研究，并将其投入到RAN框架和O-RAN架构的层次结构上。我们在整个开发生命周期中提供ML / RL模型面临的挑战的分类：从系统规范到生产部署（数据采集，模型设计，测试和管理等）。为了解决挑战，我们将一组现有的MLOPS原理整合，当考虑RL代理时，具有独特的特性。本文讨论了系统的生命周期模型开发，测试和验证管道，称为：RLOPS。我们讨论了RLOP的所有基本部分，包括：模型规范，开发和蒸馏，生产环境服务，运营监控，安全/安全和数据工程平台。根据这些原则，我们提出了最佳实践，以实现自动化和可重复的模型开发过程。

translated by 谷歌翻译

Deep Reinforcement Learning for Cyber Security

Thanh Thi Nguyen , Vijay Janapa Reddi

分类：人工智能 | 机器学习 | (统计)机器学习

2019-06-13

互联网连接系统的规模大大增加，这些系统比以往任何时候都更接触到网络攻击。网络攻击的复杂性和动态需要保护机制响应，自适应和可扩展。机器学习，或更具体地说，深度增强学习（DRL），方法已经广泛提出以解决这些问题。通过将深入学习纳入传统的RL，DRL能够解决复杂，动态，特别是高维的网络防御问题。本文提出了对为网络安全开发的DRL方法进行了调查。我们触及不同的重要方面，包括基于DRL的网络 - 物理系统的安全方法，自主入侵检测技术和基于多元的DRL的游戏理论模拟，用于防范策略对网络攻击。还给出了对基于DRL的网络安全的广泛讨论和未来的研究方向。我们预计这一全面审查提供了基础，并促进了未来的研究，探讨了越来越复杂的网络安全问题。

translated by 谷歌翻译

A Survey on Large-Population Systems and Scalable Multi-Agent Reinforcement Learning

Kai Cui , Anam Tahir , Gizem Ekinci , Ahmed Elshamanhory , Yannick Eich , Mengguang Li , Heinz Koeppl

分类：人工智能 | 机器学习

2022-09-08

大型人口系统的分析和控制对研究和工程的各个领域引起了极大的兴趣，从机器人群的流行病学到经济学和金融。一种越来越流行和有效的方法来实现多代理系统中的顺序决策，这是通过多机构增强学习，因为它允许对高度复杂的系统进行自动和无模型的分析。但是，可伸缩性的关键问题使控制和增强学习算法的设计变得复杂，尤其是在具有大量代理的系统中。尽管强化学习在许多情况下都发现了经验成功，但许多代理商的问题很快就变得棘手了，需要特别考虑。在这项调查中，我们将阐明当前的方法，以通过多代理强化学习以及通过诸如平均场游戏，集体智能或复杂的网络理论等研究领域进行仔细理解和分析大型人口系统。这些经典独立的主题领域提供了多种理解或建模大型人口系统的方法，这可能非常适合将来的可拖动MARL算法制定。最后，我们调查了大规模控制的潜在应用领域，并确定了实用系统中学习算法的富有成果的未来应用。我们希望我们的调查可以为理论和应用科学的初级和高级研究人员提供洞察力和未来的方向。

translated by 谷歌翻译