智能论文笔记

Digital Twin-Assisted Efficient Reinforcement Learning for Edge Task Scheduling

Xiucheng Wang , Longfei Ma , Haocheng Li , Zhisheng Yin , Tom. Luan , Nan Cheng

分类：机器学习 | 人工智能

2022-08-02

当一个用户将多个不同的任务卸载到边缘服务器时，任务调度是一个关键问题。当用户有多个任务要卸载，并且一次只能将一个任务传输到服务器，而服务器根据传输顺序处理任务时，问题是NP-HARD。但是，传统优化方法很难快速获得最佳解决方案，而基于强化学习面孔的方法和过度的动作空间和缓慢收敛的挑战。在本文中，我们提出了一种基于RL的Digital Twin（DT）辅助任务调度方法，以提高RL的性能和收敛性。我们使用DT来模拟代理商做出的不同决策的结果，以便一个代理可以一次尝试多个操作，或者类似地，多个代理可以在DT中并行与环境交互。通过这种方式，RL的勘探效率可以通过DT显着提高，因此RL可以更快地收敛，而局部最优性不太可能发生。特别是，设计了两种算法来制定任务调度决策，即DT辅助异步Q学习（DTAQL）和DT辅助探索Q-Learning（DTEQL）。仿真结果表明，两种算法都通过提高勘探效率显着提高了Q学习的收敛速度。

translated by 谷歌翻译

Applications of Multi-Agent Reinforcement Learning in Future Internet: A Comprehensive Survey

Tianxu Li , Kun Zhu , Nguyen Cong Luong , Dusit Niyato , Qihui Wu , Yang Zhang , Bing Chen

分类：人工智能 | 机器学习

2021-10-26

未来的互联网涉及几种新兴技术，例如5G和5G网络，车辆网络，无人机（UAV）网络和物联网（IOT）。此外，未来的互联网变得异质并分散了许多相关网络实体。每个实体可能需要做出本地决定，以在动态和不确定的网络环境下改善网络性能。最近使用标准学习算法，例如单药强化学习（RL）或深入强化学习（DRL），以使每个网络实体作为代理人通过与未知环境进行互动来自适应地学习最佳决策策略。但是，这种算法未能对网络实体之间的合作或竞争进行建模，而只是将其他实体视为可能导致非平稳性问题的环境的一部分。多机构增强学习（MARL）允许每个网络实体不仅观察环境，还可以观察其他实体的政策来学习其最佳政策。结果，MAL可以显着提高网络实体的学习效率，并且最近已用于解决新兴网络中的各种问题。在本文中，我们因此回顾了MAL在新兴网络中的应用。特别是，我们提供了MARL的教程，以及对MARL在下一代互联网中的应用进行全面调查。特别是，我们首先介绍单代机Agent RL和MARL。然后，我们回顾了MAL在未来互联网中解决新兴问题的许多应用程序。这些问题包括网络访问，传输电源控制，计算卸载，内容缓存，数据包路由，无人机网络的轨迹设计以及网络安全问题。

translated by 谷歌翻译

Progress and summary of reinforcement learning on energy management of MPS-EV

Jincheng Hu , Yang Lin , Liang Chu , Zhuoran Hou , Jihan Li , Jingjing Jiang , Yuanjian Zhang

分类：机器学习

2022-11-08

The high emission and low energy efficiency caused by internal combustion engines (ICE) have become unacceptable under environmental regulations and the energy crisis. As a promising alternative solution, multi-power source electric vehicles (MPS-EVs) introduce different clean energy systems to improve powertrain efficiency. The energy management strategy (EMS) is a critical technology for MPS-EVs to maximize efficiency, fuel economy, and range. Reinforcement learning (RL) has become an effective methodology for the development of EMS. RL has received continuous attention and research, but there is still a lack of systematic analysis of the design elements of RL-based EMS. To this end, this paper presents an in-depth analysis of the current research on RL-based EMS (RL-EMS) and summarizes the design elements of RL-based EMS. This paper first summarizes the previous applications of RL in EMS from five aspects: algorithm, perception scheme, decision scheme, reward function, and innovative training method. The contribution of advanced algorithms to the training effect is shown, the perception and control schemes in the literature are analyzed in detail, different reward function settings are classified, and innovative training methods with their roles are elaborated. Finally, by comparing the development routes of RL and RL-EMS, this paper identifies the gap between advanced RL solutions and existing RL-EMS. Finally, this paper suggests potential development directions for implementing advanced artificial intelligence (AI) solutions in EMS.

translated by 谷歌翻译

Asynchronous Hybrid Reinforcement Learning for Latency and Reliability Optimization in the Metaverse over Wireless Communications

Wenhan Yu , Terence Jie Chua , Jun Zhao

分类：机器学习

2022-12-30

Technology advancements in wireless communications and high-performance Extended Reality (XR) have empowered the developments of the Metaverse. The demand for Metaverse applications and hence, real-time digital twinning of real-world scenes is increasing. Nevertheless, the replication of 2D physical world images into 3D virtual world scenes is computationally intensive and requires computation offloading. The disparity in transmitted scene dimension (2D as opposed to 3D) leads to asymmetric data sizes in uplink (UL) and downlink (DL). To ensure the reliability and low latency of the system, we consider an asynchronous joint UL-DL scenario where in the UL stage, the smaller data size of the physical world scenes captured by multiple extended reality users (XUs) will be uploaded to the Metaverse Console (MC) to be construed and rendered. In the DL stage, the larger-size 3D virtual world scenes need to be transmitted back to the XUs. The decisions pertaining to computation offloading and channel assignment are optimized in the UL stage, and the MC will optimize power allocation for users assigned with a channel in the UL transmission stage. Some problems arise therefrom: (i) interactive multi-process chain, specifically Asynchronous Markov Decision Process (AMDP), (ii) joint optimization in multiple processes, and (iii) high-dimensional objective functions, or hybrid reward scenarios. To ensure the reliability and low latency of the system, we design a novel multi-agent reinforcement learning algorithm structure, namely Asynchronous Actors Hybrid Critic (AAHC). Extensive experiments demonstrate that compared to proposed baselines, AAHC obtains better solutions with preferable training time.

translated by 谷歌翻译

Online Service Migration in Edge Computing with Incomplete Information: A Deep Recurrent Actor-Critic Method

Jin Wang , Jia Hu , Geyong Min , Qiang Ni , Tarek El-Ghazawi

分类：机器学习

2020-12-16

多访问边缘计算（MEC）是一个新兴的计算范式，将云计算扩展到网络边缘，以支持移动设备上的资源密集型应用程序。作为MEC的关键问题，服务迁移需要决定如何迁移用户服务，以维持用户在覆盖范围和容量有限的MEC服务器之间漫游的服务质量。但是，由于动态的MEC环境和用户移动性，找到最佳的迁移策略是棘手的。许多现有研究根据完整的系统级信息做出集中式迁移决策，这是耗时的，并且缺乏理想的可扩展性。为了应对这些挑战，我们提出了一种新颖的学习驱动方法，该方法以用户为中心，可以通过使用不完整的系统级信息来做出有效的在线迁移决策。具体而言，服务迁移问题被建模为可观察到的马尔可夫决策过程（POMDP）。为了解决POMDP，我们设计了一个新的编码网络，该网络结合了长期记忆（LSTM）和一个嵌入式矩阵，以有效提取隐藏信息，并进一步提出了一种定制的非政策型演员 - 批判性算法，以进行有效的训练。基于现实世界的移动性痕迹的广泛实验结果表明，这种新方法始终优于启发式和最先进的学习驱动算法，并且可以在各种MEC场景上取得近乎最佳的结果。

translated by 谷歌翻译

Deep Reinforcement Learning for Task Offloading in UAV-Aided Smart Farm Networks

Anne Catherine Nguyen , Turgay Pamuklu , Aisha Syed , W. Sean Kennedy , Melike Erol-Kantarci

分类：人工智能

2022-09-15

第五世代和第六代无线通信网络正在启用工具，例如物联网设备，无人驾驶汽车（UAV）和人工智能，以使用设备网络来改善农业景观，以自动监视农田。对大面积进行调查需要在特定时间段内执行许多图像分类任务，以防止发生事件发生的情况，例如火灾或洪水。无人机具有有限的能量和计算能力，并且可能无法在本地和适当的时间内执行所有强烈的图像分类任务。因此，假定无人机能够部分将其工作量分开到附近的多访问边缘计算设备。无人机需要一种决策算法，该算法将决定将执行任务的位置，同时还考虑网络中其他无人机的时间限制和能量级别。在本文中，我们介绍了一种深入的Q学习方法（DQL）来解决这个多目标问题。将所提出的方法与Q学习和三个启发式基线进行了比较，模拟结果表明，我们提出的基于DQL的方法在涉及无人机的剩余电池电量和违规截止日期的百分比时可相当。此外，我们的方法能够比Q学习快13倍。

translated by 谷歌翻译

Federated Deep Reinforcement Learning for the Distributed Control of NextG Wireless Networks

Peyman Tehrani , Francesco Restuccia , Marco Levorato

分类：机器学习

2021-12-07

预计下一代（NEVERG）网络将支持苛刻的触觉互联网应用，例如增强现实和连接的自动车辆。虽然最近的创新带来了更大的联系能力的承诺，它们对环境的敏感性以及不稳定的性能无视基于传统的基于模型的控制理由。零触摸数据驱动的方法可以提高网络适应当前操作条件的能力。诸如强化学习（RL）算法等工具可以仅基于观察历史来构建最佳控制策略。具体而言，使用深神经网络（DNN）作为预测器的深RL（DRL）已经被示出，即使在复杂的环境和高维输入中也能够实现良好的性能。但是，DRL模型的培训需要大量数据，这可能会限制其对潜在环境的不断发展统计数据的适应性。此外，无线网络是固有的分布式系统，其中集中式DRL方法需要过多的数据交换，而完全分布的方法可能导致较慢的收敛速率和性能下降。在本文中，为了解决这些挑战，我们向DRL提出了联合学习（FL）方法，我们指的是联邦DRL（F-DRL），其中基站（BS）通过仅共享模型的重量协作培训嵌入式DNN而不是训练数据。我们评估了两个不同版本的F-DRL，价值和策略，并显示出与分布式和集中式DRL相比实现的卓越性能。

translated by 谷歌翻译

The state-of-the-art review on resource allocation problem using artificial intelligence methods on various computing paradigms

Javad Hassannataj Joloudari , Sanaz Mojrian , Hamid Saadatfar , Issa Nodehi , Fatemeh Fazl , Sahar Khanjani shirkharkolaie , Roohallah Alizadehsani , H M Dipu Kabir , Ru-San Tan , U Rajendra Acharya

分类：人工智能

2022-03-23

With the increasing growth of information through smart devices, increasing the quality level of human life requires various computational paradigms presentation including the Internet of Things, fog, and cloud. Between these three paradigms, the cloud computing paradigm as an emerging technology adds cloud layer services to the edge of the network so that resource allocation operations occur close to the end-user to reduce resource processing time and network traffic overhead. Hence, the resource allocation problem for its providers in terms of presenting a suitable platform, by using computational paradigms is considered a challenge. In general, resource allocation approaches are divided into two methods, including auction-based methods(goal, increase profits for service providers-increase user satisfaction and usability) and optimization-based methods(energy, cost, network exploitation, Runtime, reduction of time delay). In this paper, according to the latest scientific achievements, a comprehensive literature study (CLS) on artificial intelligence methods based on resource allocation optimization without considering auction-based methods in various computing environments are provided such as cloud computing, Vehicular Fog Computing, wireless, IoT, vehicular networks, 5G networks, vehicular cloud architecture,machine-to-machine communication(M2M),Train-to-Train(T2T) communication network, Peer-to-Peer(P2P) network. Since deep learning methods based on artificial intelligence are used as the most important methods in resource allocation problems; Therefore, in this paper, resource allocation approaches based on deep learning are also used in the mentioned computational environments such as deep reinforcement learning, Q-learning technique, reinforcement learning, online learning, and also Classical learning methods such as Bayesian learning, Cummins clustering, Markov decision process.

translated by 谷歌翻译

Reinforcement Learning in Computing and Network Convergence Orchestration

Aidong Yang , Mohan Wu , Boquan Cheng , Xiaozhou Ye , Ye Ouyang

分类：人工智能

2022-09-22

随着计算能力已成为数字经济时代的核心生产力，计算和网络收敛的概念（CNC），根据用户的需求，可以动态地安排和分配网络和计算资源，并引起广泛关注。基于任务的属性，网络编排平面需要灵活地部署任务以适当计算节点并将路径安排到计算节点。这是一个涉及资源调度和路径布置的编排问题。由于CNC是相对较新的，因此在本文中，我们回顾了有关CNC的一些研究和应用。然后，我们使用强化学习（RL）设计了CNC编排方法，这是第一次尝试，可以灵活地分配和安排计算资源和网络资源。旨在高利润和低潜伏期。同时，我们使用多因素来确定优化目标，以便根据来自不同方面的总绩效（例如成本，利润，延迟和系统过载）在我们的实验中优化了编排策略。实验表明，与贪婪的方法，随机选择和平衡资源方法相比，提出的基于RL的方法可以实现更高的利润和更低的潜伏度。我们证明RL适合CNC编排。本文启动了RL关于CNC编排的应用程序。

translated by 谷歌翻译

Reinforcement Learning for Cognitive Delay/Disruption Tolerant Network Node Management in an LEO-based Satellite Constellation

Xue Sun , Changhao Li , Lei Yan , Suzhi Cao

分类：人工智能 | 机器学习

2022-09-27

近年来，随着空间航天器实体的大规模部署以及卫星在板载功能的增加，在过度网络动态的情况下，与TCP/IP相比，出现了比TCP/IP更强大的通信协议。 DTN节点缓冲区管理仍然是一个活跃的研究领域，因为DTN核心协议的当前实现仍然依赖于以下假设：在不同的网络节点中始终有足够的内存来存储和正向捆绑包。此外，经典排队理论不适用于DTN节点缓冲区的动态管理。因此，本文提出了一种集中式方法，以基于高级强化学习（RL）策略优势行动者 - 批评者（A2C）自动管理低地球（LEO）卫星星座中的认知DTN节点。该方法旨在探索培训地球同步地球轨道智能代理，以管理Leo卫星星座中的所有DTN节点。 A2C代理的目的是在考虑节点内存利用率的同时最大化交付成功率并最大程度地减少网络资源消耗成本。智能代理可以根据束优先级动态调整无线电数据速率并执行下降操作。为了衡量在LEO卫星星座场景中将A2C技术应用于DTN节点管理问题的有效性，本文将受过训练的智能代理策略与其他两种非RL政策进行了比较，包括随机和标准政策。实验表明，A2C策略平衡了交付成功率和成本，并提供了最高的奖励和最低的节点存储器利用率。

translated by 谷歌翻译

Distributed Machine Learning for UAV Swarms: Computing, Sensing, and Semantics

Yahao Ding , Zhaohui Yang , Quoc-Viet Pham , Zhaoyang Zhang , Mohammad Shikh-Bahaei

分类：机器学习 | 人工智能

2023-01-03

Unmanned aerial vehicle (UAV) swarms are considered as a promising technique for next-generation communication networks due to their flexibility, mobility, low cost, and the ability to collaboratively and autonomously provide services. Distributed learning (DL) enables UAV swarms to intelligently provide communication services, multi-directional remote surveillance, and target tracking. In this survey, we first introduce several popular DL algorithms such as federated learning (FL), multi-agent Reinforcement Learning (MARL), distributed inference, and split learning, and present a comprehensive overview of their applications for UAV swarms, such as trajectory design, power control, wireless resource allocation, user assignment, perception, and satellite communications. Then, we present several state-of-the-art applications of UAV swarms in wireless communication systems, such us reconfigurable intelligent surface (RIS), virtual reality (VR), semantic communications, and discuss the problems and challenges that DL-enabled UAV swarms can solve in these applications. Finally, we describe open problems of using DL in UAV swarms and future research directions of DL enabled UAV swarms. In summary, this survey provides a comprehensive survey of various DL applications for UAV swarms in extensive scenarios.

translated by 谷歌翻译

UAV-Assisted Space-Air-Ground Integrated Networks: A Technical Review of Recent Learning Algorithms

Atefeh H. Arani , Peng Hu , Yeying Zhu

分类：机器学习

2022-11-27

Recent technological advancements in space, air and ground components have made possible a new network paradigm called "space-air-ground integrated network" (SAGIN). Unmanned aerial vehicles (UAVs) play a key role in SAGINs. However, due to UAVs' high dynamics and complexity, the real-world deployment of a SAGIN becomes a major barrier for realizing such SAGINs. Compared to the space and terrestrial components, UAVs are expected to meet performance requirements with high flexibility and dynamics using limited resources. Therefore, employing UAVs in various usage scenarios requires well-designed planning in algorithmic approaches. In this paper, we provide a comprehensive review of recent learning-based algorithmic approaches. We consider possible reward functions and discuss the state-of-the-art algorithms for optimizing the reward functions, including Q-learning, deep Q-learning, multi-armed bandit (MAB), particle swarm optimization (PSO) and satisfaction-based learning algorithms. Unlike other survey papers, we focus on the methodological perspective of the optimization problem, which can be applicable to various UAV-assisted missions on a SAGIN using these algorithms. We simulate users and environments according to real-world scenarios and compare the learning-based and PSO-based methods in terms of throughput, load, fairness, computation time, etc. We also implement and evaluate the 2-dimensional (2D) and 3-dimensional (3D) variations of these algorithms to reflect different deployment cases. Our simulation suggests that the $3$D satisfaction-based learning algorithm outperforms the other approaches for various metrics in most cases. We discuss some open challenges at the end and our findings aim to provide design guidelines for algorithm selections while optimizing the deployment of UAV-assisted SAGINs.

translated by 谷歌翻译

Computation Offloading and Resource Allocation in F-RANs: A Federated Deep Reinforcement Learning Approach

Lingling Zhang , Yanxiang Jiang , Fu-Chun Zheng , Mehdi Bennis , Xiaohu You

分类：机器学习 | 人工智能

2022-06-13

FOG无线电访问网络（F-RAN）是一项有前途的技术，用户移动设备（MDS）可以将计算任务卸载到附近的FOG接入点（F-APS）。由于F-APS的资源有限，因此设计有效的任务卸载方案很重要。在本文中，通过考虑随时间变化的网络环境，制定了F-RAN中的动态计算卸载和资源分配问题，以最大程度地减少MD的任务执行延迟和能源消耗。为了解决该问题，提出了基于联合的深入强化学习（DRL）算法，其中深层确定性策略梯度（DDPG）算法在每个F-AP中执行计算卸载和资源分配。利用联合学习来培训DDPG代理，以降低培训过程的计算复杂性并保护用户隐私。仿真结果表明，与其他现有策略相比，提议的联合DDPG算法可以更快地实现MDS更快的任务执行延迟和能源消耗。

translated by 谷歌翻译

FIRE: A Failure-Adaptive Reinforcement Learning Framework for Edge Computing Migrations

Marie Siew , Shikhar Sharma , Kun Guo , Chao Xu , Tony Q. S. Quek , Carlee Joe-Wong

分类：机器学习

2022-09-28

在边缘计算中，必须根据用户移动性迁移用户的服务配置文件。已经提出了强化学习（RL）框架。然而，这些框架并不考虑偶尔的服务器故障，尽管很少会阻止Edge Computing用户的延迟敏感应用程序（例如自动驾驶和实时障碍物检测）的平稳和安全功能，因为用户的计算作业不再是完全的。由于这些故障的发生率很低，因此，RL算法本质上很难为数据驱动的算法学习针对典型事件和罕见事件方案的最佳服务迁移解决方案。因此，我们引入了罕见的事件自适应弹性框架火，该框架将重要性采样集成到加强学习中以放置备份服务。我们以与其对价值函数的贡献成正比的稀有事件进行采样，以学习最佳政策。我们的框架平衡了服务迁移和迁移成本之间的迁移权衡，与失败的成本以及备份放置和移民的成本。我们提出了一种基于重要性抽样的Q-学习算法，并证明其界限和收敛到最佳性。随后，我们提出了新的资格轨迹，我们的算法的线性函数近似和深Q学习版本，以确保其扩展到现实世界情景。我们扩展框架，以适应具有不同风险承受失败的用户。最后，我们使用痕量驱动的实验表明我们的算法在发生故障时会降低成本。

translated by 谷歌翻译

Unified, User and Task (UUT) Centered Artificial Intelligence for Metaverse Edge Computing

Terence Jie Chua , Wenhan Yu , Jun Zhao

分类：人工智能 | 机器学习

2022-12-19

The Metaverse can be considered the extension of the present-day web, which integrates the physical and virtual worlds, delivering hyper-realistic user experiences. The inception of the Metaverse brings forth many ecosystem services such as content creation, social entertainment, in-world value transfer, intelligent traffic, healthcare. These services are compute-intensive and require computation offloading onto a Metaverse edge computing server (MECS). Existing Metaverse edge computing approaches do not efficiently and effectively handle resource allocation to ensure a fluid, seamless and hyper-realistic Metaverse experience required for Metaverse ecosystem services. Therefore, we introduce a new Metaverse-compatible, Unified, User and Task (UUT) centered artificial intelligence (AI)- based mobile edge computing (MEC) paradigm, which serves as a concept upon which future AI control algorithms could be built to develop a more user and task-focused MEC.

translated by 谷歌翻译

A Broad-persistent Advising Approach for Deep Interactive Reinforcement Learning in Robotic Environments

Hung Son Nguyen , Francisco Cruz , Richard Dazeley

分类：机器人 | 人工智能

2021-10-15

深度加强学习（DEEPRL）方法已广泛用于机器人学，以了解环境，自主获取行为。深度互动强化学习（Deepirl）包括来自外部培训师或专家的互动反馈，提供建议，帮助学习者选择采取行动以加快学习过程。但是，目前的研究仅限于仅为特工现任提供可操作建议的互动。另外，在单个使用之后，代理丢弃该信息，该用途在为Revisit以相同状态引起重复过程。在本文中，我们提出了广泛的建议（BPA），这是一种广泛的持久的咨询方法，可以保留并重新使用加工信息。它不仅可以帮助培训师提供与类似状态相关的更一般性建议，而不是仅仅是当前状态，而且还允许代理加快学习过程。我们在两个连续机器人场景中测试提出的方法，即购物车极衡任务和模拟机器人导航任务。所得结果表明，使用BPA的代理的性能在于与深层方法相比保持培训师所需的相互作用的数量。

translated by 谷歌翻译

Federated Meta-Learning for Traffic Steering in O-RAN

Hakan Erdol , Xiaoyang Wang , Peizheng Li , Jonathan D. Thomas , Robert Piechocki , George Oikonomou , Rui Inacio , Abdelrahim Ahmad , Keith Briggs , Shipra Kapoor

分类：机器学习

2022-09-13

与LTE网络相比，5G的愿景在于提供较高的数据速率，低延迟（为了实现近实时应用程序），大大增加了基站容量以及用户的接近完美服务质量（QoS）。为了提供此类服务，5G系统将支持LTE，NR，NR-U和Wi-Fi等访问技术的各种组合。每种无线电访问技术（RAT）都提供不同类型的访问，这些访问应在用户中对其进行最佳分配和管理。除了资源管理外，5G系统还将支持双重连接服务。因此，网络的编排对于系统经理在旧式访问技术方面来说是一个更困难的问题。在本文中，我们提出了一种基于联合元学习（FML）的大鼠分配算法，该算法使RAN Intelligent Controller（RIC）能够更快地适应动态变化的环境。我们设计了一个包含LTE和5G NR服务技术的模拟环境。在模拟中，我们的目标是在传输的截止日期内满足UE需求，以提供更高的QoS值。我们将提出的算法与单个RL试剂，爬行动物算法和基于规则的启发式方法进行了比较。仿真结果表明，提出的FML方法分别在第一部部署回合21％和12％时达到了较高的缓存率。此外，在比较方法中，提出的方法最快地适应了新任务和环境。

translated by 谷歌翻译

DRL-M4MR: An Intelligent Multicast Routing Approach Based on DQN Deep Reinforcement Learning in SDN

Chenwei Zhao , Miao Ye , Xingsi Xue , Jianhui Lv , Qiuxiang Jiang , Yong Wang

分类：人工智能

2022-07-31

传统的多播路由方法在构建多播树时存在一些问题，例如对网络状态信息的访问有限，对网络的动态和复杂变化的适应性不佳以及不灵活的数据转发。为了解决这些缺陷，软件定义网络（SDN）中的最佳多播路由问题是根据多目标优化问题量身定制的，以及基于深Q网络（DQN）深度强化学习（DQN）的智能多播路由算法DRL-M4MR（ DRL）方法旨在构建SDN中的多播树。首先，通过组合SDN的全局视图和控制，将多播树状态矩阵，链路带宽矩阵，链路延迟矩阵和链路延迟损耗矩阵设计为DRL代理的状态空间。其次，代理的动作空间是网络中的所有链接，而动作选择策略旨在将链接添加到四种情况下的当前多播树。第三，单步和最终奖励功能表格旨在指导智能以做出决定以构建最佳多播树。实验结果表明，与现有算法相比，DRL-M4MR的多播树结构可以在训练后获得更好的带宽，延迟和数据包损耗率，并且可以在动态网络环境中做出更智能的多播路由决策。

translated by 谷歌翻译

Deep Reinforcement Learning Microgrid Optimization Strategy Considering Priority Flexible Demand Side

Jinsong Sang , Hongbin Sun , Lei Kou

分类：机器学习 | 人工智能

2022-11-11

As an efficient way to integrate multiple distributed energy resources and the user side, a microgrid is mainly faced with the problems of small-scale volatility, uncertainty, intermittency and demand-side uncertainty of DERs. The traditional microgrid has a single form and cannot meet the flexible energy dispatch between the complex demand side and the microgrid. In response to this problem, the overall environment of wind power, thermostatically controlled loads, energy storage systems, price-responsive loads and the main grid is proposed. Secondly, the centralized control of the microgrid operation is convenient for the control of the reactive power and voltage of the distributed power supply and the adjustment of the grid frequency. However, there is a problem in that the flexible loads aggregate and generate peaks during the electricity price valley. The existing research takes into account the power constraints of the microgrid and fails to ensure a sufficient supply of electric energy for a single flexible load. This paper considers the response priority of each unit component of TCLs and ESSs on the basis of the overall environment operation of the microgrid so as to ensure the power supply of the flexible load of the microgrid and save the power input cost to the greatest extent. Finally, the simulation optimization of the environment can be expressed as a Markov decision process process. It combines two stages of offline and online operations in the training process. The addition of multiple threads with the lack of historical data learning leads to low learning efficiency. The asynchronous advantage actor-critic with the experience replay pool memory library is added to solve the data correlation and nonstatic distribution problems during training.

translated by 谷歌翻译

Dyna-T: Dyna-Q and Upper Confidence Bounds Applied to Trees

Tarek Faycal , Claudio Zito

分类：机器学习 | 人工智能

2022-01-12

在这项工作中，我们提出了一种初步调查一种名为DYNA-T的新算法。在钢筋学习（RL）中，规划代理有自己的环境表示作为模型。要发现与环境互动的最佳政策，代理商会收集试验和错误时尚的经验。经验可用于学习更好的模型或直接改进价值函数和政策。通常是分离的，Dyna-Q是一种混合方法，在每次迭代，利用真实体验更新模型以及值函数，同时使用模拟数据从其模型中的应用程序进行行动。然而，规划过程是计算昂贵的并且强烈取决于国家行动空间的维度。我们建议在模拟体验上构建一个上置信树（UCT），并在在线学习过程中搜索要选择的最佳动作。我们证明了我们提出的方法对来自Open AI的三个测试平台环境的一系列初步测试的有效性。与Dyna-Q相比，Dyna-T通过选择更强大的动作选择策略来优于随机环境中的最先进的RL代理。

translated by 谷歌翻译