智能论文笔记

Deep Reinforcement Learning for Autonomous Driving: A Survey

B Ravi Kiran , Ibrahim Sobh , Victor Talpaert , Patrick Mannion , Ahmad A. Al Sallab , Senthil Yogamani , Patrick Pérez

分类：

2020-02-02

With the development of deep representation learning, the domain of reinforcement learning (RL) has become a powerful learning framework now capable of learning complex policies in high dimensional environments. This review summarises deep reinforcement learning (DRL) algorithms and provides a taxonomy of automated driving tasks where (D)RL methods have been employed, while addressing key computational challenges in real world deployment of autonomous driving agents. It also delineates adjacent domains such as behavior cloning, imitation learning, inverse reinforcement learning that are related but are not classical RL algorithms. The role of simulators in training agents, methods to validate, test and robustify existing solutions in RL are discussed.

translated by 谷歌翻译

Deep reinforcement learning: A brief survey

分类：

Deep reinforcement learning is poised to revolutionise the field of AI and represents a step towards building autonomous systems with a higher level understanding of the visual world. Currently, deep learning is enabling reinforcement learning to scale to problems that were previously intractable, such as learning to play video games directly from pixels. Deep reinforcement learning algorithms are also applied to robotics, allowing control policies for robots to be learned directly from camera inputs in the real world. In this survey, we begin with an introduction to the general field of reinforcement learning, then progress to the main streams of value-based and policybased methods. Our survey will cover central algorithms in deep reinforcement learning, including the deep Q-network, trust region policy optimisation, and asynchronous advantage actor-critic. In parallel, we highlight the unique advantages of deep neural networks, focusing on visual understanding via reinforcement learning. To conclude, we describe several current areas of research within the field.

translated by 谷歌翻译

A Survey of Deep Learning Techniques for Autonomous Driving

Sorin Grigorescu , Bogdan Trasnea , Tiberiu Cocias , Gigel Macesanu

分类：

2019-10-17

The last decade witnessed increasingly rapid progress in self-driving vehicle technology, mainly backed up by advances in the area of deep learning and artificial intelligence. The objective of this paper is to survey the current state-of-the-art on deep learning technologies used in autonomous driving. We start by presenting AI-based self-driving architectures, convolutional and recurrent neural networks, as well as the deep reinforcement learning paradigm. These methodologies form a base for the surveyed driving scene perception, path planning, behavior arbitration and motion control algorithms. We investigate both the modular perception-planning-action pipeline, where each module is built using deep learning methods, as well as End2End systems, which directly map sensory information to steering commands. Additionally, we tackle current challenges encountered in designing AI architectures for autonomous driving, such as their safety, training data sources and computational hardware. The comparison presented in this survey helps to gain insight into the strengths and limitations of deep learning and AI approaches for autonomous driving and assist with design choices. 1

translated by 谷歌翻译

Autonomous Unmanned Aerial Vehicle Navigation using Reinforcement Learning: A Systematic Review

Fadi AlMahamid , Katarina Grolinger

分类：机器人 | 人工智能

2022-08-25

在包装交付，交通监控，搜索和救援操作以及军事战斗订婚等不同应用中，对使用无人驾驶汽车（UAV）（无人机）的需求越来越不断增加。在所有这些应用程序中，无人机用于自动导航环境 - 没有人类互动，执行特定任务并避免障碍。自主无人机导航通常是使用强化学习（RL）来完成的，在该学习中，代理在域中充当专家在避免障碍的同时导航环境。了解导航环境和算法限制在选择适当的RL算法以有效解决导航问题方面起着至关重要的作用。因此，本研究首先确定了无人机导航任务，并讨论导航框架和仿真软件。接下来，根据环境，算法特征，能力和不同无人机导航问题的应用程序对RL算法进行分类和讨论，这将帮助从业人员和研究人员为其无人机导航使用情况选择适当的RL算法。此外，确定的差距和机会将推动无人机导航研究。

translated by 谷歌翻译

Exploration in Deep Reinforcement Learning: A Comprehensive Survey

Tianpei Yang , Hongyao Tang , Chenjia Bai , Jinyi Liu , Jianye Hao , Zhaopeng Meng , Peng Liu , Zhen Wang

分类：人工智能 | 机器学习

2021-09-14

深度强化学习（DRL）和深度多机构的强化学习（MARL）在包括游戏AI，自动驾驶汽车，机器人技术等各种领域取得了巨大的成功。但是，众所周知，DRL和Deep MARL代理的样本效率低下，即使对于相对简单的问题设置，通常也需要数百万个相互作用，从而阻止了在实地场景中的广泛应用和部署。背后的一个瓶颈挑战是众所周知的探索问题，即如何有效地探索环境和收集信息丰富的经验，从而使政策学习受益于最佳研究。在稀疏的奖励，吵闹的干扰，长距离和非平稳的共同学习者的复杂环境中，这个问题变得更加具有挑战性。在本文中，我们对单格和多代理RL的现有勘探方法进行了全面的调查。我们通过确定有效探索的几个关键挑战开始调查。除了上述两个主要分支外，我们还包括其他具有不同思想和技术的著名探索方法。除了算法分析外，我们还对一组常用基准的DRL进行了全面和统一的经验比较。根据我们的算法和实证研究，我们终于总结了DRL和Deep Marl中探索的公开问题，并指出了一些未来的方向。

translated by 谷歌翻译

Graph Reinforcement Learning Application to Co-operative Decision-Making in Mixed Autonomy Traffic: Framework, Survey, and Challenges

Qi Liu , Xueyuan Li , Zirui Li , Jingda Wu , Guodong Du , Xin Gao , Fan Yang , Shihua Yuan

分类：机器人

2022-11-06

Proper functioning of connected and automated vehicles (CAVs) is crucial for the safety and efficiency of future intelligent transport systems. Meanwhile, transitioning to fully autonomous driving requires a long period of mixed autonomy traffic, including both CAVs and human-driven vehicles. Thus, collaboration decision-making for CAVs is essential to generate appropriate driving behaviors to enhance the safety and efficiency of mixed autonomy traffic. In recent years, deep reinforcement learning (DRL) has been widely used in solving decision-making problems. However, the existing DRL-based methods have been mainly focused on solving the decision-making of a single CAV. Using the existing DRL-based methods in mixed autonomy traffic cannot accurately represent the mutual effects of vehicles and model dynamic traffic environments. To address these shortcomings, this article proposes a graph reinforcement learning (GRL) approach for multi-agent decision-making of CAVs in mixed autonomy traffic. First, a generic and modular GRL framework is designed. Then, a systematic review of DRL and GRL methods is presented, focusing on the problems addressed in recent research. Moreover, a comparative study on different GRL methods is further proposed based on the designed framework to verify the effectiveness of GRL methods. Results show that the GRL methods can well optimize the performance of multi-agent decision-making for CAVs in mixed autonomy traffic compared to the DRL methods. Finally, challenges and future research directions are summarized. This study can provide a valuable research reference for solving the multi-agent decision-making problems of CAVs in mixed autonomy traffic and can promote the implementation of GRL-based methods into intelligent transportation systems. The source code of our work can be found at https://github.com/Jacklinkk/Graph_CAVs.

translated by 谷歌翻译

DDPG car-following model with real-world human driving experience in CARLA

Dianzhao Li , Ostap Okhrin

分类：机器人 | 机器学习

2021-12-29

在自主驾驶场中，人类知识融合到深增强学习（DRL）通常基于在模拟环境中记录的人类示范。这限制了在现实世界交通中的概率和可行性。我们提出了一种两级DRL方法，从真实的人类驾驶中学习，实现优于纯DRL代理的性能。培训DRL代理商是在Carla的框架内完成了机器人操作系统（ROS）。对于评估，我们设计了不同的真实驾驶场景，可以将提出的两级DRL代理与纯DRL代理进行比较。在从人驾驶员中提取“良好”行为之后，例如在信号交叉口中的预期，该代理变得更有效，并且驱动更安全，这使得这种自主代理更适应人体机器人交互（HRI）流量。

translated by 谷歌翻译

A Survey on Reinforcement Learning in Aviation Applications

Pouria Razzaghi , Amin Tabrizian , Wei Guo , Shulu Chen , Abenezer Taye , Ellis Thompson , Alexis Bregeon , Ali Baheri , Peng Wei

分类：机器学习

2022-11-03

Compared with model-based control and optimization methods, reinforcement learning (RL) provides a data-driven, learning-based framework to formulate and solve sequential decision-making problems. The RL framework has become promising due to largely improved data availability and computing power in the aviation industry. Many aviation-based applications can be formulated or treated as sequential decision-making problems. Some of them are offline planning problems, while others need to be solved online and are safety-critical. In this survey paper, we first describe standard RL formulations and solutions. Then we survey the landscape of existing RL-based applications in aviation. Finally, we summarize the paper, identify the technical gaps, and suggest future directions of RL research in aviation.

translated by 谷歌翻译

Imitation learning: A survey of learning methods

分类：

Imitation learning techniques aim to mimic human behavior in a given task. An agent (a learning machine) is trained to perform a task from demonstrations by learning a mapping between observations and actions. The idea of teaching by imitation has been around for many years, however, the field is gaining attention recently due to advances in computing and sensing as well as rising demand for intelligent applications. The paradigm of learning by imitation is gaining popularity because it facilitates teaching complex tasks with minimal expert knowledge of the tasks. Generic imitation learning methods could potentially reduce the problem of teaching a task to that of providing demonstrations; without the need for explicit programming or designing reward functions specific to the task. Modern sensors are able to collect and transmit high volumes of data rapidly, and processors with high computational power allow fast processing that maps the sensory data to actions in a timely manner. This opens the door for many potential AI applications that require real-time perception and reaction such as humanoid robots, self-driving vehicles, human computer interaction and computer games to name a few. However, specialized algorithms are needed to effectively and robustly learn models as learning by imitation poses its own set of challenges. In this paper, we survey imitation learning methods and present design options in different steps of the learning process. We introduce a background and motivation for the field as well as highlight challenges specific to the imitation problem. Methods for designing and evaluating imitation learning tasks are categorized and reviewed. Special attention is given to learning methods in robotics and games as these domains are the most popular in the literature and provide a wide array of problems and methodologies. We extensively discuss combining imitation learning approaches using different sources and methods, as well as incorporating other motion learning methods to enhance imitation. We also discuss the potential impact on industry, present major applications and highlight current and future research directions.

translated by 谷歌翻译

Hierarchical Reinforcement Learning with Opponent Modeling for Distributed Multi-agent Cooperation

Zhixuan Liang , Jiannong Cao , Shan Jiang , Divya Saxena , Huafeng Xu

分类：人工智能 | 机器人

2022-06-25

许多现实世界的应用程序都可以作为多机构合作问题进行配置，例如网络数据包路由和自动驾驶汽车的协调。深入增强学习（DRL）的出现为通过代理和环境的相互作用提供了一种有前途的多代理合作方法。但是，在政策搜索过程中，传统的DRL解决方案遭受了多个代理具有连续动作空间的高维度。此外，代理商政策的动态性使训练非平稳。为了解决这些问题，我们建议采用高级决策和低水平的个人控制，以进行有效的政策搜索，提出一种分层增强学习方法。特别是，可以在高级离散的动作空间中有效地学习多个代理的合作。同时，低水平的个人控制可以减少为单格强化学习。除了分层增强学习外，我们还建议对手建模网络在学习过程中对其他代理的政策进行建模。与端到端的DRL方法相反，我们的方法通过以层次结构将整体任务分解为子任务来降低学习的复杂性。为了评估我们的方法的效率，我们在合作车道变更方案中进行了现实世界中的案例研究。模拟和现实世界实验都表明我们的方法在碰撞速度和收敛速度中的优越性。

translated by 谷歌翻译

A Survey on Model-based Reinforcement Learning

Fan-Ming Luo , Tian Xu , Hang Lai , Xiong-Hui Chen , Weinan Zhang , Yang Yu

分类：机器学习

2022-06-19

强化学习（RL）通过与环境相互作用的试验过程解决顺序决策问题。尽管RL在玩复杂的视频游戏方面取得了巨大的成功，但在现实世界中，犯错误总是不希望的。为了提高样本效率并从而降低错误，据信基于模型的增强学习（MBRL）是一个有前途的方向，它建立了环境模型，在该模型中可以进行反复试验，而无需实际成本。在这项调查中，我们对MBRL进行了审查，重点是Deep RL的最新进展。对于非壮观环境，学到的环境模型与真实环境之间始终存在概括性错误。因此，非常重要的是分析环境模型中的政策培训与实际环境中的差异，这反过来又指导了更好的模型学习，模型使用和政策培训的算法设计。此外，我们还讨论了其他形式的RL，包括离线RL，目标条件RL，多代理RL和Meta-RL的最新进展。此外，我们讨论了MBRL在现实世界任务中的适用性和优势。最后，我们通过讨论MBRL未来发展的前景来结束这项调查。我们认为，MBRL在被忽略的现实应用程序中具有巨大的潜力和优势，我们希望这项调查能够吸引更多关于MBRL的研究。

translated by 谷歌翻译

Recent Advances in Reinforcement Learning in Finance

Ben Hambly , Renyuan Xu , Huining Yang

分类：机器学习

2021-12-08

由于数据量增加，金融业的快速变化已经彻底改变了数据处理和数据分析的技术，并带来了新的理论和计算挑战。与古典随机控制理论和解决财务决策问题的其他分析方法相比，解决模型假设的财务决策问题，强化学习（RL）的新发展能够充分利用具有更少模型假设的大量财务数据并改善复杂的金融环境中的决策。该调查纸目的旨在审查最近的资金途径的发展和使用RL方法。我们介绍了马尔可夫决策过程，这是许多常用的RL方法的设置。然后引入各种算法，重点介绍不需要任何模型假设的基于价值和基于策略的方法。连接是用神经网络进行的，以扩展框架以包含深的RL算法。我们的调查通过讨论了这些RL算法在金融中各种决策问题中的应用，包括最佳执行，投资组合优化，期权定价和对冲，市场制作，智能订单路由和Robo-Awaring。

translated by 谷歌翻译

Partially Observable Markov Decision Processes in Robotics: A Survey

Mikko Lauri , David Hsu , Joni Pajarinen

分类：机器人 | 人工智能

2022-09-21

嘈杂的传感，不完美的控制和环境变化是许多现实世界机器人任务的定义特征。部分可观察到的马尔可夫决策过程（POMDP）提供了一个原则上的数学框架，用于建模和解决不确定性下的机器人决策和控制任务。在过去的十年中，它看到了许多成功的应用程序，涵盖了本地化和导航，搜索和跟踪，自动驾驶，多机器人系统，操纵和人类机器人交互。这项调查旨在弥合POMDP模型的开发与算法之间的差距，以及针对另一端的不同机器人决策任务的应用。它分析了这些任务的特征，并将它们与POMDP框架的数学和算法属性联系起来，以进行有效的建模和解决方案。对于从业者来说，调查提供了一些关键任务特征，以决定何时以及如何成功地将POMDP应用于机器人任务。对于POMDP算法设计师，该调查为将POMDP应用于机器人系统的独特挑战提供了新的见解，并指出了有希望的新方向进行进一步研究。

translated by 谷歌翻译

Acme: A Research Framework for Distributed Reinforcement Learning

Matthew W. Hoffman , Bobak Shahriari , John Aslanides , Gabriel Barth-Maron , Nikola Momchev , Danila Sinopalnikov , Piotr Stańczyk , Sabela Ramos , Anton Raichuk , Damien Vincent

分类：机器学习 | 人工智能

2020-06-01

深度强化学习（RL）导致了许多最近和开创性的进步。但是，这些进步通常以培训的基础体系结构的规模增加以及用于训练它们的RL算法的复杂性提高，而均以增加规模的成本。这些增长反过来又使研究人员更难迅速原型新想法或复制已发表的RL算法。为了解决这些问题，这项工作描述了ACME，这是一个用于构建新型RL算法的框架，这些框架是专门设计的，用于启用使用简单的模块化组件构建的代理，这些组件可以在各种执行范围内使用。尽管ACME的主要目标是为算法开发提供一个框架，但第二个目标是提供重要或最先进算法的简单参考实现。这些实现既是对我们的设计决策的验证，也是对RL研究中可重复性的重要贡献。在这项工作中，我们描述了ACME内部做出的主要设计决策，并提供了有关如何使用其组件来实施各种算法的进一步详细信息。我们的实验为许多常见和最先进的算法提供了基准，并显示了如何为更大且更复杂的环境扩展这些算法。这突出了ACME的主要优点之一，即它可用于实现大型，分布式的RL算法，这些算法可以以较大的尺度运行，同时仍保持该实现的固有可读性。这项工作提出了第二篇文章的版本，恰好与模块化的增加相吻合，对离线，模仿和从演示算法学习以及作为ACME的一部分实现的各种新代理。

translated by 谷歌翻译

Automated Reinforcement Learning: An Overview

Reza Refaei Afshar , Yingqian Zhang , Joaquin Vanschoren , Uzay Kaymak

分类：机器学习 | 人工智能

2022-01-13

强化学习和最近的深度增强学习是解决如Markov决策过程建模的顺序决策问题的流行方法。问题和选择算法和超参数的RL建模需要仔细考虑，因为不同的配置可能需要完全不同的性能。这些考虑因素主要是RL专家的任务;然而，RL在研究人员和系统设计师不是RL专家的其他领域中逐渐变得流行。此外，许多建模决策，例如定义状态和动作空间，批次的大小和批量更新的频率以及时间戳的数量通常是手动进行的。由于这些原因，RL框架的自动化不同组成部分具有重要意义，近年来它引起了很多关注。自动RL提供了一个框架，其中RL的不同组件包括MDP建模，算法选择和超参数优化是自动建模和定义的。在本文中，我们探讨了可以在自动化RL中使用的文献和目前的工作。此外，我们讨论了Autorl中的挑战，打开问题和研究方向。

translated by 谷歌翻译

On Transforming Reinforcement Learning by Transformer: The Development Trajectory

Shengchao Hu , Li Shen , Ya Zhang , Yixin Chen , Dacheng Tao

分类：机器学习 | 人工智能

2022-12-29

Transformer, originally devised for natural language processing, has also attested significant success in computer vision. Thanks to its super expressive power, researchers are investigating ways to deploy transformers to reinforcement learning (RL) and the transformer-based models have manifested their potential in representative RL benchmarks. In this paper, we collect and dissect recent advances on transforming RL by transformer (transformer-based RL or TRL), in order to explore its development trajectory and future trend. We group existing developments in two categories: architecture enhancement and trajectory optimization, and examine the main applications of TRL in robotic manipulation, text-based games, navigation and autonomous driving. For architecture enhancement, these methods consider how to apply the powerful transformer structure to RL problems under the traditional RL framework, which model agents and environments much more precisely than deep RL methods, but they are still limited by the inherent defects of traditional RL algorithms, such as bootstrapping and "deadly triad". For trajectory optimization, these methods treat RL problems as sequence modeling and train a joint state-action model over entire trajectories under the behavior cloning framework, which are able to extract policies from static datasets and fully use the long-sequence modeling capability of the transformer. Given these advancements, extensions and challenges in TRL are reviewed and proposals about future direction are discussed. We hope that this survey can provide a detailed introduction to TRL and motivate future research in this rapidly developing field.

translated by 谷歌翻译

Closing the Planning-Learning Loop with Application to Autonomous Driving

Panpan Cai , David Hsu

分类：机器人

2021-01-11

不确定性下的实时计划对于在复杂的动态环境中运行的机器人至关重要。例如，考虑一下，汽车，摩托车，公共汽车等不受监管的城市交通不受监管的自动机器人车辆驾驶。机器人车辆必须在短期和长时间内计划，以便与许多具有不确定意图和不确定意图的交通参与者互动有效驾驶。然而，在很长一段时间内明确规划会产生过度的计算成本，并且在实时限制下是不切实际的。为了实现大规模计划的实时性能，这项工作从树木搜索驾驶（Lets-Drive）中引入了一种新的算法学习，该算法将计划和学习集成到封闭的循环中，并将其应用于拥挤的城市交通中的自动驾驶在模拟中。具体而言，让我们驱动器从在线规划者提供的数据中学习策略及其价值函数，该数据搜索了稀疏采样的信念树；在线规划师依次使用学习的策略和价值功能作为启发式方法来扩展其运行时性能，以实现实时机器人控制。重复这两个步骤以形成一个封闭的循环，以便计划者和学习者相互通知并同步改进。该算法以自我监督的方式自行学习，而无需人工努力明确的数据标记。实验结果表明，让驱动器的表现优于计划或学习，以及计划和学习的开环集成。

translated by 谷歌翻译

Automated Reinforcement Learning (AutoRL): A Survey and Open Problems

Jack Parker-Holder , Raghu Rajan , Xingyou Song , André Biedenkapp , Yingjie Miao , Theresa Eimer , Baohe Zhang , Vu Nguyen , Roberto Calandra , Aleksandra Faust

分类：机器学习

2022-01-11

深入学习的强化学习（RL）的结合导致了一系列令人印象深刻的壮举，许多相信（深）RL提供了一般能力的代理。然而，RL代理商的成功往往对培训过程中的设计选择非常敏感，这可能需要繁琐和易于易于的手动调整。这使得利用RL对新问题充满挑战，同时也限制了其全部潜力。在许多其他机器学习领域，AutomL已经示出了可以自动化这样的设计选择，并且在应用于RL时也会产生有希望的初始结果。然而，自动化强化学习（AutorL）不仅涉及Automl的标准应用，而且还包括RL独特的额外挑战，其自然地产生了不同的方法。因此，Autorl已成为RL中的一个重要研究领域，提供来自RNA设计的各种应用中的承诺，以便玩游戏等游戏。鉴于RL中考虑的方法和环境的多样性，在不同的子领域进行了大部分研究，从Meta学习到进化。在这项调查中，我们寻求统一自动的领域，我们提供常见的分类法，详细讨论每个区域并对研究人员来说是一个兴趣的开放问题。

translated by 谷歌翻译

DQ-GAT: Towards Safe and Efficient Autonomous Driving with Deep Q-Learning and Graph Attention Networks

Peide Cai , Hengli Wang , Yuxiang Sun , Ming Liu

分类：机器人 | 人工智能 | 机器学习

2021-08-11

在多机构动态交通情况下的自主驾驶具有挑战性：道路使用者的行为不确定，很难明确建模，并且自我车辆应与他们应用复杂的谈判技巧，例如屈服，合并和交付，以实现，以实现在各种环境中都有安全有效的驾驶。在这些复杂的动态场景中，传统的计划方法主要基于规则，并且通常会导致反应性甚至过于保守的行为。因此，他们需要乏味的人类努力来维持可行性。最近，基于深度学习的方法显示出令人鼓舞的结果，具有更好的概括能力，但手工工程的工作较少。但是，它们要么是通过有监督的模仿学习（IL）来实施的，该学习遭受了数据集偏见和分配不匹配问题，要么接受了深入强化学习（DRL）的培训，但专注于一种特定的交通情况。在这项工作中，我们建议DQ-GAT实现可扩展和主动的自主驾驶，在这些驾驶中，基于图形注意力的网络用于隐式建模相互作用，并采用了深层Q学习来以无聊的方式训练网络端到端的网络。。在高保真驾驶模拟器中进行的广泛实验表明，我们的方法比以前的基于学习的方法和传统的基于规则的方法获得了更高的成功率，并且在可见和看不见的情况下都可以更好地摆脱安全性和效率。此外，轨迹数据集的定性结果表明，我们所学的政策可以通过实时速度转移到现实世界中。演示视频可在https://caipeide.github.io/dq-gat/上找到。

translated by 谷歌翻译

Visual processing in context of reinforcement learning

Hlynur Davíð Hlynsson

分类：机器学习

2022-08-26

尽管深度强化学习（RL）最近取得了许多成功，但其方法仍然效率低下，这使得在数据方面解决了昂贵的许多问题。我们的目标是通过利用未标记的数据中的丰富监督信号来进行学习状态表示，以解决这一问题。本文介绍了三种不同的表示算法，可以访问传统RL算法使用的数据源的不同子集使用：（i）GRICA受到独立组件分析（ICA）的启发，并训练深层神经网络以输出统计独立的独立特征。输入。 Grica通过最大程度地减少每个功能与其他功能之间的相互信息来做到这一点。此外，格里卡仅需要未分类的环境状态。（ii）潜在表示预测（LARP）还需要更多的上下文：除了要求状态作为输入外，它还需要先前的状态和连接它们的动作。该方法通过预测当前状态和行动的环境的下一个状态来学习状态表示。预测器与图形搜索算法一起使用。（iii）重新培训通过训练深层神经网络来学习国家表示，以学习奖励功能的平滑版本。该表示形式用于预处理输入到深度RL，而奖励预测指标用于奖励成型。此方法仅需要环境中的状态奖励对学习表示表示。我们发现，每种方法都有其优势和缺点，并从我们的实验中得出结论，包括无监督的代表性学习在RL解决问题的管道中可以加快学习的速度。

translated by 谷歌翻译

HTML版本