智能论文笔记

Multi-agent Deep Reinforcement Learning for Charge-sustaining Control of Multi-mode Hybrid Vehicles

Min Hua , Quan Zhou , Cetengfei Zhang , Hongming Xu , Wei Liu

分类：机器学习

2022-09-06

运输电气化需要越来越多的电动机（例如电动机和电动机存储系统）上的电动机，并且对电动电气的控制通常涉及多个输入和多个输出（MIMO）。本文重点介绍了基于多代理增强学习（MARL）算法的多模式混合动力汽车的能源管理策略的在线优化，该算法旨在解决MIMO控制优化，而大多数现有方法仅处理单个输出控制。基于对基于深层确定性策略梯度（DDPG）基于的MARL算法优化的多模式混合动力汽车（HEV）的能源效率的分析，提出了一种新的与多代理的合作网络物理学习。然后，通过一种新颖的随机方法来设定学习驾驶周期，以加快训练过程。最终，网络设计，学习率和政策噪声被纳入了敏感性分析中，并确定了基于DDPG的算法参数，并研究了与多代理的不同关系的学习绩效，并证明与与不完全独立的关系比率0.2是最好的。与单一代理和多代理的同情研究表明，多代理可以在单一代理方案中获得总能量的4％提高。因此，MAL的多目标控制可以实现良好的优化效果和应用效率。

translated by 谷歌翻译

Progress and summary of reinforcement learning on energy management of MPS-EV

Jincheng Hu , Yang Lin , Liang Chu , Zhuoran Hou , Jihan Li , Jingjing Jiang , Yuanjian Zhang

分类：机器学习

2022-11-08

The high emission and low energy efficiency caused by internal combustion engines (ICE) have become unacceptable under environmental regulations and the energy crisis. As a promising alternative solution, multi-power source electric vehicles (MPS-EVs) introduce different clean energy systems to improve powertrain efficiency. The energy management strategy (EMS) is a critical technology for MPS-EVs to maximize efficiency, fuel economy, and range. Reinforcement learning (RL) has become an effective methodology for the development of EMS. RL has received continuous attention and research, but there is still a lack of systematic analysis of the design elements of RL-based EMS. To this end, this paper presents an in-depth analysis of the current research on RL-based EMS (RL-EMS) and summarizes the design elements of RL-based EMS. This paper first summarizes the previous applications of RL in EMS from five aspects: algorithm, perception scheme, decision scheme, reward function, and innovative training method. The contribution of advanced algorithms to the training effect is shown, the perception and control schemes in the literature are analyzed in detail, different reward function settings are classified, and innovative training methods with their roles are elaborated. Finally, by comparing the development routes of RL and RL-EMS, this paper identifies the gap between advanced RL solutions and existing RL-EMS. Finally, this paper suggests potential development directions for implementing advanced artificial intelligence (AI) solutions in EMS.

translated by 谷歌翻译

Driver Assistance Eco-driving and Transmission Control with Deep Reinforcement Learning

Lindsey Kerbel , Beshah Ayalew , Andrej Ivanco , Keith Loiselle

分类：人工智能 | 机器学习

2022-12-15

With the growing need to reduce energy consumption and greenhouse gas emissions, Eco-driving strategies provide a significant opportunity for additional fuel savings on top of other technological solutions being pursued in the transportation sector. In this paper, a model-free deep reinforcement learning (RL) control agent is proposed for active Eco-driving assistance that trades-off fuel consumption against other driver-accommodation objectives, and learns optimal traction torque and transmission shifting policies from experience. The training scheme for the proposed RL agent uses an off-policy actor-critic architecture that iteratively does policy evaluation with a multi-step return and policy improvement with the maximum posteriori policy optimization algorithm for hybrid action spaces. The proposed Eco-driving RL agent is implemented on a commercial vehicle in car following traffic. It shows superior performance in minimizing fuel consumption compared to a baseline controller that has full knowledge of fuel-efficiency tables.

translated by 谷歌翻译

Data-Driven Transferred Energy Management Strategy for Hybrid Electric Vehicles via Deep Reinforcement Learning

Hao Chen , Gang Guo , Bangbei Tang , Guo Hu , Xiaolin Tang , Teng Liu

分类：人工智能

2020-09-07

Real-time applications of energy management strategies (EMSs) in hybrid electric vehicles (HEVs) are the harshest requirements for researchers and engineers. Inspired by the excellent problem-solving capabilities of deep reinforcement learning (DRL), this paper proposes a real-time EMS via incorporating the DRL method and transfer learning (TL). The related EMSs are derived from and evaluated on the real-world collected driving cycle dataset from Transportation Secure Data Center (TSDC). The concrete DRL algorithm is proximal policy optimization (PPO) belonging to the policy gradient (PG) techniques. For specification, many source driving cycles are utilized for training the parameters of deep network based on PPO. The learned parameters are transformed into the target driving cycles under the TL framework. The EMSs related to the target driving cycles are estimated and compared in different training conditions. Simulation results indicate that the presented transfer DRL-based EMS could effectively reduce time consumption and guarantee control performance.

translated by 谷歌翻译

Empirical Analysis of AI-based Energy Management in Electric Vehicles: A Case Study on Reinforcement Learning

Jincheng Hu , Yang Lin , Jihao Li , Zhuoran Hou , Dezong Zhao , Quan Zhou , Jingjing Jiang , Yuanjian Zhang

分类：人工智能 | 机器学习

2022-12-18

Reinforcement learning-based (RL-based) energy management strategy (EMS) is considered a promising solution for the energy management of electric vehicles with multiple power sources. It has been shown to outperform conventional methods in energy management problems regarding energy-saving and real-time performance. However, previous studies have not systematically examined the essential elements of RL-based EMS. This paper presents an empirical analysis of RL-based EMS in a Plug-in Hybrid Electric Vehicle (PHEV) and Fuel Cell Electric Vehicle (FCEV). The empirical analysis is developed in four aspects: algorithm, perception and decision granularity, hyperparameters, and reward function. The results show that the Off-policy algorithm effectively develops a more fuel-efficient solution within the complete driving cycle compared with other algorithms. Improving the perception and decision granularity does not produce a more desirable energy-saving solution but better balances battery power and fuel consumption. The equivalent energy optimization objective based on the instantaneous state of charge (SOC) variation is parameter sensitive and can help RL-EMSs to achieve more efficient energy-cost strategies.

translated by 谷歌翻译

Residual Policy Learning for Powertrain Control

Lindsey Kerbel , Beshah Ayalew , Andrej Ivanco , Keith Loiselle

分类：人工智能 | 机器学习

2022-12-15

Eco-driving strategies have been shown to provide significant reductions in fuel consumption. This paper outlines an active driver assistance approach that uses a residual policy learning (RPL) agent trained to provide residual actions to default power train controllers while balancing fuel consumption against other driver-accommodation objectives. Using previous experiences, our RPL agent learns improved traction torque and gear shifting residual policies to adapt the operation of the powertrain to variations and uncertainties in the environment. For comparison, we consider a traditional reinforcement learning (RL) agent trained from scratch. Both agents employ the off-policy Maximum A Posteriori Policy Optimization algorithm with an actor-critic architecture. By implementing on a simulated commercial vehicle in various car-following scenarios, we find that the RPL agent quickly learns significantly improved policies compared to a baseline source policy but in some measures not as good as those eventually possible with the RL agent trained from scratch.

translated by 谷歌翻译

Graph Reinforcement Learning Application to Co-operative Decision-Making in Mixed Autonomy Traffic: Framework, Survey, and Challenges

Qi Liu , Xueyuan Li , Zirui Li , Jingda Wu , Guodong Du , Xin Gao , Fan Yang , Shihua Yuan

分类：机器人

2022-11-06

Proper functioning of connected and automated vehicles (CAVs) is crucial for the safety and efficiency of future intelligent transport systems. Meanwhile, transitioning to fully autonomous driving requires a long period of mixed autonomy traffic, including both CAVs and human-driven vehicles. Thus, collaboration decision-making for CAVs is essential to generate appropriate driving behaviors to enhance the safety and efficiency of mixed autonomy traffic. In recent years, deep reinforcement learning (DRL) has been widely used in solving decision-making problems. However, the existing DRL-based methods have been mainly focused on solving the decision-making of a single CAV. Using the existing DRL-based methods in mixed autonomy traffic cannot accurately represent the mutual effects of vehicles and model dynamic traffic environments. To address these shortcomings, this article proposes a graph reinforcement learning (GRL) approach for multi-agent decision-making of CAVs in mixed autonomy traffic. First, a generic and modular GRL framework is designed. Then, a systematic review of DRL and GRL methods is presented, focusing on the problems addressed in recent research. Moreover, a comparative study on different GRL methods is further proposed based on the designed framework to verify the effectiveness of GRL methods. Results show that the GRL methods can well optimize the performance of multi-agent decision-making for CAVs in mixed autonomy traffic compared to the DRL methods. Finally, challenges and future research directions are summarized. This study can provide a valuable research reference for solving the multi-agent decision-making problems of CAVs in mixed autonomy traffic and can promote the implementation of GRL-based methods into intelligent transportation systems. The source code of our work can be found at https://github.com/Jacklinkk/Graph_CAVs.

translated by 谷歌翻译

Energy-Efficient Autonomous Driving Using Cognitive Driver Behavioral Models and Reinforcement Learning

Huayi Li , Nan Li , Ilya Kolmanovsky , Anouck Girard

分类：机器人

2021-11-27

预计自动驾驶技术不仅可以提高移动性和道路安全性，还可以提高能源效率的益处。在可预见的未来，自动车辆（AVS）将在与人机车辆共享的道路上运行。为了保持安全性和活力，同时尽量减少能耗，AV规划和决策过程应考虑自动自动驾驶车辆与周围的人机车辆之间的相互作用。在本章中，我们描述了一种通过基于认知层次理论和强化学习开发人的驾驶员行为建模来开发共用道路上的节能自主驾驶政策的框架。

translated by 谷歌翻译

Learning the policy for mixed electric platoon control of automated and human-driven vehicles at signalized intersection: a random search approach

Xia Jiang , Jian Zhang , Xiaoyu Shi , Jian Cheng

分类：机器人

2022-06-24

在过去的几十年中，车辆的升级和更新加速了。出于对环境友好和情报的需求，电动汽车（EV）以及连接和自动化的车辆（CAVS）已成为运输系统的新组成部分。本文开发了一个增强学习框架，以在信号交叉点上对由骑士和人类驱动车辆（HDV）组成的电力排实施自适应控制。首先，提出了马尔可夫决策过程（MDP）模型来描述混合排的决策过程。新颖的状态表示和奖励功能是为模型设计的，以考虑整个排的行为。其次，为了处理延迟的奖励，提出了增强的随机搜索（ARS）算法。代理商所学到的控制政策可以指导骑士的纵向运动，后者是排的领导者。最后，在模拟套件相扑中进行了一系列模拟。与几种最先进的（SOTA）强化学习方法相比，提出的方法可以获得更高的奖励。同时，仿真结果证明了延迟奖励的有效性，延迟奖励的有效性均优于分布式奖励机制}与正常的汽车跟随行为相比，灵敏度分析表明，可以将能量保存到不同的扩展（39.27％-82.51％））通过调整优化目标的相对重要性。在没有牺牲行进延迟的前提下，建议的控制方法可以节省多达53.64％的电能。

translated by 谷歌翻译

Battery and Hydrogen Energy Storage Control in a Smart Energy Network with Flexible Energy Demand using Deep Reinforcement Learning

Cephas Samende , Zhong Fan , Jun Cao

分类：人工智能 | 机器学习

2022-08-26

智能能源网络提供了一种有效的手段，可容纳可变可再生能源（例如太阳能和风能）的高渗透率，这是能源生产深度脱碳的关键。但是，鉴于可再生能源以及能源需求的可变性，必须制定有效的控制和能源存储方案来管理可变的能源产生并实现所需的系统经济学和环境目标。在本文中，我们引入了由电池和氢能存储组成的混合储能系统，以处理与电价，可再生能源生产和消费有关的不确定性。我们旨在提高可再生能源利用率，并最大程度地减少能源成本和碳排放，同时确保网络内的能源可靠性和稳定性。为了实现这一目标，我们提出了一种多代理的深层确定性政策梯度方法，这是一种基于强化的基于强化学习的控制策略，可实时优化混合能源存储系统和能源需求的调度。提出的方法是无模型的，不需要明确的知识和智能能源网络环境的严格数学模型。基于现实世界数据的仿真结果表明：（i）混合储能系统和能源需求的集成和优化操作可将碳排放量减少78.69％，将成本节省的成本储蓄提高23.5％，可续订的能源利用率比13.2％以上。其他基线模型和（ii）所提出的算法优于最先进的自学习算法，例如Deep-Q网络。

translated by 谷歌翻译

HTML版本

Eco-driving for Electric Connected Vehicles at Signalized Intersections: A Parameterized Reinforcement Learning approach

Xia Jiang , Jian Zhang , Dan Li

分类：机器人 | 人工智能

2022-06-24

本文提出了一个基于加固学习（RL）的电动连接车辆（CV）的生态驾驶框架，以提高信号交叉点的车辆能效。通过整合基于型号的汽车策略，改变车道的政策和RL政策来确保车辆代理的安全操作。随后，制定了马尔可夫决策过程（MDP），该过程使车辆能够执行纵向控制和横向决策，从而共同优化了交叉口附近CVS的CAR跟踪和改变车道的行为。然后，将混合动作空间参数化为层次结构，从而在动态交通环境中使用二维运动模式训练代理。最后，我们所提出的方法从基于单车的透视和基于流的透视图中在Sumo软件中进行了评估。结果表明，我们的策略可以通过学习适当的动作方案来大大减少能源消耗，而不会中断其他人类驱动的车辆（HDVS）。

translated by 谷歌翻译

Multi-agent Reinforcement Learning for Cooperative Lane Changing of Connected and Autonomous Vehicles in Mixed Traffic

Wei Zhou , Dong Chen , Jun Yan , Zhaojian Li , Huilin Yin , Wanchen Ge

分类：机器学习

2021-11-11

自动驾驶在过去二十年中吸引了重要的研究兴趣，因为它提供了许多潜在的好处，包括释放驾驶和减轻交通拥堵的司机等。尽管进展有前途，但车道变化仍然是自治车辆（AV）的巨大挑战，特别是在混合和动态的交通方案中。最近，强化学习（RL）是一种强大的数据驱动控制方法，已被广泛探索了在令人鼓舞的效果中的通道中的车道改变决策。然而，这些研究的大多数研究专注于单车展，并且在多个AVS与人类驱动车辆（HDV）共存的情况下，道路变化已经受到稀缺的关注。在本文中，我们在混合交通公路环境中制定了多个AVS的车道改变决策，作为多功能增强学习（Marl）问题，其中每个AV基于相邻AV的动作使车道变化的决定和HDV。具体地，使用新颖的本地奖励设计和参数共享方案开发了一种多代理优势演员批评网络（MA2C）。特别是，提出了一种多目标奖励功能来纳入燃油效率，驾驶舒适度和自主驾驶的安全性。综合实验结果，在三种不同的交通密度和各级人类司机侵略性下进行，表明我们所提出的Marl框架在效率，安全和驾驶员舒适方面始终如一地优于几个最先进的基准。

translated by 谷歌翻译

Hierarchical Reinforcement Learning with Opponent Modeling for Distributed Multi-agent Cooperation

Zhixuan Liang , Jiannong Cao , Shan Jiang , Divya Saxena , Huafeng Xu

分类：人工智能 | 机器人

2022-06-25

许多现实世界的应用程序都可以作为多机构合作问题进行配置，例如网络数据包路由和自动驾驶汽车的协调。深入增强学习（DRL）的出现为通过代理和环境的相互作用提供了一种有前途的多代理合作方法。但是，在政策搜索过程中，传统的DRL解决方案遭受了多个代理具有连续动作空间的高维度。此外，代理商政策的动态性使训练非平稳。为了解决这些问题，我们建议采用高级决策和低水平的个人控制，以进行有效的政策搜索，提出一种分层增强学习方法。特别是，可以在高级离散的动作空间中有效地学习多个代理的合作。同时，低水平的个人控制可以减少为单格强化学习。除了分层增强学习外，我们还建议对手建模网络在学习过程中对其他代理的政策进行建模。与端到端的DRL方法相反，我们的方法通过以层次结构将整体任务分解为子任务来降低学习的复杂性。为了评估我们的方法的效率，我们在合作车道变更方案中进行了现实世界中的案例研究。模拟和现实世界实验都表明我们的方法在碰撞速度和收敛速度中的优越性。

translated by 谷歌翻译

A novel learning-based robust model predictive control energy management strategy for fuel cell electric vehicles

Shibo Li , Zhuoran Hou , Liang Chu , Jingjing Jiang , Yuanjian Zhang

分类：机器学习

2022-09-12

多源机电耦合使燃料电池电动汽车（FCEV）的能源管理相对非线性和复杂，尤其是在4轮驱动（4WD）FCEV的类型中。复杂的非线性系统的准确观察状态是FCEV中出色的能源管理的基础。为了释放FCEV的节能潜力，为4WD FCEV提出了一种基于学习的新型鲁棒模型预测控制（LRMPC）策略，从而有助于多个能源之间的合适功率分布。基于机器学习（ML）的精心设计的策略将非线性系统的知识转化为具有出色稳健性能的显式控制方案。首先，具有高回归准确性和出色概括能力的ML方法是离线训练的，以建立SOC的精确状态观察者。然后，使用国家观察者生成的SOC的显式数据表用于抓住准确的状态更改，其输入功能包括车辆状态和车辆组件状态。具体来说，提供未来速度参考的车辆速度估计是由深森林构建的。接下来，将包括显式数据表和车辆速度估计的组件与模型预测控制（MPC）结合使用，以释放FCEV中多释放系统的最新能源节能能力，其名称是LRMPC。最后，在模拟测试中进行详细评估以验证LRMPC的进步性能。相应的结果突出了LRMPC的最佳控制效应和强大的实时应用能力。

translated by 谷歌翻译

A Survey on Reinforcement Learning in Aviation Applications

Pouria Razzaghi , Amin Tabrizian , Wei Guo , Shulu Chen , Abenezer Taye , Ellis Thompson , Alexis Bregeon , Ali Baheri , Peng Wei

分类：机器学习

2022-11-03

Compared with model-based control and optimization methods, reinforcement learning (RL) provides a data-driven, learning-based framework to formulate and solve sequential decision-making problems. The RL framework has become promising due to largely improved data availability and computing power in the aviation industry. Many aviation-based applications can be formulated or treated as sequential decision-making problems. Some of them are offline planning problems, while others need to be solved online and are safety-critical. In this survey paper, we first describe standard RL formulations and solutions. Then we survey the landscape of existing RL-based applications in aviation. Finally, we summarize the paper, identify the technical gaps, and suggest future directions of RL research in aviation.

translated by 谷歌翻译

Distributed Energy Management and Demand Response in Smart Grids: A Multi-Agent Deep Reinforcement Learning Framework

Amin Shojaeighadikolaei , Arman Ghasemi , Kailani Jones , Yousif Dafalla , Alexandru G. Bardas , Reza Ahmadi , Morteza Haashemi

分类：机器学习

2022-11-29

This paper presents a multi-agent Deep Reinforcement Learning (DRL) framework for autonomous control and integration of renewable energy resources into smart power grid systems. In particular, the proposed framework jointly considers demand response (DR) and distributed energy management (DEM) for residential end-users. DR has a widely recognized potential for improving power grid stability and reliability, while at the same time reducing end-users energy bills. However, the conventional DR techniques come with several shortcomings, such as the inability to handle operational uncertainties while incurring end-user disutility, which prevents widespread adoption in real-world applications. The proposed framework addresses these shortcomings by implementing DR and DEM based on real-time pricing strategy that is achieved using deep reinforcement learning. Furthermore, this framework enables the power grid service provider to leverage distributed energy resources (i.e., PV rooftop panels and battery storage) as dispatchable assets to support the smart grid during peak hours, thus achieving management of distributed energy resources. Simulation results based on the Deep Q-Network (DQN) demonstrate significant improvements of the 24-hour accumulative profit for both prosumers and the power grid service provider, as well as major reductions in the utilization of the power grid reserve generators.

translated by 谷歌翻译

Renewable energy integration and microgrid energy trading using multi-agent deep reinforcement learning

Daniel J. B. Harrold , Jun Cao , Zhong Fan

分类：人工智能 | 机器学习

2021-11-21

在本文中，多种子体增强学习用于控制混合能量存储系统，通过最大化可再生能源和交易的价值来降低微电网的能量成本。该代理商必须学习在波动需求，动态批发能源价格和不可预测的可再生能源中，控制三种不同类型的能量存储系统。考虑了两种案例研究：首先看能量存储系统如何在动态定价下更好地整合可再生能源发电，第二种与这些同一代理商如何与聚合剂一起使用，以向自私外部微电网销售能量的能量减少自己的能源票据。这项工作发现，具有分散执行的多代理深度确定性政策梯度的集中学习及其最先进的变体允许多种代理方法显着地比来自单个全局代理的控制更好。还发现，在多种子体方法中使用单独的奖励功能比使用单个控制剂更好。还发现能够与其他微电网交易，而不是卖回实用电网，也发现大大增加了网格的储蓄。

translated by 谷歌翻译

Bilateral Deep Reinforcement Learning Approach for Better-than-human Car Following Model

Tianyu Shi , Yifei Ai , Omar ElSamadisy , Baher Abdulhai

分类：机器人 | 机器学习

2022-03-03

在未来几年和几十年中，自动驾驶汽车（AV）将变得越来越普遍，为更安全，更方便的旅行提供了新的机会，并可能利用自动化和连接性的更智能的交通控制方法。跟随汽车是自动驾驶中的主要功能。近年来，基于强化学习的汽车已受到关注，目的是学习和达到与人类相当的绩效水平。但是，大多数现有的RL方法将汽车模拟为单方面问题，仅感知前方的车辆。然而，最近的文献，王和霍恩[16]表明，遵循的双边汽车考虑了前方的车辆，而后面的车辆表现出更好的系统稳定性。在本文中，我们假设可以使用RL学习这款双边汽车，同时学习其他目标，例如效率最大化，混蛋最小化和安全奖励，从而导致学识渊博的模型超过了人类驾驶。我们通过将双边信息集成到基于双边控制模型（BCM）的CAR遵循控制的状态和奖励功能的情况下，提出并引入了遵循控制遵循的汽车的深钢筋学习（DRL）框架。此外，我们使用分散的多代理增强学习框架来为每个代理生成相应的控制动作。我们的仿真结果表明，我们学到的政策比（a）汽车间的前进方向，（b）平均速度，（c）混蛋，（d）碰撞时间（TTC）和（e）的速度更好。字符串稳定性。

translated by 谷歌翻译

An Intelligent Self-driving Truck System For Highway Transportation

Dawei Wang , Lingping Gao , Ziquan Lan , Wei Li , Jiaping Ren , Jiahui Zhang , Peng Zhang , Pei Zhou , Shengao Wang , Jia Pan

分类：机器人 | 人工智能

2021-12-31

最近，自主驾驶社会上有许多进展，吸引了学术界和工业的很多关注。然而，现有的作品主要专注于汽车，自动驾驶卡车算法和模型仍然需要额外的开发。在本文中，我们介绍了智能自动驾驶卡车系统。我们所呈现的系统由三个主要组成部分组成，1）一个现实的交通仿真模块，用于在测试场景中产生现实的交通流量，2）设计和评估了在现实世界部署中模仿实际卡车响应的高保真卡车模型，3 ）具有基于学习的决策算法和多模轨迹策划仪的智能计划模块，考虑到卡车的约束，道路斜率变化和周围的交通流量。我们为每个组分单独提供定量评估，以证明每个部件的保真度和性能。我们还将我们的建议系统部署在真正的卡车上，并进行真实的世界实验，表明我们的系统能力缓解了SIM-TO-REAL差距。我们的代码可以在https://github.com/inceptioresearch/iits提供

translated by 谷歌翻译

Cooperative Reinforcement Learning on Traffic Signal Control

Chi-Chun Chao , Jun-Wei Hsieh , Bor-Shiun Wang

分类：人工智能

2022-05-23

交通信号控制是一个具有挑战性的现实问题，旨在通过协调道路交叉路口的车辆移动来最大程度地减少整体旅行时间。现有使用中的流量信号控制系统仍然很大程度上依赖于过度简化的信息和基于规则的方法。具体而言，可以将绿色/红灯交替的周期性视为在策略优化中对每个代理进行更好计划的先验。为了更好地学习这种适应性和预测性先验，传统的基于RL的方法只能从只有本地代理的预定义动作池返回固定的长度。如果这些代理之间没有合作，则某些代理商通常会对其他代理产生冲突，从而减少整个吞吐量。本文提出了一个合作，多目标体系结构，具有年龄段的权重，以更好地估算流量信号控制优化的多重奖励条款，该奖励术语称为合作的多目标多代理多代理深度确定性策略梯度（Comma-ddpg）。运行的两种类型的代理可以最大程度地提高不同目标的奖励 - 一种用于每个交叉路口的本地流量优化，另一种用于全球流量等待时间优化。全球代理用于指导本地代理作为帮助更快学习的手段，但在推理阶段不使用。我们还提供了解决溶液存在的分析，并为提出的RL优化提供了融合证明。使用亚洲国家的交通摄像机收集的现实世界流量数据进行评估。我们的方法可以有效地将总延迟时间减少60 \％。结果表明，与SOTA方法相比，其优越性。

translated by 谷歌翻译