The energy sector is facing rapid changes in the transition towards clean renewable sources. However, the growing share of volatile, fluctuating renewable generation such as wind or solar energy has already led to an increase in power grid congestion and network security concerns. Grid operators mitigate these by modifying either generation or demand (redispatching, curtailment, flexible loads). Unfortunately, redispatching of fossil generators leads to excessive grid operation costs and higher emissions, which is in direct opposition to the decarbonization of the energy sector. In this paper, we propose an AlphaZero-based grid topology optimization agent as a non-costly, carbon-free congestion management alternative. Our experimental evaluation confirms the potential of topology optimization for power grid operation, achieves a reduction of the average amount of required redispatching by 60%, and shows the interoperability with traditional congestion management methods. Our approach also ranked 1st in the WCCI 2022 Learning to Run a Power Network (L2RPN) competition. Based on our findings, we identify and discuss open research problems as well as technical challenges for a productive system on a real power grid.
translated by 谷歌翻译
本文介绍了电力系统运营商的域知识如何集成到强化学习(RL)框架中,以有效学习控制电网拓扑以防止热级联的代理。由于大搜索/优化空间,典型的基于RL的拓扑控制器无法表现良好。在这里,我们提出了一个基于演员 - 评论家的代理,以解决问题的组合性质,并使用由RTE,法国TSO开发的RL环境训练代理。为了解决大型优化空间的挑战,通过使用网络物理修改环境以增强代理学习来纳入训练过程中的基于奖励调整的基于课程的方法。此外,采用多种方案的并行训练方法来避免将代理偏置到几种情况,并使其稳健地对网格操作中的自然变异性。如果没有对培训过程进行这些修改,则RL代理失败了大多数测试场景,说明了正确整合物理系统的域知识以获得真实世界的RL学习的重要性。该代理通过RTE测试2019年学习,以运行电力网络挑战,并以精确度和第1位的速度授予第2位。开发的代码是公共使用开放的。
translated by 谷歌翻译
Power grids, across the world, play an important societal and economical role by providing uninterrupted, reliable and transient-free power to several industries, businesses and household consumers. With the advent of renewable power resources and EVs resulting into uncertain generation and highly dynamic load demands, it has become ever so important to ensure robust operation of power networks through suitable management of transient stability issues and localize the events of blackouts. In the light of ever increasing stress on the modern grid infrastructure and the grid operators, this paper presents a reinforcement learning (RL) framework, PowRL, to mitigate the effects of unexpected network events, as well as reliably maintain electricity everywhere on the network at all times. The PowRL leverages a novel heuristic for overload management, along with the RL-guided decision making on optimal topology selection to ensure that the grid is operated safely and reliably (with no overloads). PowRL is benchmarked on a variety of competition datasets hosted by the L2RPN (Learning to Run a Power Network). Even with its reduced action space, PowRL tops the leaderboard in the L2RPN NeurIPS 2020 challenge (Robustness track) at an aggregate level, while also being the top performing agent in the L2RPN WCCI 2020 challenge. Moreover, detailed analysis depicts state-of-the-art performances by the PowRL agent in some of the test scenarios.
translated by 谷歌翻译
当前气候的快速变化增加了改变能源生产和消费管理的紧迫性,以减少碳和其他绿色房屋的生产。在这种情况下,法国电力网络管理公司RTE(r {\'e} seau de Transport d'{\'e} lectricit {\'e})最近发布了一项广泛的研究结果,概述了明天法国法语的各种情况能源管理。我们提出一个挑战,将测试这种情况的可行性。目的是控制电力网络中的电力运输,同时追求多种目标:平衡生产和消费,最大程度地减少能量损失,并确保人员和设备安全,尤其是避免灾难性的失败。虽然应用程序的重要性本身提供了一个目标,但该挑战也旨在推动人工智能分支(AI)(AI)的最先进,称为强化学习(RL),该研究提供了解决控制问题的新可能性。特别是,在该应用领域中,深度学习和RL的组合组合的各个方面仍然需要利用。该挑战属于2019年开始的系列赛,名称为“学习运行电力网络”(L2RPN)。在这个新版本中,我们介绍了RTE提出的新的更现实的场景,以便到2050年到达碳中立性,将化石燃料电力产生,增加了可再生和核能的比例,并引入了电池。此外,我们使用最先进的加强学习算法提供基线来刺激未来的参与者。
translated by 谷歌翻译
蒙特卡洛树搜索(MCT)是设计游戏机器人或解决顺序决策问题的强大方法。该方法依赖于平衡探索和开发的智能树搜索。MCT以模拟的形式进行随机抽样,并存储动作的统计数据,以在每个随后的迭代中做出更有教育的选择。然而,该方法已成为组合游戏的最新技术,但是,在更复杂的游戏(例如那些具有较高的分支因素或实时系列的游戏)以及各种实用领域(例如,运输,日程安排或安全性)有效的MCT应用程序通常需要其与问题有关的修改或与其他技术集成。这种特定领域的修改和混合方法是本调查的主要重点。最后一项主要的MCT调查已于2012年发布。自发布以来出现的贡献特别感兴趣。
translated by 谷歌翻译
Traditional power grid systems have become obsolete under more frequent and extreme natural disasters. Reinforcement learning (RL) has been a promising solution for resilience given its successful history of power grid control. However, most power grid simulators and RL interfaces do not support simulation of power grid under large-scale blackouts or when the network is divided into sub-networks. In this study, we proposed an updated power grid simulator built on Grid2Op, an existing simulator and RL interface, and experimented on limiting the action and observation spaces of Grid2Op. By testing with DDQN and SliceRDQN algorithms, we found that reduced action spaces significantly improve training performance and efficiency. In addition, we investigated a low-rank neural network regularization method for deep Q-learning, one of the most widely used RL algorithms, in this power grid control scenario. As a result, the experiment demonstrated that in the power grid simulation environment, adopting this method will significantly increase the performance of RL agents.
translated by 谷歌翻译
Monte Carlo Tree Search (MCTS) is a recently proposed search method that combines the precision of tree search with the generality of random sampling. It has received considerable interest due to its spectacular success in the difficult problem of computer Go, but has also proved beneficial in a range of other domains. This paper is a survey of the literature to date, intended to provide a snapshot of the state of the art after the first five years of MCTS research. We outline the core algorithm's derivation, impart some structure on the many variations and enhancements that have been proposed, and summarise the results from the key game and non-game domains to which MCTS methods have been applied. A number of open research questions indicate that the field is ripe for future work.
translated by 谷歌翻译
This paper is a technical overview of DeepMind and Google's recent work on reinforcement learning for controlling commercial cooling systems. Building on expertise that began with cooling Google's data centers more efficiently, we recently conducted live experiments on two real-world facilities in partnership with Trane Technologies, a building management system provider. These live experiments had a variety of challenges in areas such as evaluation, learning from offline data, and constraint satisfaction. Our paper describes these challenges in the hope that awareness of them will benefit future applied RL work. We also describe the way we adapted our RL system to deal with these challenges, resulting in energy savings of approximately 9% and 13% respectively at the two live experiment sites.
translated by 谷歌翻译
Ongoing risks from climate change have impacted the livelihood of global nomadic communities, and are likely to lead to increased migratory movements in coming years. As a result, mobility considerations are becoming increasingly important in energy systems planning, particularly to achieve energy access in developing countries. Advanced Plug and Play control strategies have been recently developed with such a decentralized framework in mind, more easily allowing for the interconnection of nomadic communities, both to each other and to the main grid. In light of the above, the design and planning strategy of a mobile multi-energy supply system for a nomadic community is investigated in this work. Motivated by the scale and dimensionality of the associated uncertainties, impacting all major design and decision variables over the 30-year planning horizon, Deep Reinforcement Learning (DRL) is implemented for the design and planning problem tackled. DRL based solutions are benchmarked against several rigid baseline design options to compare expected performance under uncertainty. The results on a case study for ger communities in Mongolia suggest that mobile nomadic energy systems can be both technically and economically feasible, particularly when considering flexibility, although the degree of spatial dispersion among households is an important limiting factor. Key economic, sustainability and resilience indicators such as Cost, Equivalent Emissions and Total Unmet Load are measured, suggesting potential improvements compared to available baselines of up to 25%, 67% and 76%, respectively. Finally, the decomposition of values of flexibility and plug and play operation is presented using a variation of real options theory, with important implications for both nomadic communities and policymakers focused on enabling their energy access.
translated by 谷歌翻译
本文介绍了电力网络的问题,可以为应用多功能增强学习(Marl)创造一个令人兴奋和挑战的现实情景。脱碳的新出现趋势在配电网络上放置过大的压力。主动电压控制被视为有希望的解决方案,以减轻电力拥塞和改善电压质量,无需额外的硬件投资,利用网络中的可控装置,例如屋顶光伏(PVS)和静态VAR补偿器(SVC)。这些可控设备出现在大量广大数字中,并分布在宽的地理区域中,使Marl成为自然候选者。本文在DEC-POMDP框架中制定了主动电压控制问题,并建立了开源环境。它旨在弥合电力社区与马尔社区之间的差距,并成为马尔算法实际应用的驱动力。最后,我们分析了主动电压控制问题的特殊特征,导致最先进的Marl方法挑战,并总结了潜在的方向。
translated by 谷歌翻译
单位承诺(UC)是日期电力市场中的一个基本问题,有效解决UC问题至关重要。 UC问题通常采用数学优化技术,例如动态编程,拉格朗日放松和混合二次二次编程(MIQP)。但是,这些方法的计算时间随着发电机和能源资源的数量而增加,这仍然是行业中的主要瓶颈。人工智能的最新进展证明了加强学习(RL)解决UC问题的能力。不幸的是,当UC问题的大小增长时,现有关于解决RL的UC问题的研究受到维数的诅咒。为了解决这些问题,我们提出了一个优化方法辅助的集合深钢筋学习算法,其中UC问题是作为Markov决策过程(MDP)提出的,并通过集合框架中的多步进深度学习解决。所提出的算法通过解决量身定制的优化问题来确保相对较高的性能和操作约束的满意度来建立候选动作。关于IEEE 118和300总线系统的数值研究表明,我们的算法优于基线RL算法和MIQP。此外,所提出的算法在无法预见的操作条件下显示出强大的概括能力。
translated by 谷歌翻译
Optimal Power Flow (OPF) is a very traditional research area within the power systems field that seeks for the optimal operation point of electric power plants, and which needs to be solved every few minutes in real-world scenarios. However, due to the nonconvexities that arise in power generation systems, there is not yet a fast, robust solution technique for the full Alternating Current Optimal Power Flow (ACOPF). In the last decades, power grids have evolved into a typical dynamic, non-linear and large-scale control system, known as the power system, so searching for better and faster ACOPF solutions is becoming crucial. Appearance of Graph Neural Networks (GNN) has allowed the natural use of Machine Learning (ML) algorithms on graph data, such as power networks. On the other hand, Deep Reinforcement Learning (DRL) is known for its powerful capability to solve complex decision-making problems. Although solutions that use these two methods separately are beginning to appear in the literature, none has yet combined the advantages of both. We propose a novel architecture based on the Proximal Policy Optimization algorithm with Graph Neural Networks to solve the Optimal Power Flow. The objective is to design an architecture that learns how to solve the optimization problem and that is at the same time able to generalize to unseen scenarios. We compare our solution with the DCOPF in terms of cost after having trained our DRL agent on IEEE 30 bus system and then computing the OPF on that base network with topology changes
translated by 谷歌翻译
The high emission and low energy efficiency caused by internal combustion engines (ICE) have become unacceptable under environmental regulations and the energy crisis. As a promising alternative solution, multi-power source electric vehicles (MPS-EVs) introduce different clean energy systems to improve powertrain efficiency. The energy management strategy (EMS) is a critical technology for MPS-EVs to maximize efficiency, fuel economy, and range. Reinforcement learning (RL) has become an effective methodology for the development of EMS. RL has received continuous attention and research, but there is still a lack of systematic analysis of the design elements of RL-based EMS. To this end, this paper presents an in-depth analysis of the current research on RL-based EMS (RL-EMS) and summarizes the design elements of RL-based EMS. This paper first summarizes the previous applications of RL in EMS from five aspects: algorithm, perception scheme, decision scheme, reward function, and innovative training method. The contribution of advanced algorithms to the training effect is shown, the perception and control schemes in the literature are analyzed in detail, different reward function settings are classified, and innovative training methods with their roles are elaborated. Finally, by comparing the development routes of RL and RL-EMS, this paper identifies the gap between advanced RL solutions and existing RL-EMS. Finally, this paper suggests potential development directions for implementing advanced artificial intelligence (AI) solutions in EMS.
translated by 谷歌翻译
在本文中,我们介绍了有关典型乘车共享系统中决策优化问题的强化学习方法的全面,深入的调查。涵盖了有关乘车匹配,车辆重新定位,乘车,路由和动态定价主题的论文。在过去的几年中,大多数文献都出现了,并且要继续解决一些核心挑战:模型复杂性,代理协调和多个杠杆的联合优化。因此,我们还引入了流行的数据集和开放式仿真环境,以促进进一步的研发。随后,我们讨论了有关该重要领域的强化学习研究的许多挑战和机会。
translated by 谷歌翻译
增强学习(RL)是多能管理系统的有前途的最佳控制技术。它不需要先验模型 - 降低了前期和正在进行的项目特定工程工作,并且能够学习基础系统动力学的更好表示。但是,香草RL不能提供约束满意度的保证 - 导致其在安全至关重要的环境中产生各种不安全的互动。在本文中,我们介绍了两种新颖的安全RL方法,即SafeFallback和Afvafe,其中安全约束配方与RL配方脱钩,并且提供了硬构成满意度,可以保证在培训(探索)和开发过程中(近距离) )最佳政策。在模拟的多能系统案例研究中,我们已经表明,这两种方法均与香草RL基准相比(94,6%和82,8%,而35.5%)和香草RL基准相比明显更高的效用(即有用的政策)开始。提出的SafeFallback方法甚至可以胜过香草RL基准(102,9%至100%)。我们得出的结论是,这两种方法都是超越RL的安全限制处理技术,正如随机代理所证明的,同时仍提供坚硬的保证。最后,我们向I.A.提出了基本的未来工作。随着更多数据可用,改善约束功能本身。
translated by 谷歌翻译
在本文中,多种子体增强学习用于控制混合能量存储系统,通过最大化可再生能源和交易的价值来降低微电网的能量成本。该代理商必须学习在波动需求,动态批发能源价格和不可预测的可再生能源中,控制三种不同类型的能量存储系统。考虑了两种案例研究:首先看能量存储系统如何在动态定价下更好地整合可再生能源发电,第二种与这些同一代理商如何与聚合剂一起使用,以向自私外部微电网销售能量的能量减少自己的能源票据。这项工作发现,具有分散执行的多代理深度确定性政策梯度的集中学习及其最先进的变体允许多种代理方法显着地比来自单个全局代理的控制更好。还发现,在多种子体方法中使用单独的奖励功能比使用单个控制剂更好。还发现能够与其他微电网交易,而不是卖回实用电网,也发现大大增加了网格的储蓄。
translated by 谷歌翻译
我们考虑创建助手的问题,这些助手可以帮助代理人(通常是人类)解决新颖的顺序决策问题,假设代理人无法将奖励功能明确指定给助手。我们没有像目前的方法那样旨在自动化并代替代理人,而是赋予助手一个咨询角色,并将代理商作为主要决策者。困难是,我们必须考虑由代理商的限制或限制引起的潜在偏见,这可能导致其看似非理性地拒绝建议。为此,我们介绍了一种新颖的援助形式化,以模拟这些偏见,从而使助手推断和适应它们。然后,我们引入了一种计划助手建议的新方法,该方法可以扩展到大型决策问题。最后,我们通过实验表明我们的方法适应了这些代理偏见,并且比基于自动化的替代方案给代理带来了更高的累积奖励。
translated by 谷歌翻译
数字化和远程连接扩大了攻击面,使网络系统更脆弱。由于攻击者变得越来越复杂和资源丰富,仅仅依赖传统网络保护,如入侵检测,防火墙和加密,不足以保护网络系统。网络弹性提供了一种新的安全范式,可以使用弹性机制来补充保护不足。一种网络弹性机制(CRM)适应了已知的或零日威胁和实际威胁和不确定性,并对他们进行战略性地响应,以便在成功攻击时保持网络系统的关键功能。反馈架构在启用CRM的在线感应,推理和致动过程中发挥关键作用。强化学习(RL)是一个重要的工具,对网络弹性的反馈架构构成。它允许CRM提供有限或没有事先知识和攻击者的有限攻击的顺序响应。在这项工作中,我们审查了Cyber​​恢复力的RL的文献,并讨论了对三种主要类型的漏洞,即姿势有关,与信息相关的脆弱性的网络恢复力。我们介绍了三个CRM的应用领域:移动目标防御,防守网络欺骗和辅助人类安全技术。 RL算法也有漏洞。我们解释了RL的三个漏洞和目前的攻击模型,其中攻击者针对环境与代理商之间交换的信息:奖励,国家观察和行动命令。我们展示攻击者可以通过最低攻击努力来欺骗RL代理商学习邪恶的政策。最后,我们讨论了RL为基于RL的CRM的网络安全和恢复力和新兴应用的未来挑战。
translated by 谷歌翻译
Compared with model-based control and optimization methods, reinforcement learning (RL) provides a data-driven, learning-based framework to formulate and solve sequential decision-making problems. The RL framework has become promising due to largely improved data availability and computing power in the aviation industry. Many aviation-based applications can be formulated or treated as sequential decision-making problems. Some of them are offline planning problems, while others need to be solved online and are safety-critical. In this survey paper, we first describe standard RL formulations and solutions. Then we survey the landscape of existing RL-based applications in aviation. Finally, we summarize the paper, identify the technical gaps, and suggest future directions of RL research in aviation.
translated by 谷歌翻译
由于数据量增加,金融业的快速变化已经彻底改变了数据处理和数据分析的技术,并带来了新的理论和计算挑战。与古典随机控制理论和解决财务决策问题的其他分析方法相比,解决模型假设的财务决策问题,强化学习(RL)的新发展能够充分利用具有更少模型假设的大量财务数据并改善复杂的金融环境中的决策。该调查纸目的旨在审查最近的资金途径的发展和使用RL方法。我们介绍了马尔可夫决策过程,这是许多常用的RL方法的设置。然后引入各种算法,重点介绍不需要任何模型假设的基于价值和基于策略的方法。连接是用神经网络进行的,以扩展框架以包含深的RL算法。我们的调查通过讨论了这些RL算法在金融中各种决策问题中的应用,包括最佳执行,投资组合优化,期权定价和对冲,市场制作,智能订单路由和Robo-Awaring。
translated by 谷歌翻译