当前气候的快速变化增加了改变能源生产和消费管理的紧迫性,以减少碳和其他绿色房屋的生产。在这种情况下,法国电力网络管理公司RTE(r {\'e} seau de Transport d'{\'e} lectricit {\'e})最近发布了一项广泛的研究结果,概述了明天法国法语的各种情况能源管理。我们提出一个挑战,将测试这种情况的可行性。目的是控制电力网络中的电力运输,同时追求多种目标:平衡生产和消费,最大程度地减少能量损失,并确保人员和设备安全,尤其是避免灾难性的失败。虽然应用程序的重要性本身提供了一个目标,但该挑战也旨在推动人工智能分支(AI)(AI)的最先进,称为强化学习(RL),该研究提供了解决控制问题的新可能性。特别是,在该应用领域中,深度学习和RL的组合组合的各个方面仍然需要利用。该挑战属于2019年开始的系列赛,名称为“学习运行电力网络”(L2RPN)。在这个新版本中,我们介绍了RTE提出的新的更现实的场景,以便到2050年到达碳中立性,将化石燃料电力产生,增加了可再生和核能的比例,并引入了电池。此外,我们使用最先进的加强学习算法提供基线来刺激未来的参与者。
translated by 谷歌翻译
Power grids, across the world, play an important societal and economical role by providing uninterrupted, reliable and transient-free power to several industries, businesses and household consumers. With the advent of renewable power resources and EVs resulting into uncertain generation and highly dynamic load demands, it has become ever so important to ensure robust operation of power networks through suitable management of transient stability issues and localize the events of blackouts. In the light of ever increasing stress on the modern grid infrastructure and the grid operators, this paper presents a reinforcement learning (RL) framework, PowRL, to mitigate the effects of unexpected network events, as well as reliably maintain electricity everywhere on the network at all times. The PowRL leverages a novel heuristic for overload management, along with the RL-guided decision making on optimal topology selection to ensure that the grid is operated safely and reliably (with no overloads). PowRL is benchmarked on a variety of competition datasets hosted by the L2RPN (Learning to Run a Power Network). Even with its reduced action space, PowRL tops the leaderboard in the L2RPN NeurIPS 2020 challenge (Robustness track) at an aggregate level, while also being the top performing agent in the L2RPN WCCI 2020 challenge. Moreover, detailed analysis depicts state-of-the-art performances by the PowRL agent in some of the test scenarios.
translated by 谷歌翻译
本文介绍了电力系统运营商的域知识如何集成到强化学习(RL)框架中,以有效学习控制电网拓扑以防止热级联的代理。由于大搜索/优化空间,典型的基于RL的拓扑控制器无法表现良好。在这里,我们提出了一个基于演员 - 评论家的代理,以解决问题的组合性质,并使用由RTE,法国TSO开发的RL环境训练代理。为了解决大型优化空间的挑战,通过使用网络物理修改环境以增强代理学习来纳入训练过程中的基于奖励调整的基于课程的方法。此外,采用多种方案的并行训练方法来避免将代理偏置到几种情况,并使其稳健地对网格操作中的自然变异性。如果没有对培训过程进行这些修改,则RL代理失败了大多数测试场景,说明了正确整合物理系统的域知识以获得真实世界的RL学习的重要性。该代理通过RTE测试2019年学习,以运行电力网络挑战,并以精确度和第1位的速度授予第2位。开发的代码是公共使用开放的。
translated by 谷歌翻译
The energy sector is facing rapid changes in the transition towards clean renewable sources. However, the growing share of volatile, fluctuating renewable generation such as wind or solar energy has already led to an increase in power grid congestion and network security concerns. Grid operators mitigate these by modifying either generation or demand (redispatching, curtailment, flexible loads). Unfortunately, redispatching of fossil generators leads to excessive grid operation costs and higher emissions, which is in direct opposition to the decarbonization of the energy sector. In this paper, we propose an AlphaZero-based grid topology optimization agent as a non-costly, carbon-free congestion management alternative. Our experimental evaluation confirms the potential of topology optimization for power grid operation, achieves a reduction of the average amount of required redispatching by 60%, and shows the interoperability with traditional congestion management methods. Our approach also ranked 1st in the WCCI 2022 Learning to Run a Power Network (L2RPN) competition. Based on our findings, we identify and discuss open research problems as well as technical challenges for a productive system on a real power grid.
translated by 谷歌翻译
Driven by the global decarbonization effort, the rapid integration of renewable energy into the conventional electricity grid presents new challenges and opportunities for the battery energy storage system (BESS) participating in the energy market. Energy arbitrage can be a significant source of revenue for the BESS due to the increasing price volatility in the spot market caused by the mismatch between renewable generation and electricity demand. In addition, the Frequency Control Ancillary Services (FCAS) markets established to stabilize the grid can offer higher returns for the BESS due to their capability to respond within milliseconds. Therefore, it is crucial for the BESS to carefully decide how much capacity to assign to each market to maximize the total profit under uncertain market conditions. This paper formulates the bidding problem of the BESS as a Markov Decision Process, which enables the BESS to participate in both the spot market and the FCAS market to maximize profit. Then, Proximal Policy Optimization, a model-free deep reinforcement learning algorithm, is employed to learn the optimal bidding strategy from the dynamic environment of the energy market under a continuous bidding scale. The proposed model is trained and validated using real-world historical data of the Australian National Electricity Market. The results demonstrate that our developed joint bidding strategy in both markets is significantly profitable compared to individual markets.
translated by 谷歌翻译
As an efficient way to integrate multiple distributed energy resources and the user side, a microgrid is mainly faced with the problems of small-scale volatility, uncertainty, intermittency and demand-side uncertainty of DERs. The traditional microgrid has a single form and cannot meet the flexible energy dispatch between the complex demand side and the microgrid. In response to this problem, the overall environment of wind power, thermostatically controlled loads, energy storage systems, price-responsive loads and the main grid is proposed. Secondly, the centralized control of the microgrid operation is convenient for the control of the reactive power and voltage of the distributed power supply and the adjustment of the grid frequency. However, there is a problem in that the flexible loads aggregate and generate peaks during the electricity price valley. The existing research takes into account the power constraints of the microgrid and fails to ensure a sufficient supply of electric energy for a single flexible load. This paper considers the response priority of each unit component of TCLs and ESSs on the basis of the overall environment operation of the microgrid so as to ensure the power supply of the flexible load of the microgrid and save the power input cost to the greatest extent. Finally, the simulation optimization of the environment can be expressed as a Markov decision process process. It combines two stages of offline and online operations in the training process. The addition of multiple threads with the lack of historical data learning leads to low learning efficiency. The asynchronous advantage actor-critic with the experience replay pool memory library is added to solve the data correlation and nonstatic distribution problems during training.
translated by 谷歌翻译
Optimal Power Flow (OPF) is a very traditional research area within the power systems field that seeks for the optimal operation point of electric power plants, and which needs to be solved every few minutes in real-world scenarios. However, due to the nonconvexities that arise in power generation systems, there is not yet a fast, robust solution technique for the full Alternating Current Optimal Power Flow (ACOPF). In the last decades, power grids have evolved into a typical dynamic, non-linear and large-scale control system, known as the power system, so searching for better and faster ACOPF solutions is becoming crucial. Appearance of Graph Neural Networks (GNN) has allowed the natural use of Machine Learning (ML) algorithms on graph data, such as power networks. On the other hand, Deep Reinforcement Learning (DRL) is known for its powerful capability to solve complex decision-making problems. Although solutions that use these two methods separately are beginning to appear in the literature, none has yet combined the advantages of both. We propose a novel architecture based on the Proximal Policy Optimization algorithm with Graph Neural Networks to solve the Optimal Power Flow. The objective is to design an architecture that learns how to solve the optimization problem and that is at the same time able to generalize to unseen scenarios. We compare our solution with the DCOPF in terms of cost after having trained our DRL agent on IEEE 30 bus system and then computing the OPF on that base network with topology changes
translated by 谷歌翻译
The high emission and low energy efficiency caused by internal combustion engines (ICE) have become unacceptable under environmental regulations and the energy crisis. As a promising alternative solution, multi-power source electric vehicles (MPS-EVs) introduce different clean energy systems to improve powertrain efficiency. The energy management strategy (EMS) is a critical technology for MPS-EVs to maximize efficiency, fuel economy, and range. Reinforcement learning (RL) has become an effective methodology for the development of EMS. RL has received continuous attention and research, but there is still a lack of systematic analysis of the design elements of RL-based EMS. To this end, this paper presents an in-depth analysis of the current research on RL-based EMS (RL-EMS) and summarizes the design elements of RL-based EMS. This paper first summarizes the previous applications of RL in EMS from five aspects: algorithm, perception scheme, decision scheme, reward function, and innovative training method. The contribution of advanced algorithms to the training effect is shown, the perception and control schemes in the literature are analyzed in detail, different reward function settings are classified, and innovative training methods with their roles are elaborated. Finally, by comparing the development routes of RL and RL-EMS, this paper identifies the gap between advanced RL solutions and existing RL-EMS. Finally, this paper suggests potential development directions for implementing advanced artificial intelligence (AI) solutions in EMS.
translated by 谷歌翻译
增强学习(RL)是多能管理系统的有前途的最佳控制技术。它不需要先验模型 - 降低了前期和正在进行的项目特定工程工作,并且能够学习基础系统动力学的更好表示。但是,香草RL不能提供约束满意度的保证 - 导致其在安全至关重要的环境中产生各种不安全的互动。在本文中,我们介绍了两种新颖的安全RL方法,即SafeFallback和Afvafe,其中安全约束配方与RL配方脱钩,并且提供了硬构成满意度,可以保证在培训(探索)和开发过程中(近距离) )最佳政策。在模拟的多能系统案例研究中,我们已经表明,这两种方法均与香草RL基准相比(94,6%和82,8%,而35.5%)和香草RL基准相比明显更高的效用(即有用的政策)开始。提出的SafeFallback方法甚至可以胜过香草RL基准(102,9%至100%)。我们得出的结论是,这两种方法都是超越RL的安全限制处理技术,正如随机代理所证明的,同时仍提供坚硬的保证。最后,我们向I.A.提出了基本的未来工作。随着更多数据可用,改善约束功能本身。
translated by 谷歌翻译
我们解决了多梯队供应链中生产规划和分布的问题。我们考虑不确定的需求和铅,这使得问题随机和非线性。提出了马尔可夫决策过程配方和非线性编程模型。作为一个顺序决策问题,深度加强学习(RL)是一种可能的解决方案方法。近年来,这种类型的技术从人工智能和优化社区获得了很多关注。考虑到不同领域的深入RL接近获得的良好结果,对在运营研究领域的问题中造成越来越兴趣的兴趣。我们使用了深入的RL技术,即近端政策优化(PPO2),解决了考虑不确定,定期和季节性需求和常数或随机交货时间的问题。实验在不同的场景中进行,以更好地评估算法的适用性。基于线性化模型的代理用作基线。实验结果表明,PPO2是这种类型的问题的竞争力和适当的工具。 PPO2代理在所有情景中的基线都优于基线,随机交货时间(7.3-11.2%),无论需求是否是季节性的。在具有恒定交货时间的情况下,当不确定的需求是非季节性的时,PPO2代理更好(2.2-4.7%)。结果表明,这种情况的不确定性越大,这种方法的可行性就越大。
translated by 谷歌翻译
资产分配(或投资组合管理)是确定如何最佳将有限预算的资金分配给一系列金融工具/资产(例如股票)的任务。这项研究调查了使用无模型的深RL代理应用于投资组合管理的增强学习(RL)的性能。我们培训了几个RL代理商的现实股票价格,以学习如何执行资产分配。我们比较了这些RL剂与某些基线剂的性能。我们还比较了RL代理,以了解哪些类别的代理表现更好。从我们的分析中,RL代理可以执行投资组合管理的任务,因为它们的表现明显优于基线代理(随机分配和均匀分配)。四个RL代理(A2C,SAC,PPO和TRPO)总体上优于最佳基线MPT。这显示了RL代理商发现更有利可图的交易策略的能力。此外,基于价值和基于策略的RL代理之间没有显着的性能差异。演员批评者的表现比其他类型的药物更好。同样,在政策代理商方面的表现要好,因为它们在政策评估方面更好,样品效率在投资组合管理中并不是一个重大问题。这项研究表明,RL代理可以大大改善资产分配,因为它们的表现优于强基础。基于我们的分析,在政策上,参与者批评的RL药物显示出最大的希望。
translated by 谷歌翻译
智能能源网络提供了一种有效的手段,可容纳可变可再生能源(例如太阳能和风能)的高渗透率,这是能源生产深度脱碳的关键。但是,鉴于可再生能源以及能源需求的可变性,必须制定有效的控制和能源存储方案来管理可变的能源产生并实现所需的系统经济学和环境目标。在本文中,我们引入了由电池和氢能存储组成的混合储能系统,以处理与电价,可再生能源生产和消费有关的不确定性。我们旨在提高可再生能源利用率,并最大程度地减少能源成本和碳排放,同时确保网络内的能源可靠性和稳定性。为了实现这一目标,我们提出了一种多代理的深层确定性政策梯度方法,这是一种基于强化的基于强化学习的控制策略,可实时优化混合能源存储系统和能源需求的调度。提出的方法是无模型的,不需要明确的知识和智能能源网络环境的严格数学模型。基于现实世界数据的仿真结果表明:(i)混合储能系统和能源需求的集成和优化操作可将碳排放量减少78.69%,将成本节省的成本储蓄提高23.5%,可续订的能源利用率比13.2%以上。其他基线模型和(ii)所提出的算法优于最先进的自学习算法,例如Deep-Q网络。
translated by 谷歌翻译
为了通过使用可再生能源来取代化石燃料,间歇性风能和光伏(PV)功率的资源不平衡是点对点(P2P)功率交易的关键问题。为了解决这个问题,本文介绍了增强学习(RL)技术。对于RL,图形卷积网络(GCN)和双向长期记忆(BI-LSTM)网络由基于合作游戏理论的纳米簇之间的P2P功率交易共同应用于P2P功率交易。柔性且可靠的DC纳米醇适合整合可再生能源以进行分配系统。每个局部纳米粒子群都采用了生产者的位置,同时着重于功率生产和消费。对于纳米级簇的电源管理,使用物联网(IoT)技术将多目标优化应用于每个本地纳米群集群。考虑到风和光伏发电的间歇性特征,进行电动汽车(EV)的充电/排放。 RL算法,例如深Q学习网络(DQN),深度复发Q学习网络(DRQN),BI-DRQN,近端策略优化(PPO),GCN-DQN,GCN-DQN,GCN-DRQN,GCN-DRQN,GCN-BI-DRQN和GCN-PPO用于模拟。因此,合作P2P电力交易系统利用使用时间(TOU)基于关税的电力成本和系统边际价格(SMP)最大化利润,并最大程度地减少电网功耗的量。用P2P电源交易的纳米簇簇的电源管理实时模拟了分配测试馈线,并提议的GCN-PPO技术将纳米糖簇的电量降低了36.7%。
translated by 谷歌翻译
我们考虑了需求侧能源管理的问题,每个家庭都配备了能够在线安排家用电器的智能电表。目的是最大程度地减少实时定价计划下的整体成本。尽管以前的作品引入了集中式方法,在该方法中,调度算法具有完全可观察的性能,但我们提出了将智能网格环境作为马尔可夫游戏的表述。每个家庭都是具有部分可观察性的去中心化代理,可以在现实环境中进行可扩展性和隐私保护。电网操作员产生的价格信号随能量需求而变化。我们提出了从代理商的角度来解决部分可观察性和环境的局部可观察性的扩展,以解决部分可观察性。该算法学习了一位集中批评者,该批评者协调分散的代理商的培训。因此,我们的方法使用集中学习,但分散执行。仿真结果表明,我们的在线深入强化学习方法可以纯粹基于瞬时观察和价格信号来降低所有消耗的总能量的峰值与平均值和所有家庭的电力。
translated by 谷歌翻译
在本文中,多种子体增强学习用于控制混合能量存储系统,通过最大化可再生能源和交易的价值来降低微电网的能量成本。该代理商必须学习在波动需求,动态批发能源价格和不可预测的可再生能源中,控制三种不同类型的能量存储系统。考虑了两种案例研究:首先看能量存储系统如何在动态定价下更好地整合可再生能源发电,第二种与这些同一代理商如何与聚合剂一起使用,以向自私外部微电网销售能量的能量减少自己的能源票据。这项工作发现,具有分散执行的多代理深度确定性政策梯度的集中学习及其最先进的变体允许多种代理方法显着地比来自单个全局代理的控制更好。还发现,在多种子体方法中使用单独的奖励功能比使用单个控制剂更好。还发现能够与其他微电网交易,而不是卖回实用电网,也发现大大增加了网格的储蓄。
translated by 谷歌翻译
Energy consumption in buildings, both residential and commercial, accounts for approximately 40% of all energy usage in the U.S., and similar numbers are being reported from countries around the world. This significant amount of energy is used to maintain a comfortable, secure, and productive environment for the occupants. So, it is crucial that the energy consumption in buildings must be optimized, all the while maintaining satisfactory levels of occupant comfort, health, and safety. Recently, Machine Learning has been proven to be an invaluable tool in deriving important insights from data and optimizing various systems. In this work, we review the ways in which machine learning has been leveraged to make buildings smart and energy-efficient. For the convenience of readers, we provide a brief introduction of several machine learning paradigms and the components and functioning of each smart building system we cover. Finally, we discuss challenges faced while implementing machine learning algorithms in smart buildings and provide future avenues for research at the intersection of smart buildings and machine learning.
translated by 谷歌翻译
本文利用了强化学习和深度学习的最新发展来解决供应链库存管理(SCIM)问题,这是一个复杂的顺序决策问题,包括确定在给定时间范围内生产和运送到不同仓库的最佳产品数量。给出了随机两回波供应链环境的数学公式,该公式可以管理任意数量的仓库和产品类型。此外,开发了一个与深钢筋学习(DRL)算法接口的开源库,并公开可用于解决遇险问题。通过在合成生成的数据上进行了丰富的数值实验,比较了最新的DRL算法实现的性能。实验计划的设计和执行,包括供应链的不同结构,拓扑,需求,能力和成本。结果表明,PPO算法非常适合环境的不同特征。 VPG算法几乎总是会收敛到局部最大值,即使它通常达到可接受的性能水平。最后,A3C是最快的算法,但是就像VPG一样,与PPO相比,它从未取得最好的性能。总之,数值实验表明,DRL的性能始终如一,比标准的重新订购策略(例如静态(S,Q) - policy)更好。因此,它可以被认为是解决随机两回波问题的现实世界实例的实用和有效选择。
translated by 谷歌翻译
This paper is a technical overview of DeepMind and Google's recent work on reinforcement learning for controlling commercial cooling systems. Building on expertise that began with cooling Google's data centers more efficiently, we recently conducted live experiments on two real-world facilities in partnership with Trane Technologies, a building management system provider. These live experiments had a variety of challenges in areas such as evaluation, learning from offline data, and constraint satisfaction. Our paper describes these challenges in the hope that awareness of them will benefit future applied RL work. We also describe the way we adapted our RL system to deal with these challenges, resulting in energy savings of approximately 9% and 13% respectively at the two live experiment sites.
translated by 谷歌翻译
Global power systems are increasingly reliant on wind energy as a mitigation strategy for climate change. However, the variability of wind energy causes system reliability to erode, resulting in the wind being curtailed and, ultimately, leading to substantial economic losses for wind farm owners. Wind curtailment can be reduced using battery energy storage systems (BESS) that serve as onsite backup sources. Yet, this auxiliary role may significantly hamper the BESS's capacity to generate revenues from the electricity market, particularly in conducting energy arbitrage in the Spot market and providing frequency control ancillary services (FCAS) in the FCAS markets. Ideal BESS scheduling should effectively balance the BESS's role in absorbing onsite wind curtailment and trading in the electricity market, but it is difficult in practice because of the underlying coordination complexity and the stochastic nature of energy prices and wind generation. In this study, we investigate the bidding strategy of a wind-battery system co-located and participating simultaneously in both the Spot and Regulation FCAS markets. We propose a deep reinforcement learning (DRL)-based approach that decouples the market participation of the wind-battery system into two related Markov decision processes for each facility, enabling the BESS to absorb onsite wind curtailment while simultaneously bidding in the wholesale Spot and FCAS markets to maximize overall operational revenues. Using realistic wind farm data, we validated the coordinated bidding strategy for the wind-battery system and find that our strategy generates significantly higher revenue and responds better to wind curtailment compared to an optimization-based benchmark. Our results show that joint-market bidding can significantly improve the financial performance of wind-battery systems compared to individual market participation.
translated by 谷歌翻译
Reformulating the history matching problem from a least-square mathematical optimization problem into a Markov Decision Process introduces a method in which reinforcement learning can be utilized to solve the problem. This method provides a mechanism where an artificial deep neural network agent can interact with the reservoir simulator and find multiple different solutions to the problem. Such formulation allows for solving the problem in parallel by launching multiple concurrent environments enabling the agent to learn simultaneously from all the environments at once, achieving significant speed up.
translated by 谷歌翻译