Optimal Power Flow (OPF) is a very traditional research area within the power systems field that seeks for the optimal operation point of electric power plants, and which needs to be solved every few minutes in real-world scenarios. However, due to the nonconvexities that arise in power generation systems, there is not yet a fast, robust solution technique for the full Alternating Current Optimal Power Flow (ACOPF). In the last decades, power grids have evolved into a typical dynamic, non-linear and large-scale control system, known as the power system, so searching for better and faster ACOPF solutions is becoming crucial. Appearance of Graph Neural Networks (GNN) has allowed the natural use of Machine Learning (ML) algorithms on graph data, such as power networks. On the other hand, Deep Reinforcement Learning (DRL) is known for its powerful capability to solve complex decision-making problems. Although solutions that use these two methods separately are beginning to appear in the literature, none has yet combined the advantages of both. We propose a novel architecture based on the Proximal Policy Optimization algorithm with Graph Neural Networks to solve the Optimal Power Flow. The objective is to design an architecture that learns how to solve the optimization problem and that is at the same time able to generalize to unseen scenarios. We compare our solution with the DCOPF in terms of cost after having trained our DRL agent on IEEE 30 bus system and then computing the OPF on that base network with topology changes
translated by 谷歌翻译
本文介绍了电力系统运营商的域知识如何集成到强化学习(RL)框架中,以有效学习控制电网拓扑以防止热级联的代理。由于大搜索/优化空间,典型的基于RL的拓扑控制器无法表现良好。在这里,我们提出了一个基于演员 - 评论家的代理,以解决问题的组合性质,并使用由RTE,法国TSO开发的RL环境训练代理。为了解决大型优化空间的挑战,通过使用网络物理修改环境以增强代理学习来纳入训练过程中的基于奖励调整的基于课程的方法。此外,采用多种方案的并行训练方法来避免将代理偏置到几种情况,并使其稳健地对网格操作中的自然变异性。如果没有对培训过程进行这些修改,则RL代理失败了大多数测试场景,说明了正确整合物理系统的域知识以获得真实世界的RL学习的重要性。该代理通过RTE测试2019年学习,以运行电力网络挑战,并以精确度和第1位的速度授予第2位。开发的代码是公共使用开放的。
translated by 谷歌翻译
在电力市场中寻找最佳的招标策略将带来更高的利润。但是,由于系统不确定性,这是一个充满挑战的问题,这是由于其他一代单位的策略所致。分布式优化(每个实体或代理人都决定单独出价)已成为最新技术的状态。但是,它无法克服系统不确定性的挑战。深度强化学习是在不确定环境中学习最佳策略的一种有前途的方法。然而,它无法在学习过程中整合有关空间系统拓扑的信息。本文提出了一种基于深钢筋学习(DRL)与图形卷积神经网络(GCN)的分布式学习算法。实际上,拟议的框架可以通过从环境中获得反馈来帮助代理商更新决策,从而可以克服不确定性的挑战。在该提出的算法中,节点之间的状态和连接是GCN的输入,可以使代理知道系统的结构。有关系统拓扑的此信息可以帮助代理商改善其投标策略并增加利润。我们在不同情况下评估了IEEE 30总线系统上提出的算法。此外,为了研究所提出的方法的概括能力,我们测试了IEEE 39-BUS系统的训练模型。结果表明,所提出的算法具有与DRL相比具有更大的泛化能力,并且在更改系统拓扑时可能会获得更高的利润。
translated by 谷歌翻译
深度强化学习(DRL)赋予了各种人工智能领域,包括模式识别,机器人技术,推荐系统和游戏。同样,图神经网络(GNN)也证明了它们在图形结构数据的监督学习方面的出色表现。最近,GNN与DRL用于图形结构环境的融合引起了很多关注。本文对这些混合动力作品进行了全面评论。这些作品可以分为两类:(1)算法增强,其中DRL和GNN相互补充以获得更好的实用性; (2)特定于应用程序的增强,其中DRL和GNN相互支持。这种融合有效地解决了工程和生命科学方面的各种复杂问题。基于审查,我们进一步分析了融合这两个领域的适用性和好处,尤其是在提高通用性和降低计算复杂性方面。最后,集成DRL和GNN的关键挑战以及潜在的未来研究方向被突出显示,这将引起更广泛的机器学习社区的关注。
translated by 谷歌翻译
当前气候的快速变化增加了改变能源生产和消费管理的紧迫性,以减少碳和其他绿色房屋的生产。在这种情况下,法国电力网络管理公司RTE(r {\'e} seau de Transport d'{\'e} lectricit {\'e})最近发布了一项广泛的研究结果,概述了明天法国法语的各种情况能源管理。我们提出一个挑战,将测试这种情况的可行性。目的是控制电力网络中的电力运输,同时追求多种目标:平衡生产和消费,最大程度地减少能量损失,并确保人员和设备安全,尤其是避免灾难性的失败。虽然应用程序的重要性本身提供了一个目标,但该挑战也旨在推动人工智能分支(AI)(AI)的最先进,称为强化学习(RL),该研究提供了解决控制问题的新可能性。特别是,在该应用领域中,深度学习和RL的组合组合的各个方面仍然需要利用。该挑战属于2019年开始的系列赛,名称为“学习运行电力网络”(L2RPN)。在这个新版本中,我们介绍了RTE提出的新的更现实的场景,以便到2050年到达碳中立性,将化石燃料电力产生,增加了可再生和核能的比例,并引入了电池。此外,我们使用最先进的加强学习算法提供基线来刺激未来的参与者。
translated by 谷歌翻译
Power grids, across the world, play an important societal and economical role by providing uninterrupted, reliable and transient-free power to several industries, businesses and household consumers. With the advent of renewable power resources and EVs resulting into uncertain generation and highly dynamic load demands, it has become ever so important to ensure robust operation of power networks through suitable management of transient stability issues and localize the events of blackouts. In the light of ever increasing stress on the modern grid infrastructure and the grid operators, this paper presents a reinforcement learning (RL) framework, PowRL, to mitigate the effects of unexpected network events, as well as reliably maintain electricity everywhere on the network at all times. The PowRL leverages a novel heuristic for overload management, along with the RL-guided decision making on optimal topology selection to ensure that the grid is operated safely and reliably (with no overloads). PowRL is benchmarked on a variety of competition datasets hosted by the L2RPN (Learning to Run a Power Network). Even with its reduced action space, PowRL tops the leaderboard in the L2RPN NeurIPS 2020 challenge (Robustness track) at an aggregate level, while also being the top performing agent in the L2RPN WCCI 2020 challenge. Moreover, detailed analysis depicts state-of-the-art performances by the PowRL agent in some of the test scenarios.
translated by 谷歌翻译
As an efficient way to integrate multiple distributed energy resources and the user side, a microgrid is mainly faced with the problems of small-scale volatility, uncertainty, intermittency and demand-side uncertainty of DERs. The traditional microgrid has a single form and cannot meet the flexible energy dispatch between the complex demand side and the microgrid. In response to this problem, the overall environment of wind power, thermostatically controlled loads, energy storage systems, price-responsive loads and the main grid is proposed. Secondly, the centralized control of the microgrid operation is convenient for the control of the reactive power and voltage of the distributed power supply and the adjustment of the grid frequency. However, there is a problem in that the flexible loads aggregate and generate peaks during the electricity price valley. The existing research takes into account the power constraints of the microgrid and fails to ensure a sufficient supply of electric energy for a single flexible load. This paper considers the response priority of each unit component of TCLs and ESSs on the basis of the overall environment operation of the microgrid so as to ensure the power supply of the flexible load of the microgrid and save the power input cost to the greatest extent. Finally, the simulation optimization of the environment can be expressed as a Markov decision process process. It combines two stages of offline and online operations in the training process. The addition of multiple threads with the lack of historical data learning leads to low learning efficiency. The asynchronous advantage actor-critic with the experience replay pool memory library is added to solve the data correlation and nonstatic distribution problems during training.
translated by 谷歌翻译
The energy sector is facing rapid changes in the transition towards clean renewable sources. However, the growing share of volatile, fluctuating renewable generation such as wind or solar energy has already led to an increase in power grid congestion and network security concerns. Grid operators mitigate these by modifying either generation or demand (redispatching, curtailment, flexible loads). Unfortunately, redispatching of fossil generators leads to excessive grid operation costs and higher emissions, which is in direct opposition to the decarbonization of the energy sector. In this paper, we propose an AlphaZero-based grid topology optimization agent as a non-costly, carbon-free congestion management alternative. Our experimental evaluation confirms the potential of topology optimization for power grid operation, achieves a reduction of the average amount of required redispatching by 60%, and shows the interoperability with traditional congestion management methods. Our approach also ranked 1st in the WCCI 2022 Learning to Run a Power Network (L2RPN) competition. Based on our findings, we identify and discuss open research problems as well as technical challenges for a productive system on a real power grid.
translated by 谷歌翻译
我们提出了一种基于图形神经网络(GNN)的端到端框架,以平衡通用网格中的功率流。优化被帧为监督的顶点回归任务,其中GNN培训以预测每个网格分支的电流和功率注入,从而产生功率流量平衡。通过将电网表示为与顶点的分支的线图,我们可以培训一个更准确和强大的GNN来改变底层拓扑。此外,通过使用专门的GNN层,我们能够构建一个非常深的架构,该架构占图表上的大街区,同时仅实现本地化操作。我们执行三个不同的实验来评估:i)使用深入GNN模型时使用本地化而不是全球运营的好处和趋势; ii)图形拓扑中对扰动的弹性;和iii)能力同时在多个网格拓扑上同时培训模型以及新的看不见网格的概括性的改进。拟议的框架是有效的,而且与基于深度学习的其他求解器相比,不仅对网格组件上的物理量而且对拓扑的物理量具有鲁棒性。
translated by 谷歌翻译
我们解决了多梯队供应链中生产规划和分布的问题。我们考虑不确定的需求和铅,这使得问题随机和非线性。提出了马尔可夫决策过程配方和非线性编程模型。作为一个顺序决策问题,深度加强学习(RL)是一种可能的解决方案方法。近年来,这种类型的技术从人工智能和优化社区获得了很多关注。考虑到不同领域的深入RL接近获得的良好结果,对在运营研究领域的问题中造成越来越兴趣的兴趣。我们使用了深入的RL技术,即近端政策优化(PPO2),解决了考虑不确定,定期和季节性需求和常数或随机交货时间的问题。实验在不同的场景中进行,以更好地评估算法的适用性。基于线性化模型的代理用作基线。实验结果表明,PPO2是这种类型的问题的竞争力和适当的工具。 PPO2代理在所有情景中的基线都优于基线,随机交货时间(7.3-11.2%),无论需求是否是季节性的。在具有恒定交货时间的情况下,当不确定的需求是非季节性的时,PPO2代理更好(2.2-4.7%)。结果表明,这种情况的不确定性越大,这种方法的可行性就越大。
translated by 谷歌翻译
Driven by the global decarbonization effort, the rapid integration of renewable energy into the conventional electricity grid presents new challenges and opportunities for the battery energy storage system (BESS) participating in the energy market. Energy arbitrage can be a significant source of revenue for the BESS due to the increasing price volatility in the spot market caused by the mismatch between renewable generation and electricity demand. In addition, the Frequency Control Ancillary Services (FCAS) markets established to stabilize the grid can offer higher returns for the BESS due to their capability to respond within milliseconds. Therefore, it is crucial for the BESS to carefully decide how much capacity to assign to each market to maximize the total profit under uncertain market conditions. This paper formulates the bidding problem of the BESS as a Markov Decision Process, which enables the BESS to participate in both the spot market and the FCAS market to maximize profit. Then, Proximal Policy Optimization, a model-free deep reinforcement learning algorithm, is employed to learn the optimal bidding strategy from the dynamic environment of the energy market under a continuous bidding scale. The proposed model is trained and validated using real-world historical data of the Australian National Electricity Market. The results demonstrate that our developed joint bidding strategy in both markets is significantly profitable compared to individual markets.
translated by 谷歌翻译
智能能源网络提供了一种有效的手段,可容纳可变可再生能源(例如太阳能和风能)的高渗透率,这是能源生产深度脱碳的关键。但是,鉴于可再生能源以及能源需求的可变性,必须制定有效的控制和能源存储方案来管理可变的能源产生并实现所需的系统经济学和环境目标。在本文中,我们引入了由电池和氢能存储组成的混合储能系统,以处理与电价,可再生能源生产和消费有关的不确定性。我们旨在提高可再生能源利用率,并最大程度地减少能源成本和碳排放,同时确保网络内的能源可靠性和稳定性。为了实现这一目标,我们提出了一种多代理的深层确定性政策梯度方法,这是一种基于强化的基于强化学习的控制策略,可实时优化混合能源存储系统和能源需求的调度。提出的方法是无模型的,不需要明确的知识和智能能源网络环境的严格数学模型。基于现实世界数据的仿真结果表明:(i)混合储能系统和能源需求的集成和优化操作可将碳排放量减少78.69%,将成本节省的成本储蓄提高23.5%,可续订的能源利用率比13.2%以上。其他基线模型和(ii)所提出的算法优于最先进的自学习算法,例如Deep-Q网络。
translated by 谷歌翻译
VAR-VAR控制(VVC)是通过控制电源系统中的执行器在健康状态内运行电源分配系统的问题。现有作品主要采用代表电力系统(带有树拓扑的图)作为训练深钢筋学习(RL)策略的向量的常规例程。我们提出了一个将RL与图形神经网络相结合的框架,并研究VVC设置中基于图的策略的好处和局限性。我们的结果表明,与向量表示相比,基于图的策略会渐近地收敛到相同的奖励。我们对观察和行动的影响进行进一步分析:在观察端,我们研究了基于图形的策略对功率系统中两个典型数据采集错误的鲁棒性,即传感器通信失败和测量错误。在动作端,我们表明执行器对系统有各种影响,因此使用由电源系统拓扑引起的图表表示可能不是最佳选择。最后,我们进行了一项案例研究,以证明读取功能架构和图形增强的选择可以进一步提高训练性能和鲁棒性。
translated by 谷歌翻译
单位承诺(UC)是日期电力市场中的一个基本问题,有效解决UC问题至关重要。 UC问题通常采用数学优化技术,例如动态编程,拉格朗日放松和混合二次二次编程(MIQP)。但是,这些方法的计算时间随着发电机和能源资源的数量而增加,这仍然是行业中的主要瓶颈。人工智能的最新进展证明了加强学习(RL)解决UC问题的能力。不幸的是,当UC问题的大小增长时,现有关于解决RL的UC问题的研究受到维数的诅咒。为了解决这些问题,我们提出了一个优化方法辅助的集合深钢筋学习算法,其中UC问题是作为Markov决策过程(MDP)提出的,并通过集合框架中的多步进深度学习解决。所提出的算法通过解决量身定制的优化问题来确保相对较高的性能和操作约束的满意度来建立候选动作。关于IEEE 118和300总线系统的数值研究表明,我们的算法优于基线RL算法和MIQP。此外,所提出的算法在无法预见的操作条件下显示出强大的概括能力。
translated by 谷歌翻译
Ongoing risks from climate change have impacted the livelihood of global nomadic communities, and are likely to lead to increased migratory movements in coming years. As a result, mobility considerations are becoming increasingly important in energy systems planning, particularly to achieve energy access in developing countries. Advanced Plug and Play control strategies have been recently developed with such a decentralized framework in mind, more easily allowing for the interconnection of nomadic communities, both to each other and to the main grid. In light of the above, the design and planning strategy of a mobile multi-energy supply system for a nomadic community is investigated in this work. Motivated by the scale and dimensionality of the associated uncertainties, impacting all major design and decision variables over the 30-year planning horizon, Deep Reinforcement Learning (DRL) is implemented for the design and planning problem tackled. DRL based solutions are benchmarked against several rigid baseline design options to compare expected performance under uncertainty. The results on a case study for ger communities in Mongolia suggest that mobile nomadic energy systems can be both technically and economically feasible, particularly when considering flexibility, although the degree of spatial dispersion among households is an important limiting factor. Key economic, sustainability and resilience indicators such as Cost, Equivalent Emissions and Total Unmet Load are measured, suggesting potential improvements compared to available baselines of up to 25%, 67% and 76%, respectively. Finally, the decomposition of values of flexibility and plug and play operation is presented using a variation of real options theory, with important implications for both nomadic communities and policymakers focused on enabling their energy access.
translated by 谷歌翻译
This article proposes a model-based deep reinforcement learning (DRL) method to design emergency control strategies for short-term voltage stability problems in power systems. Recent advances show promising results in model-free DRL-based methods for power systems, but model-free methods suffer from poor sample efficiency and training time, both critical for making state-of-the-art DRL algorithms practically applicable. DRL-agent learns an optimal policy via a trial-and-error method while interacting with the real-world environment. And it is desirable to minimize the direct interaction of the DRL agent with the real-world power grid due to its safety-critical nature. Additionally, state-of-the-art DRL-based policies are mostly trained using a physics-based grid simulator where dynamic simulation is computationally intensive, lowering the training efficiency. We propose a novel model-based-DRL framework where a deep neural network (DNN)-based dynamic surrogate model, instead of a real-world power-grid or physics-based simulation, is utilized with the policy learning framework, making the process faster and sample efficient. However, stabilizing model-based DRL is challenging because of the complex system dynamics of large-scale power systems. We solved these issues by incorporating imitation learning to have a warm start in policy learning, reward-shaping, and multi-step surrogate loss. Finally, we achieved 97.5% sample efficiency and 87.7% training efficiency for an application to the IEEE 300-bus test system.
translated by 谷歌翻译
利用其数据驱动和无模型的功能,深入加强学习(DRL)算法有可能应对由于引入基于可再生能源的一代而导致的不确定性升高。要同时处理能源系统的运营成本和技术约束(例如,生成需求平衡),DRL算法在设计奖励功能时必须考虑权衡取舍。这种权衡引入了额外的超参数,这些超参数会影响DRL算法的性能和提供可行解决方案的能力。在本文中,介绍了包括DDPG,TD3,SAC和PPO在内的不同DRL算法的性能比较。我们旨在为能源系统最佳调度问题提供这些DRL算法的公平比较。结果表明,与能源系统最佳调度问题的数学编程模型相比,即使在看不见的操作场景中,DRL算法在实时良好质量解决方案中提供的能力也是如此。然而,在大量高峰消费的情况下,这些算法未能提供可行的解决方案,这可能会阻碍其实际实施。
translated by 谷歌翻译
资产分配(或投资组合管理)是确定如何最佳将有限预算的资金分配给一系列金融工具/资产(例如股票)的任务。这项研究调查了使用无模型的深RL代理应用于投资组合管理的增强学习(RL)的性能。我们培训了几个RL代理商的现实股票价格,以学习如何执行资产分配。我们比较了这些RL剂与某些基线剂的性能。我们还比较了RL代理,以了解哪些类别的代理表现更好。从我们的分析中,RL代理可以执行投资组合管理的任务,因为它们的表现明显优于基线代理(随机分配和均匀分配)。四个RL代理(A2C,SAC,PPO和TRPO)总体上优于最佳基线MPT。这显示了RL代理商发现更有利可图的交易策略的能力。此外,基于价值和基于策略的RL代理之间没有显着的性能差异。演员批评者的表现比其他类型的药物更好。同样,在政策代理商方面的表现要好,因为它们在政策评估方面更好,样品效率在投资组合管理中并不是一个重大问题。这项研究表明,RL代理可以大大改善资产分配,因为它们的表现优于强基础。基于我们的分析,在政策上,参与者批评的RL药物显示出最大的希望。
translated by 谷歌翻译
可再生能源资源(RERS)已越来越纳入现代电力系统,尤其是在大规模分配网络(DNS)中。在本文中,我们提出了一种深度加强学习(DRL)基础的方法来动态搜索最佳操作点,即最佳功率流(OPF),在具有高摄取RER的DNS中。考虑到由RERS引起的不确定性和电压波动问题,我们将OPF分为多目标优化(MOO)问题。为了解决MOO问题,我们开发了一种利用分发网络图形信息的新型DRL算法。具体而言,我们采用最先进的DRL算法,即深度确定性政策梯度(DDPG),以学习OPF的最佳策略。由于DN中的电力流重新分配是连续的过程,其中节点是在时间和空间视图中自相关和相互关联的,以充分利用DNS的图形信息,我们开发了一种基于多粒的关注的空间 - 时间图卷积用于空间颞曲线图信息提取的网络(MG-ASTGCN),为其顺序DDPG准备。我们在修改IEEE 33,69和118总线径向分布系统(RDS)中验证了基于DRL的基于DRL的方法,并显示了基于DRL的方法优于其他基准算法。我们的实验结果还揭示了MG-ASTGCN可以显着加速DDPG训练过程,并提高DDPG在重新分配OPF电流中的能力。所提出的基于DRL的方法还促进了节点故障存在下的DNS的稳定性,特别是对于大型DNS。
translated by 谷歌翻译
Compared with model-based control and optimization methods, reinforcement learning (RL) provides a data-driven, learning-based framework to formulate and solve sequential decision-making problems. The RL framework has become promising due to largely improved data availability and computing power in the aviation industry. Many aviation-based applications can be formulated or treated as sequential decision-making problems. Some of them are offline planning problems, while others need to be solved online and are safety-critical. In this survey paper, we first describe standard RL formulations and solutions. Then we survey the landscape of existing RL-based applications in aviation. Finally, we summarize the paper, identify the technical gaps, and suggest future directions of RL research in aviation.
translated by 谷歌翻译