现代电力系统正在经历由可再生能源驱动的各种挑战,该挑战要求开发新颖的调度方法,例如增强学习(RL)。对这些方法以及RL药物的评估很大程度上受到探索。在本文中,我们提出了一种评估方法,以分析RL代理的性能在审查的经济调度方案中。这种方法是通过扫描多个操作方案来进行的。特别是,开发了一种方案生成方法来生成网络方案和需求方案进行评估,并且根据电力流的变化率汇总了网络结构。然后,定义了几个指标来从经济和安全的角度评估代理商的绩效。在案例研究中,我们使用经过改进的IEEE 30总线系统来说明拟议的评估方法的有效性,模拟结果揭示了对不同情况的良好和快速适应。不同的RL代理之间的比较也很有帮助,可以为更好地设计学习策略提供建议。
translated by 谷歌翻译
本文介绍了电力系统运营商的域知识如何集成到强化学习(RL)框架中,以有效学习控制电网拓扑以防止热级联的代理。由于大搜索/优化空间,典型的基于RL的拓扑控制器无法表现良好。在这里,我们提出了一个基于演员 - 评论家的代理,以解决问题的组合性质,并使用由RTE,法国TSO开发的RL环境训练代理。为了解决大型优化空间的挑战,通过使用网络物理修改环境以增强代理学习来纳入训练过程中的基于奖励调整的基于课程的方法。此外,采用多种方案的并行训练方法来避免将代理偏置到几种情况,并使其稳健地对网格操作中的自然变异性。如果没有对培训过程进行这些修改,则RL代理失败了大多数测试场景,说明了正确整合物理系统的域知识以获得真实世界的RL学习的重要性。该代理通过RTE测试2019年学习,以运行电力网络挑战,并以精确度和第1位的速度授予第2位。开发的代码是公共使用开放的。
translated by 谷歌翻译
单位承诺(UC)是日期电力市场中的一个基本问题,有效解决UC问题至关重要。 UC问题通常采用数学优化技术,例如动态编程,拉格朗日放松和混合二次二次编程(MIQP)。但是,这些方法的计算时间随着发电机和能源资源的数量而增加,这仍然是行业中的主要瓶颈。人工智能的最新进展证明了加强学习(RL)解决UC问题的能力。不幸的是,当UC问题的大小增长时,现有关于解决RL的UC问题的研究受到维数的诅咒。为了解决这些问题,我们提出了一个优化方法辅助的集合深钢筋学习算法,其中UC问题是作为Markov决策过程(MDP)提出的,并通过集合框架中的多步进深度学习解决。所提出的算法通过解决量身定制的优化问题来确保相对较高的性能和操作约束的满意度来建立候选动作。关于IEEE 118和300总线系统的数值研究表明,我们的算法优于基线RL算法和MIQP。此外,所提出的算法在无法预见的操作条件下显示出强大的概括能力。
translated by 谷歌翻译
Ongoing risks from climate change have impacted the livelihood of global nomadic communities, and are likely to lead to increased migratory movements in coming years. As a result, mobility considerations are becoming increasingly important in energy systems planning, particularly to achieve energy access in developing countries. Advanced Plug and Play control strategies have been recently developed with such a decentralized framework in mind, more easily allowing for the interconnection of nomadic communities, both to each other and to the main grid. In light of the above, the design and planning strategy of a mobile multi-energy supply system for a nomadic community is investigated in this work. Motivated by the scale and dimensionality of the associated uncertainties, impacting all major design and decision variables over the 30-year planning horizon, Deep Reinforcement Learning (DRL) is implemented for the design and planning problem tackled. DRL based solutions are benchmarked against several rigid baseline design options to compare expected performance under uncertainty. The results on a case study for ger communities in Mongolia suggest that mobile nomadic energy systems can be both technically and economically feasible, particularly when considering flexibility, although the degree of spatial dispersion among households is an important limiting factor. Key economic, sustainability and resilience indicators such as Cost, Equivalent Emissions and Total Unmet Load are measured, suggesting potential improvements compared to available baselines of up to 25%, 67% and 76%, respectively. Finally, the decomposition of values of flexibility and plug and play operation is presented using a variation of real options theory, with important implications for both nomadic communities and policymakers focused on enabling their energy access.
translated by 谷歌翻译
Multi-uncertainties from power sources and loads have brought significant challenges to the stable demand supply of various resources at islands. To address these challenges, a comprehensive scheduling framework is proposed by introducing a model-free deep reinforcement learning (DRL) approach based on modeling an island integrated energy system (IES). In response to the shortage of freshwater on islands, in addition to the introduction of seawater desalination systems, a transmission structure of "hydrothermal simultaneous transmission" (HST) is proposed. The essence of the IES scheduling problem is the optimal combination of each unit's output, which is a typical timing control problem and conforms to the Markov decision-making solution framework of deep reinforcement learning. Deep reinforcement learning adapts to various changes and timely adjusts strategies through the interaction of agents and the environment, avoiding complicated modeling and prediction of multi-uncertainties. The simulation results show that the proposed scheduling framework properly handles multi-uncertainties from power sources and loads, achieves a stable demand supply for various resources, and has better performance than other real-time scheduling methods, especially in terms of computational efficiency. In addition, the HST model constitutes an active exploration to improve the utilization efficiency of island freshwater.
translated by 谷歌翻译
The energy sector is facing rapid changes in the transition towards clean renewable sources. However, the growing share of volatile, fluctuating renewable generation such as wind or solar energy has already led to an increase in power grid congestion and network security concerns. Grid operators mitigate these by modifying either generation or demand (redispatching, curtailment, flexible loads). Unfortunately, redispatching of fossil generators leads to excessive grid operation costs and higher emissions, which is in direct opposition to the decarbonization of the energy sector. In this paper, we propose an AlphaZero-based grid topology optimization agent as a non-costly, carbon-free congestion management alternative. Our experimental evaluation confirms the potential of topology optimization for power grid operation, achieves a reduction of the average amount of required redispatching by 60%, and shows the interoperability with traditional congestion management methods. Our approach also ranked 1st in the WCCI 2022 Learning to Run a Power Network (L2RPN) competition. Based on our findings, we identify and discuss open research problems as well as technical challenges for a productive system on a real power grid.
translated by 谷歌翻译
当前气候的快速变化增加了改变能源生产和消费管理的紧迫性,以减少碳和其他绿色房屋的生产。在这种情况下,法国电力网络管理公司RTE(r {\'e} seau de Transport d'{\'e} lectricit {\'e})最近发布了一项广泛的研究结果,概述了明天法国法语的各种情况能源管理。我们提出一个挑战,将测试这种情况的可行性。目的是控制电力网络中的电力运输,同时追求多种目标:平衡生产和消费,最大程度地减少能量损失,并确保人员和设备安全,尤其是避免灾难性的失败。虽然应用程序的重要性本身提供了一个目标,但该挑战也旨在推动人工智能分支(AI)(AI)的最先进,称为强化学习(RL),该研究提供了解决控制问题的新可能性。特别是,在该应用领域中,深度学习和RL的组合组合的各个方面仍然需要利用。该挑战属于2019年开始的系列赛,名称为“学习运行电力网络”(L2RPN)。在这个新版本中,我们介绍了RTE提出的新的更现实的场景,以便到2050年到达碳中立性,将化石燃料电力产生,增加了可再生和核能的比例,并引入了电池。此外,我们使用最先进的加强学习算法提供基线来刺激未来的参与者。
translated by 谷歌翻译
在电力市场中寻找最佳的招标策略将带来更高的利润。但是,由于系统不确定性,这是一个充满挑战的问题,这是由于其他一代单位的策略所致。分布式优化(每个实体或代理人都决定单独出价)已成为最新技术的状态。但是,它无法克服系统不确定性的挑战。深度强化学习是在不确定环境中学习最佳策略的一种有前途的方法。然而,它无法在学习过程中整合有关空间系统拓扑的信息。本文提出了一种基于深钢筋学习(DRL)与图形卷积神经网络(GCN)的分布式学习算法。实际上,拟议的框架可以通过从环境中获得反馈来帮助代理商更新决策,从而可以克服不确定性的挑战。在该提出的算法中,节点之间的状态和连接是GCN的输入,可以使代理知道系统的结构。有关系统拓扑的此信息可以帮助代理商改善其投标策略并增加利润。我们在不同情况下评估了IEEE 30总线系统上提出的算法。此外,为了研究所提出的方法的概括能力,我们测试了IEEE 39-BUS系统的训练模型。结果表明,所提出的算法具有与DRL相比具有更大的泛化能力,并且在更改系统拓扑时可能会获得更高的利润。
translated by 谷歌翻译
Power grids, across the world, play an important societal and economical role by providing uninterrupted, reliable and transient-free power to several industries, businesses and household consumers. With the advent of renewable power resources and EVs resulting into uncertain generation and highly dynamic load demands, it has become ever so important to ensure robust operation of power networks through suitable management of transient stability issues and localize the events of blackouts. In the light of ever increasing stress on the modern grid infrastructure and the grid operators, this paper presents a reinforcement learning (RL) framework, PowRL, to mitigate the effects of unexpected network events, as well as reliably maintain electricity everywhere on the network at all times. The PowRL leverages a novel heuristic for overload management, along with the RL-guided decision making on optimal topology selection to ensure that the grid is operated safely and reliably (with no overloads). PowRL is benchmarked on a variety of competition datasets hosted by the L2RPN (Learning to Run a Power Network). Even with its reduced action space, PowRL tops the leaderboard in the L2RPN NeurIPS 2020 challenge (Robustness track) at an aggregate level, while also being the top performing agent in the L2RPN WCCI 2020 challenge. Moreover, detailed analysis depicts state-of-the-art performances by the PowRL agent in some of the test scenarios.
translated by 谷歌翻译
This paper presents a multi-agent Deep Reinforcement Learning (DRL) framework for autonomous control and integration of renewable energy resources into smart power grid systems. In particular, the proposed framework jointly considers demand response (DR) and distributed energy management (DEM) for residential end-users. DR has a widely recognized potential for improving power grid stability and reliability, while at the same time reducing end-users energy bills. However, the conventional DR techniques come with several shortcomings, such as the inability to handle operational uncertainties while incurring end-user disutility, which prevents widespread adoption in real-world applications. The proposed framework addresses these shortcomings by implementing DR and DEM based on real-time pricing strategy that is achieved using deep reinforcement learning. Furthermore, this framework enables the power grid service provider to leverage distributed energy resources (i.e., PV rooftop panels and battery storage) as dispatchable assets to support the smart grid during peak hours, thus achieving management of distributed energy resources. Simulation results based on the Deep Q-Network (DQN) demonstrate significant improvements of the 24-hour accumulative profit for both prosumers and the power grid service provider, as well as major reductions in the utilization of the power grid reserve generators.
translated by 谷歌翻译
智能能源网络提供了一种有效的手段,可容纳可变可再生能源(例如太阳能和风能)的高渗透率,这是能源生产深度脱碳的关键。但是,鉴于可再生能源以及能源需求的可变性,必须制定有效的控制和能源存储方案来管理可变的能源产生并实现所需的系统经济学和环境目标。在本文中,我们引入了由电池和氢能存储组成的混合储能系统,以处理与电价,可再生能源生产和消费有关的不确定性。我们旨在提高可再生能源利用率,并最大程度地减少能源成本和碳排放,同时确保网络内的能源可靠性和稳定性。为了实现这一目标,我们提出了一种多代理的深层确定性政策梯度方法,这是一种基于强化的基于强化学习的控制策略,可实时优化混合能源存储系统和能源需求的调度。提出的方法是无模型的,不需要明确的知识和智能能源网络环境的严格数学模型。基于现实世界数据的仿真结果表明:(i)混合储能系统和能源需求的集成和优化操作可将碳排放量减少78.69%,将成本节省的成本储蓄提高23.5%,可续订的能源利用率比13.2%以上。其他基线模型和(ii)所提出的算法优于最先进的自学习算法,例如Deep-Q网络。
translated by 谷歌翻译
增强学习(RL)是多能管理系统的有前途的最佳控制技术。它不需要先验模型 - 降低了前期和正在进行的项目特定工程工作,并且能够学习基础系统动力学的更好表示。但是,香草RL不能提供约束满意度的保证 - 导致其在安全至关重要的环境中产生各种不安全的互动。在本文中,我们介绍了两种新颖的安全RL方法,即SafeFallback和Afvafe,其中安全约束配方与RL配方脱钩,并且提供了硬构成满意度,可以保证在培训(探索)和开发过程中(近距离) )最佳政策。在模拟的多能系统案例研究中,我们已经表明,这两种方法均与香草RL基准相比(94,6%和82,8%,而35.5%)和香草RL基准相比明显更高的效用(即有用的政策)开始。提出的SafeFallback方法甚至可以胜过香草RL基准(102,9%至100%)。我们得出的结论是,这两种方法都是超越RL的安全限制处理技术,正如随机代理所证明的,同时仍提供坚硬的保证。最后,我们向I.A.提出了基本的未来工作。随着更多数据可用,改善约束功能本身。
translated by 谷歌翻译
Optimal Power Flow (OPF) is a very traditional research area within the power systems field that seeks for the optimal operation point of electric power plants, and which needs to be solved every few minutes in real-world scenarios. However, due to the nonconvexities that arise in power generation systems, there is not yet a fast, robust solution technique for the full Alternating Current Optimal Power Flow (ACOPF). In the last decades, power grids have evolved into a typical dynamic, non-linear and large-scale control system, known as the power system, so searching for better and faster ACOPF solutions is becoming crucial. Appearance of Graph Neural Networks (GNN) has allowed the natural use of Machine Learning (ML) algorithms on graph data, such as power networks. On the other hand, Deep Reinforcement Learning (DRL) is known for its powerful capability to solve complex decision-making problems. Although solutions that use these two methods separately are beginning to appear in the literature, none has yet combined the advantages of both. We propose a novel architecture based on the Proximal Policy Optimization algorithm with Graph Neural Networks to solve the Optimal Power Flow. The objective is to design an architecture that learns how to solve the optimization problem and that is at the same time able to generalize to unseen scenarios. We compare our solution with the DCOPF in terms of cost after having trained our DRL agent on IEEE 30 bus system and then computing the OPF on that base network with topology changes
translated by 谷歌翻译
多阶段随机线性问题(MSLP)的解决方案代表了许多应用程序的挑战。长期水热调度计划(LHDP)在影响全球电力市场,经济和自然资源的现实世界中实现了这一挑战。没有用于MSLP的封闭式解决方案,并且具有高质量的非预期策略的定义是至关重要的。线性决策规则(LDR)提供了一个有趣的基于模拟的框架,可通过两阶段随机模型为MSLP找到高质量的策略。但是,在实际应用中,使用LDR时要估计的参数数量可能接近或高于样本平均近似问题的场景数量,从而在样本外产生样本外的过度效果和差的表现不佳模拟。在本文中,我们提出了一个新型的正则LDR来基于Adalasso(自适应最少的绝对收缩和选择算子)求解MSLP。目的是使用高维线性回归模型中所研究的简约原理,以获得应用于MSLP的LDR的更好的样本外部性能。计算实验表明,使用经典的非规范LDR来求解LHDP时,过度合适的威胁是不可忽略的,这是研究最多的MSLP之一,其中具有相关应用在行业中。我们的分析强调了拟议框架与非规范化基准相比的以下好处:1)非零系数的数量显着减少(模型简约),2)2)大幅度降低样本外评估的成本降低, 3)改善了现货价格概况。
translated by 谷歌翻译
我们提供了PelficGridWorld软件包,为用户提供轻量级,模块化和可定制的框架,用于创建专注的电源系统的多代理体育馆环境,该环境易于与强化学习(RL)的现有培训框架集成。虽然存在许多框架用于训练多代理RL(MARL)政策,但没有可以快速原型并发开发环境,尤其是在所需电流解决方案来定义网格的异构(复合式,多器件)电力系统的背景下 - 级别变量和成本。 PowerGridWorld是一个开源软件包,有助于填补此间隙。为了突出PowerGridWorld的关键功能,我们展示了两个案例研究,并使用Openai的多代理深度确定性政策梯度(MADDPG)和RLLIB的近端策略优化(PPO)算法来演示MARL政策。在这两种情况下,至少一些代理子集合在每次作为奖励(负成本)结构的一部分中的一部分中的功率流溶液的元件。
translated by 谷歌翻译
电动汽车快速采用(EVS)要求广泛安装EV充电站。为了最大限度地提高充电站的盈利能力,提供充电和电网服务的智能控制器实际上很需要。然而,由于不确定的到达时间和EVS的充电需求,确定最佳充电时间表具有挑战性。在本文中,我们提出了一种新的集中分配和分散执行(CADE)强化学习(RL)框架,以最大限度地提高收费站的利润。在集中分配过程中,EVS被分配给等待或充电点。在分散的执行过程中,每个充电器都在学习来自共享重放内存的动作值函数的同时使其自己的充电/放电决定。该CADE框架显着提高了RL算法的可扩展性和采样效率。数值结果表明,所提出的CADE框架既有计算高效且可扩展,显着优于基线模型预测控制(MPC)。我们还提供了对学习的动作值的深入分析,以解释加强学习代理的内部工作。
translated by 谷歌翻译
增强现有传输线是对抗传输拥塞并保证传输安全性随需求增加并增强可再生能源的有用工具。这项研究涉及选择其容量应扩大的线路的选择,以及从独立系统操作员(ISO)的角度来看,通过考虑传输线约束以及发电和需求平衡条件,并结合坡道 - 上升和启动坡道率,关闭坡道速率,坡度降低率限制以及最小降低时间。为此,我们开发了ISO单元承诺和经济调度模型,并将其作为混合整数线性编程(MILP)问题的右侧不确定性多个参数分析。我们首先放松二进制变量,以连续变量并采用拉格朗日方法和Karush-Kuhn-Tucker条件,以获得最佳的解决方案(最佳决策变量和目标函数)以及与主动和无效约束相关的关键区域。此外,我们通过确定每个节点处的问题上限,然后比较上限和下限之间的差异,并在决策制造商中达到近似最佳解决方案,从而扩展传统分支和界限方法,以解决大规模MILP问题。可耐受的误差范围。另外,目标函数在每行参数上的第一个衍生物用于告知各行的选择,以简化拥塞和最大化社会福利。最后,通过平衡目标函数的成本率和阵容升级成本来选择容量升级的量。我们的发现得到了数值模拟的支持,并为传输线计划提供了决策指导。
translated by 谷歌翻译
Traditional power grid systems have become obsolete under more frequent and extreme natural disasters. Reinforcement learning (RL) has been a promising solution for resilience given its successful history of power grid control. However, most power grid simulators and RL interfaces do not support simulation of power grid under large-scale blackouts or when the network is divided into sub-networks. In this study, we proposed an updated power grid simulator built on Grid2Op, an existing simulator and RL interface, and experimented on limiting the action and observation spaces of Grid2Op. By testing with DDQN and SliceRDQN algorithms, we found that reduced action spaces significantly improve training performance and efficiency. In addition, we investigated a low-rank neural network regularization method for deep Q-learning, one of the most widely used RL algorithms, in this power grid control scenario. As a result, the experiment demonstrated that in the power grid simulation environment, adopting this method will significantly increase the performance of RL agents.
translated by 谷歌翻译
在具有可再生生成的大量份额的网格中,由于负载和发电的波动性增加,运营商将需要其他工具来评估运营风险。正向不确定性传播问题的计算要求必须解决众多安全受限的经济调度(SCED)优化,是这种实时风险评估的主要障碍。本文提出了一个即时风险评估学习框架(Jitralf)作为替代方案。 Jitralf训练风险代理,每天每小时一个,使用机器学习(ML)来预测估计风险所需的数量,而无需明确解决SCED问题。这大大减轻了正向不确定性传播的计算负担,并允许快速,实时的风险估计。本文还提出了一种新颖的,不对称的损失函数,并表明使用不对称损失训练的模型的性能优于使用对称损耗函数的模型。在法国传输系统上评估了Jitralf,以评估运营储量不足的风险,减轻负载的风险和预期的运营成本。
translated by 谷歌翻译
安全约束的经济调度(SCED)是传输系统运营商(TSO)的基本优化模型,以清除实时能源市场,同时确保电网的可靠操作。在不断增长的运营不确定性的背景下,由于可再生发电机和分布式能源资源的渗透率增加,运营商必须实时监控风险,即,他们必须在负载和可再生生产的各种变化下快速评估系统的行为。遗憾的是,鉴于实时操作的严格约束,系统地解决了每个这样的场景的优化问题。为了克服这种限制,本文提出了学习SCED,即机器学习(ML)模型的优化代理,其可以预测用于以毫秒为单位的最佳解决方案。本文提出了对MISO市场清算优化优化的原则性分析,提出了一种新颖的ML管道,解决了学习SCES解决方案的主要挑战,即负载,可再生产量和生产成本以及组合结构的变化,以及组合结构承诺决定。还提出了一种新的分类 - 然后回归架构,以进一步捕获SCED解决方案的行为。在法国传输系统上报告了数值实验,并展示了该方法在与实时操作兼容的时间范围内生产的能力,精确的优化代理产生相对误差低于0.6 \%$。
translated by 谷歌翻译