随着可再生能源的延伸升幅,盘中电市场在交易商和电力公用事业中录得不断增长的普及,以应对能源供应的诱导波动。通过其短途交易地平线和持续的性质,盘中市场提供了调整日前市场的交易决策的能力,或者在短期通知中降低交易风险。通过根据当前预测修改其提供的能力,可再生能源的生产者利用盘中市场降低预测风险。然而,由于电网必须保持稳定,电力仅部分可存储,因此市场动态很复杂。因此,需要在盘区市场中运营的强大和智能交易策略。在这项工作中,我们提出了一种基于深度加强学习(DRL)算法的新型自主交易方法作为可能的解决方案。为此目的,我们将盘区贸易塑造为马尔可夫决策问题(MDP),并采用近端策略优化(PPO)算法作为我们的DRL方法。介绍了一种模拟框架,使得连续盘整价格的分辨率提供一分钟步骤。从风园运营商的角度来看,我们在案例研究中测试我们的框架。我们在普通贸易信息旁边包括价格和风险预测。在2018年德国盘区交易结果的测试场景中,我们能够以至少45.24%的改进优于多个基线,显示DRL算法的优势。但是,我们还讨论了DRL代理的局限性和增强功能,以便在未来的工作中提高性能。
translated by 谷歌翻译
最佳执行是算法交易中节省成本的顺序决策问题。研究发现,加强学习(RL)可以帮助确定订单分类的大小。但是,问题尚未解决:如何以适当的限制价格下达限额订单?关键挑战在于动作空间的“连续折叠双重性”。一方面,使用价格变化百分比变化的连续行动空间是概括。另一方面,交易者最终需要离散地选择限制价格,这是由于tick尺寸的存在,这需要对每个具有不同特征(例如流动性和价格范围)的单人进行专业化。因此,我们需要连续控制进行概括和离散控制以进行专业化。为此,我们提出了一种混合RL方法来结合两者的优势。我们首先使用连续的控制代理来范围范围,然后部署细粒代理以选择特定的限制价格。广泛的实验表明,与现有的RL算法相比,我们的方法具有更高的样本效率和更好的训练稳定性,并且显着优于先前基于学习的方法的订单执行方法。
translated by 谷歌翻译
Driven by the global decarbonization effort, the rapid integration of renewable energy into the conventional electricity grid presents new challenges and opportunities for the battery energy storage system (BESS) participating in the energy market. Energy arbitrage can be a significant source of revenue for the BESS due to the increasing price volatility in the spot market caused by the mismatch between renewable generation and electricity demand. In addition, the Frequency Control Ancillary Services (FCAS) markets established to stabilize the grid can offer higher returns for the BESS due to their capability to respond within milliseconds. Therefore, it is crucial for the BESS to carefully decide how much capacity to assign to each market to maximize the total profit under uncertain market conditions. This paper formulates the bidding problem of the BESS as a Markov Decision Process, which enables the BESS to participate in both the spot market and the FCAS market to maximize profit. Then, Proximal Policy Optimization, a model-free deep reinforcement learning algorithm, is employed to learn the optimal bidding strategy from the dynamic environment of the energy market under a continuous bidding scale. The proposed model is trained and validated using real-world historical data of the Australian National Electricity Market. The results demonstrate that our developed joint bidding strategy in both markets is significantly profitable compared to individual markets.
translated by 谷歌翻译
Global power systems are increasingly reliant on wind energy as a mitigation strategy for climate change. However, the variability of wind energy causes system reliability to erode, resulting in the wind being curtailed and, ultimately, leading to substantial economic losses for wind farm owners. Wind curtailment can be reduced using battery energy storage systems (BESS) that serve as onsite backup sources. Yet, this auxiliary role may significantly hamper the BESS's capacity to generate revenues from the electricity market, particularly in conducting energy arbitrage in the Spot market and providing frequency control ancillary services (FCAS) in the FCAS markets. Ideal BESS scheduling should effectively balance the BESS's role in absorbing onsite wind curtailment and trading in the electricity market, but it is difficult in practice because of the underlying coordination complexity and the stochastic nature of energy prices and wind generation. In this study, we investigate the bidding strategy of a wind-battery system co-located and participating simultaneously in both the Spot and Regulation FCAS markets. We propose a deep reinforcement learning (DRL)-based approach that decouples the market participation of the wind-battery system into two related Markov decision processes for each facility, enabling the BESS to absorb onsite wind curtailment while simultaneously bidding in the wholesale Spot and FCAS markets to maximize overall operational revenues. Using realistic wind farm data, we validated the coordinated bidding strategy for the wind-battery system and find that our strategy generates significantly higher revenue and responds better to wind curtailment compared to an optimization-based benchmark. Our results show that joint-market bidding can significantly improve the financial performance of wind-battery systems compared to individual market participation.
translated by 谷歌翻译
本文利用了强化学习和深度学习的最新发展来解决供应链库存管理(SCIM)问题,这是一个复杂的顺序决策问题,包括确定在给定时间范围内生产和运送到不同仓库的最佳产品数量。给出了随机两回波供应链环境的数学公式,该公式可以管理任意数量的仓库和产品类型。此外,开发了一个与深钢筋学习(DRL)算法接口的开源库,并公开可用于解决遇险问题。通过在合成生成的数据上进行了丰富的数值实验,比较了最新的DRL算法实现的性能。实验计划的设计和执行,包括供应链的不同结构,拓扑,需求,能力和成本。结果表明,PPO算法非常适合环境的不同特征。 VPG算法几乎总是会收敛到局部最大值,即使它通常达到可接受的性能水平。最后,A3C是最快的算法,但是就像VPG一样,与PPO相比,它从未取得最好的性能。总之,数值实验表明,DRL的性能始终如一,比标准的重新订购策略(例如静态(S,Q) - policy)更好。因此,它可以被认为是解决随机两回波问题的现实世界实例的实用和有效选择。
translated by 谷歌翻译
As an efficient way to integrate multiple distributed energy resources and the user side, a microgrid is mainly faced with the problems of small-scale volatility, uncertainty, intermittency and demand-side uncertainty of DERs. The traditional microgrid has a single form and cannot meet the flexible energy dispatch between the complex demand side and the microgrid. In response to this problem, the overall environment of wind power, thermostatically controlled loads, energy storage systems, price-responsive loads and the main grid is proposed. Secondly, the centralized control of the microgrid operation is convenient for the control of the reactive power and voltage of the distributed power supply and the adjustment of the grid frequency. However, there is a problem in that the flexible loads aggregate and generate peaks during the electricity price valley. The existing research takes into account the power constraints of the microgrid and fails to ensure a sufficient supply of electric energy for a single flexible load. This paper considers the response priority of each unit component of TCLs and ESSs on the basis of the overall environment operation of the microgrid so as to ensure the power supply of the flexible load of the microgrid and save the power input cost to the greatest extent. Finally, the simulation optimization of the environment can be expressed as a Markov decision process process. It combines two stages of offline and online operations in the training process. The addition of multiple threads with the lack of historical data learning leads to low learning efficiency. The asynchronous advantage actor-critic with the experience replay pool memory library is added to solve the data correlation and nonstatic distribution problems during training.
translated by 谷歌翻译
由于数据量增加,金融业的快速变化已经彻底改变了数据处理和数据分析的技术,并带来了新的理论和计算挑战。与古典随机控制理论和解决财务决策问题的其他分析方法相比,解决模型假设的财务决策问题,强化学习(RL)的新发展能够充分利用具有更少模型假设的大量财务数据并改善复杂的金融环境中的决策。该调查纸目的旨在审查最近的资金途径的发展和使用RL方法。我们介绍了马尔可夫决策过程,这是许多常用的RL方法的设置。然后引入各种算法,重点介绍不需要任何模型假设的基于价值和基于策略的方法。连接是用神经网络进行的,以扩展框架以包含深的RL算法。我们的调查通过讨论了这些RL算法在金融中各种决策问题中的应用,包括最佳执行,投资组合优化,期权定价和对冲,市场制作,智能订单路由和Robo-Awaring。
translated by 谷歌翻译
Energy management systems (EMS) are becoming increasingly important in order to utilize the continuously growing curtailed renewable energy. Promising energy storage systems (ESS), such as batteries and green hydrogen should be employed to maximize the efficiency of energy stakeholders. However, optimal decision-making, i.e., planning the leveraging between different strategies, is confronted with the complexity and uncertainties of large-scale problems. Here, we propose a sophisticated deep reinforcement learning (DRL) methodology with a policy-based algorithm to realize the real-time optimal ESS planning under the curtailed renewable energy uncertainty. A quantitative performance comparison proved that the DRL agent outperforms the scenario-based stochastic optimization (SO) algorithm, even with a wide action and observation space. Owing to the uncertainty rejection capability of the DRL, we could confirm a robust performance, under a large uncertainty of the curtailed renewable energy, with a maximizing net profit and stable system. Action-mapping was performed for visually assessing the action taken by the DRL agent according to the state. The corresponding results confirmed that the DRL agent learns the way like what a human expert would do, suggesting reliable application of the proposed methodology.
translated by 谷歌翻译
在本文中,我们开发了一个模块化框架,用于将强化学习应用于最佳贸易执行问题。该框架的设计考虑了灵活性,以便简化不同的仿真设置的实现。我们不关注代理和优化方法,而是专注于环境,并分解必要的要求,以模拟在强化学习框架下的最佳贸易执行,例如数据预处理,观察结果的构建,行动处理,儿童订单执行,模拟,模拟我们给出了每个组件的示例,探索他们的各个实现\&它们之间的相互作用所带来的困难,并讨论每个组件在模拟中引起的不同现象,并突出了模拟与行为之间的分歧,并讨论了不同的现象。真正的市场。我们通过设置展示我们的模块化实施,该设置是按照时间加权的平均价格(TWAP)提交时间表,允许代理人专门放置限制订单,并通过迭代的迭代来模拟限制订单(LOB)(LOB)和根据相同的时间表,将奖励计算为TWAP基准算法所达到的价格的\ $改进。我们还制定了评估程序,以在培训视野的间隔内纳入给定代理的迭代重新训练和评估,并模仿代理在随着新市场数据的可用而连续再培训时的行为,并模拟算法提供者是限制的监测实践在当前的监管框架下执行。
translated by 谷歌翻译
这篇科学论文提出了一种新型的投资组合优化模型,使用改进的深钢筋学习算法。优化模型的目标函数是投资组合累积回报的期望和价值的加权总和。所提出的算法基于参与者 - 批判性架构,其中关键网络的主要任务是使用分位数回归学习投资组合累积返回的分布,而Actor网络通过最大化上述目标函数来输出最佳投资组合权重。同时,我们利用线性转换功能来实现资产短销售。最后,使用了一种称为APE-X的多进程方法来加速深度强化学习训练的速度。为了验证我们提出的方法,我们对两个代表性的投资组合进行了重新测试,并观察到这项工作中提出的模型优于基准策略。
translated by 谷歌翻译
提出了一个新颖的框架,用于使用模仿的增强学习(RL)解决最佳执行和放置问题。从拟议的框架中训练的RL代理商在执行成本中始终优于行业基准计时加权平均价格(TWAP)策略,并在样本外交易日期和股票方面表现出了巨大的概括。从三个方面实现了令人印象深刻的表现。首先,我们的RL网络架构称为双窗口Denoise PPO在嘈杂的市场环境中启用了有效的学习。其次,设计了模仿学习的奖励计划,并研究了一组全面的市场功能。第三,我们的灵活动作公式使RL代理能够解决最佳执行和放置,从而使性能更好地比分别解决个体问题。 RL代理的性能在我们的多代理现实历史限制顺序模拟器中进行了评估,在该模拟器中,对价格影响进行了准确评估。此外,还进行了消融研究,证实了我们框架的优势。
translated by 谷歌翻译
资产分配(或投资组合管理)是确定如何最佳将有限预算的资金分配给一系列金融工具/资产(例如股票)的任务。这项研究调查了使用无模型的深RL代理应用于投资组合管理的增强学习(RL)的性能。我们培训了几个RL代理商的现实股票价格,以学习如何执行资产分配。我们比较了这些RL剂与某些基线剂的性能。我们还比较了RL代理,以了解哪些类别的代理表现更好。从我们的分析中,RL代理可以执行投资组合管理的任务,因为它们的表现明显优于基线代理(随机分配和均匀分配)。四个RL代理(A2C,SAC,PPO和TRPO)总体上优于最佳基线MPT。这显示了RL代理商发现更有利可图的交易策略的能力。此外,基于价值和基于策略的RL代理之间没有显着的性能差异。演员批评者的表现比其他类型的药物更好。同样,在政策代理商方面的表现要好,因为它们在政策评估方面更好,样品效率在投资组合管理中并不是一个重大问题。这项研究表明,RL代理可以大大改善资产分配,因为它们的表现优于强基础。基于我们的分析,在政策上,参与者批评的RL药物显示出最大的希望。
translated by 谷歌翻译
我们考虑单个强化学习与基于事件驱动的代理商金融市场模型相互作用时学习最佳执行代理的学习动力。交易在事件时间内通过匹配引擎进行异步进行。最佳执行代理在不同级别的初始订单尺寸和不同尺寸的状态空间上进行考虑。使用校准方法考虑了对基于代理的模型和市场的影响,该方法探讨了经验性风格化事实和价格影响曲线的变化。收敛,音量轨迹和动作痕迹图用于可视化学习动力学。这表明了最佳执行代理如何在模拟的反应性市场框架内学习最佳交易决策,以及如何通过引入战略订单分类来改变模拟市场的反反应。
translated by 谷歌翻译
在本文中,多种子体增强学习用于控制混合能量存储系统,通过最大化可再生能源和交易的价值来降低微电网的能量成本。该代理商必须学习在波动需求,动态批发能源价格和不可预测的可再生能源中,控制三种不同类型的能量存储系统。考虑了两种案例研究:首先看能量存储系统如何在动态定价下更好地整合可再生能源发电,第二种与这些同一代理商如何与聚合剂一起使用,以向自私外部微电网销售能量的能量减少自己的能源票据。这项工作发现,具有分散执行的多代理深度确定性政策梯度的集中学习及其最先进的变体允许多种代理方法显着地比来自单个全局代理的控制更好。还发现,在多种子体方法中使用单独的奖励功能比使用单个控制剂更好。还发现能够与其他微电网交易,而不是卖回实用电网,也发现大大增加了网格的储蓄。
translated by 谷歌翻译
我们解决了多梯队供应链中生产规划和分布的问题。我们考虑不确定的需求和铅,这使得问题随机和非线性。提出了马尔可夫决策过程配方和非线性编程模型。作为一个顺序决策问题,深度加强学习(RL)是一种可能的解决方案方法。近年来,这种类型的技术从人工智能和优化社区获得了很多关注。考虑到不同领域的深入RL接近获得的良好结果,对在运营研究领域的问题中造成越来越兴趣的兴趣。我们使用了深入的RL技术,即近端政策优化(PPO2),解决了考虑不确定,定期和季节性需求和常数或随机交货时间的问题。实验在不同的场景中进行,以更好地评估算法的适用性。基于线性化模型的代理用作基线。实验结果表明,PPO2是这种类型的问题的竞争力和适当的工具。 PPO2代理在所有情景中的基线都优于基线,随机交货时间(7.3-11.2%),无论需求是否是季节性的。在具有恒定交货时间的情况下,当不确定的需求是非季节性的时,PPO2代理更好(2.2-4.7%)。结果表明,这种情况的不确定性越大,这种方法的可行性就越大。
translated by 谷歌翻译
Ongoing risks from climate change have impacted the livelihood of global nomadic communities, and are likely to lead to increased migratory movements in coming years. As a result, mobility considerations are becoming increasingly important in energy systems planning, particularly to achieve energy access in developing countries. Advanced Plug and Play control strategies have been recently developed with such a decentralized framework in mind, more easily allowing for the interconnection of nomadic communities, both to each other and to the main grid. In light of the above, the design and planning strategy of a mobile multi-energy supply system for a nomadic community is investigated in this work. Motivated by the scale and dimensionality of the associated uncertainties, impacting all major design and decision variables over the 30-year planning horizon, Deep Reinforcement Learning (DRL) is implemented for the design and planning problem tackled. DRL based solutions are benchmarked against several rigid baseline design options to compare expected performance under uncertainty. The results on a case study for ger communities in Mongolia suggest that mobile nomadic energy systems can be both technically and economically feasible, particularly when considering flexibility, although the degree of spatial dispersion among households is an important limiting factor. Key economic, sustainability and resilience indicators such as Cost, Equivalent Emissions and Total Unmet Load are measured, suggesting potential improvements compared to available baselines of up to 25%, 67% and 76%, respectively. Finally, the decomposition of values of flexibility and plug and play operation is presented using a variation of real options theory, with important implications for both nomadic communities and policymakers focused on enabling their energy access.
translated by 谷歌翻译
Heating in private households is a major contributor to the emissions generated today. Heat pumps are a promising alternative for heat generation and are a key technology in achieving our goals of the German energy transformation and to become less dependent on fossil fuels. Today, the majority of heat pumps in the field are controlled by a simple heating curve, which is a naive mapping of the current outdoor temperature to a control action. A more advanced control approach is model predictive control (MPC) which was applied in multiple research works to heat pump control. However, MPC is heavily dependent on the building model, which has several disadvantages. Motivated by this and by recent breakthroughs in the field, this work applies deep reinforcement learning (DRL) to heat pump control in a simulated environment. Through a comparison to MPC, it could be shown that it is possible to apply DRL in a model-free manner to achieve MPC-like performance. This work extends other works which have already applied DRL to building heating operation by performing an in-depth analysis of the learned control strategies and by giving a detailed comparison of the two state-of-the-art control methods.
translated by 谷歌翻译
More and more stock trading strategies are constructed using deep reinforcement learning (DRL) algorithms, but DRL methods originally widely used in the gaming community are not directly adaptable to financial data with low signal-to-noise ratios and unevenness, and thus suffer from performance shortcomings. In this paper, to capture the hidden information, we propose a DRL based stock trading system using cascaded LSTM, which first uses LSTM to extract the time-series features from stock daily data, and then the features extracted are fed to the agent for training, while the strategy functions in reinforcement learning also use another LSTM for training. Experiments in DJI in the US market and SSE50 in the Chinese stock market show that our model outperforms previous baseline models in terms of cumulative returns and Sharp ratio, and this advantage is more significant in the Chinese stock market, a merging market. It indicates that our proposed method is a promising way to build a automated stock trading system.
translated by 谷歌翻译
Reformulating the history matching problem from a least-square mathematical optimization problem into a Markov Decision Process introduces a method in which reinforcement learning can be utilized to solve the problem. This method provides a mechanism where an artificial deep neural network agent can interact with the reservoir simulator and find multiple different solutions to the problem. Such formulation allows for solving the problem in parallel by launching multiple concurrent environments enabling the agent to learn simultaneously from all the environments at once, achieving significant speed up.
translated by 谷歌翻译
通过提供流动性,市场制造商在金融市场中发挥着关键作用。他们通常填写订单书籍,以购买和出售限额订单,以便为交易员提供替代价格水平来运营。本文精确地侧重于从基于代理人的角度研究这些市场制造商战略的研究。特别是,我们提出了加强学习(RL)在模拟股市中创建智能市场标志的应用。本研究分析了RL市场制造商代理在非竞争性(同时只有一个RL市场制造商学习)和竞争方案(同时学习的多个RL市场标记)以及如何调整其在SIM2REAL范围内的策略有很有趣的结果。此外,它涵盖了不同实验之间的政策转移的应用,描述了竞争环境对RL代理表现的影响。 RL和Deep RL技术被证明是有利可图的市场制造商方法,从而更好地了解他们在股票市场的行为。
translated by 谷歌翻译