最近的波能转化器(WEC)配备了多个腿和发电机,以最大程度地发电。传统控制器显示出捕获复杂波形模式的局限性,并且控制器必须有效地最大化能量捕获。本文介绍了多项式增强学习控制器(MARL),该控制器的表现优于传统使用的弹簧减震器控制器。我们的最初研究表明,问题的复杂性质使训练很难融合。因此,我们提出了一种新颖的跳过训练方法,使MARL训练能够克服性能饱和,并与默认的MARL训练相比,融合到最佳控制器,从而增强发电。我们还提出了另一种新型的混合训练初始化(STHTI)方法,其中最初可以单独针对基线弹簧减震器(SD)控制器对MARL控制器的个别代理进行训练,然后在将来一次或将来培训一个代理商或全部培训加速收敛。我们使用异步参与者-Critic(A3C)算法在基线弹簧减震器控制器上实现了基线弹簧减震器控制器的能源效率的两位数提高。
translated by 谷歌翻译
从意外的外部扰动中恢复的能力是双模型运动的基本机动技能。有效的答复包括不仅可以恢复平衡并保持稳定性的能力,而且在平衡恢复物质不可行时,也可以保证安全的方式。对于与双式运动有关的机器人,例如人形机器人和辅助机器人设备,可帮助人类行走,设计能够提供这种稳定性和安全性的控制器可以防止机器人损坏或防止伤害相关的医疗费用。这是一个具有挑战性的任务,因为它涉及用触点产生高维,非线性和致动系统的高动态运动。尽管使用基于模型和优化方法的前进方面,但诸如广泛领域知识的要求,诸如较大的计算时间和有限的动态变化的鲁棒性仍然会使这个打开问题。在本文中,为了解决这些问题,我们开发基于学习的算法,能够为两种不同的机器人合成推送恢复控制政策:人形机器人和有助于双模型运动的辅助机器人设备。我们的工作可以分为两个密切相关的指示:1)学习人形机器人的安全下降和预防策略,2)使用机器人辅助装置学习人类的预防策略。为实现这一目标,我们介绍了一套深度加强学习(DRL)算法,以学习使用这些机器人时提高安全性的控制策略。
translated by 谷歌翻译
Legged systems have many advantages when compared to their wheeled counterparts. For example, they can more easily navigate extreme, uneven terrain. However, there are disadvantages as well, particularly the difficulty seen in modeling the nonlinearities of the system. Research has shown that using flexible components within legged locomotive systems improves performance measures such as efficiency and running velocity. Because of the difficulties encountered in modeling flexible systems, control methods such as reinforcement learning can be used to define control strategies. Furthermore, reinforcement learning can be tasked with learning mechanical parameters of a system to match a control input. It is shown in this work that when deploying reinforcement learning to find design parameters for a pogo-stick jumping system, the designs the agents learn are optimal within the design space provided to the agents.
translated by 谷歌翻译
本文介绍了电力系统运营商的域知识如何集成到强化学习(RL)框架中,以有效学习控制电网拓扑以防止热级联的代理。由于大搜索/优化空间,典型的基于RL的拓扑控制器无法表现良好。在这里,我们提出了一个基于演员 - 评论家的代理,以解决问题的组合性质,并使用由RTE,法国TSO开发的RL环境训练代理。为了解决大型优化空间的挑战,通过使用网络物理修改环境以增强代理学习来纳入训练过程中的基于奖励调整的基于课程的方法。此外,采用多种方案的并行训练方法来避免将代理偏置到几种情况,并使其稳健地对网格操作中的自然变异性。如果没有对培训过程进行这些修改,则RL代理失败了大多数测试场景,说明了正确整合物理系统的域知识以获得真实世界的RL学习的重要性。该代理通过RTE测试2019年学习,以运行电力网络挑战,并以精确度和第1位的速度授予第2位。开发的代码是公共使用开放的。
translated by 谷歌翻译
当前气候的快速变化增加了改变能源生产和消费管理的紧迫性,以减少碳和其他绿色房屋的生产。在这种情况下,法国电力网络管理公司RTE(r {\'e} seau de Transport d'{\'e} lectricit {\'e})最近发布了一项广泛的研究结果,概述了明天法国法语的各种情况能源管理。我们提出一个挑战,将测试这种情况的可行性。目的是控制电力网络中的电力运输,同时追求多种目标:平衡生产和消费,最大程度地减少能量损失,并确保人员和设备安全,尤其是避免灾难性的失败。虽然应用程序的重要性本身提供了一个目标,但该挑战也旨在推动人工智能分支(AI)(AI)的最先进,称为强化学习(RL),该研究提供了解决控制问题的新可能性。特别是,在该应用领域中,深度学习和RL的组合组合的各个方面仍然需要利用。该挑战属于2019年开始的系列赛,名称为“学习运行电力网络”(L2RPN)。在这个新版本中,我们介绍了RTE提出的新的更现实的场景,以便到2050年到达碳中立性,将化石燃料电力产生,增加了可再生和核能的比例,并引入了电池。此外,我们使用最先进的加强学习算法提供基线来刺激未来的参与者。
translated by 谷歌翻译
With the development of deep representation learning, the domain of reinforcement learning (RL) has become a powerful learning framework now capable of learning complex policies in high dimensional environments. This review summarises deep reinforcement learning (DRL) algorithms and provides a taxonomy of automated driving tasks where (D)RL methods have been employed, while addressing key computational challenges in real world deployment of autonomous driving agents. It also delineates adjacent domains such as behavior cloning, imitation learning, inverse reinforcement learning that are related but are not classical RL algorithms. The role of simulators in training agents, methods to validate, test and robustify existing solutions in RL are discussed.
translated by 谷歌翻译
在本文中,多种子体增强学习用于控制混合能量存储系统,通过最大化可再生能源和交易的价值来降低微电网的能量成本。该代理商必须学习在波动需求,动态批发能源价格和不可预测的可再生能源中,控制三种不同类型的能量存储系统。考虑了两种案例研究:首先看能量存储系统如何在动态定价下更好地整合可再生能源发电,第二种与这些同一代理商如何与聚合剂一起使用,以向自私外部微电网销售能量的能量减少自己的能源票据。这项工作发现,具有分散执行的多代理深度确定性政策梯度的集中学习及其最先进的变体允许多种代理方法显着地比来自单个全局代理的控制更好。还发现,在多种子体方法中使用单独的奖励功能比使用单个控制剂更好。还发现能够与其他微电网交易,而不是卖回实用电网,也发现大大增加了网格的储蓄。
translated by 谷歌翻译
Collaborative autonomous multi-agent systems covering a specified area have many potential applications, such as UAV search and rescue, forest fire fighting, and real-time high-resolution monitoring. Traditional approaches for such coverage problems involve designing a model-based control policy based on sensor data. However, designing model-based controllers is challenging, and the state-of-the-art classical control policy still exhibits a large degree of suboptimality. In this paper, we present a reinforcement learning (RL) approach for the multi-agent coverage problem involving agents with second-order dynamics. Our approach is based on the Multi-Agent Proximal Policy Optimization Algorithm (MAPPO). To improve the stability of the learning-based policy and efficiency of exploration, we utilize an imitation loss based on the state-of-the-art classical control policy. Our trained policy significantly outperforms the state-of-the-art. Our proposed network architecture includes incorporation of self attention, which allows a single-shot domain transfer of the trained policy to a large variety of domain shapes and number of agents. We demonstrate our proposed method in a variety of simulated experiments.
translated by 谷歌翻译
In order to avoid conventional controlling methods which created obstacles due to the complexity of systems and intense demand on data density, developing modern and more efficient control methods are required. In this way, reinforcement learning off-policy and model-free algorithms help to avoid working with complex models. In terms of speed and accuracy, they become prominent methods because the algorithms use their past experience to learn the optimal policies. In this study, three reinforcement learning algorithms; DDPG, TD3 and SAC have been used to train Fetch robotic manipulator for four different tasks in MuJoCo simulation environment. All of these algorithms are off-policy and able to achieve their desired target by optimizing both policy and value functions. In the current study, the efficiency and the speed of these three algorithms are analyzed in a controlled environment.
translated by 谷歌翻译
Eco-driving strategies have been shown to provide significant reductions in fuel consumption. This paper outlines an active driver assistance approach that uses a residual policy learning (RPL) agent trained to provide residual actions to default power train controllers while balancing fuel consumption against other driver-accommodation objectives. Using previous experiences, our RPL agent learns improved traction torque and gear shifting residual policies to adapt the operation of the powertrain to variations and uncertainties in the environment. For comparison, we consider a traditional reinforcement learning (RL) agent trained from scratch. Both agents employ the off-policy Maximum A Posteriori Policy Optimization algorithm with an actor-critic architecture. By implementing on a simulated commercial vehicle in various car-following scenarios, we find that the RPL agent quickly learns significantly improved policies compared to a baseline source policy but in some measures not as good as those eventually possible with the RL agent trained from scratch.
translated by 谷歌翻译
The high emission and low energy efficiency caused by internal combustion engines (ICE) have become unacceptable under environmental regulations and the energy crisis. As a promising alternative solution, multi-power source electric vehicles (MPS-EVs) introduce different clean energy systems to improve powertrain efficiency. The energy management strategy (EMS) is a critical technology for MPS-EVs to maximize efficiency, fuel economy, and range. Reinforcement learning (RL) has become an effective methodology for the development of EMS. RL has received continuous attention and research, but there is still a lack of systematic analysis of the design elements of RL-based EMS. To this end, this paper presents an in-depth analysis of the current research on RL-based EMS (RL-EMS) and summarizes the design elements of RL-based EMS. This paper first summarizes the previous applications of RL in EMS from five aspects: algorithm, perception scheme, decision scheme, reward function, and innovative training method. The contribution of advanced algorithms to the training effect is shown, the perception and control schemes in the literature are analyzed in detail, different reward function settings are classified, and innovative training methods with their roles are elaborated. Finally, by comparing the development routes of RL and RL-EMS, this paper identifies the gap between advanced RL solutions and existing RL-EMS. Finally, this paper suggests potential development directions for implementing advanced artificial intelligence (AI) solutions in EMS.
translated by 谷歌翻译
Compared with model-based control and optimization methods, reinforcement learning (RL) provides a data-driven, learning-based framework to formulate and solve sequential decision-making problems. The RL framework has become promising due to largely improved data availability and computing power in the aviation industry. Many aviation-based applications can be formulated or treated as sequential decision-making problems. Some of them are offline planning problems, while others need to be solved online and are safety-critical. In this survey paper, we first describe standard RL formulations and solutions. Then we survey the landscape of existing RL-based applications in aviation. Finally, we summarize the paper, identify the technical gaps, and suggest future directions of RL research in aviation.
translated by 谷歌翻译
Deep reinforcement learning is poised to revolutionise the field of AI and represents a step towards building autonomous systems with a higher level understanding of the visual world. Currently, deep learning is enabling reinforcement learning to scale to problems that were previously intractable, such as learning to play video games directly from pixels. Deep reinforcement learning algorithms are also applied to robotics, allowing control policies for robots to be learned directly from camera inputs in the real world. In this survey, we begin with an introduction to the general field of reinforcement learning, then progress to the main streams of value-based and policybased methods. Our survey will cover central algorithms in deep reinforcement learning, including the deep Q-network, trust region policy optimisation, and asynchronous advantage actor-critic. In parallel, we highlight the unique advantages of deep neural networks, focusing on visual understanding via reinforcement learning. To conclude, we describe several current areas of research within the field.
translated by 谷歌翻译
在许多机器人和工业应用中,传统的线性控制策略已经广泛研究和使用,但它们不应响应系统的总动态,以避免对非线性控制等非线性控制方案的繁琐计算,加强学习的预测控制应用可以提供替代解决方案本文介绍了在移动自拍的深度确定性政策梯度和近端策略优化的情况下实现了RL控制的实现,在移动自拍伸直倒立摆片EWIP系统这样的RL模型使得找到满意控制方案的任务更容易,并在自我调整时有效地响应动态。在本文中提供更好控制的参数,两个RL基础控制器被针对MPC控制器捕获,以基于EWIP系统的状态变量进行评估,同时遵循特定的所需轨迹
translated by 谷歌翻译
With the growing need to reduce energy consumption and greenhouse gas emissions, Eco-driving strategies provide a significant opportunity for additional fuel savings on top of other technological solutions being pursued in the transportation sector. In this paper, a model-free deep reinforcement learning (RL) control agent is proposed for active Eco-driving assistance that trades-off fuel consumption against other driver-accommodation objectives, and learns optimal traction torque and transmission shifting policies from experience. The training scheme for the proposed RL agent uses an off-policy actor-critic architecture that iteratively does policy evaluation with a multi-step return and policy improvement with the maximum posteriori policy optimization algorithm for hybrid action spaces. The proposed Eco-driving RL agent is implemented on a commercial vehicle in car following traffic. It shows superior performance in minimizing fuel consumption compared to a baseline controller that has full knowledge of fuel-efficiency tables.
translated by 谷歌翻译
机器人和与世界相互作用或互动的机器人和智能系统越来越多地被用来自动化各种任务。这些系统完成这些任务的能力取决于构成机器人物理及其传感器物体的机械和电气部件,例如,感知算法感知环境,并计划和控制算法以生产和控制算法来生产和控制算法有意义的行动。因此,通常有必要在设计具体系统时考虑这些组件之间的相互作用。本文探讨了以端到端方式对机器人系统进行任务驱动的合作的工作,同时使用推理或控制算法直接优化了系统的物理组件以进行任务性能。我们首先考虑直接优化基于信标的本地化系统以达到本地化准确性的问题。设计这样的系统涉及将信标放置在整个环境中,并通过传感器读数推断位置。在我们的工作中,我们开发了一种深度学习方法,以直接优化信标的放置和位置推断以达到本地化精度。然后,我们将注意力转移到了由任务驱动的机器人及其控制器优化的相关问题上。在我们的工作中,我们首先提出基于多任务增强学习的数据有效算法。我们的方法通过利用能够在物理设计的空间上概括设计条件的控制器,有效地直接优化了物理设计和控制参数,以直接优化任务性能。然后,我们对此进行跟进,以允许对离散形态参数(例如四肢的数字和配置)进行优化。最后,我们通过探索优化的软机器人的制造和部署来得出结论。
translated by 谷歌翻译
由于涉及的复杂动态和多标准优化,控制非静态双模型机器人具有挑战性。最近的作品已经证明了深度加强学习(DRL)的仿真和物理机器人的有效性。在这些方法中,通常总共总共汇总来自不同标准的奖励以学习单个值函数。但是,这可能导致混合奖励之间的依赖信息丢失并导致次优策略。在这项工作中,我们提出了一种新颖的奖励自适应加强学习,用于Biped运动,允许控制策略通过使用动态机制通过多标准同时优化。该方法应用多重批评,为每个奖励组件学习单独的值函数。这导致混合政策梯度。我们进一步提出了动态权重,允许每个组件以不同的优先级优化策略。这种混合动态和动态策略梯度(HDPG)设计使代理商更有效地学习。我们表明所提出的方法优于总结奖励方法,能够转移到物理机器人。 SIM-to-Real和Mujoco结果进一步证明了HDPG的有效性和泛化。
translated by 谷歌翻译
Machine learning frameworks such as Genetic Programming (GP) and Reinforcement Learning (RL) are gaining popularity in flow control. This work presents a comparative analysis of the two, bench-marking some of their most representative algorithms against global optimization techniques such as Bayesian Optimization (BO) and Lipschitz global optimization (LIPO). First, we review the general framework of the model-free control problem, bringing together all methods as black-box optimization problems. Then, we test the control algorithms on three test cases. These are (1) the stabilization of a nonlinear dynamical system featuring frequency cross-talk, (2) the wave cancellation from a Burgers' flow and (3) the drag reduction in a cylinder wake flow. We present a comprehensive comparison to illustrate their differences in exploration versus exploitation and their balance between `model capacity' in the control law definition versus `required complexity'. We believe that such a comparison paves the way toward the hybridization of the various methods, and we offer some perspective on their future development in the literature on flow control problems.
translated by 谷歌翻译
深度加强学习(RL)最近在机器人连续控制任务中表现出很大的承诺。尽管如此,在该静脉中心围绕集中式学习设置的研究,这在很大程度上依赖于机器人的所有组件之间的通信可用性。然而,现实世界中的代理商经常以分散的方式运作,由于潜伏期要求,有限的电力预算和安全问题。通过将机器人组件作为分​​散剂的系统配制,这项工作提出了一种用于连续控制的分散的多效增强学习框架。为此,我们首先开发一个合作的多眼PPO框架,允许在执行期间训练和分散的操作期间集中优化。但是,该系统仅接收全局奖励信号,该信号不会归因于每个代理。为了解决这一挑战,我们进一步提出了一个通用的游戏理论信用分配框架,它计算特定于代理的奖励信号。最后但并非最不重要的是,我们还将基于模型的RL模块纳入了我们的信用分配框架,这导致采样效率的显着提高。我们展示了我们对Mujoco机器人控制任务的实验结果框架的有效性。对于演示视频,请访问:https://youtu.be/gfyvpm4svey。
translated by 谷歌翻译
This work considers the problem of learning cooperative policies in complex, partially observable domains without explicit communication. We extend three classes of single-agent deep reinforcement learning algorithms based on policy gradient, temporal-difference error, and actor-critic methods to cooperative multi-agent systems. We introduce a set of cooperative control tasks that includes tasks with discrete and continuous actions, as well as tasks that involve hundreds of agents. The three approaches are evaluated against each other using different neural architectures, training procedures, and reward structures. Using deep reinforcement learning with a curriculum learning scheme, our approach can solve problems that were previously considered intractable by most multi-agent reinforcement learning algorithms. We show that policy gradient methods tend to outperform both temporal-difference and actor-critic methods when using feed-forward neural architectures. We also show that recurrent policies, while more difficult to train, outperform feed-forward policies on our evaluation tasks.
translated by 谷歌翻译