开发了计算流体结构动力学(CFD-CSD)和深增强学习(深度RL)的综合框架,用于控制复杂流动的飞级柔性翼飞行器。复杂流动中传单的动态非常不稳定,非线性,这使得模型动态具有挑战性。因此,传统的控制方法,其中模型的动态,不足以调节这种复杂的动态。因此,在本研究中,提出了解决流体和结构的整个控制方程的集成框架,以产生传单的控制策略。为了成功学习控制策略,需要准确和充足的动态数据。然而,满足基于复杂动态的数据的质量和数量非常困难,因为一般来说,更准确的数据更昂贵。在本研究中,提出了两项​​策略来处理困境。为了获得准确的数据,采用CFD-CSD精确预测动态。为了获得充分的数据,设计了一种新的数据再现方法,其中在节省动态的同时在各种情况下复制所获得的数据。通过这些数据,该框架在各种流动条件下了解控制策略,并且显示在复杂流场中的传单中具有显着性能。
translated by 谷歌翻译
Microswimmers can acquire information on the surrounding fluid by sensing mechanical queues. They can then navigate in response to these signals. We analyse this navigation by combining deep reinforcement learning with direct numerical simulations to resolve the hydrodynamics. We study how local and non-local information can be used to train a swimmer to achieve particular swimming tasks in a non-uniform flow field, in particular a zig-zag shear flow. The swimming tasks are (1) learning how to swim in the vorticity direction, (2) the shear-gradient direction, and (3) the shear flow direction. We find that access to lab frame information on the swimmer's instantaneous orientation is all that is required in order to reach the optimal policy for (1,2). However, information on both the translational and rotational velocities seem to be required to achieve (3). Inspired by biological microorganisms we also consider the case where the swimmers sense local information, i.e. surface hydrodynamic forces, together with a signal direction. This might correspond to gravity or, for micro-organisms with light sensors, a light source. In this case, we show that the swimmer can reach a comparable level of performance as a swimmer with access to lab frame variables. We also analyse the role of different swimming modes, i.e. pusher, puller, and neutral swimmers.
translated by 谷歌翻译
学习玩乒乓球是机器人的一个具有挑战性的任务,作为所需的各种笔画。最近的进展表明,深度加强学习(RL)能够在模拟环境中成功地学习最佳动作。然而,由于高勘探努力,RL在实际情况中的适用性仍然有限。在这项工作中,我们提出了一个现实的模拟环境,其中多种模型是为球的动态和机器人的运动学而建立的。代替训练端到端的RL模型,提出了一种具有TD3骨干的新的政策梯度方法,以基于击球时间基于球的预测状态来学习球拍笔划。在实验中,我们表明,所提出的方法显着优于仿真中现有的RL方法。此外,将域从仿真跨越现实,我们采用了一个有效的再培训方法,并在三种实际情况下测试。由此产生的成功率为98%,距离误差约为24.9厘米。总培训时间约为1.5小时。
translated by 谷歌翻译
Machine learning frameworks such as Genetic Programming (GP) and Reinforcement Learning (RL) are gaining popularity in flow control. This work presents a comparative analysis of the two, bench-marking some of their most representative algorithms against global optimization techniques such as Bayesian Optimization (BO) and Lipschitz global optimization (LIPO). First, we review the general framework of the model-free control problem, bringing together all methods as black-box optimization problems. Then, we test the control algorithms on three test cases. These are (1) the stabilization of a nonlinear dynamical system featuring frequency cross-talk, (2) the wave cancellation from a Burgers' flow and (3) the drag reduction in a cylinder wake flow. We present a comprehensive comparison to illustrate their differences in exploration versus exploitation and their balance between `model capacity' in the control law definition versus `required complexity'. We believe that such a comparison paves the way toward the hybridization of the various methods, and we offer some perspective on their future development in the literature on flow control problems.
translated by 谷歌翻译
从意外的外部扰动中恢复的能力是双模型运动的基本机动技能。有效的答复包括不仅可以恢复平衡并保持稳定性的能力,而且在平衡恢复物质不可行时,也可以保证安全的方式。对于与双式运动有关的机器人,例如人形机器人和辅助机器人设备,可帮助人类行走,设计能够提供这种稳定性和安全性的控制器可以防止机器人损坏或防止伤害相关的医疗费用。这是一个具有挑战性的任务,因为它涉及用触点产生高维,非线性和致动系统的高动态运动。尽管使用基于模型和优化方法的前进方面,但诸如广泛领域知识的要求,诸如较大的计算时间和有限的动态变化的鲁棒性仍然会使这个打开问题。在本文中,为了解决这些问题,我们开发基于学习的算法,能够为两种不同的机器人合成推送恢复控制政策:人形机器人和有助于双模型运动的辅助机器人设备。我们的工作可以分为两个密切相关的指示:1)学习人形机器人的安全下降和预防策略,2)使用机器人辅助装置学习人类的预防策略。为实现这一目标,我们介绍了一套深度加强学习(DRL)算法,以学习使用这些机器人时提高安全性的控制策略。
translated by 谷歌翻译
由于非线性动力学,执行器约束和耦合的纵向和横向运动,部分地,固定翼无人驾驶飞行器(无人机)的姿态控制是一个困难的控制问题。目前的最先进的自动驾驶仪基于线性控制,因此有限于其有效性和性能。深度加强学习(DRL)是一种通过与受控系统的交互自动发现最佳控制法的机器学习方法,可以处理复杂的非线性动态。我们在本文中展示DRL可以成功学习直接在原始非线性动态上运行的固定翼UAV的态度控制,需要短至三分钟的飞行数据。我们最初在仿真环境中培训我们的模型,然后在飞行测试中部署无人机的学习控制器,向最先进的ArduplaneProportional-Integry-artivation(PID)姿态控制器的表现展示了可比的性能,而无需进一步的在线学习。为了更好地理解学习控制器的操作,我们呈现了对其行为的分析,包括与现有良好调整的PID控制器的比较。
translated by 谷歌翻译
平衡机器人(Ballbot)是测试平衡控制器有效性的好平台。考虑到平衡控制,已经广泛使用了基于模型的反馈控制方法。但是,接触和碰撞很难建模,并且通常导致平衡控制失败,尤其是当球机器人倾斜的角度时。为了探索球机器人的最大初始倾斜角,平衡控制被解释为使用增强学习(RL)的恢复任务。 RL是难以建模的系统的强大技术,因为它允许代理通过与环境进行交互来学习策略。在本文中,通过将常规反馈控制器与RL方法相结合,提出了化合物控制器。我们通过训练代理成功执行涉及联系和碰撞的恢复任务来显示化合物控制器的有效性。仿真结果表明,与常规基于模型的控制器相比,使用化合物控制器可以在更大的初始倾斜角度下保持平衡。
translated by 谷歌翻译
With the development of experimental quantum technology, quantum control has attracted increasing attention due to the realization of controllable artificial quantum systems. However, because quantum-mechanical systems are often too difficult to analytically deal with, heuristic strategies and numerical algorithms which search for proper control protocols are adopted, and, deep learning, especially deep reinforcement learning (RL), is a promising generic candidate solution for the control problems. Although there have been a few successful applications of deep RL to quantum control problems, most of the existing RL algorithms suffer from instabilities and unsatisfactory reproducibility, and require a large amount of fine-tuning and a large computational budget, both of which limit their applicability. To resolve the issue of instabilities, in this dissertation, we investigate the non-convergence issue of Q-learning. Then, we investigate the weakness of existing convergent approaches that have been proposed, and we develop a new convergent Q-learning algorithm, which we call the convergent deep Q network (C-DQN) algorithm, as an alternative to the conventional deep Q network (DQN) algorithm. We prove the convergence of C-DQN and apply it to the Atari 2600 benchmark. We show that when DQN fail, C-DQN still learns successfully. Then, we apply the algorithm to the measurement-feedback cooling problems of a quantum quartic oscillator and a trapped quantum rigid body. We establish the physical models and analyse their properties, and we show that although both C-DQN and DQN can learn to cool the systems, C-DQN tends to behave more stably, and when DQN suffers from instabilities, C-DQN can achieve a better performance. As the performance of DQN can have a large variance and lack consistency, C-DQN can be a better choice for researches on complicated control problems.
translated by 谷歌翻译
我们应用META强化学习框架,优化用于空运导弹的集成和自适应引导和飞行控制系统,实现系统作为深度经常性神经网络(政策)。该策略地图直接观察到导弹控制表面偏转的导弹变化的变化率,与通过带下雷达导引率测量的计算稳定的视线单元向量的最小处理,从速率陀螺仪测量的估计旋转速度,控制表面偏转角。该系统将截距轨迹引导对机动轨迹,以满足鳍片偏转角上的控制约束,以及视图角度和负载上的路径约束。我们在六个自由度模拟器中测试优化系统,该模拟器包括非线性天线罩模型和挂钩寻求者模型。通过广泛的模拟,我们证明该系统可以适应大型飞行信封和偏离包括空气动力系数参数和压力中心的扰动的标称飞行条件。此外,我们发现该系统对由径向折射,不完美的寻求稳定和传感器比例因子误差引起的寄生态环是强大的。重要的是,我们将我们的系统的性能与三个环路自动驾驶仪耦合的比例导航的纵向模型进行比较,并发现我们的系统优于基准的基准。附加实验研究了从策略和价值函数网络中移除复发层的影响,以及用红外寻求者的性能。
translated by 谷歌翻译
Unmanned combat air vehicle (UCAV) combat is a challenging scenario with continuous action space. In this paper, we propose a general hierarchical framework to resolve the within-vision-range (WVR) air-to-air combat problem under 6 dimensions of degree (6-DOF) dynamics. The core idea is to divide the whole decision process into two loops and use reinforcement learning (RL) to solve them separately. The outer loop takes into account the current combat situation and decides the expected macro behavior of the aircraft according to a combat strategy. Then the inner loop tracks the macro behavior with a flight controller by calculating the actual input signals for the aircraft. We design the Markov decision process for both the outer loop strategy and inner loop controller, and train them by proximal policy optimization (PPO) algorithm. For the inner loop controller, we design an effective reward function to accurately track various macro behavior. For the outer loop strategy, we further adopt a fictitious self-play mechanism to improve the combat performance by constantly combating against the historical strategies. Experiment results show that the inner loop controller can achieve better tracking performance than fine-tuned PID controller, and the outer loop strategy can perform complex maneuvers to get higher and higher winning rate, with the generation evolves.
translated by 谷歌翻译
对于空中机器人来说,以快速而健壮的方式倒置着陆是一项艰巨的壮举,尤其是完全取决于板载感应和计算。尽管如此,这项壮举通常由蝙蝠,苍蝇和蜜蜂等生物传单进行。我们以前的工作已经确定了一系列板载视觉提示与运动学动作之间的直接因果关系,这些关系允许在小型空中机器人中可靠地执行这种具有挑战性的特技操纵。在这项工作中,我们首先利用深入的强化学习和基于物理的模拟来获得从任何任意方法条件开始的一般最佳控制策略,以实现强大的倒置着陆。这项优化的控制策略提供了从系统的观察空间到其电动机命令动作空间的计算效率映射,包括触发和控制旋转操作。这是通过训练系统在大量和方向变化的大量进式飞行速度上进行训练。接下来,我们通过在仿真中改变了机器人的惯性参数,通过域随机化对学习策略进行了模拟策略的传输和实验验证。通过实验试验,我们确定了几个主要因素,这些因素极大地改善了着陆鲁棒性和确定倒置成功的主要机制。我们希望这项研究中开发的学习框架可以推广以解决更具挑战性的任务,例如利用嘈杂的板载感觉数据,降落在各种方向的表面上或降落在动态移动的表面上。
translated by 谷歌翻译
在这项工作中,我们为软机器人蛇提供了一种基于学习的目标跟踪控制方法。受到生物蛇的启发,我们的控制器由两个关键模块组成:用于学习靶向轨迹行为的增强学习(RL)模块,给出了软蛇机器人的随机动力学,以及带有Matsuoka振荡器的中央模式生成器(CPG)系统,用于产生稳定而多样的运动模式。基于提议的框架,我们全面讨论了软蛇机器人的可操作性,包括在其蛇形运动期间的转向和速度控制。可以将这种可操作性映射到CPG系统振荡模式的控制中。通过对Matsuoka CPG系统振荡性能的理论分析,这项工作表明,实现我们软蛇机器人的自由移动性的关键是正确限制和控制Matsuoka CpG系统的某些系数比率。基于此分析,我们系统地制定了CPG系统的可控系数,供RL代理运行。通过实验验证,我们表明,在模拟环境中学习的控制政策可以直接应用于控制我们的真正的蛇机器人以执行目标跟踪任务,而不管模拟与现实世界之间的物理环境差距如何。实验结果还表明,与我们先前的方法和基线RL方法(PPO)相比,我们的方法对SIM到现实过渡的适应性和鲁棒性得到了显着改善。
translated by 谷歌翻译
现在,最先进的强化学习能够在模拟中学习双皮亚机器人的多功能运动,平衡和推送能力。然而,现实差距大多被忽略了,模拟结果几乎不会转移到真实硬件上。在实践中,它是不成功的,因为物理学过度简化,硬件限制被忽略,或者不能保证规律性,并且可能会发生意外的危险运动。本文提出了一个强化学习框架,该框架能够学习以平稳的开箱即用向现实的转移,仅需要瞬时的本体感受观察,可以学习强大的站立式恢复。通过结合原始的终止条件和政策平滑度调节,我们使用没有记忆力或观察历史的政策实现了稳定的学习,SIM转移和安全性。然后使用奖励成型来提供有关如何保持平衡的见解。我们展示了其在下LIMB医学外骨骼Atalante中的现实表现。
translated by 谷歌翻译
深度加强学习(RL)使得可以使用神经网络作为功能近似器来解决复杂的机器人问题。然而,在从一个环境转移到另一个环境时,在普通环境中培训的政策在泛化方面受到影响。在这项工作中,我们使用强大的马尔可夫决策过程(RMDP)来训练无人机控制策略,这将思想与强大的控制和RL相结合。它选择了悲观优化,以处理从一个环境到另一个环境的策略转移之间的潜在间隙。训练有素的控制策略是关于四转位位置控制的任务。 RL代理商在Mujoco模拟器中培训。在测试期间,使用不同的环境参数(培训期间看不见)来验证训练策略的稳健性,以从一个环境转移到另一个环境。强大的政策在这些环境中表现出标准代理,表明增加的鲁棒性增加了一般性,并且可以适应非静止环境。代码:https://github.com/adipandas/gym_multirotor
translated by 谷歌翻译
In the field of autonomous robots, reinforcement learning (RL) is an increasingly used method to solve the task of dynamic obstacle avoidance for mobile robots, autonomous ships, and drones. A common practice to train those agents is to use a training environment with random initialization of agent and obstacles. Such approaches might suffer from a low coverage of high-risk scenarios in training, leading to impaired final performance of obstacle avoidance. This paper proposes a general training environment where we gain control over the difficulty of the obstacle avoidance task by using short training episodes and assessing the difficulty by two metrics: The number of obstacles and a collision risk metric. We found that shifting the training towards a greater task difficulty can massively increase the final performance. A baseline agent, using a traditional training environment based on random initialization of agent and obstacles and longer training episodes, leads to a significantly weaker performance. To prove the generalizability of the proposed approach, we designed two realistic use cases: A mobile robot and a maritime ship under the threat of approaching obstacles. In both applications, the previous results can be confirmed, which emphasizes the general usability of the proposed approach, detached from a specific application context and independent of the agent's dynamics. We further added Gaussian noise to the sensor signals, resulting in only a marginal degradation of performance and thus indicating solid robustness of the trained agent.
translated by 谷歌翻译
A reduced order model of a generic submarine is presented. Computational fluid dynamics (CFD) results are used to create and validate a model that includes depth dependence and the effect of waves on the craft. The model and the procedure to obtain its coefficients are discussed, and examples of the data used to obtain the model coefficients are presented. An example of operation following a complex path is presented and results from the reduced order model are compared to those from an equivalent CFD calculation. The controller implemented to complete these maneuvers is also presented.
translated by 谷歌翻译
现代高性能战斗机超出了传统的飞行信封通过使用推力矢量进行机动性,因此实现超级措施。随着较持续发展的仿生无人驾驶飞行器(无人机),通过仿生机制的超级制剂能力可能变得明显。到目前为止,这种潜力尚未得到很好的研究:尚未显示生物摩托的无人机能够能够有任何形式的古典超级算法可用于推动矢量。在这里,我们通过展示生物微米传动翼无人机在低变形复杂度下如何执行复杂的Multiaxis鼻子指向和射击(NPA)机动,展示这种能力。非线性飞行动力学分析用于表征飞机修剪状态的多维空间的程度和稳定性,从仿生变形中出现。导航此修剪空间提供了一种基于模型的基于模型的指导策略,用于在仿真中生成开环NPAS操纵。我们的结果展示了仿古飞机用于空战相关的超级借助性的能力,并提供勘探,表征和在此类飞机中进一步形式的经典和非古典超级运动性的指导的策略。
translated by 谷歌翻译
本文提出了一项新颖的控制法,以使用尾随机翼无人驾驶飞机(UAV)进行准确跟踪敏捷轨迹,该轨道在垂直起飞和降落(VTOL)和向前飞行之间过渡。全球控制配方可以在整个飞行信封中进行操作,包括与Sideslip的不协调的飞行。显示了具有简化空气动力学模型的非线性尾尾动力学的差异平坦度。使用扁平度变换,提出的控制器结合了位置参考的跟踪及其导数速度,加速度和混蛋以及偏航参考和偏航速率。通过角速度进纸术语包含混蛋和偏航率参考,可以改善随着快速变化的加速度跟踪轨迹。控制器不取决于广泛的空气动力学建模,而是使用增量非线性动态反演(INDI)仅基于局部输入输出关系来计算控制更新,从而导致对简化空气动力学方程中差异的稳健性。非线性输入输出关系的精确反转是通过派生的平坦变换实现的。在飞行测试中对所得的控制算法进行了广泛的评估,在该测试中,它展示了准确的轨迹跟踪和挑战性敏捷操作,例如侧向飞行和转弯时的侵略性过渡。
translated by 谷歌翻译
In this thesis, we consider two simple but typical control problems and apply deep reinforcement learning to them, i.e., to cool and control a particle which is subject to continuous position measurement in a one-dimensional quadratic potential or in a quartic potential. We compare the performance of reinforcement learning control and conventional control strategies on the two problems, and show that the reinforcement learning achieves a performance comparable to the optimal control for the quadratic case, and outperforms conventional control strategies for the quartic case for which the optimal control strategy is unknown. To our knowledge, this is the first time deep reinforcement learning is applied to quantum control problems in continuous real space. Our research demonstrates that deep reinforcement learning can be used to control a stochastic quantum system in real space effectively as a measurement-feedback closed-loop controller, and our research also shows the ability of AI to discover new control strategies and properties of the quantum systems that are not well understood, and we can gain insights into these problems by learning from the AI, which opens up a new regime for scientific research.
translated by 谷歌翻译
随着自动驾驶行业的发展,自动驾驶汽车群体的潜在相互作用也随之增长。结合人工智能和模拟的进步,可以模拟此类组,并且可以学习控制内部汽车的安全模型。这项研究将强化学习应用于多代理停车场的问题,在那里,汽车旨在有效地停车,同时保持安全和理性。利用强大的工具和机器学习框架,我们以马尔可夫决策过程的形式与独立学习者一起设计和实施灵活的停车环境,从而利用多代理通信。我们实施了一套工具来进行大规模执行实验,从而取得了超过98.1%成功率的高达7辆汽车的模型,从而超过了现有的单代机构模型。我们还获得了与汽车在我们环境中表现出的竞争性和协作行为有关的几个结果,这些行为的密度和沟通水平各不相同。值得注意的是,我们发现了一种没有竞争的合作形式,以及一种“泄漏”的合作形式,在没有足够状态的情况下,代理商进行了协作。这种工作在自动驾驶和车队管理行业中具有许多潜在的应用,并为将强化学习应用于多机构停车场提供了几种有用的技术和基准。
translated by 谷歌翻译