我们的目标是构建复杂的人形代理,整合感知,运动控制和记忆。在这项工作中,我们部分地将这个问题归因于本体感受的低水平运动控制以及由视力所知的低水平技能的高水平协调。我们通过将低级别电机控制器的预训练与高级别,以任务为中心的控制器相结合,开发出一种能够以惊人的灵活性,任务导向的电动机控制相对较高的DoF人体机构的架构,该控制器可在低级子策略之间切换。该结果系统能够控制物理模拟的人形体解决任务,这些任务需要在环境中的运动期间耦合来自不稳定的中心RGB相机的视觉感知。有关补充视频链接,请参阅https://youtu.be/7GISvfbykLE。
translated by 谷歌翻译
Deep generative models have recently shown great promise in imitation learning for motor control. Given enough data, even supervised approaches can do one-shot imitation learning; however, they are vulnerable to cascading failures when the agent trajectory diverges from the demonstrations. Compared to purely supervised methods, Generative Adversarial Imitation Learning (GAIL) can learn more robust controllers from fewer demonstrations, but is inherently mode-seeking and more difficult to train. In this paper, we show how to combine the favourable aspects of these two approaches. The base of our model is a new type of variational autoencoder on demonstration trajectories that learns semantic policy embeddings. We show that these embeddings can be learned on a 9 DoF Jaco robot arm in reaching tasks, and then smoothly interpolated with a resulting smooth interpolation of reaching behavior. Leveraging these policy representations, we develop a new version of GAIL that (1) is much more robust than the purely-supervised controller, especially with few demonstrations, and (2) avoids mode collapse, capturing many diverse behaviors when GAIL on its own does not. We demonstrate our approach on learning diverse gaits from demonstration on a 2D biped and a 62 DoF 3D humanoid in the MuJoCo physics environment.
translated by 谷歌翻译
角色动画的一个长期目标是将行为的数据驱动指定与可以在物理模拟中执行类似行为的系统相结合,从而实现对扰动和环境变化的真实响应。我们展示了众所周知的强化学习(RL)方法可以适用于学习能够模仿国外范围的示例运动剪辑的强大控制策略,同时还学习复杂的恢复,适应形态的变化,并实现用户指定的目标。我们的方法处理关键帧运动,高度动态的动作,如运动捕捉翻转和旋转,以及重新定位的动作。通过将模仿目标与任务目标相结合,我们可以训练在交互设置中智能地反应的角色,例如,通过在期望的方向上行走或者在用户指定的目标上投掷球。因此,这种方法结合了使用运动剪辑来定义所需风格和外观的便利性和运动质量,以及由RL方法和基于物理的动画提供的灵活性和通用性。我们进一步探索了将多个剪辑集成到学习过程中的多种方法,以开发能够执行丰富多样技能的多技能代理。我们使用多个角色(人类,阿特拉斯机器人,双足恐龙,龙)以及各种各样的角色演示结果。技能,包括运动,杂技和武术。
translated by 谷歌翻译
Reinforcement learning offers to robotics a framework and set of tools for the design of sophisticated and hard-to-engineer behaviors. Conversely, the challenges of robotic problems provide both inspiration, impact, and validation for developments in reinforcement learning. The relationship between disciplines has sufficient promise to be likened to that between physics and mathematics. In this article, we attempt to strengthen the links between the two research communities by providing a survey of work in reinforcement learning for behavior generation in robots. We highlight both key challenges in robot reinforcement learning as well as notable successes. We discuss how contributions tamed the complexity of the domain and study the role of algorithms, representations, and prior knowledge in achieving these successes. As a result, a particular focus of our paper lies on the choice between model-based and model-free as well as between value-function-based and policy-search methods. By analyzing a simple problem in some detail we demonstrate how reinforcement learning approaches may be profitably applied, and we note throughout open questions and the tremendous potential for future research.
translated by 谷歌翻译
The difficulty of developing control strategies has been a primary bottleneck in the adoption of physics-based simulations of human motion. We present a method for learning robust feedback strategies around given motion capture clips as well as the transition paths between clips. The output is a control graph that supports real-time physics-based simulation of multiple characters , each capable of a diverse range of robust movement skills, such as walking, running, sharp turns, cartwheels, spin-kicks, and flips. The control fragments that compose the control graph are developed using guided learning. This leverages the results of open-loop sampling-based reconstruction in order to produce state-action pairs that are then transformed into a linear feedback policy for each control fragment using linear regression. Our synthesis framework allows for the development of robust controllers with a minimal amount of prior knowledge.
translated by 谷歌翻译
translated by 谷歌翻译
智能生物可以在没有监督的情况下探索环境并学习有用的技能。在本文中,我们提出了DIAYN('Diversity is All YouNeed'),这是一种在没有奖励功能的情况下学习有用技能的方法。我们提出的方法通过使用最大熵策略最大化信息理论目标来学习技能。在各种模拟机器人任务中,weshow表明这个简单的目标导致了无人监督的多种技能的出现,例如步行和跳跃。在许多强化学习基准测试环境中,我们的方法能够学习一项能够解决基准测试任务的技能,尽管从未收到真正的任务奖励。我们展示了预训练技能可以为下游任务提供良好的参数初始化,并且可以分层次地组合以解决复杂的,稀疏的任务。我们的研究结果表明,无监督的技能发现可以作为克服强化学习中探索和数据效率挑战的有效预训练机制。
translated by 谷歌翻译
We present a comprehensive survey of robot Learning from Demonstration (LfD), a technique that develops policies from example state to action mappings. We introduce the LfD design choices in terms of demonstrator, problem space, policy derivation and performance, and contribute the foundations for a structure in which to categorize LfD research. Specifically, we analyze and categorize the multiple ways in which examples are gathered, ranging from teleoperation to imitation, as well as the various techniques for policy derivation, including matching functions, dynamics models and plans. To conclude we discuss LfD limitations and related promising areas for future research.
translated by 谷歌翻译
We present a method for reinforcement learning of closely related skills that are parameterized via a skill embedding space. We learn such skills by taking advantage of latent variables and exploiting a connection between reinforcement learning and variational inference. The main contribution of our work is an entropy-regularized policy gradient formulation for hierarchical policies, and an associated , data-efficient and robust off-policy gradient algorithm based on stochastic value gradients. We demonstrate the effectiveness of our method on several simulated robotic manipulation tasks. We find that our method allows for discovery of multiple solutions and is capable of learning the minimum number of distinct skills that are necessary to solve a given set of tasks. In addition, our results indicate that the hereby proposed technique can interpolate and/or sequence previously learned skills in order to accomplish more complex tasks, even in the presence of sparse rewards.
translated by 谷歌翻译
The goal of imitation learning is to mimic expert behavior without access toan explicit reward signal. Expert demonstrations provided by humans, however,often show significant variability due to latent factors that are typically notexplicitly modeled. In this paper, we propose a new algorithm that can inferthe latent structure of expert demonstrations in an unsupervised way. Ourmethod, built on top of Generative Adversarial Imitation Learning, can not onlyimitate complex behaviors, but also learn interpretable and meaningfulrepresentations of complex behavioral data, including visual demonstrations. Inthe driving domain, we show that a model learned from human demonstrations isable to both accurately reproduce a variety of behaviors and accuratelyanticipate human actions using raw visual inputs. Compared with variousbaselines, our method can better capture the latent structure underlying expertdemonstrations, often recovering semantically meaningful factors of variationin the data.
translated by 谷歌翻译
Robots exhibit flexible behavior largely in proportion to their degree of knowledge about the world. Such knowledge is often meticulously hand-coded for a narrow class of tasks, limiting the scope of possible robot competencies. Thus, the primary limiting factor of robot capabilities is often not the physical attributes of the robot, but the limited time and skill of expert programmers. One way to deal with the vast number of situations and environments that robots face outside the laboratory is to provide users with simple methods for programming robots that do not require the skill of an expert. For this reason, learning from demonstration (LfD) has become a popular alternative to traditional robot programming methods , aiming to provide a natural mechanism for quickly teaching robots. By simply showing a robot how to perform a task, users can easily demonstrate new tasks as needed, without any special knowledge about the robot. Unfortunately, LfD often yields little knowledge about the world, and thus lacks robust generalization capabilities, especially for complex , multi-step tasks. We present a series of algorithms that draw from recent advances in Bayesian non-parametric statistics and control theory to automatically detect and leverage repeated structure at multiple levels of abstraction in demonstration data. The discovery of repeated structure provides critical insights into task invariants, features of importance , high-level task structure, and appropriate skills for the task. This culminates in the discovery of a finite-state representation of the task, composed of grounded skills that are flexible and reusable, providing robust generalization and transfer in complex, multi-step robotic tasks. These algorithms are tested and evaluated using a PR2 mobile manipulator, showing success on several complex real-world tasks, such as furniture assembly.
translated by 谷歌翻译
已经提出了用于广泛应用的对抗性学习方法,但是对抗性模型的训练可以是众所周知的不稳定的。有效地平衡发生器和识别器的性能是至关重要的,因为实现非常高精度的鉴别器将产生相对无信息的梯度。在这项工作中,我们提出了一种简单而通用的技术,通过信息瓶颈来约束信息流中的信息流。通过对观察与鉴别器的内部表示之间的相互信息进行约束,我们可以有效地调制鉴别器的准确性并保持有用和信息化的梯度。我们证明了我们提出的变分鉴别器瓶颈(VDB)导致了对抗学习算法的三个不同应用领域的显着改进。我们的主要评估研究VDB动态学习动态连续控制技能的适用性,例如跑步。 Weshow我们的方法可以直接从\ emph {raw} videodemonstrations学习这些技能,大大优于以前的对抗模仿学习方法。 VDB还可以与对抗性逆向强化学习相结合,以学习可以在新设置中转移和重新优化的简约奖励功能。最后,我们证明了VDBcan可以更有效地训练GAN以产生图像,并改进了许多先前的稳定方法。
translated by 谷歌翻译
A key challenge in complex visuomotor control is learning abstract representations that are effective for specifying goals, planning, and generalization. To this end, we introduce universal planning networks (UPN). UPNs embed differen-tiable planning within a goal-directed policy. This planning computation unrolls a forward model in a latent space and infers an optimal action plan through gradient descent trajectory optimization. The plan-by-gradient-descent process and its underlying representations are learned end-to-end to directly optimize a supervised imitation learning objective. We find that the representations learned are not only effective for goal-directed visual imitation via gradient-based trajectory optimization, but can also provide a metric for specifying goals using images. The learned representations can be leveraged to specify distance-based rewards to reach new target states for model-free reinforcement learning, resulting in substantially more effective learning when solving new tasks described via image-based goals. We were able to achieve successful transfer of visuomotor planning strategies across robots with significantly different morphologies and actuation capabilities.
translated by 谷歌翻译
We explore learning-based approaches for feedback control of a dexterous five-finger hand performing non-prehensile manipulation. First, we learn local controllers that are able to perform the task starting at a predefined initial state. These controllers are constructed using trajectory optimization with respect to locally-linear time-varying models learned directly from sensor data. In some cases, we initialize the optimizer with human demonstrations collected via teleoperation in a virtual environment. We demonstrate that such controllers can perform the task robustly, both in simulation and on the physical platform, for a limited range of initial conditions around the trained starting state. We then consider two interpolation methods for generalizing to a wider range of initial conditions: deep learning, and nearest neighbors. We find that nearest neighbors achieve higher performance. Nevertheless, the neural network has its advantages: it uses only tactile and proprioceptive feedback but no visual feedback about the object (i.e. it performs the task blind) and learns a time-invariant policy. In contrast, the nearest neighbors method switches between time-varying local controllers based on the proximity of initial object states sensed via motion capture. While both generalization methods leave room for improvement, our work shows that (i) local trajectory-based controllers for complex non-prehensile manipulation tasks can be constructed from surprisingly small amounts of training data, and (ii) collections of such controllers can be interpolated to form more global controllers. Results are summarized in the supplementary video: https://youtu.be/E0wmO6deqjo
translated by 谷歌翻译
基于运动捕捉的数据驱动角色动画可以产生高度自然的行为,并且当与物理模拟相结合时,可以提供对物理扰动,环境变化和形态差异的自然程序响应。运动捕捉仍然是最受欢迎的运动数据源,但收集mocap数据通常需要重度仪表化的环境和演员。在本文中,我们提出了一种方法,使物理模拟角色能够从视频(SFV)中学习技能。我们的方法基于深度姿态估计和深度加强学习,允许数据驱动的动画利用来自网络的大量公开可用的视频剪辑,例如来自YouTube的视频剪辑。这具有能够简单地通过查找期望行为的视频记录来快速且容易地设计字符控制器的潜力。由此产生的控制器对于扰动是鲁棒的,可以适应新的设置,可以执行基本的对象交互,并且可以通过强化学习重新定向到新的形态。我们进一步证明了我们的方法可以通过从观察到的姿势初始化的学习控制器的前向模拟来预测来自静止图像的潜在人类运动。我们的框架能够学习广泛的动态技能,包括运动,杂技和martialarts。
translated by 谷歌翻译
大多数策略搜索算法需要数千个训练集才能找到有效的策略,这对于物理机器人来说通常是不可行的。这篇调查文章侧重于极端的另一端:arobot如何才能适应少数试验(一打)和几分钟?通过“大数据”这个词,我们将这一挑战称为“微数据增强学习”。我们表明,第一种策略是利用政策结构(例如,动态运动原语),政策参数(例如演示)或动态(例如模拟器)的预知。第二种策略是创建预期奖励(例如,贝叶斯优化)或动态模型(例如,基于模型的策略搜索)的数据驱动的替代模型,以便策略优化器查询模型而不是真实系统。总的来说,所有成功的微观数据算法都通过改变模型的类型和先验知识来结合这两种策略。当前的科学挑战主要围绕扩展tocomplex机器人(例如人形机器人),设计通用先验,以及优化计算时间。
translated by 谷歌翻译
Composable Controllers for Physics-Based Character Animation An ambitious goal in the area of physics-based computer animation is the creation of virtual actors that autonomously synthesize realistic human motions and possess a broad repertoire of lifelike motor skills. To this end, the control of dynamic, anthropomorphic figures subject to gravity and contact forces remains a difficult open problem. We propose a framework for composing controllers in order to enhance the motor abilities of such figures. A key contribution of our composition framework is an explicit model of the "pre-conditions" under which motor controllers are expected to function properly. We demonstrate controller composition with preconditions determined not only manually, but also automatically based on Support Vector Machine (SVM) learning theory. We evaluate our composition framework using a family of controllers capable of synthesizing basic actions such as balance, protective stepping when balance is disturbed, protective arm reactions when falling, and multiple ways of standing up after a fall. We furthermore demonstrate these basic controllers working in conjunction with more dynamic motor skills within a two-dimensional and a three-dimensional prototype virtual stuntperson. Our composition framework promises to enable the community of physics-based animation practitioners to more easily exchange motor controllers and integrate them into dynamic characters. ii Dedication To my father, Nikolaos Faloutsos, my mother, Sofia Faloutsou, and my wife, Florine Tseu. iii Acknowledgements I am done! Phew! It feels great. I have to do one more thing and that is to write the acknowledgements, one of the most important parts of a PhD thesis. The educational process of working towards a PhD degree teaches you, among other things, how important the interaction and contributions of the other people are to your career and personal development. First, I would like to thank my supervisors, Michiel van de Panne and Demetri Terzopoulos, for everything they did for me. And it was a lot. You have been the perfect supervisors. THANK YOU! However, I will never forgive Michiel for beating me at a stair-climbing race during a charity event that required running up the CN Tower stairs. Michiel, you may have forgotten, but I haven't! I am grateful to my external appraiser, Jessica Hodgins, and the members of my supervisory committee, Ken Jackson, Alejo Hausner and James Stewart, for their contribution to the successful completion of my degree. I would like to thank my close collaborator, Victor Ng-Thow-Hing, for being the richest source of knowledge on graphics research, graphics technology, investing and martial arts movies. Too bad you do not like Jackie Chan, Victor. A great THANKS is due to Joe Laszlo, the heart and soul of our lab's community spirit. Joe practically ran our lab during some difficult times. He has spent hours of his time to ensure the smooth operation of the lab and its equip
translated by 谷歌翻译
We propose a model-free deep reinforcement learning method that leverages a small amount of demonstration data to assist a reinforcement learning agent. We apply this approach to robotic manipulation tasks and train end-to-end visuomotor policies that map directly from RGB camera inputs to joint velocities. We demonstrate that our approach can solve a wide variety of visuomotor tasks, for which engineering a scripted controller would be laborious. In experiments, our reinforcement and imitation agent achieves significantly better performances than agents trained with reinforcement learning or imitation learning alone. We also illustrate that these policies, trained with large visual and dynamics variations, can achieve preliminary successes in zero-shot sim2real transfer. A brief visual description of this work can be viewed in this video.
translated by 谷歌翻译
人类是高保真模仿的专家 - 通常在一次尝试中非常模仿演示。人类使用此功能快速解决atask实例,并引导学习新任务。在自主代理中实现这些可能性是一个悬而未决的问题。在本文中,我们介绍了非政策RL算法(MetaMimic)来缩小这一差距。 MetaMimic可以学习(i)高保真一次性模仿各种新技能的政策,以及(ii)使代理人能够更有效地解决任务的政策。 MetaMimic依赖于将所有经验存储在存储器中并重放这些经验以通过非策略RL学习大规模深度神经网络策略的原理。在我们所知的情况下,本文介绍了用于深度RL的最大现有神经网络,并且表明需要具有归一化的较大网络来实现对于具有挑战性的操纵任务的一次性高保真模仿。结果还表明,尽管任务奖励稀少,并且无法访问示威者行动,但可以从愿景中学习这两种类型的政策。
translated by 谷歌翻译