Several self-supervised representation learning methods have been proposed for reinforcement learning (RL) with rich observations. For real-world applications of RL, recovering underlying latent states is crucial, particularly when sensory inputs contain irrelevant and exogenous information. In this work, we study how information bottlenecks can be used to construct latent states efficiently in the presence of task-irrelevant information. We propose architectures that utilize variational and discrete information bottlenecks, coined as RepDIB, to learn structured factorized representations. Exploiting the expressiveness bought by factorized representations, we introduce a simple, yet effective, bottleneck that can be integrated with any existing self-supervised objective for RL. We demonstrate this across several online and offline RL benchmarks, along with a real robot arm task, where we find that compressed representations with RepDIB can lead to strong performance improvements, as the learned bottlenecks help predict only the relevant state while ignoring irrelevant information.
translated by 谷歌翻译
The ability to effectively reuse prior knowledge is a key requirement when building general and flexible Reinforcement Learning (RL) agents. Skill reuse is one of the most common approaches, but current methods have considerable limitations.For example, fine-tuning an existing policy frequently fails, as the policy can degrade rapidly early in training. In a similar vein, distillation of expert behavior can lead to poor results when given sub-optimal experts. We compare several common approaches for skill transfer on multiple domains including changes in task and system dynamics. We identify how existing methods can fail and introduce an alternative approach to mitigate these problems. Our approach learns to sequence existing temporally-extended skills for exploration but learns the final policy directly from the raw experience. This conceptual split enables rapid adaptation and thus efficient data collection but without constraining the final solution.It significantly outperforms many classical methods across a suite of evaluation tasks and we use a broad set of ablations to highlight the importance of differentc omponents of our method.
translated by 谷歌翻译
translated by 谷歌翻译
translated by 谷歌翻译
Inspired by progress in large-scale language modeling, we apply a similar approach towards building a single generalist agent beyond the realm of text outputs. The agent, which we refer to as Gato, works as a multi-modal, multi-task, multi-embodiment generalist policy. The same network with the same weights can play Atari, caption images, chat, stack blocks with a real robot arm and much more, deciding based on its context whether to output text, joint torques, button presses, or other tokens. In this report we describe the model and the data, and document the current capabilities of Gato.
translated by 谷歌翻译
translated by 谷歌翻译
translated by 谷歌翻译
强化学习(RL)原则上可以让机器人自动适应新任务,但是当前的RL方法需要大量的试验来实现这一目标。在本文中,我们通过元学习的框架来快速适应新任务,该框架利用过去的任务学习适应了对工业插入任务的特定关注。快速适应至关重要,因为大量的机器人试验可能会损害硬件件。另外,在不同的插入应用之间的经验中,有效的适应性也可以在很大程度上彼此利用。在这种情况下,我们在应用元学习时解决了两个具体的挑战。首先,传统的元元算法需要冗长的在线元训练。 We show that this can be replaced with appropriately chosen offline data, resulting in an offline meta-RL method that only requires demonstrations and trials from each of the prior tasks, without the need to run costly meta-RL procedures online.其次,元RL方法可能无法推广到与元训练时间时看到的新任务太大的任务,这在高成功率至关重要的工业应用中构成了特定的挑战。我们通过将上下文元学习与直接在线填充结合结合来解决这一问题:如果新任务与先前数据中看到的任务相似,则可以立即适应上下文的元学习者,如果它太不同,它会逐渐通过Finetuning适应。我们表明,我们的方法能够快速适应各种不同的插入任务,成功率为100%仅使用从头开始学习任务所需的样本的一小部分。实验视频和详细信息可从获得。
translated by 谷歌翻译
translated by 谷歌翻译
We present a framework for efficient inference in structured image models that explicitly reason about objects. We achieve this by performing probabilistic inference using a recurrent neural network that attends to scene elements and processes them one at a time. Crucially, the model itself learns to choose the appropriate number of inference steps. We use this scheme to learn to perform inference in partially specified 2D models (variable-sized variational auto-encoders) and fully specified 3D models (probabilistic renderers). We show that such models learn to identify multiple objects -counting, locating and classifying the elements of a scenewithout any supervision, e.g., decomposing 3D images with various numbers of objects in a single forward pass of a neural network at unprecedented speed. We further show that the networks produce accurate inferences when compared to supervised counterparts, and that their structure leads to improved generalization.
translated by 谷歌翻译