By transferring knowledge from large, diverse, task-agnostic datasets, modern machine learning models can solve specific downstream tasks either zero-shot or with small task-specific datasets to a high level of performance. While this capability has been demonstrated in other fields such as computer vision, natural language processing or speech recognition, it remains to be shown in robotics, where the generalization capabilities of the models are particularly critical due to the difficulty of collecting real-world robotic data. We argue that one of the keys to the success of such general robotic models lies with open-ended task-agnostic training, combined with high-capacity architectures that can absorb all of the diverse, robotic data. In this paper, we present a model class, dubbed Robotics Transformer, that exhibits promising scalable model properties. We verify our conclusions in a study of different model classes and their ability to generalize as a function of the data size, model size, and data diversity based on a large-scale data collection on real robots performing real-world tasks. The project's website and videos can be found at robotics-transformer.github.io
translated by 谷歌翻译
Identifying statistical regularities in solutions to some tasks in multi-task reinforcement learning can accelerate the learning of new tasks. Skill learning offers one way of identifying these regularities by decomposing pre-collected experiences into a sequence of skills. A popular approach to skill learning is maximizing the likelihood of the pre-collected experience with latent variable models, where the latent variables represent the skills. However, there are often many solutions that maximize the likelihood equally well, including degenerate solutions. To address this underspecification, we propose a new objective that combines the maximum likelihood objective with a penalty on the description length of the skills. This penalty incentivizes the skills to maximally extract common structures from the experiences. Empirically, our objective learns skills that solve downstream tasks in fewer samples compared to skills learned from only maximizing likelihood. Further, while most prior works in the offline multi-task setting focus on tasks with low-dimensional observations, our objective can scale to challenging tasks with high-dimensional image observations.
translated by 谷歌翻译
A growing ecosystem of large, open-source foundation models has reduced the labeled data and technical expertise necessary to apply machine learning to many new problems. Yet foundation models pose a clear dual-use risk, indiscriminately reducing the costs of building both harmful and beneficial machine learning systems. To mitigate this risk, we propose the task blocking paradigm, in which foundation models are trained with an additional mechanism to impede adaptation to harmful tasks while retaining good performance on desired tasks. We call the resulting models self-destructing models, inspired by mechanisms that prevent adversaries from using tools for harmful purposes. We present an algorithm for training self-destructing models leveraging techniques from meta-learning and adversarial learning, showing that it can largely prevent a BERT-based model from learning to perform gender identification without harming the model's ability to perform profession classification. We conclude with a discussion of future directions.
translated by 谷歌翻译
Large language models have recently shown promising progress in mathematical reasoning when fine-tuned with human-generated sequences walking through a sequence of solution steps. However, the solution sequences are not formally structured and the resulting model-generated sequences may not reflect the kind of systematic reasoning we might expect an expert human to produce. In this paper, we study how to build stronger reasoning capability in language models using the idea of relational abstractions. We introduce new types of sequences that more explicitly provide an abstract characterization of the transitions through intermediate solution steps to the goal state. We find that models that are supplied with such sequences as prompts can solve tasks with a significantly higher accuracy, and models that are trained to produce such sequences solve problems better than those that are trained with previously used human-generated sequences and other baselines. Our work thus takes several steps toward elucidating and improving how language models perform on tasks requiring multi-step mathematical reasoning.
translated by 谷歌翻译
利用许多离线机器人数据来源需要努力处理此类数据的异质性。在本文中,我们关注异质性的一个特定方面:从不同控制频率收集的离线数据学习。在整个实验室中,控制器的离散化,传感器的采样率以及对目标任务的需求可能会有所不同,从而导致聚合数据集中的频率混合在一起。我们研究离线增强学习(RL)算法如何在训练过程中使用频率混合的数据。我们观察到,$ Q $价值以不同的离散率以不同的速度传播,从而导致了离线RL的许多学习挑战。我们提出了一个简单而有效的解决方案,该解决方案可以在$ Q $值更新的速率上执行一致性,以稳定学习。通过缩放$ n $ n $ n $步骤的$ n $的价值,并具有离散化的大小,我们有效地平衡了$ q $ - 价值传播,从而导致更稳定的收敛性。在三个模拟的机器人控制问题上,我们从经验上发现,这种简单的方法的平均混合量超过50%。
translated by 谷歌翻译
即使是最大的神经网络也会出错,随着世界的变化,曾经纠正的预测可能变得无效。模型编辑器对基础模型(预训练)模型的行为进行本地更新,以注入更新的知识或纠正不良行为。现有的模型编辑已经显示出希望,但也没有足够的表现力:他们难以准确地对编辑的预期范围进行建模(受编辑影响的示例),从而导致与编辑相关的测试输入的预测不准确,并且经常在之后完全失败。许多编辑。作为一个较高容量的替代方案,我们建议使用检索型反面模型(SERAC)提出半参数编辑,该模型(SERAC)存储在明确的内存中,并学会对它们进行推理以根据需要调节基本模型的预测。为了实现对模型编辑器的更严格评估,我们介绍了三个具有挑战性的语言模型编辑问题,基于问题回答,事实检查和对话生成。我们发现,只有SERAC才能在所有三个问题上实现高性能,从而超过了现有的方法,可以通过大量利润进行模型编辑。代码,数据和其他项目信息将在https://sites.google.com/view/serac-editing上提供。
translated by 谷歌翻译
大型语言模型可以编码有关世界的大量语义知识。这种知识对于旨在采取自然语言表达的高级,时间扩展的指示的机器人可能非常有用。但是,语言模型的一个重大弱点是,它们缺乏现实世界的经验,这使得很难利用它们在给定的体现中进行决策。例如,要求语言模型描述如何清洁溢出物可能会导致合理的叙述,但是它可能不适用于需要在特定环境中执行此任务的特定代理商(例如机器人)。我们建议通过预处理的技能来提供现实世界的基础,这些技能用于限制模型以提出可行且在上下文上适当的自然语言动作。机器人可以充当语​​言模型的“手和眼睛”,而语言模型可以提供有关任务的高级语义知识。我们展示了如何将低级技能与大语言模型结合在一起,以便语言模型提供有关执行复杂和时间扩展说明的过程的高级知识,而与这些技能相关的价值功能则提供了连接必要的基础了解特定的物理环境。我们在许多现实世界的机器人任务上评估了我们的方法,我们表明了对现实世界接地的需求,并且这种方法能够在移动操纵器上完成长远,抽象的自然语言指令。该项目的网站和视频可以在https://say-can.github.io/上找到。
translated by 谷歌翻译
许多数据集被指定:给定任务存在多个同样可行的解决方案。对于学习单个假设的方法,指定的指定可能是有问题的,因为实现低训练损失的不同功能可以集中在不同的预测特征上,从而在分布数据的数据上产生明显变化的预测。我们提出了Divdis,这是一个简单的两阶段框架,首先通过利用测试分布中的未标记数据来学习多种假设,以实现任务。然后,我们通过使用其他标签的形式或检查功能可视化的形式选择最小的其他监督来选择一个发现的假设之一来消除歧义。我们证明了Divdis找到在图像分类中使用强大特征的假设和自然语言处理问题的能力。
translated by 谷歌翻译
离线增强学习(RL)可以从静态数据集中学习控制策略,但是像标准RL方法一样,它需要每个过渡的奖励注释。在许多情况下,将大型数据集标记为奖励可能会很高,尤其是如果人类标签必须提供这些奖励,同时收集多样的未标记数据可能相对便宜。我们如何在离线RL中最好地利用这种未标记的数据?一种自然解决方案是从标记的数据中学习奖励函数,并使用它标记未标记的数据。在本文中,我们发现,也许令人惊讶的是,一种简单得多的方法,它简单地将零奖励应用于未标记的数据可以导致理论和实践中的有效数据共享,而无需学习任何奖励模型。虽然这种方法起初可能看起来很奇怪(并且不正确),但我们提供了广泛的理论和经验分析,说明了它如何摆脱奖励偏见,样本复杂性和分配变化,通常会导致良好的结果。我们表征了这种简单策略有效的条件,并进一步表明,使用简单的重新加权方法扩展它可以进一步缓解通过使用不正确的奖励标签引入的偏见。我们的经验评估证实了模拟机器人运动,导航和操纵设置中的这些发现。
translated by 谷歌翻译
机器学习算法通常假设培训和测试示例是从相同的分布中汲取的。然而,分发转移是现实世界应用中的常见问题,并且可以在测试时间造成模型急剧执行。在本文中,我们特别考虑域移位和亚泊素班次的问题(例如,不平衡数据)。虽然先前的作品通常会寻求明确地将模型的内部表示和预测器进行明确,以成为域不变的,但我们旨在规范整个功能而不限制模型的内部表示。这导致了一种简单的基于混合技术,它通过名为LISA的选择性增强来学习不变函数。 Lisa选择性地用相同的标签而单独地插值样本,但不同的域或具有相同的域但不同的标签。我们分析了线性设置,从理论上展示了LISA如何导致较小的最差组错误。凭经验,我们研究了LISA对从亚本化转变到域移位的九个基准的有效性,我们发现LISA一直以其他最先进的方法表达。
translated by 谷歌翻译