在各地文化建造的知识使人类远远超过一个人可以从自己的一生中获集自己的经验。文化知识依次依赖语言:语言是之前几代相信,有价值和实践的最富有的记录,以及这些随着时间的推移如何进化。然而,语言作为文化学习手段的力量和机制并不充分了解,因此,当前的AI系统不会利用语言作为文化知识传输的手段。在这里,我们通过语言迈向逆向工程文化学习的第一步。我们以极简主义风格的视频游戏形式开发了一套复杂的任务,我们部署在迭代学习范式中。人类参与者仅限于只有两次尝试(两个生命)来击败每场比赛,并被允许向未来参与者写一条消息,在播放之前阅读消息。知识逐渐累积,允许后代在游戏中进一步推进并执行更有效的行动。多铸铁学习遵循一个独立的轨迹,对个人学习单独学习,无限数量的生命。通过表达自然语言中的不同类型的知识,连续几代学习者能够成功:环境的动态,有价值的目标,危险的风险和成功策略。我们在这里的视频游戏范式是一种丰富的试验台,用于开发能够获取和传递文化知识的AI系统。
translated by 谷歌翻译
Recent progress in artificial intelligence (AI) has renewed interest in building systems that learn and think like people. Many advances have come from using deep neural networks trained end-to-end in tasks such as object recognition, video games, and board games, achieving performance that equals or even beats humans in some respects. Despite their biological inspiration and performance achievements, these systems differ from human intelligence in crucial ways. We review progress in cognitive science suggesting that truly human-like learning and thinking machines will have to reach beyond current engineering trends in both what they learn, and how they learn it. Specifically, we argue that these machines should (a) build causal models of the world that support explanation and understanding, rather than merely solving pattern recognition problems; (b) ground learning in intuitive theories of physics and psychology, to support and enrich the knowledge that is learned; and (c) harness compositionality and learning-to-learn to rapidly acquire and generalize knowledge to new tasks and situations. We suggest concrete challenges and promising routes towards these goals that can combine the strengths of recent neural network advances with more structured cognitive models.
translated by 谷歌翻译
Many real-world applications of language models (LMs), such as code autocomplete and writing assistance, involve human-LM interaction, but the main LM benchmarks are non-interactive, where a system produces output without human intervention. To evaluate human-LM interaction, we develop a framework, Human-AI Language-based Interaction Evaluation (H-LINE), that expands non-interactive evaluation along three dimensions, capturing (i) the interactive process, not only the final output; (ii) the first-person subjective experience, not just a third-party assessment; and (iii) notions of preference beyond quality. We then design five tasks ranging from goal-oriented to open-ended to capture different forms of interaction. On four state-of-the-art LMs (three variants of OpenAI's GPT-3 and AI21's J1-Jumbo), we find that non-interactive performance does not always result in better human-LM interaction and that first-person and third-party metrics can diverge, suggesting the importance of examining the nuances of human-LM interaction.
translated by 谷歌翻译
语言是协调问题的强大解决方案:他们提供了稳定的,有关我们所说的单词如何对应于我们头脑中的信仰和意图的共同期望。然而,在变量和非静止社会环境中的语言使用需要语言表征来灵活:旧词在飞行中获取新的临时或合作伙伴特定含义。在本文中,我们介绍了柴(通过推理的连续分层适应),一个分层贝叶斯的协调理论和会议组织,旨在在这两个基本观察之间调和长期张力。我们认为,沟通的中央计算问题不仅仅是传输,如在经典配方中,而是在多个时间尺度上持续学习和适应。合作伙伴特定的共同点迅速出现在数型互动中的社会推论中,而社群范围内的社会公约是稳定的前锋,这些前锋已经抽象出与多个合作伙伴的互动。我们展示了新的实证数据,展示了我们的模型为多个现象提供了对先前账户挑战的计算基础:(1)与同一合作伙伴的重复互动的更有效的参考表达的融合(2)将合作伙伴特定的共同基础转移到陌生人,并(3)交际范围的影响最终会形成。
translated by 谷歌翻译
Interaction and cooperation with humans are overarching aspirations of artificial intelligence (AI) research. Recent studies demonstrate that AI agents trained with deep reinforcement learning are capable of collaborating with humans. These studies primarily evaluate human compatibility through "objective" metrics such as task performance, obscuring potential variation in the levels of trust and subjective preference that different agents garner. To better understand the factors shaping subjective preferences in human-agent cooperation, we train deep reinforcement learning agents in Coins, a two-player social dilemma. We recruit participants for a human-agent cooperation study and measure their impressions of the agents they encounter. Participants' perceptions of warmth and competence predict their stated preferences for different agents, above and beyond objective performance metrics. Drawing inspiration from social science and biology research, we subsequently implement a new "partner choice" framework to elicit revealed preferences: after playing an episode with an agent, participants are asked whether they would like to play the next round with the same agent or to play alone. As with stated preferences, social perception better predicts participants' revealed preferences than does objective performance. Given these results, we recommend human-agent interaction researchers routinely incorporate the measurement of social perception and subjective preferences into their studies.
translated by 谷歌翻译
最近的神经生成系统已经证明了程序性生成游戏内容,图像,故事等的潜力。但是,大多数神经生成算法是“不受控制的”,因为用户在最初的及时规范之外的创意决策中几乎没有发言权。共同创造性的混合定位系统需要以用户为中心的影响算法,尤其是当用户不太可能拥有机器学习专业知识时。共同创造系统的关键是能够从用户到代理以及从代理到用户传达想法和意图的能力。共同创造的AI中的关键问题包括:用户如何表达自己的创造意图? Creative AI系统如何传达他们的信念,解释他们的举动或指示用户代表他们采取行动? Creative AI系统何时应该采取主动?此类问题的答案以及更多的答案将使我们能够开发出更好的共同创造系统,从而使人类更有能力表达自己的创造意图。我们介绍了Creative-Wand,这是一个可定制的框架,用于调查共同创造的混合发电生成。 Creative-Wand可以将生成模型和人类代理通信渠道的插入式注射到基于聊天的接口中。它提供了许多维度,在共同创造过程中,AI发生器和人类可以进行交流。我们通过使用该框架来研究共同创造性通信全球广播的一个维度与本地创意意图通过讲故事的上下文来说明创意范围的框架。
translated by 谷歌翻译
深度强化学习(RL)的进展是通过用于培训代理商的具有挑战性的基准的可用性来驱动。但是,社区广泛采用的基准未明确设计用于评估RL方法的特定功能。虽然存在用于评估RL的特定打开问题的环境(例如探索,转移学习,无监督环境设计,甚至语言辅助RL),但一旦研究超出证明,通常难以将这些更富有,更复杂的环境 - 概念结果。我们展示了一个强大的沙箱框架,用于易于设计新颖的RL环境。 Minihack是一个停止商店,用于RL实验,环境包括从小房间到复杂的,程序生成的世界。通过利用来自Nethack的全套实体和环境动态,MiniHack是最富有的基网上的视频游戏之一,允许设计快速方便的定制RL测试台。使用这种沙箱框架,可以轻松设计新颖的环境,可以使用人类可读的描述语言或简单的Python接口来设计。除了各种RL任务和基线外,Minihack还可以包装现有的RL基准,并提供无缝添加额外复杂性的方法。
translated by 谷歌翻译
Despite recent advances of AI research in many application-specific domains, we do not know how to build a human-level artificial intelligence (HLAI). We conjecture that learning from others' experience with the language is the essential characteristic that distinguishes human intelligence from the rest. Humans can update the action-value function with the verbal description as if they experience states, actions, and corresponding rewards sequences firsthand. In this paper, we present a classification of intelligence according to how individual agents learn and propose a definition and a test for HLAI. The main idea is that language acquisition without explicit rewards can be a sufficient test for HLAI.
translated by 谷歌翻译
有效的人类组合需要能够传达团队目标的能力和您需要代理商进行操作的约束。提供指定团队共享意图或操作标准的能力,可以使AI代理执行其主要功能,同时仍然能够满足当前团队的特定愿望。尽管已经开展了重要的工作来指导代理通过语言或演示执行任务,但先前的工作缺乏专注于可以在团队指定的参数中运行的建筑物。更糟糕的是,缺乏有关使人类通过非结构化的自然主义语言提供其规范的研究。在本文中,我们建议将目标和约束用作调节和评估自治药物的脚手架。我们通过介绍一个新颖的数据集和相关的数据收集协议来为这一领域做出贡献,该协议将语言描述映射到与人参与者为棋盘游戏风险开发的特定策略相对应的目标和约束。利用最先进的语言模型和增强程序,我们开发了一个机器学习框架,该框架可用于从非结构化策略描述中识别目标和约束。为了验证我们的方法,我们进行了一项人为主体研究,以建立我们的数据集的人类基础。我们的结果表明,与执行同一机器翻译任务的人类评估者相比,我们的机器学习体系结构能够更好地将非结构化语言描述解释为策略规范(F(1,272.53)= 17.025,p <0.001)。
translated by 谷歌翻译
诸如OpenAI的生成预训练的变压器(GPT-2/3)之类的语言模型捕获了在各种域(例如语言翻译器)和最近在游戏玩法(国际象棋,GO和Checkers)中生成文本所需的长期相关性。本研究同时应用较大的(GPT-3)和较小的(GPT-2)语言模型来探索奥赛罗(或逆转)游戏的复杂策略。鉴于《财富快速逆转》的游戏规则,语言模型不仅代表了基于以前的游戏动作的下一步动作的候选预测指标,而且还避免了游戏玩法中的稀疏奖励。语言模型会自动捕获或模拟冠军级策略。微调的GPT-2型号产生的Othello游戏范围从13-71%的完成范围不等,而较大的GPT-3型号则达到完整游戏的41%。像以前的国际象棋和Go一样,这些语言模型提供了一种新颖的方式来生成合理的游戏档案,尤其是用于比较比人类更大的样本的开放动作。这些模型的主要贡献(从两倍)放大(从1977 - 2022年开始的45年以来的45年)上放大了先前的记录,从而为研究界提供了使用其他强化学习技术进行采样的更多样化和原始的策略。
translated by 谷歌翻译
本文介绍了一种全自动的机械照明方法,以实现一般视频游戏水平的生成。使用受约束的MAP-ELITE算法和GVG-AI框架,该系统生成了最简单的基于图块的级别,该级别包含特定的游戏机制集并满足可玩性约束。我们将这种方法应用于GVG-AI的$ 4 $不同游戏的机械空间:Zelda,Solarfox,Plants和eartortals。
translated by 谷歌翻译
As text generated by large language models proliferates, it becomes vital to understand how humans engage with such text, and whether or not they are able to detect when the text they are reading did not originate with a human writer. Prior work on human detection of generated text focuses on the case where an entire passage is either human-written or machine-generated. In this paper, we study a more realistic setting where text begins as human-written and transitions to being generated by state-of-the-art neural language models. We show that, while annotators often struggle at this task, there is substantial variance in annotator skill and that given proper incentives, annotators can improve at this task over time. Furthermore, we conduct a detailed comparison study and analyze how a variety of variables (model size, decoding strategy, fine-tuning, prompt genre, etc.) affect human detection performance. Finally, we collect error annotations from our participants and use them to show that certain textual genres influence models to make different types of errors and that certain sentence-level features correlate highly with annotator selection. We release the RoFT dataset: a collection of over 21,000 human annotations paired with error classifications to encourage future work in human detection and evaluation of generated text.
translated by 谷歌翻译
Recent work pre-training Transformers with self-supervised objectives on large text corpora has shown great success when fine-tuned on downstream NLP tasks including text summarization. However, pre-training objectives tailored for abstractive text summarization have not been explored. Furthermore there is a lack of systematic evaluation across diverse domains. In this work, we propose pre-training large Transformer-based encoder-decoder models on massive text corpora with a new selfsupervised objective. In PEGASUS, important sentences are removed/masked from an input document and are generated together as one output sequence from the remaining sentences, similar to an extractive summary. We evaluated our best PEGASUS model on 12 downstream summarization tasks spanning news, science, stories, instructions, emails, patents, and legislative bills. Experiments demonstrate it achieves state-of-the-art performance on all 12 downstream datasets measured by ROUGE scores. Our model also shows surprising performance on low-resource summarization, surpassing previous state-of-the-art results on 6 datasets with only 1000 examples. Finally we validated our results using human evaluation and show that our model summaries achieve human performance on multiple datasets.
translated by 谷歌翻译
Multi-agent artificial intelligence research promises a path to develop intelligent technologies that are more human-like and more human-compatible than those produced by "solipsistic" approaches, which do not consider interactions between agents. Melting Pot is a research tool developed to facilitate work on multi-agent artificial intelligence, and provides an evaluation protocol that measures generalization to novel social partners in a set of canonical test scenarios. Each scenario pairs a physical environment (a "substrate") with a reference set of co-players (a "background population"), to create a social situation with substantial interdependence between the individuals involved. For instance, some scenarios were inspired by institutional-economics-based accounts of natural resource management and public-good-provision dilemmas. Others were inspired by considerations from evolutionary biology, game theory, and artificial life. Melting Pot aims to cover a maximally diverse set of interdependencies and incentives. It includes the commonly-studied extreme cases of perfectly-competitive (zero-sum) motivations and perfectly-cooperative (shared-reward) motivations, but does not stop with them. As in real-life, a clear majority of scenarios in Melting Pot have mixed incentives. They are neither purely competitive nor purely cooperative and thus demand successful agents be able to navigate the resulting ambiguity. Here we describe Melting Pot 2.0, which revises and expands on Melting Pot. We also introduce support for scenarios with asymmetric roles, and explain how to integrate them into the evaluation protocol. This report also contains: (1) details of all substrates and scenarios; (2) a complete description of all baseline algorithms and results. Our intention is for it to serve as a reference for researchers using Melting Pot 2.0.
translated by 谷歌翻译
Monte Carlo Tree Search (MCTS) is a recently proposed search method that combines the precision of tree search with the generality of random sampling. It has received considerable interest due to its spectacular success in the difficult problem of computer Go, but has also proved beneficial in a range of other domains. This paper is a survey of the literature to date, intended to provide a snapshot of the state of the art after the first five years of MCTS research. We outline the core algorithm's derivation, impart some structure on the many variations and enhancements that have been proposed, and summarise the results from the key game and non-game domains to which MCTS methods have been applied. A number of open research questions indicate that the field is ripe for future work.
translated by 谷歌翻译
DeepMind的游戏理论与多代理团队研究多学科学习的几个方面,从计算近似值到游戏理论中的基本概念,再到在富裕的空间环境中模拟社会困境,并在困难的团队协调任务中培训3-D类人动物。我们小组的一个签名目的是使用DeepMind在DeepMind中提供的资源和专业知识,以深入强化学习来探索复杂环境中的多代理系统,并使用这些基准来提高我们的理解。在这里,我们总结了我们团队的最新工作,并提出了一种分类法,我们认为这重点介绍了多代理研究中许多重要的开放挑战。
translated by 谷歌翻译
为了协助游戏开发人员制作游戏NPC,我们展示了EvolvingBehavior,这是一种新颖的工具,用于基因编程,以在不真实的引擎4中发展行为树4.在初步评估中,我们将演变的行为与我们的研究人员设计的手工制作的树木和随机的树木进行了比较 - 在3D生存游戏中种植的树木。我们发现,在这种情况下,EvolvingBehavior能够产生行为,以实现设计师的目标。最后,我们讨论了共同创造游戏AI设计工具的探索的含义和未来途径,以及行为树进化的挑战和困难。
translated by 谷歌翻译
创建可以自然与人类互动的代理是人工智能(AI)研究中的共同目标。但是,评估这些互动是具有挑战性的:收集在线人类代理相互作用缓慢而昂贵,但更快的代理指标通常与交互式评估相关。在本文中,我们评估了这些现有评估指标的优点,并提出了一种新颖的评估方法,称为标准化测试套件(STS)。 STS使用从真实人类交互数据中挖掘出的行为方案。代理商请参阅重播方案上下文,接收指令,然后将控制权控制以脱机完成交互。记录这些代理的延续并将其发送给人类注释者以将其标记为成功或失败,并且根据其成功的连续性比例对代理进行排名。最终的ST是自然主义相互作用的快速,控制,可解释的和代表的。总的来说,STS巩固了我们许多标准评估指标中所需的许多值,从而使我们能够加速研究进展,以生产可以自然与人类互动的代理。可以在https://youtu.be/yr1tnggorgq上找到视频。
translated by 谷歌翻译
在嘈杂的互联网规模数据集上进行了预测,已对具有广泛的文本,图像和其他模式能力的培训模型进行了大量研究。但是,对于许多顺序决策域,例如机器人技术,视频游戏和计算机使用,公开可用的数据不包含以相同方式训练行为先验所需的标签。我们通过半监督的模仿学习将互联网规模的预处理扩展到顺序的决策域,其中代理通过观看在线未标记的视频来学习行动。具体而言,我们表明,使用少量标记的数据,我们可以训练一个足够准确的反向动力学模型,可以标记一个巨大的未标记在线数据来源 - 在这里,在线播放Minecraft的在线视频 - 然后我们可以从中训练一般行为先验。尽管使用了本地人类界面(鼠标和键盘为20Hz),但我们表明,这种行为先验具有非平凡的零射击功能,并且可以通过模仿学习和加强学习,可以对其进行微调,以进行硬探索任务。不可能通过增强学习从头开始学习。对于许多任务,我们的模型都表现出人类水平的性能,我们是第一个报告可以制作钻石工具的计算机代理,这些工具可以花费超过20分钟(24,000个环境动作)的游戏玩法来实现。
translated by 谷歌翻译
工人花费大量时间学习如何做出正确的决定。但是,评估给定决策的功效可能很复杂 - 例如,决策结果通常是长期的,并且以复杂的方式与原始决策有关。令人惊讶的是,即使学习良好的决策策略很困难,它们通常可以以简单明了的形式表达。为了关注顺序决策,我们设计了一种新颖的机器学习算法,该算法能够从跟踪数据中提取“最佳实践”,并以可解释的“提示”的形式向人类传达其见解。我们的算法选择了最能弥合人类工人所采取的行动与最佳政策所采取的行动之间差距的提示,以说明行动对实现更高绩效的影响的方式。我们通过一系列参与者管理虚拟厨房的一系列随机对照实验来评估我们的方法。我们的实验表明,我们算法产生的提示可以显着改善相对于直观基准的人类性能。此外,我们讨论了许多经验见解,这些见解可以帮助告知针对人类界面的算法设计。例如,我们发现参与者不仅盲目地遵循我们的技巧的证据。相反,他们将他们与自己的经验结合在一起,以发现改善性能的其他策略。
translated by 谷歌翻译