在本文中,我们将我们的理解,以经典的AI难题的问题应用于激进的旨在的议程。自然语言理解是AI研究的子领域,看起来很容易对先驱者来说。因此,在其原始形式的情况下,将计算机假设计算机可以使用语言,挑战是假装人类智慧。事实证明,与必要的语言技能相比,下棋和正式逻辑很容易。良好的老式的AI(戈福)的技术假设符号表示是推理和人类通信的核心,包括将代表从一个思想转移到另一个思想。但是,通过这个模型,一个人发现表示在另一个人的思想中,而不出现在中间语言。人们通过思想沟通似乎似乎。具有语音接口的系统,如Alexa和Siri当然是常见的,但它们是有限的。我们而不是添加思维阅读技巧,我们介绍了一个“作弊”,使我们的系统能够假装它。作弊很简单,对计算机科学家而言只是略有兴趣,并且对哲学家来说并不有趣。然而,阅读关于审查的主题,我们“直接感知”他人的意图,我们的作弊占据了一个新的光明,本文再次看自然语言理解在人类之间的实际工作程度。
translated by 谷歌翻译
There has been a recent resurgence in the area of explainable artificial intelligence as researchers and practitioners seek to make their algorithms more understandable. Much of this research is focused on explicitly explaining decisions or actions to a human observer, and it should not be controversial to say that looking at how humans explain to each other can serve as a useful starting point for explanation in artificial intelligence. However, it is fair to say that most work in explainable artificial intelligence uses only the researchers' intuition of what constitutes a 'good' explanation. There exists vast and valuable bodies of research in philosophy, psychology, and cognitive science of how people define, generate, select, evaluate, and present explanations, which argues that people employ certain cognitive biases and social expectations towards the explanation process. This paper argues that the field of explainable artificial intelligence should build on this existing research, and reviews relevant papers from philosophy, cognitive psychology/science, and social psychology, which study these topics. It draws out some important findings, and discusses ways that these can be infused with work on explainable artificial intelligence.
translated by 谷歌翻译
大型语言模型(LLMS)具有变革性。它们是预先训练的基础模型,可以通过微调来适应许多不同的自然语言任务,以前每个任务都需要单独的网络模型。这是接近人类语言的非凡多功能性的一步。 GPT-3和最近的LAMDA可以与人类进行对话,并在最少的启动之后与许多例子进行许多主题。但是,关于这些LLM是否了解他们在说什么或表现出智力迹象的反应。在与LLM的三次访谈中得出截然不同的结论中,这种较高的差异显示出来。发现了一种新的可能性,可以解释这种分歧。实际上,LLM中似乎是智慧的是反映面试官智力的镜子,这是一个显着的转折,可以被视为反向图灵测试。如果是这样,那么通过研究访谈,我们可能会更多地了解面试官的智力和信念,而不是LLM的智能。
translated by 谷歌翻译
The success of the large neural language models on many NLP tasks is exciting. However, we find that these successes sometimes lead to hype in which these models are being described as "understanding" language or capturing "meaning". In this position paper, we argue that a system trained only on form has a priori no way to learn meaning. In keeping with the ACL 2020 theme of "Taking Stock of Where We've Been and Where We're Going", we argue that a clear understanding of the distinction between form and meaning will help guide the field towards better science around natural language understanding.
translated by 谷歌翻译
最近围绕语言处理模型的复杂性的最新炒作使人们对机器获得了类似人类自然语言的指挥的乐观情绪。人工智能中自然语言理解的领域声称在这一领域取得了长足的进步,但是,在这方面和其他学科中使用“理解”的概念性清晰,使我们很难辨别我们实际上有多近的距离。目前的方法和剩余挑战的全面,跨学科的概述尚待进行。除了语言知识之外,这还需要考虑我们特定于物种的能力,以对,记忆,标签和传达我们(足够相似的)体现和位置经验。此外,测量实际约束需要严格分析当前模型的技术能力,以及对理论可能性和局限性的更深入的哲学反思。在本文中,我将所有这些观点(哲学,认知语言和技术)团结在一起,以揭开达到真实(人类般的)语言理解所涉及的挑战。通过解开当前方法固有的理论假设,我希望说明我们距离实现这一目标的实际程度,如果确实是目标。
translated by 谷歌翻译
事实证明,在学习环境中,社会智能代理(SIA)的部署在不同的应用领域具有多个优势。社会代理创作工具使场景设计师能够创造出对SIAS行为的高度控制的量身定制体验,但是,另一方面,这是有代价的,因为该方案及其创作的复杂性可能变得霸道。在本文中,我们介绍了可解释的社会代理创作工具的概念,目的是分析社会代理的创作工具是否可以理解和解释。为此,我们检查了创作工具Fatima-Toolkit是否可以理解,并且从作者的角度来看,其创作步骤可以解释。我们进行了两项用户研究,以定量评估Fatima-Toolkit的解释性,可理解性和透明度,从场景设计师的角度来看。关键发现之一是,法蒂玛 - 库尔基特(Fatima-Toolkit)的概念模型通常是可以理解的,但是基于情感的概念并不那么容易理解和使用。尽管关于Fatima-Toolkit的解释性有一些积极的方面,但仍需要取得进展,以实现完全可以解释的社会代理商创作工具。我们提供一组关键概念和可能的解决方案,可以指导开发人员构建此类工具。
translated by 谷歌翻译
大规模的语言技术越来越多地用于与人类在不同情况下的各种形式的交流中。这些技术的一种特殊用例是对话剂,它会根据提示和查询输出自然语言文本。这种参与方式提出了许多社会和道德问题。例如,将对话剂与人类规范或价值观相结合意味着什么?它们应该与哪些规范或价值观保持一致?如何实现这一目标?在本文中,我们提出了许多步骤来帮助回答这些问题。我们首先要对对话代理人和人类对话者之间语言交流的基础进行哲学分析。然后,我们使用此分析来识别和制定理想的对话规范,这些规范可以控制人类与对话代理之间的成功语言交流。此外,我们探讨了如何使用这些规范来使对话剂与在一系列不同的话语领域中的人类价值相结合。最后,我们讨论了我们对与这些规范和价值观一致的对话代理设计的建议的实际含义。
translated by 谷歌翻译
Thanks to rapid progress in artificial intelligence, we have entered an era when technology and philosophy intersect in interesting ways. Sitting squarely at the centre of this intersection are large language models (LLMs). The more adept LLMs become at mimicking human language, the more vulnerable we become to anthropomorphism, to seeing the systems in which they are embedded as more human-like than they really are. This trend is amplified by the natural tendency to use philosophically loaded terms, such as "knows", "believes", and "thinks", when describing these systems. To mitigate this trend, this paper advocates the practice of repeatedly stepping back to remind ourselves of how LLMs, and the systems of which they form a part, actually work. The hope is that increased scientific precision will encourage more philosophical nuance in the discourse around artificial intelligence, both within the field and in the public sphere.
translated by 谷歌翻译
Developing safe and useful general-purpose AI systems will require us to make progress on scalable oversight: the problem of supervising systems that potentially outperform us on most skills relevant to the task at hand. Empirical work on this problem is not straightforward, since we do not yet have systems that broadly exceed our abilities. This paper discusses one of the major ways we think about this problem, with a focus on how to turn it into one that can be productively studied empirically. We first present an experimental design centered on choosing tasks for which human specialists succeed but unaided humans and current general AI systems fail. We then present a proof-of-concept experiment following meant to demonstrate a key feature of this experimental design and show its viability with two question-answering tasks: MMLU and time-limited QuALITY. On these tasks, we find that human participants who interact with an unreliable large-language-model dialog assistant through chat -- a trivial baseline strategy for scalable oversight -- substantially outperform both the model alone and their own unaided performance. These results are an encouraging sign that scalable oversight will be tractable to study with present models and bolster recent findings that large language models can productively assist humans with difficult tasks.
translated by 谷歌翻译
在AI研究中,到目前为止,尽管这一方面在智能系统的功能中突出特征,但对功能和负担的表征和代表的表征和代表的关注一直是零星和稀疏的。迄今为止,零星和稀疏的稀疏努力是对功能和负担的表征和理解,也没有一般框架可以统一与功能概念的表示和应用有关的所有不同使用域和情况。本文开发了这样的一般框架,一种方法强调了一个事实,即所涉及的表示必须是明确的认知和概念性的,它们还必须包含有关涉及的事件和过程的因果特征,并采用了概念上的结构,这些概念结构是扎根的为了达到最大的通用性,他们所指的指南。描述了基本的一般框架,以及一组有关功能表示的基本指南原则。为了正确,充分地表征和表示功能,需要一种描述性表示语言。该语言是定义和开发的,并描述了其使用的许多示例。一般框架是基于一般语言含义表示代表框架的概念依赖性的扩展而开发的。为了支持功能的一般表征和表示,基本的概念依赖框架通过称为结构锚和概念依赖性阐述的代表性设备以及一组地面概念的定义来增强。这些新颖的代表性构建体得到了定义,开发和描述。处理功能的一般框架将代表实现人工智能的重大步骤。
translated by 谷歌翻译
我们介绍了Sparrow,这是一个寻求信息的对话代理,与提示的语言模型基线相比,训练有素,更有帮助,正确和无害。我们使用从人类反馈中的强化学习来培训我们的模型,以帮助人类评估者判断代理人的行为。首先,为了使我们的代理人更有帮助和无害,我们将良好对话的要求分解为代理人应遵循的自然语言规则,并分别向评估者询问每个规则。我们证明,这种崩溃使我们能够收集对代理行为的更多针对性的人类判断,并允许更有效的规则条件奖励模型。其次,我们的代理商在收集对模型声明的偏好判决时提供了支持事实主张的来源的证据。对于事实问题,麻雀提供的证据支持了78%的时间。比基线比基线更享受麻雀,同时对人类的对抗性探测更具弹性,在探测时只有8%的时间违反了我们的规则。最后,我们进行了广泛的分析,表明尽管我们的模型学会遵守我们的规则,但它可以表现出分布偏见。
translated by 谷歌翻译
Recent developments in natural language generation (NLG) using neural language models have brought us closer than ever to the goal of building AI-powered creative writing tools. However, most prior work on human-AI collaboration in the creative writing domain has evaluated new systems with amateur writers, typically in contrived user studies of limited scope. In this work, we commissioned 13 professional, published writers from a diverse set of creative writing backgrounds to craft stories using Wordcraft, a text editor with built-in AI-powered writing assistance tools. Using interviews and participant journals, we discuss the potential of NLG to have significant impact in the creative writing domain--especially with respect to brainstorming, generation of story details, world-building, and research assistance. Experienced writers, more so than amateurs, typically have well-developed systems and methodologies for writing, as well as distinctive voices and target audiences. Our work highlights the challenges in building for these writers; NLG technologies struggle to preserve style and authorial voice, and they lack deep understanding of story contents. In order for AI-powered writing assistants to realize their full potential, it is essential that they take into account the diverse goals and expertise of human writers.
translated by 谷歌翻译
Recent progress in artificial intelligence (AI) has renewed interest in building systems that learn and think like people. Many advances have come from using deep neural networks trained end-to-end in tasks such as object recognition, video games, and board games, achieving performance that equals or even beats humans in some respects. Despite their biological inspiration and performance achievements, these systems differ from human intelligence in crucial ways. We review progress in cognitive science suggesting that truly human-like learning and thinking machines will have to reach beyond current engineering trends in both what they learn, and how they learn it. Specifically, we argue that these machines should (a) build causal models of the world that support explanation and understanding, rather than merely solving pattern recognition problems; (b) ground learning in intuitive theories of physics and psychology, to support and enrich the knowledge that is learned; and (c) harness compositionality and learning-to-learn to rapidly acquire and generalize knowledge to new tasks and situations. We suggest concrete challenges and promising routes towards these goals that can combine the strengths of recent neural network advances with more structured cognitive models.
translated by 谷歌翻译
We propose the Detailed Outline Control (DOC) framework for improving long-range plot coherence when automatically generating several-thousand-word-long stories. DOC consists of two complementary components: a detailed outliner and a detailed controller. The detailed outliner creates a more detailed, hierarchically structured outline, shifting creative burden from the main drafting procedure to the planning stage. The detailed controller ensures the more detailed outline is still respected during generation by controlling story passages to align with outline details. In human evaluations of automatically generated stories, DOC substantially outperforms a strong Re3 baseline (Yang et al., 2022) on plot coherence (22.5% absolute gain), outline relevance (28.2%), and interestingness (20.7%). Humans also judged DOC to be much more controllable in an interactive generation setting.
translated by 谷歌翻译
Curiosity for machine agents has been a focus of lively research activity. The study of human and animal curiosity, particularly specific curiosity, has unearthed several properties that would offer important benefits for machine learners, but that have not yet been well-explored in machine intelligence. In this work, we conduct a comprehensive, multidisciplinary survey of the field of animal and machine curiosity. As a principal contribution of this work, we use this survey as a foundation to introduce and define what we consider to be five of the most important properties of specific curiosity: 1) directedness towards inostensible referents, 2) cessation when satisfied, 3) voluntary exposure, 4) transience, and 5) coherent long-term learning. As a second main contribution of this work, we show how these properties may be implemented together in a proof-of-concept reinforcement learning agent: we demonstrate how the properties manifest in the behaviour of this agent in a simple non-episodic grid-world environment that includes curiosity-inducing locations and induced targets of curiosity. As we would hope, our example of a computational specific curiosity agent exhibits short-term directed behaviour while updating long-term preferences to adaptively seek out curiosity-inducing situations. This work, therefore, presents a landmark synthesis and translation of specific curiosity to the domain of machine learning and reinforcement learning and provides a novel view into how specific curiosity operates and in the future might be integrated into the behaviour of goal-seeking, decision-making computational agents in complex environments.
translated by 谷歌翻译
在数字治疗干预的背景下,例如互联网交付的认知行为治疗(ICBT)用于治疗抑郁和焦虑,广泛的研究表明,人类支持者或教练的参与如何协助接受治疗的人,改善用户参与治疗并导致更有效的健康结果而不是不受支持的干预措施。该研究旨在最大限度地提高这一人类支持的影响和结果,研究了通过AI和机器学习领域(ML)领域的最新进展提供的新机遇如何有助于有效地支持ICBT支持者的工作实践。本文报告了采访研究的详细调查结果,与15个ICBT支持者加深了解其现有的工作实践和信息需求,旨在有意义地向抑郁和焦虑治疗的背景下提供有用,可实现的ML申请。分析贡献(1)一组六个主题,总结了ICBT支持者在为其精神卫生客户提供有效,个性化反馈方面的策略和挑战;并回应这些学习,(2)对于ML方法如何帮助支持和解决挑战和信息需求,为每个主题提供具体机会。它依赖于在支持者LED客户审查实践中引入新的机器生成的数据见解的潜在社会,情感和务实含义的思考。
translated by 谷歌翻译
我们最近开始一个项目,为来自背景知识的后推推,以促进深入自然语言理解的制定更有效和有效的方式。单词的含义被认为是它增加了持续情况的实体,预测,预设和潜在推论。随着单词组成,情况下的最小模型演变为限制和直接推理。此时我们开发了我们的计算架构并在真实文本上实现了它。我们的重点是证明了我们设计的可行性。
translated by 谷歌翻译
Intelligent agents have great potential as facilitators of group conversation among older adults. However, little is known about how to design agents for this purpose and user group, especially in terms of agent embodiment. To this end, we conducted a mixed methods study of older adults' reactions to voice and body in a group conversation facilitation agent. Two agent forms with the same underlying artificial intelligence (AI) and voice system were compared: a humanoid robot and a voice assistant. One preliminary study (total n=24) and one experimental study comparing voice and body morphologies (n=36) were conducted with older adults and an experienced human facilitator. Findings revealed that the artificiality of the agent, regardless of its form, was beneficial for the socially uncomfortable task of conversation facilitation. Even so, talkative personality types had a poorer experience with the "bodied" robot version. Design implications and supplementary reactions, especially to agent voice, are also discussed.
translated by 谷歌翻译
This volume contains revised versions of the papers selected for the third volume of the Online Handbook of Argumentation for AI (OHAAI). Previously, formal theories of argument and argument interaction have been proposed and studied, and this has led to the more recent study of computational models of argument. Argumentation, as a field within artificial intelligence (AI), is highly relevant for researchers interested in symbolic representations of knowledge and defeasible reasoning. The purpose of this handbook is to provide an open access and curated anthology for the argumentation research community. OHAAI is designed to serve as a research hub to keep track of the latest and upcoming PhD-driven research on the theory and application of argumentation in all areas related to AI.
translated by 谷歌翻译
Despite recent advances of AI research in many application-specific domains, we do not know how to build a human-level artificial intelligence (HLAI). We conjecture that learning from others' experience with the language is the essential characteristic that distinguishes human intelligence from the rest. Humans can update the action-value function with the verbal description as if they experience states, actions, and corresponding rewards sequences firsthand. In this paper, we present a classification of intelligence according to how individual agents learn and propose a definition and a test for HLAI. The main idea is that language acquisition without explicit rewards can be a sufficient test for HLAI.
translated by 谷歌翻译