Rapid progress in machine learning and artificial intelligence (AI) hasbrought increasing attention to the potential impacts of AI technologies onsociety. In this paper we discuss one such potential impact: the problem ofaccidents in machine learning systems, defined as unintended and harmfulbehavior that may emerge from poor design of real-world AI systems. We presenta list of five practical research problems related to accident risk,categorized according to whether the problem originates from having the wrongobjective function ("avoiding side effects" and "avoiding reward hacking"), anobjective function that is too expensive to evaluate frequently ("scalablesupervision"), or undesirable behavior during the learning process ("safeexploration" and "distributional shift"). We review previous work in theseareas as well as suggesting research directions with a focus on relevance tocutting-edge AI systems. Finally, we consider the high-level question of how tothink most productively about the safety of forward-looking applications of AI.
translated by 谷歌翻译
We survey eight research areas organized around one question: As learning systems become increasingly intelligent and autonomous, what design principles can best ensure that their behavior is aligned with the interests of the operators? We focus on two major technical obstacles to AI alignment: the challenge of specifying the right kind of objective functions, and the challenge of designing AI systems that avoid unintended consequences and undesirable behavior even in cases where the objective function does not line up perfectly with the intentions of the designers. Open problems surveyed in this research proposal include: How can we train reinforcement learners to take actions that are more amenable to meaningful assessment by intelligent overseers? What kinds of objective functions incen-tivize a system to "not have an overly large impact" or "not have many side effects"? We discuss these questions, related work, and potential directions for future research, with the goal of highlighting relevant research topics in machine learning that appear tractable today.
translated by 谷歌翻译
A fundamental problem in artificial intelligence is that nobody really knows what intelligence is. The problem is especially acute when we need to consider artificial systems which are significantly different to humans. In this paper we approach this problem in the following way: We take a number of well known informal definitions of human intelligence that have been given by experts, and extract their essential features. These are then mathematically formalised to produce a general measure of intelligence for arbitrary machines. We believe that this equation formally captures the concept of machine intelligence in the broadest reasonable sense. We then show how this formal definition is related to the theory of universal optimal learning agents. Finally, we survey the many other tests and definitions of intelligence that have been proposed for machines.
translated by 谷歌翻译
将强化学习算法应用于现实问题的一个障碍是缺乏合适的奖励函数。设计这样的奖励功能很困难,部分原因是用户只对任务目标有一个隐含的理解。这引起了代理对齐问题:我们如何创建符合用户意图的代理?我们概述了一个高级别的研究方向,以解决以奖励建模为中心的代理人对齐问题:通过与用户的交互来学习奖励功能,并通过强化学习优化学习的奖励功能。我们讨论了我们期望面临的关键挑战,即将复合和一般领域的奖励建模,具体方法应对这些挑战,以及如何在最终代理中建立信任。
translated by 谷歌翻译
为了使AI系统广泛用于挑战现实世界的任务,我们需要他们学习复杂的人类目标和偏好。指定复杂目标的一种方法要求人们在训练期间判断哪些代理行为是安全且有用的,但如果任务过于复杂,人们无法直接判断,这种方法可能会失败。为了解决这个问题,我们在零和辩论游戏中通过自我游戏提出训练。给定一个问题或建议的行动,两个代理人轮流将简短的陈述发送到极限,然后人工判断哪个代理人提供了最真实,最有用的信息。在复杂性理论的类型学中,最佳游戏的辩论可以在PSPACE中回答任何问题,给出多项式时间判断(直接判断只回答NP问题)。在实践中,辩论是否有效涉及人类的经验问题和我们希望AI执行的任务,以及关于AI对齐的意义的理论问题。我们在初始MNISTexperiment上报告结果,其中代理商竞争说服稀疏分类器,在给定6个像素的情况下将分类器的准确度从59.4%提高到88.9%,并且在给定4个像素的情况下从48.2%提高到85.2%。最后,我们讨论辩论模型的理论和实践方面,关注模型扩展时的潜在弱点,并提出未来的人类和计算机实验来测试这些属性。
translated by 谷歌翻译
人工智能(AI)的最新进展使人们重新建立了像人一样学习和思考的系统。许多进步来自于在对象识别,视频游戏和棋盘游戏等任务中使用端到端训练的深度神经网络,在某些方面实现了与人类相当的性能。尽管他们的生物灵感和性能成就,这些系统不同于人类智能的不规则方式。我们回顾了认知科学的进展,表明真正的人类学习和思维机器将不得不超越当前的工程学习趋势,以及他们如何学习它。具体而言,我们认为这些机器应该(a)构建世界的因果模型支持解释和理解,而不仅仅是解决模式识别问题; (b)在物理学和心理学的直觉理论中进行基础学习,以支持和丰富所学知识;以及(c)利用组合性和学习 - 学习快速获取知识并将其推广到新的任务和情境。我们建议针对这些目标的具体挑战和有希望的途径,这些目标可以将最近神经网络进步的强度与更结构化的认知模型结合起来。
translated by 谷歌翻译
现代人工智能和机器人系统的特点是高度且不断增加的自主性。与此同时,它们在自动驾驶,服务机器人和数字个人助理等领域的应用也向人类转移。从这两种发展的结合出现了人工智能领域,它认识到自动机器的行为需要时间维度,并试图回答我们如何建立道德机器的问题。在本文中,我们主张从Aristotelianvirtue伦理学中汲取灵感,表明它与现代的AIdue形成了一种合适的组合,专注于从经验中学习。我们进一步提出从道德范例中学习学习,这是美德伦理的核心概念,可以解决价值对齐问题。最后,我们表明,赋予人类节制和友谊的优点的知识分子系统不会构成控制问题,因为它不会有无限的自我改进的欲望。
translated by 谷歌翻译
We introduce a novel unsupervised algorithm for text segmentation. We re-conceptualize text segmentation as a graph-partitioning task aiming to optimize the normalized-cut criterion. Central to this framework is a contrastive analysis of lexical distribution that simultaneously optimizes the total similarity within each segment and dissimilarity across segments. Our experimental results show that the normalized-cut algorithm obtains performance improvements over the state-of-the-art techniques on the task of spoken lecture segmenta-tion. Another attractive property of the algorithm is robustness to noise. The accuracy of our algorithm does not deteriorate significantly when applied to automatically recognized speech. The impact of the novel segmentation framework extends beyond the text segmen-tation domain. We demonstrate the power of the model by applying it to the segmentation of raw acoustic signal without intermediate speech recognition.
translated by 谷歌翻译
One might imagine that AI systems with harmless goals will be harmless. This paper instead shows that intelligent systems will need to be carefully designed to prevent them from behaving in harmful ways. We identify a number of "drives" that will appear in sufficiently advanced AI systems of any design. We call them drives because they are tendencies which will be present unless explicitly counteracted. We start by showing that goal-seeking systems will have drives to model their own operation and to improve themselves. We then show that self-improving systems will be driven to clarify their goals and represent them as economic utility functions. They will also strive for their actions to approximate rational economic behavior. This will lead almost all systems to protect their utility functions from modification and their utility measurement systems from corruption. We also discuss some exceptional systems which will want to modify their utility functions. We next discuss the drive toward self-protection which causes systems try to prevent themselves from being harmed. Finally we examine drives toward the acquisition of resources and toward their efficient utilization. We end with a discussion of how to incorporate these insights in designing intelligent technology which will lead to a positive future for humanity.
translated by 谷歌翻译
随着研究人员和从业人员试图使他们的算法更容易理解,最近在可解释的人工智能领域再次出现了复苏。这项研究的大部分内容都集中在明确地向人类观察者解释决策或行动,而且不应该引起争议,看看人类如何相互解释可以作为人工智能解释的有用起点。然而,有人说,可解释的人工智能中的大部分工作只使用了研究者对什么构成“好”解释的直觉。在哲学,心理学和人类如何定义,生成,选择,评估和表达计划方面存在着大量有价值的研究,这些研究认为人们对解释过程采用了某种认知偏见和社会期望。本文认为,可解释的人工智能领域应建立在这一现有研究的基础上,并回顾研究这些主题的哲学,认知心理学/科学和社会心理学相关论文。它提出了一些重要的发现,并讨论了如何通过可解释的人工智能进行工作。
translated by 谷歌翻译
translated by 谷歌翻译
The development of intelligent machines is one of the biggest unsolved challenges in computer science. In this paper, we propose some fundamental properties these machines should have, focusing in particular on communication and learning. We discuss a simple environment that could be used to incrementally teach a machine the basics of natural-language-based communication, as a prerequisite to more complex interaction with human users. We also present some conjectures on the sort of algorithms the machine should support in order to profitably learn from the environment.
translated by 谷歌翻译
设计,使用并受自主人工智能代理影响的人希望能够“信任”这些代理人 - 即知道这些代理人将正确执行,理解他们行为背后的推理,并知道如何适当地使用它们。已经设计了许多技术来评估和影响人类对智能代理人的信任。然而,这些方法通常是特殊的,并且彼此之间或正式的信任模型之间并未正式相关。本文介绍了\ emph {算法保证}的调查,即代理操作的程序化组件,这些组件明确地设计为用户对人工智能代理的信任。算法保证首先从正式建模的人工智能代理信任关系的角度进行正式定义和分类。基于这些定义,机器学习,人机交互,机器人,电子商务等社区研究的综合表明,保证算法在对代理核心功能的影响方面自然会落在一个范围内,其中有七个值得注意的类别。积分保证(影响代理人的核心功能)供应保证(对代理人绩效没有直接影响)。确定并讨论了每个类别中的共同方法;还研究了不同方法的利弊。
translated by 谷歌翻译
Much research in artificial intelligence is concerned with the development of autonomous agents that can interact effectively with other agents. An important aspect of such agents is the ability to reason about the behaviours of other agents, by constructing models which make predictions about various properties of interest (such as actions, goals, beliefs) of the modelled agents. A variety of modelling approaches now exist which vary widely in their methodology and underlying assumptions, catering to the needs of the different sub-communities within which they were developed and reflecting the different practical uses for which they are intended. The purpose of the present article is to provide a comprehensive survey of the salient modelling methods which can be found in the literature. The article concludes with a discussion of open problems which may form the basis for fruitful future research.
translated by 谷歌翻译
The work toward attaining "artificial intelligence" is the center of considerable computer research, design, and application. The field is in its starting transient, characterized by many varied and independent efforts. Marvin Minsky has been requested to draw this work together into a coherent summary, supplement it with appropriate explanatory or theoretical noncomputer information, and introduce his assessment of the state-of-the-art. This paper emphasizes the class of activities in which a general purpose computer, complete with a library of basic programs, is further programmed to perform operations leading to ever higher-level information processing functions such as learning and problem solving. This informative article will be of real interest to both the general PROCEEDINGS reader and the computer specialist.
translated by 谷歌翻译
This is the fourth workshop on Knowledge and Reasoning in Practical Dialogue Systems. The first workshop was organised at IJCAI-99 in Stockholm, 1 the second workshop took place at IJCAI-2001 in Seattle, 2 and the third workshop was held at IJCAI-2003 in Acapulco. 3 The current workshop includes research in three main areas: dialogue management , adaptive discourse planning, and automatic learning of dialogue policies. Probabilistic and machine learning techniques have significant representation , and the main applications are in robotics and information-providing systems. These workshop notes contain 12 papers that address these issues from various viewpoints. The papers provide stimulating ideas and we believe that they function as a fruitful basis for discussions and further research. The program committee consisted of the colleagues listed below, who were assisted by three additional reviewers. Without the time spent reviewing the submissions and the thoughtful comments provided by these colleagues, the decision process would have been much more difficult. We would like to express our warmest thanks to them all. 1 Selected contributions have been published in a special issue of ETAI, the Electronic Transaction of Artificial Intelligence
translated by 谷歌翻译