我们考虑将机器学习模型相结合以执行更高级别的认知任务和明确规格的问题。我们提出了新的视觉歧视难题(VDP)的新问题,该问题需要找到可解释的歧视因子,这些歧视因子根据逻辑规范对图像进行分类。人类可以轻松解决这些难题,并提供强大,可验证和可解释的歧视者作为答案。我们提出了一个组成神经肌符号框架,该框架结合了一个神经网络,以检测对象和与符号学习者的关系,以发现可解释的歧视者。我们创建了涉及自然图像的大量VDP数据集,并表明与几种纯粹的神经方法相比,我们的神经成像框架表现出色。
translated by 谷歌翻译
归纳逻辑编程(ILP)是一种机器学习的形式。ILP的目标是诱导推广培训示例的假设(一组逻辑规则)。随着ILP转30,我们提供了对该领域的新介绍。我们介绍了必要的逻辑符号和主要学习环境;描述ILP系统的构建块;比较几个维度的几个系统;描述四个系统(Aleph,Tilde,Aspal和Metagol);突出关键应用领域;最后,总结了未来研究的当前限制和方向。
translated by 谷歌翻译
尽管在现代的机器学习算法的最新进展,其内在机制的不透明仍是采用的障碍。在人工智能系统灌输信心和信任,解释的人工智能已成为提高现代机器学习算法explainability的响应。归纳逻辑程序(ILP),符号人工智能的子场中,起着产生,因为它的直观的逻辑驱动框架的可解释的解释有希望的作用。 ILP有效利用绎推理产生从实例和背景知识解释的一阶分句理论。然而,在发展中通过ILP需要启发方法的几个挑战,在实践中他们的成功应用来解决。例如,现有的ILP系统通常拥有广阔的解空间,以及感应解决方案是对噪声和干扰非常敏感。本次调查总结在ILP的最新进展和统计关系学习和神经象征算法的讨论,其中提供给ILP协同意见。继最新进展的严格审查,我们划定观察的挑战,突出对发展不言自明的人工智能系统进一步ILP动机研究的潜在途径。
translated by 谷歌翻译
Neural-symbolic computing (NeSy), which pursues the integration of the symbolic and statistical paradigms of cognition, has been an active research area of Artificial Intelligence (AI) for many years. As NeSy shows promise of reconciling the advantages of reasoning and interpretability of symbolic representation and robust learning in neural networks, it may serve as a catalyst for the next generation of AI. In the present paper, we provide a systematic overview of the important and recent developments of research on NeSy AI. Firstly, we introduce study history of this area, covering early work and foundations. We further discuss background concepts and identify key driving factors behind the development of NeSy. Afterward, we categorize recent landmark approaches along several main characteristics that underline this research paradigm, including neural-symbolic integration, knowledge representation, knowledge embedding, and functionality. Then, we briefly discuss the successful application of modern NeSy approaches in several domains. Finally, we identify the open problems together with potential future research directions. This survey is expected to help new researchers enter this rapidly-developing field and accelerate progress towards data-and knowledge-driven AI.
translated by 谷歌翻译
主张神经符号人工智能(NESY)断言,将深度学习与象征性推理相结合将导致AI更强大,而不是本身。像深度学习一样成功,人们普遍认为,即使我们最好的深度学习系统也不是很擅长抽象推理。而且,由于推理与语言密不可分,因此具有直觉的意义,即自然语言处理(NLP)将成为NESY特别适合的候选人。我们对实施NLP实施NESY的研究进行了结构化审查,目的是回答Nesy是否确实符合其承诺的问题:推理,分布概括,解释性,学习和从小数据的可转让性以及新的推理到新的域。我们研究了知识表示的影响,例如规则和语义网络,语言结构和关系结构,以及隐式或明确的推理是否有助于更高的承诺分数。我们发现,将逻辑编译到神经网络中的系统会导致满足最NESY的目标,而其他因素(例如知识表示或神经体系结构的类型)与实现目标没有明显的相关性。我们发现在推理的定义方式上,特别是与人类级别的推理有关的许多差异,这会影响有关模型架构的决策并推动结论,这些结论在整个研究中并不总是一致的。因此,我们倡导采取更加有条不紊的方法来应用人类推理的理论以及适当的基准的发展,我们希望这可以更好地理解该领域的进步。我们在GitHub上提供数据和代码以进行进一步分析。
translated by 谷歌翻译
Visual question answering is fundamentally compositional in nature-a question like where is the dog? shares substructure with questions like what color is the dog? and where is the cat? This paper seeks to simultaneously exploit the representational capacity of deep networks and the compositional linguistic structure of questions. We describe a procedure for constructing and learning neural module networks, which compose collections of jointly-trained neural "modules" into deep networks for question answering. Our approach decomposes questions into their linguistic substructures, and uses these structures to dynamically instantiate modular networks (with reusable components for recognizing dogs, classifying colors, etc.). The resulting compound networks are jointly trained. We evaluate our approach on two challenging datasets for visual question answering, achieving state-of-the-art results on both the VQA natural image dataset and a new dataset of complex questions about abstract shapes.
translated by 谷歌翻译
Artificial Intelligence (AI) and its applications have sparked extraordinary interest in recent years. This achievement can be ascribed in part to advances in AI subfields including Machine Learning (ML), Computer Vision (CV), and Natural Language Processing (NLP). Deep learning, a sub-field of machine learning that employs artificial neural network concepts, has enabled the most rapid growth in these domains. The integration of vision and language has sparked a lot of attention as a result of this. The tasks have been created in such a way that they properly exemplify the concepts of deep learning. In this review paper, we provide a thorough and an extensive review of the state of the arts approaches, key models design principles and discuss existing datasets, methods, their problem formulation and evaluation measures for VQA and Visual reasoning tasks to understand vision and language representation learning. We also present some potential future paths in this field of research, with the hope that our study may generate new ideas and novel approaches to handle existing difficulties and develop new applications.
translated by 谷歌翻译
近年来,随着新颖的策略和应用,神经网络一直在迅速扩展。然而,尽管不可避免地会针对关键应用程序来解决这些挑战,例如神经网络技术诸如神经网络技术中仍未解决诸如神经网络技术的挑战。已经尝试通过用符号表示来表示和嵌入域知识来克服神经网络计算中的挑战。因此,出现了神经符号学习(Nesyl)概念,其中结合了符号表示的各个方面,并将常识带入神经网络(Nesyl)。在可解释性,推理和解释性至关重要的领域中,例如视频和图像字幕,提问和推理,健康信息学和基因组学,Nesyl表现出了有希望的结果。这篇综述介绍了一项有关最先进的Nesyl方法的全面调查,其原理,机器和深度学习算法的进步,诸如Opthalmology之类的应用以及最重要的是该新兴领域的未来观点。
translated by 谷歌翻译
人工智能代理必须从周围环境中学到学习,并了解所学习的知识,以便做出决定。虽然从数据的最先进的学习通常使用子符号分布式表示,但是使用用于知识表示的一阶逻辑语言,推理通常在更高的抽象级别中有用。结果,将符号AI和神经计算结合成神经符号系统的尝试已经增加。在本文中,我们呈现了逻辑张量网络(LTN),一种神经组织形式和计算模型,通过引入许多值的端到端可分别的一阶逻辑来支持学习和推理,称为真实逻辑作为表示语言深入学习。我们表明LTN为规范提供了统一的语言,以及多个AI任务的计算,如数据聚类,多标签分类,关系学习,查询应答,半监督学习,回归和嵌入学习。我们使用TensorFlow2的许多简单的解释例实施和说明上述每个任务。关键词:神经组音恐怖症,深度学习和推理,许多值逻辑。
translated by 谷歌翻译
近年来,视觉问题应答(VQA)在近年来,由于了解来自多种方式的信息(即图像,语言),近年来近年来在近年来的机器学习社区中获得了很多牵引力。在VQA中,基于一组图像提出了一系列问题,并且手头的任务是到达答案。为实现这一目标,我们采用了一种基于象征的推理方法,使用正式逻辑框架。图像和问题被转换为执行显式推理的符号表示。我们提出了一种正式的逻辑框架,其中(i)图像在场景图的帮助下将图像转换为逻辑背景事实,(ii)问题被基于变压器的深度学习模型转换为一阶谓词逻辑条款,(iii)通过使用背景知识和谓词条款的接地来执行可靠性检查,以获得答案。我们所提出的方法是高度解释的,并且可以通过人容易地分析管道中的每个步骤。我们验证了我们在CLEVR和GQA数据集上的方法。我们在Clevr DataSet上实现了99.6%的近似完美的准确性,可与艺术模式相当,展示正式逻辑是一个可行的工具来解决视觉问题的回答。我们的模型也是数据高效,在仅在培训数据的10%培训时,在缩放数据集中实现99.1%的准确性。
translated by 谷歌翻译
我们考虑从示例中学习复合代数表达式语义的问题。结果是一个多功能框架,用于研究可以放入以下抽象形式中的学习任务:输入是部分代数$ \ alg $和一组有限的示例$(\ varphi_1,o_1),(\ varphi_2,o_2,o_2),\ ldots $,每个由代数项$ \ varphi_i $和一组对象〜$ o_i $组成。目的是在$ \ alg $中同时填写缺失的代数操作,并将每个$ \ varphi_i $的变量填充$ o_i $,以便优化条款的合并价值。我们通过案例研究在语法推理,图像学习和逻辑场景描述的基础中证明了该框架的适用性。
translated by 谷歌翻译
内容的离散和连续表示(例如,语言或图像)具有有趣的属性,以便通过机器的理解或推理此内容来探索或推理。该职位论文提出了我们关于离散和持续陈述的作用及其在深度学习领域的作用的意见。目前的神经网络模型计算连续值数据。信息被压缩成密集,分布式嵌入式。通过Stark对比,人类在他们的语言中使用离散符号。此类符号代表了来自共享上下文信息的含义的世界的压缩版本。此外,人工推理涉及在认知水平处符号操纵,这促进了抽象的推理,知识和理解的构成,泛化和高效学习。通过这些见解的动机,在本文中,我们认为,结合离散和持续的陈述及其处理对于构建展示一般情报形式的系统至关重要。我们建议并讨论了几个途径,可以在包含离散元件来结合两种类型的陈述的优点来改进当前神经网络。
translated by 谷歌翻译
Knowledge about space and time is necessary to solve problems in the physical world: An AI agent situated in the physical world and interacting with objects often needs to reason about positions of and relations between objects; and as soon as the agent plans its actions to solve a task, it needs to consider the temporal aspect (e.g., what actions to perform over time). Spatio-temporal knowledge, however, is required beyond interacting with the physical world, and is also often transferred to the abstract world of concepts through analogies and metaphors (e.g., "a threat that is hanging over our heads"). As spatial and temporal reasoning is ubiquitous, different attempts have been made to integrate this into AI systems. In the area of knowledge representation, spatial and temporal reasoning has been largely limited to modeling objects and relations and developing reasoning methods to verify statements about objects and relations. On the other hand, neural network researchers have tried to teach models to learn spatial relations from data with limited reasoning capabilities. Bridging the gap between these two approaches in a mutually beneficial way could allow us to tackle many complex real-world problems, such as natural language processing, visual question answering, and semantic image segmentation. In this chapter, we view this integration problem from the perspective of Neuro-Symbolic AI. Specifically, we propose a synergy between logical reasoning and machine learning that will be grounded on spatial and temporal knowledge. Describing some successful applications, remaining challenges, and evaluation datasets pertaining to this direction is the main topic of this contribution.
translated by 谷歌翻译
实际上,所有验证和综合技术都假定正式规格很容易获得,在功能上正确并完全匹配工程师对给定系统的理解。但是,在实践中,这种假设通常是不现实的:正式化系统要求非常困难,容易出错,并且需要大量的培训。为了减轻这一严重的障碍,我们提出了一种从根本上新颖的编写形式规范的方法,称为线性时间逻辑(LTL)的规范草图。关键的想法是,工程师可以提供部分LTL公式,称为LTL草图,在该公式中很难形式化。给定一组描述规范应该或不应允许的系统行为的示例,然后将所谓的草图算法的任务完成给定的草图,以使所得的LTL公式与示例一致。我们表明,决定是否可以完成草图属于复杂性NP,并呈现两个基于SAT的草图算法。我们还证明,素描是使用原型实现编写形式规格的实用方法。
translated by 谷歌翻译
Despite progress in perceptual tasks such as image classification, computers still perform poorly on cognitive tasks such as image description and question answering. Cognition is core to tasks that involve not just recognizing, but reasoning about our visual world. However, models used to tackle the rich content in images for cognitive tasks are still being trained using the same datasets designed for perceptual tasks. To achieve success at cognitive tasks, models need to understand the interactions and relationships between objects in
translated by 谷歌翻译
We introduce GQA, a new dataset for real-world visual reasoning and compositional question answering, seeking to address key shortcomings of previous VQA datasets. We have developed a strong and robust question engine that leverages Visual Genome scene graph structures to create 22M diverse reasoning questions, which all come with functional programs that represent their semantics. We use the programs to gain tight control over the answer distribution and present a new tunable smoothing technique to mitigate question biases. Accompanying the dataset is a suite of new metrics that evaluate essential qualities such as consistency, grounding and plausibility. A careful analysis is performed for baselines as well as state-of-the-art models, providing fine-grained results for different question types and topologies. Whereas a blind LSTM obtains a mere 42.1%, and strong VQA models achieve 54.1%, human performance tops at 89.3%, offering ample opportunity for new research to explore. We hope GQA will provide an enabling resource for the next generation of models with enhanced robustness, improved consistency, and deeper semantic understanding of vision and language.
translated by 谷歌翻译
最近,几种技术旨在通过合并背景知识来提高场景图生成(SGG)的深度学习模型的性能。最先进的技术可以分为两个家庭:一个以潜在的方式将背景知识纳入模型,而另一种则以象征性形式保持背景知识。尽管有希望的结果,但两个技术家族都面临着几个缺点:第一个需要临时,更复杂的神经体系结构来增加培训或推理成本;第二个遭受有限的可伸缩性W.R.T.背景知识的大小。我们的工作引入了一种正则化技术,将符号背景知识注入神经SGG模型,以克服先前的艺术局限性。我们的技术是模型不合时宜的,在推理时间不会产生任何成本,并缩放到以前难以管理的背景知识规模。我们证明我们的技术可以提高最新SGG模型的准确性,最多可提高33%。
translated by 谷歌翻译
象征性推理,基于规则的符号操作,是人类智慧的标志。然而,基于规则的系统的成功有限与基于学习的系统在外面的正式域之外的竞争中,例如自动定理证明。我们假设这是由于过去尝试中的规则的手动构建。在这项工作中,我们询问我们如何构建基于规则的系统,可以推理自然语言输入,但没有手动构建规则。我们提出了Metaqnl,这是一种“准自然”语言,可以表达正式逻辑和自然语言句子,并梅多斯诱惑,一种学习算法,它从训练数据组成的训练和答案,有或没有中间推理步骤。我们的方法在多个推理基准上实现了最先进的准确性;它学习具有更少数据的紧凑型号,不仅可以答案,而且产生答案。此外,对现实世界的形态学分析基准测试的实验表明,我们可以处理噪音和歧义。代码将在https://github.com/princeton-vl/metaqnl发布。
translated by 谷歌翻译
人工智能的最终目标之一是从原始数据中学习通用和人类解剖知识。神经符号推理方法通过使用手动设计的符号知识库改善神经网络的训练来部分解决此问题。在从原始数据中学到符号知识的情况下,该知识缺乏解决复杂问题所需的表现力。在本文中,我们介绍了神经符号归纳学习者(NSIL),该方法训练神经网络从原始数据中提取潜在概念,而学习符号知识可以解决复杂问题,该知识是根据这些潜在概念定义的。我们方法的新颖性是一种基于神经和符号成分的训练性能,使符号学习者偏向于学习改进的知识的方法。我们评估了两个问题领域的NSIL,这些问题领域需要具有不同级别的复杂性学习知识,并证明NSIL学习知识,而这些知识是不可能使用其他神经符号系统学习的知识,同时就准确性和数据效率而言优于基线模型。
translated by 谷歌翻译
Scene graph generation from images is a task of great interest to applications such as robotics, because graphs are the main way to represent knowledge about the world and regulate human-robot interactions in tasks such as Visual Question Answering (VQA). Unfortunately, its corresponding area of machine learning is still relatively in its infancy, and the solutions currently offered do not specialize well in concrete usage scenarios. Specifically, they do not take existing "expert" knowledge about the domain world into account; and that might indeed be necessary in order to provide the level of reliability demanded by the use case scenarios. In this paper, we propose an initial approximation to a framework called Ontology-Guided Scene Graph Generation (OG-SGG), that can improve the performance of an existing machine learning based scene graph generator using prior knowledge supplied in the form of an ontology (specifically, using the axioms defined within); and we present results evaluated on a specific scenario founded in telepresence robotics. These results show quantitative and qualitative improvements in the generated scene graphs.
translated by 谷歌翻译