知识表示学习(KRL)旨在表示低维语义空间中知识图中的实体和关系,这些知识图已广泛用于大规模知识驱动的任务中。在本文中,我们向读者介绍了KRL的动机,并概述了现有的KRL方法。然后,我们对知识获取的三个评价任务,包括知识图完成,三重分类和相关提取,对几种典型的KRL方法进行了广泛的定量比较和分析。我们还回顾了KRL的实际应用,例如语言建模,问答,信息检索和推荐系统。最后,我们讨论了剩余的挑战并展望了KRL的未来发展方向。实验中使用的代码和数据集可以在https://github.com/thunlp/OpenKE中找到。
translated by 谷歌翻译
Real-world factoid or list questions often have a simple structure , yet are hard to match to facts in a given knowledge base due to high representational and linguistic variability. For example, to answer "who is the ceo of apple" on Freebase requires a match to an abstract "leadership" entity with three relations "role", "organization" and "person", and two other entities "apple inc" and "managing director". Recent years have seen a surge of research activity on learning-based solutions for this method. We further advance the state of the art by adopting learning-to-rank methodology and by fully addressing the inherent entity recognition problem, which was neglected in recent works. We evaluate our system, called Aqqu, on two standard benchmarks, Free917 and WebQuestions, improving the previous best result for each benchmark considerably. These two benchmarks exhibit quite different challenges, and many of the existing approaches were evaluated (and work well) only for one of them. We also consider efficiency aspects and take care that all questions can be answered interactively (that is, within a second). Materials for full reproducibil-ity are available on our website: http://ad.informatik. uni-freiburg.de/publications .
translated by 谷歌翻译
Preface The performance of an Artificial Intelligence system often depends on the amount of world knowledge available to it. During the last decade, the AI community has witnessed the emergence of a number of highly structured knowledge repositories whose collaborative nature has led to a dramatic increase in the amount of world knowledge that can now be exploited in AI applications. Arguably, the best-known repository of user-contributed knowledge is Wikipedia. Since its inception less than eight years ago, it has become one of the largest and fastest growing on-line sources of encyclopedic knowledge. One of the reasons why Wikipedia is appealing to contributors and users alike is the richness of its embedded structural information: articles are hyperlinked to each other and connected to categories from an ever expanding taxonomy; pervasive language phenomena such as synonymy and polysemy are addressed through redirection and disambiguation pages; entities of the same type are described in a consistent format using infoboxes; related articles are grouped together in series templates. Many more repositories of user-contributed knowledge exist besides Wikipedia. Collaborative tagging in Delicious and community-driven question answering in Yahoo! Answers and Wiki Answers are only a few examples of knowledge sources that, like Wikipedia, can become a valuable asset for AI researchers. Furthermore, AI methods have the potential to improve these resources, as demonstrated recently by research on personalized tag recommendations, or on matching user questions with previously answered questions. The goal of this workshop was to foster the research and dissemination of ideas on the mutually beneficial interaction between AI and repositories of user-contributed knowledge. This volume contains papers accepted for presentation at the workshop. We issued calls for regular papers, short late-breaking papers, and demos. After careful review by the program committee of the 20 submissions received-13 regular papers, 6 short papers and 1 demo-5 regular papers and 3 short papers were accepted for presentation. Consistent with the original aim of the workshop, the accepted papers address a diverse set of problems and resources, although Wikipedia-based systems are still dominant. The accepted papers explore leveraging knowledge induced and patterns learned from Wikipedia and apply them to the web or untagged text collections, using such knowledge for tasks such as information extraction, entity disambiguation, terminology extraction and analysing the structure of social networks. We also learn of useful methods that integrate Wikipedia with structured resources, in particular relational databases. The members of the program committee provided high quality reviews in a timely fashion, and all submissions have benefited from this expert feedback. For a successful event, having high quality invited speakers is crucial. We were lucky to have two excellent speakers for
translated by 谷歌翻译
知识图(KG)是各种自然语言处理应用程序的关键组件。为了进一步扩大幼稚园的覆盖范围,先前对知识图完成的研究通常需要为每个关系提供大量训练实例。然而,我们观察到长尾关系在幼儿园实际上更常见,而那些新增的关系往往没有很多已知的训练用途。在这项工作中,我们的目标是在具有挑战性的环境中预测新事实,其中只有一个培训实例可用。我们提出了一种一次性关系学习框架,它利用嵌入模型提取的知识,并通过考虑学习嵌入和单跳图结构来学习匹配度量。根据经验,我们的模型比现有的嵌入模型产生了令人瞩目的性能改进,并且还消除了在处理新增关系时重新训练嵌入模型的需要。
translated by 谷歌翻译
我们调查了解由图像及其字幕传达的信息(要点)的问题,例如,在网站或新闻文章中找到的。为此,我们提出了一种方法,以大量的方式捕捉图像 - 字幕对的含义。机器可读知识,以前被证明对文本理解非常有效。我们的方法识别超出其表示的对象的内涵:对图像理解的重要性集中于对象的表示,即它们的字面意义,我们的工作解决了内涵的识别,即对象的标志性意义,以理解图像的信息。我们将视图理解视为在广泛覆盖的概念词汇表中表示图像标题对的任务,例如由维基百科提供的概念,并且将要点检测作为概念排序问题,将图像标题对作为查询。为了彻底调查对要点理解的问题,我们制作了一个超过300个图像标题对的金标准和超过8,000个gist注释,涵盖了不同抽象层次的各种主题。我们使用该数据集来实验地对来自异构资源(即图像和文本)的信号的贡献进行基准测试。平均精度(MAP)为0.69的最佳结果表明,通过组合这两个维度,我们能够更好地理解图像标题对的含义,而不仅仅是使用语言或视觉信息。我们在接收自动生成的输入时测试我们的要点检测方法的稳健性,即使用自动生成的图像标签或生成的标题,并证明了端到端自动化过程的可行性。
translated by 谷歌翻译
以语言(如SQL,SPARQL或XQuery)表示的结构化查询为用户提供了一种方便,明确的方式来表达他们对许多任务的信息需求。在这项工作中,我们提出了一种直接对文本数据进行回答的方法,而不将结果存储在数据库中。特别关注知识库的情况,其中查询是过度的以及它们之间的关系。我们的方法将分布式查询应答(例如,三重模式片段)与建立了前提问题回答的模型相结合。重要的是,通过应用分布式查询,我们能够简化模型学习问题。我们为维基数据中的大部分(572)关系训练模型,并在所有模型中实现平均0.70 F1测量。我们还提出了一种系统的方法,从知识图中为此任务构建必要的训练数据,并描述原型实现。
translated by 谷歌翻译
Training large-scale question answering systems is complicated because training sources usually cover a small portion of the range of possible questions. This paper studies the impact of multitask and transfer learning for simple question answering ; a setting for which the reasoning required to answer is quite easy, as long as one can retrieve the correct evidence given a question, which can be difficult in large-scale conditions. To this end, we introduce a new dataset of 100k questions that we use in conjunction with existing benchmarks. We conduct our study within the framework of Memory Networks (Weston et al., 2015) because this perspective allows us to eventually scale up to more complex reasoning, and show that Memory Networks can be successfully trained to achieve excellent performance .
translated by 谷歌翻译
A visual-relational knowledge graph (KG) is a multi-relational graph whose entities are associated with images. We explore novel machine learning approaches for answering visual-relational queries in web-extracted knowledge graphs. To this end, we have created ImageGraph, a KG with 1,330 relation types, 14,870 entities, and 829,931 images crawled from the web. With visual-relational KGs such as ImageGraph one can introduce novel probabilistic query types in which images are treated as first-class citizens. Both the prediction of relations between unseen images as well as multi-relational image retrieval can be expressed with specific families of visual-relational queries. We introduce novel combinations of convolutional networks and knowledge graph embedding methods to answer such queries. We also explore a zero-shot learning scenario where an image of an entirely new entity is linked with multiple relations to entities of an existing KG. The resulting multi-relational grounding of unseen entity images into a knowledge graph serves as a semantic entity representation. We conduct experiments to demonstrate that the proposed methods can answer these visual-relational queries efficiently and accurately.
translated by 谷歌翻译
Current systems of fine-grained entity typing use distant supervision in conjunction with existing knowledge bases to assign categories (type labels) to entity mentions. However, the type labels so obtained from knowledge bases are often noisy (i.e., incorrect for the entity mention's local context). We define a new task, Label Noise Reduction in Entity Typing (LNR), to be the automatic identification of correct type labels (type-paths) for training examples, given the set of candidate type labels obtained by distant supervision with a given type hierarchy. The unknown type labels for individual entity mentions and the semantic similarity between entity types pose unique challenges for solving the LNR task. We propose a general framework, called PLE, to jointly embed entity mentions, text features and entity types into the same low-dimensional space where, in that space, objects whose types are semantically close have similar representations. Then we estimate the type-path for each training example in a top-down manner using the learned embeddings. We formulate a global objective for learning the embeddings from text corpora and knowledge bases, which adopts a novel margin-based loss that is robust to noisy labels and faithfully models type correlation derived from knowledge bases. Our experiments on three public typing datasets demonstrate the effectiveness and robustness of PLE, with an average of 25% improvement in accuracy compared to next best method.
translated by 谷歌翻译
大多数阅读理解方法仅限于使用单个句子,段落或文档进行追问的查询。启用模型tocombine不相交的文本证据将扩展机器理解方法的范围,但目前没有资源来训练和测试这种能力。我们提出了一项新的任务,鼓励开发跨多个文档的文本理解模型,并研究现有方法的局限性。在我们的任务中,模型学习寻找和组合证据 - 有效地执行多跳(别名多步骤)推理。我们设计了一个方法来为这个任务生成数据集,给出了一组查询 - 答案对和主题链接文档。引入了来自不同领域的两个数据集,我们确定了潜在的陷阱和设计策略。我们评估两个先前提出的竞争模型,并发现可以跨文档整合信息。但是,两种模式都难以选择相关信息,因为提供相关信息可以大大提高其绩效。虽然模型表现出几个强大的基线,但它们的最佳准确度达到42.9%,而人类表现达到74.0% - 留下了充足的改进空间。
translated by 谷歌翻译
Models that learn to represent textual and knowledge base relations in the same continuous latent space are able to perform joint inferences among the two kinds of relations and obtain high accuracy on knowledge base completion (Riedel et al., 2013). In this paper we propose a model that captures the compositional structure of tex-tual relations, and jointly optimizes entity, knowledge base, and textual relation representations. The proposed model significantly improves performance over a model that does not share parameters among tex-tual relations with common sub-structure.
translated by 谷歌翻译
在本文中,我们对神经查询图形化方法进行了实证研究,以解决知识图上复杂问题的问题。我们尝试了六种不同的排名模型,并提出了一种新颖的基于自我关注的时隙匹配模型,该模型利用了查询图的固有结构,这是我们选择的逻辑形式。我们提出的模型通常优于在DBpedia知识图上的两个QA数据集上的其他模型,在不同的设置中进行评估。此外,我们还表明,从较大的QA数据集到较小的数据集的转移学习产生了实质性的改进,有效地抵消了训练数据的普遍缺乏。
translated by 谷歌翻译
In the task of question answering, Memory Networks have recently shown to be quite effective towards complex reasoning as well as scalability, in spite of limited range of topics covered in training data. In this paper, we introduce Factual Memory Network, which learns to answer questions by extracting and reasoning over relevant facts from a Knowledge Base. Our system generate distributed representation of questions and KB in same word vector space, extract a subset of initial candidate facts, then try to find a path to answer entity using multi-hop reasoning and refinement. Additionally, we also improve the run-time efficiency of our model using various computational heuristics.
translated by 谷歌翻译
诸如Freebase之类的大规模知识图(KG)通常是不完整的。因此,对于多跳(mh)KG路径的推理是问答或其他需要了解世界的NLP任务所需的重要能力。 mh-KG推理包括各种场景,例如,给定一个headentity和一个关系路径,预测尾部实体;或者给定两个由某些关系路径连接的实体,预测它们之间的未知关系。我们通过使用递归神经网络和实体和关系的向量表示来预测mh-KB路径的每个步骤中的实体的ROP,经常的单跳预测器,具有两个好处:(i)在更新实体和关系的同时建模任意长度的mh路径每个步骤的训练信号的陈述; (ii)在统一框架中处理不同类型的mh-KG推理。我们的模型展示了两个重要的多跳KG推理任务的最新技术:知识库完成和路径查询应答。
translated by 谷歌翻译
A recent ''third wave'' of neural network (NN) approaches now delivers state-of-the-art performance in many machine learning tasks, spanning speech recognition, computer vision, and natural language processing. Because these modern NNs often comprise multiple interconnected layers, work in this area is often referred to as deep learning. Recent years have witnessed an explosive growth of research into NN-based approaches to information retrieval (IR). A significant body of work has now been created. In this paper, Kezban Dilek Onal and Ye Zhang contributed equally. Maarten de Rijke and Matthew Lease contributed equally. we survey the current landscape of Neural IR research, paying special attention to the use of learned distributed representations of textual units. We highlight the successes of neural IR thus far, catalog obstacles to its wider adoption, and suggest potentially promising directions for future research.
translated by 谷歌翻译
This article introduces TableMiner + , a Semantic Table Interpretation method that annotates Web tables in a both effective and efficient way. Built on our previous work TableMiner, the extended version advances state-of-the-art in several ways. First, it improves annotation accuracy by making innovative use of various types of contextual information both inside and outside tables as features for inference. Second, it reduces computational overheads by adopting an incremental, bootstrapping approach that starts by creating preliminary and partial annotations of a table using 'sample' data in the table, then using the outcome as 'seed' to guide interpretation of remaining contents. This is then followed by a message passing process that iteratively refines results on the entire table to create the final optimal annotations. Third, it is able to handle all annotation tasks of Semantic Table Interpretation (e.g., annotating a column, or entity cells) while state-of-the-art methods are limited in different ways. We also compile the largest dataset known to date and extensively evaluate TableMiner + against four baselines and two re-implemented (near-identical, as adaptations are needed due to the use of different knowledge bases) state-of-the-art methods. TableMiner + consistently outperforms all models under all experimental settings. On the two most diverse datasets covering multiple domains and various table schemata, it achieves improvement in F1 by between 1 and 42 percentage points depending on specific annotation tasks. It also significantly reduces computational overheads in terms of wall-clock time when compared against classic methods that 'exhaustively' process the entire table content to build features for inference. As a concrete example, compared against a method based on joint inference implemented with parallel computation, the non-parallel implementation of TableMiner + achieves significant improvement in learning accuracy and almost orders of magnitude of savings in wall-clock time.
translated by 谷歌翻译
在这项工作中,我们介绍了开放式关系参数提取(ORAE)的任务:给定语料库,查询实体Q和知识库关系(例如,“Qauthored notable work with title X”),模型必须提取来自语料库的非标准实体类型的参数(不能由标准名称标记符提取的实体,例如X:书籍或艺术作品的标题)。获得并发布基于WikiData关系的受监督数据集以解决该任务。我们为这项任务开发和比较了广泛的神经模型,在通过神经问答系统获得的强基线上进行了大幅改进。系统地比较了不同句子编码体系结构和提取方法的影响。基于gatedrecurrent单元的编码器与条件随机字段标记器相结合,可以得到最好的结果。
translated by 谷歌翻译
随着知识库(KB)的快速发展,基于KB的问答(QA)已经成为一个热门的研究课题。在本文中,我们提出了两个框架(即管道框架,一个端到端框架)来集中关注单关系仿真问题。在两个框架中,我们研究了上下文信息对QA质量的影响,例如实体的注意类型,外部程度。在端到端框架中,我们结合了char-levelencoding和自我关注机制,使用权重共享和多任务策略来提高QA的准确性。实验结果表明,无论是流水线框架还是端到端框架,上下文信息都可以获得更好的简单QA结果。此外,我们发现端到端框架在准确性方面达到了与最先进方法竞争的结果,并且比它们花费的时间短得多。
translated by 谷歌翻译
Representation learning of knowledge bases aims to embed both entities and relations into a low-dimensional space. Most existing methods only consider direct relations in representation learning. We argue that multiple-step relation paths also contain rich inference patterns between entities, and propose a path-based representation learning model. This model considers relation paths as translations between entities for representation learning , and addresses two key challenges: (1) Since not all relation paths are reliable, we design a path-constraint resource allocation algorithm to measure the reliability of relation paths. (2) We represent relation paths via semantic composition of relation embeddings. Experimental results on real-world datasets show that, as compared with baselines, our model achieves significant and consistent improvements on knowledge base completion and relation extraction from text. The source code of this paper can be obtained from https://github.com/mrlyk423/ relation_extraction.
translated by 谷歌翻译
Recent years have witnessed a proliferation of large-scale knowledge bases, including Wikipedia, Freebase, YAGO, Mi-crosoft's Satori, and Google's Knowledge Graph. To increase the scale even further, we need to explore automatic methods for constructing knowledge bases. Previous approaches have primarily focused on text-based extraction, which can be very noisy. Here we introduce Knowledge Vault, a Web-scale probabilistic knowledge base that combines extractions from Web content (obtained via analysis of text, tabular data, page structure, and human annotations) with prior knowledge derived from existing knowledge repositories. We employ supervised machine learning methods for fusing these distinct information sources. The Knowledge Vault is substantially bigger than any previously published structured knowledge repository, and features a probabilis-tic inference system that computes calibrated probabilities of fact correctness. We report the results of multiple studies that explore the relative utility of the different information sources and extraction methods.
translated by 谷歌翻译