大多数知识图嵌入技术将实体和谓词视为单独的嵌入矩阵,使用聚合函数来构建输入三重的表示。但是,这些聚集是有损的,即它们没有捕获原始三元组的语义,例如谓词中包含的信息。为了消除这些缺点,当前方法从头开始学习三重嵌入,而无需利用预训练模型的实体和谓词嵌入。在本文中,我们通过从预训练的知识图嵌入中创建弱监督信号来设计一种新型的微调方法来学习三重嵌入。我们开发了一种从知识图中自动采样三联的方法,并从预训练的嵌入模型中估算了它们的成对相似性。然后将这些成对的相似性得分馈送到类似暹罗的神经结构中,以微调三重表示。我们在两个广泛研究的知识图上评估了所提出的方法,并在三重分类和三重聚类任务上显示出对其他最先进的三重嵌入方法的一致改进。
translated by 谷歌翻译
学术知识图(KGS)提供了代表科学出版物编码的知识的丰富的结构化信息来源。随着出版的科学文学的庞大,包括描述科学概念的过多的非均匀实体和关系,这些公斤本质上是不完整的。我们呈现Exbert,一种利用预先训练的变压器语言模型来执行学术知识图形完成的方法。我们将知识图形的三元组模型为文本并执行三重分类(即,属于KG或不属于KG)。评估表明,在三重分类,链路预测和关系预测的任务中,Exbert在三个学术kg完成数据集中表现出其他基线。此外,我们将两个学术数据集作为研究界的资源,从公共公共公报和在线资源中收集。
translated by 谷歌翻译
事实证明,信息提取方法可有效从结构化或非结构化数据中提取三重。以(头部实体,关系,尾部实体)形式组织这样的三元组的组织称为知识图(kgs)。当前的大多数知识图都是不完整的。为了在下游任务中使用kgs,希望预测kgs中缺少链接。最近,通过将实体和关系嵌入到低维的矢量空间中,旨在根据先前访问的三元组来预测三元组,从而对KGS表示不同的方法。根据如何独立或依赖对三元组进行处理,我们将知识图完成的任务分为传统和图形神经网络表示学习,并更详细地讨论它们。在传统的方法中,每个三重三倍将独立处理,并在基于GNN的方法中进行处理,三倍也考虑了他们的当地社区。查看全文
translated by 谷歌翻译
最近公布的知识图形嵌入模型的实施,培训和评估的异质性已经公平和彻底的比较困难。为了评估先前公布的结果的再现性,我们在Pykeen软件包中重新实施和评估了21个交互模型。在这里,我们概述了哪些结果可以通过其报告的超参数再现,这只能以备用的超参数再现,并且无法再现,并且可以提供洞察力,以及为什么会有这种情况。然后,我们在四个数据集上进行了大规模的基准测试,其中数千个实验和24,804 GPU的计算时间。我们展示了最佳实践,每个模型的最佳配置以及可以通过先前发布的最佳配置进行改进的洞察。我们的结果强调了模型架构,训练方法,丢失功能和逆关系显式建模的组合对于模型的性能来说至关重要,而不仅由模型架构决定。我们提供了证据表明,在仔细配置时,若干架构可以获得对最先进的结果。我们制定了所有代码,实验配置,结果和分析,导致我们在https://github.com/pykeen/pykeen和https://github.com/pykeen/benchmarking中获得的解释
translated by 谷歌翻译
Knowledge graph (KG) embedding is to embed components of a KG including entities and relations into continuous vector spaces, so as to simplify the manipulation while preserving the inherent structure of the KG. It can benefit a variety of downstream tasks such as KG completion and relation extraction, and hence has quickly gained massive attention. In this article, we provide a systematic review of existing techniques, including not only the state-of-the-arts but also those with latest trends. Particularly, we make the review based on the type of information used in the embedding task. Techniques that conduct embedding using only facts observed in the KG are first introduced. We describe the overall framework, specific model design, typical training procedures, as well as pros and cons of such techniques. After that, we discuss techniques that further incorporate additional information besides facts. We focus specifically on the use of entity types, relation paths, textual descriptions, and logical rules. Finally, we briefly introduce how KG embedding can be applied to and benefit a wide variety of downstream tasks such as KG completion, relation extraction, question answering, and so forth.
translated by 谷歌翻译
知识图,例如Wikidata,包括结构和文本知识,以表示知识。对于图形嵌入和语言模型的两种方式中的每种方法都可以学习预测新型结构知识的模式。很少有方法与模式结合学习和推断,而这些现有的方法只能部分利用结构和文本知识的相互作用。在我们的方法中,我们以单个方式的现有强烈表示为基础,并使用超复杂代数来表示(i),(i),单模式嵌入以及(ii),不同方式之间的相互作用及其互补的知识表示手段。更具体地说,我们建议4D超复合数的二脑和四个元素表示,以整合四个模态,即结构知识图形嵌入,单词级表示(例如\ word2vec,fastText,fastText),句子级表示(句子transformer)和文档级表示(句子级别)(句子级别)(句子级表示)(句子变压器,doc2vec)。我们的统一矢量表示通过汉密尔顿和二脑产物进行标记的边缘的合理性,从而对不同模态之间的成对相互作用进行建模。对标准基准数据集的广泛实验评估显示了我们两个新模型的优越性,除了稀疏的结构知识外,还可以提高链接预测任务中的性能。
translated by 谷歌翻译
如今,知识图(KGS)一直在AI相关的应用中发挥关键作用。尽管尺寸大,但现有的公斤远非完全和全面。为了不断丰富KG,通常使用自动知识结构和更新机制,这不可避免地带来充足的噪音。然而,大多数现有知识图形嵌入(KGE)方法假设KGS中的所有三重事实都是正确的,并且在不考虑噪声和知识冲突的情况下将实体和关系投入到低维空间。这将导致kgs的低质量和不可靠的表示。为此,本文提出了一般的多任务加固学习框架,这可以大大缓解嘈杂的数据问题。在我们的框架中,我们利用强化学习来选择高质量的知识三分石,同时过滤出嘈杂的。此外,为了充分利用语义类似的关系之间的相关性,在具有多任务学习的集体方式中训练了类似关系的三重选择过程。此外,我们扩展了流行的KGE Models Transe,Distmult,与所提出的框架耦合和旋转。最后,实验验证表明,我们的方法能够增强现有的KGE模型,可以在嘈杂的情景中提供更强大的KGS表示。
translated by 谷歌翻译
近年来,人们对少量知识图(FKGC)的兴趣日益增加,该图表旨在推断出关于该关系的一些参考三元组,从而推断出不见了的查询三倍。现有FKGC方法的主要重点在于学习关系表示,可以反映查询和参考三元组共享的共同信息。为此,这些方法从头部和尾部实体的直接邻居中学习实体对表示,然后汇总参考实体对的表示。但是,只有从直接邻居那里学到的实体对代表可能具有较低的表现力,当参与实体稀疏直接邻居或与其他实体共享一个共同的当地社区。此外,仅仅对头部和尾部实体的语义信息进行建模不足以准确推断其关系信息,尤其是当它们具有多个关系时。为了解决这些问题,我们提出了一个特定于关系的上下文学习(RSCL)框架,该框架利用了三元组的图形上下文,以学习全球和本地关系特定的表示形式,以使其几乎没有相关关系。具体而言,我们首先提取每个三倍的图形上下文,这可以提供长期实体关系依赖性。为了编码提取的图形上下文,我们提出了一个分层注意网络,以捕获三元组的上下文信息并突出显示实体的有价值的本地邻里信息。最后,我们设计了一个混合注意聚合器,以评估全球和本地级别的查询三元组的可能性。两个公共数据集的实验结果表明,RSCL的表现优于最先进的FKGC方法。
translated by 谷歌翻译
Knowledge graph embedding (KGE) is a increasingly popular technique that aims to represent entities and relations of knowledge graphs into low-dimensional semantic spaces for a wide spectrum of applications such as link prediction, knowledge reasoning and knowledge completion. In this paper, we provide a systematic review of existing KGE techniques based on representation spaces. Particularly, we build a fine-grained classification to categorise the models based on three mathematical perspectives of the representation spaces: (1) Algebraic perspective, (2) Geometric perspective, and (3) Analytical perspective. We introduce the rigorous definitions of fundamental mathematical spaces before diving into KGE models and their mathematical properties. We further discuss different KGE methods over the three categories, as well as summarise how spatial advantages work over different embedding needs. By collating the experimental results from downstream tasks, we also explore the advantages of mathematical space in different scenarios and the reasons behind them. We further state some promising research directions from a representation space perspective, with which we hope to inspire researchers to design their KGE models as well as their related applications with more consideration of their mathematical space properties.
translated by 谷歌翻译
知识图(kg)完成是一项重要任务,它极大地使许多领域的知识发现受益(例如生物医学研究)。近年来,学习kg嵌入以执行此任务的嵌入引起了很大的关注。尽管KG嵌入方法成功,但它们主要使用负抽样,从而增加了计算复杂性以及由于封闭的世界假设而引起的偏见预测。为了克服这些局限性,我们提出了\ textbf {kg-nsf},这是一个基于嵌入向量的互相关矩阵学习kg嵌入的无负抽样框架。结果表明,所提出的方法在收敛速度更快的同时,将可比较的链接预测性能与基于阴性采样的方法达到了可比性的预测性能。
translated by 谷歌翻译
Relational machine learning studies methods for the statistical analysis of relational, or graph-structured, data. In this paper, we provide a review of how such statistical models can be "trained" on large knowledge graphs, and then used to predict new facts about the world (which is equivalent to predicting new edges in the graph). In particular, we discuss two fundamentally different kinds of statistical relational models, both of which can scale to massive datasets. The first is based on latent feature models such as tensor factorization and multiway neural networks. The second is based on mining observable patterns in the graph. We also show how to combine these latent and observable models to get improved modeling power at decreased computational cost. Finally, we discuss how such statistical models of graphs can be combined with text-based information extraction methods for automatically constructing knowledge graphs from the Web. To this end, we also discuss Google's Knowledge Vault project as an example of such combination.
translated by 谷歌翻译
人蛋白质组包含一个庞大的相互作用激酶和底物网络。即使某些激酶被证明是治疗靶标的非常有用的,但大多数仍在研究中。在这项工作中,我们提出了一种新颖的知识图表示方法,以预测研究研究的新型相互作用伙伴。我们的方法使用通过整合IPTMNET,蛋白质本体论,基因本体论和BIOKG的数据构建的磷蛋白知识图。通过在三元组上进行定向的随机步行,与修改后的Skipgram或CBOW模型一起进行定向的随机步行,从而学习了该知识图中激酶和底物的表示。然后,这些表示形式被用作监督分类模型的输入,以预测研究不细的激酶的新型相互作用。我们还提供了对预测相互作用的后预测分析和对磷酸蛋白质学知识图的消融研究,以了解对研究的激酶的生物学的见解。
translated by 谷歌翻译
The development of deep neural networks has improved representation learning in various domains, including textual, graph structural, and relational triple representations. This development opened the door to new relation extraction beyond the traditional text-oriented relation extraction. However, research on the effectiveness of considering multiple heterogeneous domain information simultaneously is still under exploration, and if a model can take an advantage of integrating heterogeneous information, it is expected to exhibit a significant contribution to many problems in the world. This thesis works on Drug-Drug Interactions (DDIs) from the literature as a case study and realizes relation extraction utilizing heterogeneous domain information. First, a deep neural relation extraction model is prepared and its attention mechanism is analyzed. Next, a method to combine the drug molecular structure information and drug description information to the input sentence information is proposed, and the effectiveness of utilizing drug molecular structures and drug descriptions for the relation extraction task is shown. Then, in order to further exploit the heterogeneous information, drug-related items, such as protein entries, medical terms and pathways are collected from multiple existing databases and a new data set in the form of a knowledge graph (KG) is constructed. A link prediction task on the constructed data set is conducted to obtain embedding representations of drugs that contain the heterogeneous domain information. Finally, a method that integrates the input sentence information and the heterogeneous KG information is proposed. The proposed model is trained and evaluated on a widely used data set, and as a result, it is shown that utilizing heterogeneous domain information significantly improves the performance of relation extraction from the literature.
translated by 谷歌翻译
学习知识图的嵌入对人工智能至关重要,可以使各种下游应用受益,例如推荐和问题回答。近年来,已经提出了许多研究努力,以嵌入知识图形。然而,最先前的知识图形嵌入方法忽略不同三元组中的相关实体和实体关系耦合之间的语义相似性,因为它们与评分函数分别优化每个三倍。为了解决这个问题,我们提出了一个简单但有效的对比学习框架,用于知识图形嵌入,可以缩短不同三元组中相关实体和实体关系耦合的语义距离,从而提高知识图形嵌入的表现力。我们在三个标准知识图形基准上评估我们提出的方法。值得注意的是,我们的方法可以产生一些新的最先进的结果,在WN18RR数据集中实现51.2%的MRR,46.8%HITS @ 1,59.1%的MRR,51.8%在YAGO3-10数据集中击打@ 1 。
translated by 谷歌翻译
Knowledge graphs (KG) have served as the key component of various natural language processing applications. Commonsense knowledge graphs (CKG) are a special type of KG, where entities and relations are composed of free-form text. However, previous works in KG completion and CKG completion suffer from long-tail relations and newly-added relations which do not have many know triples for training. In light of this, few-shot KG completion (FKGC), which requires the strengths of graph representation learning and few-shot learning, has been proposed to challenge the problem of limited annotated data. In this paper, we comprehensively survey previous attempts on such tasks in the form of a series of methods and applications. Specifically, we first introduce FKGC challenges, commonly used KGs, and CKGs. Then we systematically categorize and summarize existing works in terms of the type of KGs and the methods. Finally, we present applications of FKGC models on prediction tasks in different areas and share our thoughts on future research directions of FKGC.
translated by 谷歌翻译
现实世界知识图(kg)主要是不完整的。恢复缺失关系的问题(称为KG完成)最近已成为一个活跃的研究领域。知识图(kg)嵌入是实体和关系的低维表示,是kg完成的关键技术。诸如凸,SACN,Interacte和RGCN等模型中的卷积神经网络取得了最新成功。本文采用了不同的建筑视图,并提出了使用密集的神经网络结合关系感知和共同特征的Comdense。在关系感知的特征提取中,我们尝试通过应用特定于每个关系的编码函数来创建关系归纳偏置。在公共特征提取中,我们将共同的编码函数应用于所有输入嵌入。这些编码功能是使用密集的密集层实现的。与先前的基线方法相比,Comdense在MRR方面实现了链接预测中的最新性能,在FB15K-237上达到@1,并在WN18RR上达到@1。我们进行了一项广泛的消融研究,以检查关系感知层和comdense的共同层的影响。实验结果表明,在Comdense中实现的合并密集体系结构实现了最佳性能。
translated by 谷歌翻译
Link prediction for knowledge graphs is the task of predicting missing relationships between entities. Previous work on link prediction has focused on shallow, fast models which can scale to large knowledge graphs. However, these models learn less expressive features than deep, multi-layer modelswhich potentially limits performance. In this work we introduce ConvE, a multi-layer convolutional network model for link prediction, and report state-of-the-art results for several established datasets. We also show that the model is highly parameter efficient, yielding the same performance as DistMult and R-GCN with 8x and 17x fewer parameters. Analysis of our model suggests that it is particularly effective at modelling nodes with high indegree -which are common in highlyconnected, complex knowledge graphs such as Freebase and YAGO3. In addition, it has been noted that the WN18 and FB15k datasets suffer from test set leakage, due to inverse relations from the training set being present in the test sethowever, the extent of this issue has so far not been quantified. We find this problem to be severe: a simple rule-based model can achieve state-of-the-art results on both WN18 and FB15k. To ensure that models are evaluated on datasets where simply exploiting inverse relations cannot yield competitive results, we investigate and validate several commonly used datasets -deriving robust variants where necessary. We then perform experiments on these robust datasets for our own and several previously proposed models, and find that ConvE achieves state-of-the-art Mean Reciprocal Rank across most datasets.
translated by 谷歌翻译
Knowledge graphs enable a wide variety of applications, including question answering and information retrieval. Despite the great effort invested in their creation and maintenance, even the largest (e.g., Yago, DBPedia or Wikidata) remain incomplete. We introduce Relational Graph Convolutional Networks (R-GCNs) and apply them to two standard knowledge base completion tasks: Link prediction (recovery of missing facts, i.e. subject-predicate-object triples) and entity classification (recovery of missing entity attributes). R-GCNs are related to a recent class of neural networks operating on graphs, and are developed specifically to deal with the highly multi-relational data characteristic of realistic knowledge bases. We demonstrate the effectiveness of R-GCNs as a stand-alone model for entity classification. We further show that factorization models for link prediction such as DistMult can be significantly improved by enriching them with an encoder model to accumulate evidence over multiple inference steps in the relational graph, demonstrating a large improvement of 29.8% on FB15k-237 over a decoder-only baseline. * Equal contribution.
translated by 谷歌翻译
知识图形嵌入研究主要集中在两个最小的规范部门代数,$ \ mathbb {r} $和$ \ mathbb {c} $。最近的结果表明,四元增值嵌入的三线性产品可以是解决链路预测的更有效手段。此外,基于真实嵌入的卷曲的模型通常会产生最先进的链路预测结果。在本文中,我们调查了一种卷积操作的组成,具有超量用乘法。我们提出了四个方法qmult,amult,convic和convo来解决链路预测问题。 Qmult和Omult可以被视为先前最先进方法的四元数和octonion扩展,包括Distmult和复杂。 Convic和Convo在Qmult和Omlult上建立在剩余学习框架的方式中包括卷积操作。我们在七个链路预测数据集中评估了我们的方法,包括WN18RR,FB15K-237和YAGO3-10。实验结果表明,随着知识图的规模和复杂性的增长,学习超复分价值的矢量表示的益处变得更加明显。 Convo优于MRR的FB15K-237上的最先进的方法,命中@ 1并点击@ 3,而Qmult,Omlult,Convic和Convo在所有度量标准中的Yago3-10上的最终倾斜的方式。结果还表明,通过预测平均可以进一步改善链路预测性能。为了培养可重复的研究,我们提供了开源的方法,包括培训和评估脚本以及佩戴型模型。
translated by 谷歌翻译
在本文中,我们介绍了一种新的基于GNN的知识图形嵌入模型,命名为WGE,以捕获聚焦的图形结构和关联的图形结构。特别是,鉴于知识图形,WGE构建一个无向实体的聚焦图,该图形将实体视为节点。此外,WGE还从关联的约束构造另一个无向图形,将实体和关系视为节点。然后,WGE提出了一种新的架构,即直接在这两个单个图表上使用两个vanilla GNNS,以更好地更新实体和关系的矢量表示,然后是加权得分函数来返回三重分数。实验结果表明,WGE在三个新的和具有挑战性的基准数据集Codex上获得最先进的表演,用于知识图形完成。
translated by 谷歌翻译