知识图,例如Wikidata,包括结构和文本知识,以表示知识。对于图形嵌入和语言模型的两种方式中的每种方法都可以学习预测新型结构知识的模式。很少有方法与模式结合学习和推断,而这些现有的方法只能部分利用结构和文本知识的相互作用。在我们的方法中,我们以单个方式的现有强烈表示为基础,并使用超复杂代数来表示(i),(i),单模式嵌入以及(ii),不同方式之间的相互作用及其互补的知识表示手段。更具体地说,我们建议4D超复合数的二脑和四个元素表示,以整合四个模态,即结构知识图形嵌入,单词级表示(例如\ word2vec,fastText,fastText),句子级表示(句子transformer)和文档级表示(句子级别)(句子级别)(句子级表示)(句子变压器,doc2vec)。我们的统一矢量表示通过汉密尔顿和二脑产物进行标记的边缘的合理性,从而对不同模态之间的成对相互作用进行建模。对标准基准数据集的广泛实验评估显示了我们两个新模型的优越性,除了稀疏的结构知识外,还可以提高链接预测任务中的性能。
translated by 谷歌翻译
知识嵌入(KE)通过将实体和关系嵌入连续的向量空间来表示知识图(kg)。现有方法主要基于结构或基于描述。基于结构的方法学习保留KGS固有结构的表示。它们不能很好地代表具有有限结构信息的现实世界中的丰富长尾实体。基于描述的方法利用文本信息和语言模型。朝这个方向迈出的先前方法几乎不能胜过基于结构的结构,并且遇到了昂贵的负面抽样和限制性描述需求等问题。在本文中,我们提出了LMKE,该LMKE采用语言模型来得出知识嵌入,旨在既富集了长尾实体的表示形式又旨在解决先前的基于描述的方法的问题。我们通过对比度学习框架制定基于描述的KE学习,以提高培训和评估的效率。实验结果表明,LMKE在链接预测和三重分类的KE基准上实现了最先进的性能,尤其是对于长尾实体。
translated by 谷歌翻译
学术知识图(KGS)提供了代表科学出版物编码的知识的丰富的结构化信息来源。随着出版的科学文学的庞大,包括描述科学概念的过多的非均匀实体和关系,这些公斤本质上是不完整的。我们呈现Exbert,一种利用预先训练的变压器语言模型来执行学术知识图形完成的方法。我们将知识图形的三元组模型为文本并执行三重分类(即,属于KG或不属于KG)。评估表明,在三重分类,链路预测和关系预测的任务中,Exbert在三个学术kg完成数据集中表现出其他基线。此外,我们将两个学术数据集作为研究界的资源,从公共公共公报和在线资源中收集。
translated by 谷歌翻译
事实证明,信息提取方法可有效从结构化或非结构化数据中提取三重。以(头部实体,关系,尾部实体)形式组织这样的三元组的组织称为知识图(kgs)。当前的大多数知识图都是不完整的。为了在下游任务中使用kgs,希望预测kgs中缺少链接。最近,通过将实体和关系嵌入到低维的矢量空间中,旨在根据先前访问的三元组来预测三元组,从而对KGS表示不同的方法。根据如何独立或依赖对三元组进行处理,我们将知识图完成的任务分为传统和图形神经网络表示学习,并更详细地讨论它们。在传统的方法中,每个三重三倍将独立处理,并在基于GNN的方法中进行处理,三倍也考虑了他们的当地社区。查看全文
translated by 谷歌翻译
由于知识图(kgs)的不完整,旨在预测kgs中未观察到的关系的零照片链接预测(ZSLP)引起了研究人员的最新兴趣。一个常见的解决方案是将关系的文本特征(例如表面名称或文本描述)用作辅助信息,以弥合所见关系和看不见的关系之间的差距。当前方法学习文本中每个单词令牌的嵌入。这些方法缺乏稳健性,因为它们遭受了量不足(OOV)的问题。同时,建立在字符n-grams上的模型具有为OOV单词生成表达式表示的能力。因此,在本文中,我们提出了一个为零链接预测(HNZSLP)的层次N-gram框架,该框架考虑了ZSLP的关系n-gram之间的依赖项。我们的方法通过首先在表面名称上构造层次n-gram图来进行起作用,以模拟导致表面名称的N-gram的组织结构。然后,将基于变压器的革兰amtransformer呈现,以建模层次n-gram图,以构建ZSLP的关系嵌入。实验结果表明,提出的HNZSLP在两个ZSLP数据集上实现了最先进的性能。
translated by 谷歌翻译
Knowledge graphs (KG) have served as the key component of various natural language processing applications. Commonsense knowledge graphs (CKG) are a special type of KG, where entities and relations are composed of free-form text. However, previous works in KG completion and CKG completion suffer from long-tail relations and newly-added relations which do not have many know triples for training. In light of this, few-shot KG completion (FKGC), which requires the strengths of graph representation learning and few-shot learning, has been proposed to challenge the problem of limited annotated data. In this paper, we comprehensively survey previous attempts on such tasks in the form of a series of methods and applications. Specifically, we first introduce FKGC challenges, commonly used KGs, and CKGs. Then we systematically categorize and summarize existing works in terms of the type of KGs and the methods. Finally, we present applications of FKGC models on prediction tasks in different areas and share our thoughts on future research directions of FKGC.
translated by 谷歌翻译
Knowledge graph embedding (KGE) is a increasingly popular technique that aims to represent entities and relations of knowledge graphs into low-dimensional semantic spaces for a wide spectrum of applications such as link prediction, knowledge reasoning and knowledge completion. In this paper, we provide a systematic review of existing KGE techniques based on representation spaces. Particularly, we build a fine-grained classification to categorise the models based on three mathematical perspectives of the representation spaces: (1) Algebraic perspective, (2) Geometric perspective, and (3) Analytical perspective. We introduce the rigorous definitions of fundamental mathematical spaces before diving into KGE models and their mathematical properties. We further discuss different KGE methods over the three categories, as well as summarise how spatial advantages work over different embedding needs. By collating the experimental results from downstream tasks, we also explore the advantages of mathematical space in different scenarios and the reasons behind them. We further state some promising research directions from a representation space perspective, with which we hope to inspire researchers to design their KGE models as well as their related applications with more consideration of their mathematical space properties.
translated by 谷歌翻译
Knowledge graph (KG) embedding is to embed components of a KG including entities and relations into continuous vector spaces, so as to simplify the manipulation while preserving the inherent structure of the KG. It can benefit a variety of downstream tasks such as KG completion and relation extraction, and hence has quickly gained massive attention. In this article, we provide a systematic review of existing techniques, including not only the state-of-the-arts but also those with latest trends. Particularly, we make the review based on the type of information used in the embedding task. Techniques that conduct embedding using only facts observed in the KG are first introduced. We describe the overall framework, specific model design, typical training procedures, as well as pros and cons of such techniques. After that, we discuss techniques that further incorporate additional information besides facts. We focus specifically on the use of entity types, relation paths, textual descriptions, and logical rules. Finally, we briefly introduce how KG embedding can be applied to and benefit a wide variety of downstream tasks such as KG completion, relation extraction, question answering, and so forth.
translated by 谷歌翻译
大多数知识图嵌入技术将实体和谓词视为单独的嵌入矩阵,使用聚合函数来构建输入三重的表示。但是,这些聚集是有损的,即它们没有捕获原始三元组的语义,例如谓词中包含的信息。为了消除这些缺点,当前方法从头开始学习三重嵌入,而无需利用预训练模型的实体和谓词嵌入。在本文中,我们通过从预训练的知识图嵌入中创建弱监督信号来设计一种新型的微调方法来学习三重嵌入。我们开发了一种从知识图中自动采样三联的方法,并从预训练的嵌入模型中估算了它们的成对相似性。然后将这些成对的相似性得分馈送到类似暹罗的神经结构中,以微调三重表示。我们在两个广泛研究的知识图上评估了所提出的方法,并在三重分类和三重聚类任务上显示出对其他最先进的三重嵌入方法的一致改进。
translated by 谷歌翻译
近年来,人们对少量知识图(FKGC)的兴趣日益增加,该图表旨在推断出关于该关系的一些参考三元组,从而推断出不见了的查询三倍。现有FKGC方法的主要重点在于学习关系表示,可以反映查询和参考三元组共享的共同信息。为此,这些方法从头部和尾部实体的直接邻居中学习实体对表示,然后汇总参考实体对的表示。但是,只有从直接邻居那里学到的实体对代表可能具有较低的表现力,当参与实体稀疏直接邻居或与其他实体共享一个共同的当地社区。此外,仅仅对头部和尾部实体的语义信息进行建模不足以准确推断其关系信息,尤其是当它们具有多个关系时。为了解决这些问题,我们提出了一个特定于关系的上下文学习(RSCL)框架,该框架利用了三元组的图形上下文,以学习全球和本地关系特定的表示形式,以使其几乎没有相关关系。具体而言,我们首先提取每个三倍的图形上下文,这可以提供长期实体关系依赖性。为了编码提取的图形上下文,我们提出了一个分层注意网络,以捕获三元组的上下文信息并突出显示实体的有价值的本地邻里信息。最后,我们设计了一个混合注意聚合器,以评估全球和本地级别的查询三元组的可能性。两个公共数据集的实验结果表明,RSCL的表现优于最先进的FKGC方法。
translated by 谷歌翻译
在知识图上回答自然语言问题(KGQA)仍然是通过多跳推理理解复杂问题的巨大挑战。以前的努力通常利用与实体相关的文本语料库或知识图(kg)嵌入作为辅助信息来促进答案选择。但是,实体之间隐含的富裕语义远未得到很好的探索。本文提议通过利用关系路径的混合语义来改善多跳kgqa。具体而言,我们基于新颖的旋转和规模的实体链接链接预测框架,集成了关系路径的明确文本信息和隐式kg结构特征。在三个KGQA数据集上进行的广泛实验证明了我们方法的优势,尤其是在多跳场景中。进一步的调查证实了我们方法在问题和关系路径之间的系统协调,以识别答案实体。
translated by 谷歌翻译
We address the challenge of building domain-specific knowledge models for industrial use cases, where labelled data and taxonomic information is initially scarce. Our focus is on inductive link prediction models as a basis for practical tools that support knowledge engineers with exploring text collections and discovering and linking new (so-called open-world) entities to the knowledge graph. We argue that - though neural approaches to text mining have yielded impressive results in the past years - current benchmarks do not reflect the typical challenges encountered in the industrial wild properly. Therefore, our first contribution is an open benchmark coined IRT2 (inductive reasoning with text) that (1) covers knowledge graphs of varying sizes (including very small ones), (2) comes with incidental, low-quality text mentions, and (3) includes not only triple completion but also ranking, which is relevant for supporting experts with discovery tasks. We investigate two neural models for inductive link prediction, one based on end-to-end learning and one that learns from the knowledge graph and text data in separate steps. These models compete with a strong bag-of-words baseline. The results show a significant advance in performance for the neural approaches as soon as the available graph data decreases for linking. For ranking, the results are promising, and the neural approaches outperform the sparse retriever by a wide margin.
translated by 谷歌翻译
知识图(kgs)将世界知识建模为结构三元组是不可避免的。多模式知识图(MMKGS)仍然存在此类问题。因此,知识图完成(KGC)对于预测现有KG中缺失的三元组至关重要。至于现有的KGC方法,基于嵌入的方法依靠手动设计来利用多模式信息,而基于芬太尼的方法在链接预​​测中并不优于基于嵌入的方法。为了解决这些问题,我们提出了一个Visualbert增强知识图完成模型(简称VBKGC)。 VBKGC可以为实体捕获深层融合的多模式信息,并将其集成到KGC模型中。此外,我们通过设计一种称为Twins Twins负抽样的新的负抽样策略来实现KGC模型的共同设计和负抽样。双胞胎阴性采样适用于多模式场景,可以对齐实体的不同嵌入。我们进行了广泛的实验,以显示VBKGC在链接预测任务上的出色表现,并进一步探索VBKGC。
translated by 谷歌翻译
The development of deep neural networks has improved representation learning in various domains, including textual, graph structural, and relational triple representations. This development opened the door to new relation extraction beyond the traditional text-oriented relation extraction. However, research on the effectiveness of considering multiple heterogeneous domain information simultaneously is still under exploration, and if a model can take an advantage of integrating heterogeneous information, it is expected to exhibit a significant contribution to many problems in the world. This thesis works on Drug-Drug Interactions (DDIs) from the literature as a case study and realizes relation extraction utilizing heterogeneous domain information. First, a deep neural relation extraction model is prepared and its attention mechanism is analyzed. Next, a method to combine the drug molecular structure information and drug description information to the input sentence information is proposed, and the effectiveness of utilizing drug molecular structures and drug descriptions for the relation extraction task is shown. Then, in order to further exploit the heterogeneous information, drug-related items, such as protein entries, medical terms and pathways are collected from multiple existing databases and a new data set in the form of a knowledge graph (KG) is constructed. A link prediction task on the constructed data set is conducted to obtain embedding representations of drugs that contain the heterogeneous domain information. Finally, a method that integrates the input sentence information and the heterogeneous KG information is proposed. The proposed model is trained and evaluated on a widely used data set, and as a result, it is shown that utilizing heterogeneous domain information significantly improves the performance of relation extraction from the literature.
translated by 谷歌翻译
最近公布的知识图形嵌入模型的实施,培训和评估的异质性已经公平和彻底的比较困难。为了评估先前公布的结果的再现性,我们在Pykeen软件包中重新实施和评估了21个交互模型。在这里,我们概述了哪些结果可以通过其报告的超参数再现,这只能以备用的超参数再现,并且无法再现,并且可以提供洞察力,以及为什么会有这种情况。然后,我们在四个数据集上进行了大规模的基准测试,其中数千个实验和24,804 GPU的计算时间。我们展示了最佳实践,每个模型的最佳配置以及可以通过先前发布的最佳配置进行改进的洞察。我们的结果强调了模型架构,训练方法,丢失功能和逆关系显式建模的组合对于模型的性能来说至关重要,而不仅由模型架构决定。我们提供了证据表明,在仔细配置时,若干架构可以获得对最先进的结果。我们制定了所有代码,实验配置,结果和分析,导致我们在https://github.com/pykeen/pykeen和https://github.com/pykeen/benchmarking中获得的解释
translated by 谷歌翻译
为了减轻从头开始构建知识图(kg)的挑战,更一般的任务是使用开放式语料库中的三元组丰富一个kg,那里获得的三元组包含嘈杂的实体和关系。在保持知识代表的质量的同时,以新收获的三元组丰富一个公园,这是一项挑战。本文建议使用从附加语料库中收集的信息来完善kg的系统。为此,我们将任务制定为两个耦合子任务,即加入事件提取(JEE)和知识图融合(KGF)。然后,我们提出了一个协作知识图融合框架,以允许我们的子任务以交替的方式相互协助。更具体地说,探险家执行了由地面注释和主管提供的现有KG监督的JEE。然后,主管评估了探险家提取的三元组,并用高度排名的人来丰富KG。为了实施此评估,我们进一步提出了一种翻译的关系一致性评分机制,以对齐并将提取的三元组对齐为先前的kg。实验验证了这种合作既可以提高JEE和KGF的表现。
translated by 谷歌翻译
知识图(kg)嵌入在实体的学习表示和链接预测任务的关系方面表现出很大的力量。以前的工作通常将KG嵌入到单个几何空间中,例如欧几里得空间(零弯曲),双曲空间(负弯曲)或超透明空间(积极弯曲),以维持其特定的几何结构(例如,链,层次结构和环形结构)。但是,KGS的拓扑结构似乎很复杂,因为它可能同时包含多种类型的几何结构。因此,将kg嵌入单个空间中,无论欧几里得空间,双曲线空间或透明空间,都无法准确捕获KGS的复杂结构。为了克服这一挑战,我们提出了几何相互作用知识图嵌入(GIE),该图形嵌入了,该图形在欧几里得,双曲线和超级空间之间进行了交互学习的空间结构。从理论上讲,我们提出的GIE可以捕获一组更丰富的关系信息,模型键推理模式,并启用跨实体的表达语义匹配。三个完善的知识图完成基准的实验结果表明,我们的GIE以更少的参数实现了最先进的性能。
translated by 谷歌翻译
链路预测在知识图中起着重要作用,这是许多人工智能任务的重要资源,但它通常受不完整的限制。在本文中,我们提出了知识图表BERT for Link预测,名为LP-BERT,其中包含两个培训阶段:多任务预训练和知识图微调。预训练策略不仅使用掩码语言模型(MLM)来学习上下文语料库的知识,还引入掩模实体模型(MEM)和掩模关系模型(MRM),其可以通过预测语义来学习三元组的关系信息基于实体和关系元素。结构化三维关系信息可以转换为非结构化语义信息,可以将其与上下文语料库信息一起集成到培训模型中。在微调阶段,灵感来自对比学习,我们在样本批量中进行三样式的负面取样,这大大增加了负采样的比例,同时保持训练时间几乎不变。此外,我们提出了一种基于Triples的逆关系的数据增强方法,以进一步增加样本分集。我们在WN18RR和UMLS数据集上实现最先进的结果,特别是HITS @ 10指示器从WN18RR数据集上的先前最先进的结果提高了5 \%。
translated by 谷歌翻译
知识图(KG)嵌入寻求学习实体和关系的向量表示。传统的模型理由是图形结构,但它们遭受了图形不完整和长尾实体的问题。最近的研究使用了预训练的语言模型根据实体和关系的文本信息来学习嵌入,但它们无法利用图形结构。在论文中,我们从经验上表明,这两种特征是KG嵌入的互补性。为此,我们提出了Cole,Cole是一种用于嵌入KG的共同介绍方法,可利用图形结构和文本信息的互补性。其图形嵌入模型使用变压器从其邻域子图中重建实体的表示。其文本嵌入模型使用预训练的语言模型来从其名称,描述和关系邻居的软提示中生成实体表示。为了让两个模型相互推广,我们提出了共同依据学习,使他们可以从彼此的预测逻辑中提取选择性知识。在我们的共同阶段学习中,每个模型既是老师又是学生。基准数据集上的实验表明,这两个模型的表现优于其相关基线,而与共同介绍学习的集合方法Cole可以推进KG嵌入的最先进。
translated by 谷歌翻译
Sparsity of formal knowledge and roughness of non-ontological construction make sparsity problem particularly prominent in Open Knowledge Graphs (OpenKGs). Due to sparse links, learning effective representation for few-shot entities becomes difficult. We hypothesize that by introducing negative samples, a contrastive learning (CL) formulation could be beneficial in such scenarios. However, existing CL methods model KG triplets as binary objects of entities ignoring the relation-guided ternary propagation patterns and they are too generic, i.e., they ignore zero-shot, few-shot and synonymity problems that appear in OpenKGs. To address this, we propose TernaryCL, a CL framework based on ternary propagation patterns among head, relation and tail. TernaryCL designs Contrastive Entity and Contrastive Relation to mine ternary discriminative features with both negative entities and relations, introduces Contrastive Self to help zero- and few-shot entities learn discriminative features, Contrastive Synonym to model synonymous entities, and Contrastive Fusion to aggregate graph features from multiple paths. Extensive experiments on benchmarks demonstrate the superiority of TernaryCL over state-of-the-art models.
translated by 谷歌翻译