Graph is an important data representation which appears in a wide diversity of real-world scenarios. Effective graph analytics provides users a deeper understanding of what is behind the data, and thus can benefit a lot of useful applications such as node classification, node recommendation, link prediction, etc. However, most graph analytics methods suffer the high computation and space cost. Graph embedding is an effective yet efficient way to solve the graph analytics problem. It converts the graph data into a low dimensional space in which the graph structural information and graph properties are maximumly preserved. In this survey, we conduct a comprehensive review of the literature in graph embedding. We first introduce the formal definition of graph embedding as well as the related concepts. After that, we propose two taxonomies of graph embedding which correspond to what challenges exist in different graph embedding problem settings and how the existing work address these challenges in their solutions. Finally, we summarize the applications that graph embedding enables and suggest four promising future research directions in terms of computation efficiency, problem settings, techniques and application scenarios.
translated by 谷歌翻译
图表是一个宇宙数据结构,广泛用于组织现实世界中的数据。像交通网络,社交和学术网络这样的各种实际网络网络可以由图表代表。近年来,目睹了在网络中代表顶点的快速发展,进入低维矢量空间,称为网络表示学习。表示学习可以促进图形数据上的新算法的设计。在本调查中,我们对网络代表学习的当前文献进行了全面审查。现有算法可以分为三组:浅埋模型,异构网络嵌入模型,图形神经网络的模型。我们为每个类别审查最先进的算法,并讨论这些算法之间的基本差异。调查的一个优点是,我们系统地研究了不同类别的算法底层的理论基础,这提供了深入的见解,以更好地了解网络表示学习领域的发展。
translated by 谷歌翻译
Deep learning has revolutionized many machine learning tasks in recent years, ranging from image classification and video processing to speech recognition and natural language understanding. The data in these tasks are typically represented in the Euclidean space. However, there is an increasing number of applications where data are generated from non-Euclidean domains and are represented as graphs with complex relationships and interdependency between objects. The complexity of graph data has imposed significant challenges on existing machine learning algorithms. Recently, many studies on extending deep learning approaches for graph data have emerged. In this survey, we provide a comprehensive overview of graph neural networks (GNNs) in data mining and machine learning fields. We propose a new taxonomy to divide the state-of-the-art graph neural networks into four categories, namely recurrent graph neural networks, convolutional graph neural networks, graph autoencoders, and spatial-temporal graph neural networks. We further discuss the applications of graph neural networks across various domains and summarize the open source codes, benchmark data sets, and model evaluation of graph neural networks. Finally, we propose potential research directions in this rapidly growing field.
translated by 谷歌翻译
Machine learning on graphs is an important and ubiquitous task with applications ranging from drug design to friendship recommendation in social networks. The primary challenge in this domain is finding a way to represent, or encode, graph structure so that it can be easily exploited by machine learning models. Traditionally, machine learning approaches relied on user-defined heuristics to extract features encoding structural information about a graph (e.g., degree statistics or kernel functions). However, recent years have seen a surge in approaches that automatically learn to encode graph structure into low-dimensional embeddings, using techniques based on deep learning and nonlinear dimensionality reduction. Here we provide a conceptual review of key advancements in this area of representation learning on graphs, including matrix factorization-based methods, random-walk based algorithms, and graph neural networks. We review methods to embed individual nodes as well as approaches to embed entire (sub)graphs. In doing so, we develop a unified framework to describe these recent approaches, and we highlight a number of important applications and directions for future work.
translated by 谷歌翻译
Pre-publication draft of a book to be published byMorgan & Claypool publishers. Unedited version released with permission. All relevant copyrights held by the author and publisher extend to this pre-publication draft.
translated by 谷歌翻译
Deep learning has been shown to be successful in a number of domains, ranging from acoustics, images, to natural language processing. However, applying deep learning to the ubiquitous graph data is non-trivial because of the unique characteristics of graphs. Recently, substantial research efforts have been devoted to applying deep learning methods to graphs, resulting in beneficial advances in graph analysis techniques. In this survey, we comprehensively review the different types of deep learning methods on graphs. We divide the existing methods into five categories based on their model architectures and training strategies: graph recurrent neural networks, graph convolutional networks, graph autoencoders, graph reinforcement learning, and graph adversarial methods. We then provide a comprehensive overview of these methods in a systematic manner mainly by following their development history. We also analyze the differences and compositions of different methods. Finally, we briefly outline the applications in which they have been used and discuss potential future research directions.
translated by 谷歌翻译
Knowledge graph (KG) embedding is to embed components of a KG including entities and relations into continuous vector spaces, so as to simplify the manipulation while preserving the inherent structure of the KG. It can benefit a variety of downstream tasks such as KG completion and relation extraction, and hence has quickly gained massive attention. In this article, we provide a systematic review of existing techniques, including not only the state-of-the-arts but also those with latest trends. Particularly, we make the review based on the type of information used in the embedding task. Techniques that conduct embedding using only facts observed in the KG are first introduced. We describe the overall framework, specific model design, typical training procedures, as well as pros and cons of such techniques. After that, we discuss techniques that further incorporate additional information besides facts. We focus specifically on the use of entity types, relation paths, textual descriptions, and logical rules. Finally, we briefly introduce how KG embedding can be applied to and benefit a wide variety of downstream tasks such as KG completion, relation extraction, question answering, and so forth.
translated by 谷歌翻译
人类每天产生的exabytes数据,导致越来越需要对大数据带来的多标签学习的大挑战的新努力。例如,极端多标签分类是一个有效且快速增长的研究区域,可以处理具有极大数量的类或标签的分类任务;利用具有有限监督的大规模数据构建一个多标签分类模型对实际应用变得有价值。除此之外,如何收获深度学习的强大学习能力,有巨大努力,以更好地捕获多标签的标签依赖性学习,这是深入学习解决现实世界分类任务的关键。然而,有人指出,缺乏缺乏系统性研究,明确关注分析大数据时代的多标签学习的新兴趋势和新挑战。呼吁综合调查旨在满足这项任务和描绘未来的研究方向和新应用。
translated by 谷歌翻译
Graphs are ubiquitous in nature and can therefore serve as models for many practical but also theoretical problems. For this purpose, they can be defined as many different types which suitably reflect the individual contexts of the represented problem. To address cutting-edge problems based on graph data, the research field of Graph Neural Networks (GNNs) has emerged. Despite the field's youth and the speed at which new models are developed, many recent surveys have been published to keep track of them. Nevertheless, it has not yet been gathered which GNN can process what kind of graph types. In this survey, we give a detailed overview of already existing GNNs and, unlike previous surveys, categorize them according to their ability to handle different graph types and properties. We consider GNNs operating on static and dynamic graphs of different structural constitutions, with or without node or edge attributes. Moreover, we distinguish between GNN models for discrete-time or continuous-time dynamic graphs and group the models according to their architecture. We find that there are still graph types that are not or only rarely covered by existing GNN models. We point out where models are missing and give potential reasons for their absence.
translated by 谷歌翻译
Traditional network embedding primarily focuses on learning a continuous vector representation for each node, preserving network structure and/or node content information, such that off-the-shelf machine learning algorithms can be easily applied to the vector-format node representations for network analysis. However, the learned continuous vector representations are inefficient for large-scale similarity search, which often involves finding nearest neighbors measured by distance or similarity in a continuous vector space. In this paper, we propose a search efficient binary network embedding algorithm called BinaryNE to learn a binary code for each node, by simultaneously modeling node context relations and node attribute relations through a three-layer neural network. BinaryNE learns binary node representations through a stochastic gradient descent based online learning algorithm. The learned binary encoding not only reduces memory usage to represent each node, but also allows fast bit-wise comparisons to support faster node similarity search than using Euclidean distance or other distance measures. Extensive experiments and comparisons demonstrate that BinaryNE not only delivers more than 25 times faster search speed, but also provides comparable or better search quality than traditional continuous vector based network embedding methods. The binary codes learned by BinaryNE also render competitive performance on node classification and node clustering tasks. The source code of this paper is available at https://github.com/daokunzhang/BinaryNE.
translated by 谷歌翻译
在过去十年中,图形内核引起了很多关注,并在结构化数据上发展成为一种快速发展的学习分支。在过去的20年中,该领域发生的相当大的研究活动导致开发数十个图形内核,每个图形内核都对焦于图形的特定结构性质。图形内核已成功地成功地在广泛的域中,从社交网络到生物信息学。本调查的目标是提供图形内核的文献的统一视图。特别是,我们概述了各种图形内核。此外,我们对公共数据集的几个内核进行了实验评估,并提供了比较研究。最后,我们讨论图形内核的关键应用,并概述了一些仍有待解决的挑战。
translated by 谷歌翻译
时间图代表实体之间的动态关系,并发生在许多现实生活中的应用中,例如社交网络,电子商务,通信,道路网络,生物系统等。他们需要根据其生成建模和表示学习的研究超出与静态图有关的研究。在这项调查中,我们全面回顾了近期针对处理时间图提出的神经时间依赖图表的学习和生成建模方法。最后,我们确定了现有方法的弱点,并讨论了我们最近发表的论文提格的研究建议[24]。
translated by 谷歌翻译
事实证明,信息提取方法可有效从结构化或非结构化数据中提取三重。以(头部实体,关系,尾部实体)形式组织这样的三元组的组织称为知识图(kgs)。当前的大多数知识图都是不完整的。为了在下游任务中使用kgs,希望预测kgs中缺少链接。最近,通过将实体和关系嵌入到低维的矢量空间中,旨在根据先前访问的三元组来预测三元组,从而对KGS表示不同的方法。根据如何独立或依赖对三元组进行处理,我们将知识图完成的任务分为传统和图形神经网络表示学习,并更详细地讨论它们。在传统的方法中,每个三重三倍将独立处理,并在基于GNN的方法中进行处理,三倍也考虑了他们的当地社区。查看全文
translated by 谷歌翻译
图表表示学习是一种快速增长的领域,其中一个主要目标是在低维空间中产生有意义的图形表示。已经成功地应用了学习的嵌入式来执行各种预测任务,例如链路预测,节点分类,群集和可视化。图表社区的集体努力提供了数百种方法,但在所有评估指标下没有单一方法擅长,例如预测准确性,运行时间,可扩展性等。该调查旨在通过考虑算法来评估嵌入方法的所有主要类别的图表变体,参数选择,可伸缩性,硬件和软件平台,下游ML任务和多样化数据集。我们使用包含手动特征工程,矩阵分解,浅神经网络和深图卷积网络的分类法组织了图形嵌入技术。我们使用广泛使用的基准图表评估了节点分类,链路预测,群集和可视化任务的这些类别算法。我们在Pytorch几何和DGL库上设计了我们的实验,并在不同的多核CPU和GPU平台上运行实验。我们严格地审查了各种性能指标下嵌入方法的性能,并总结了结果。因此,本文可以作为比较指南,以帮助用户选择最适合其任务的方法。
translated by 谷歌翻译
关于图表的深度学习最近吸引了重要的兴趣。然而,大多数作品都侧重于(半)监督学习,导致缺点包括重标签依赖,普遍性差和弱势稳健性。为了解决这些问题,通过良好设计的借口任务在不依赖于手动标签的情况下提取信息知识的自我监督学习(SSL)已成为图形数据的有希望和趋势的学习范例。与计算机视觉和自然语言处理等其他域的SSL不同,图表上的SSL具有独家背景,设计理念和分类。在图表的伞下自我监督学习,我们对采用图表数据采用SSL技术的现有方法及时及全面的审查。我们构建一个统一的框架,数学上正式地规范图表SSL的范例。根据借口任务的目标,我们将这些方法分为四类:基于生成的,基于辅助性的,基于对比的和混合方法。我们进一步描述了曲线图SSL在各种研究领域的应用,并总结了绘图SSL的常用数据集,评估基准,性能比较和开源代码。最后,我们讨论了该研究领域的剩余挑战和潜在的未来方向。
translated by 谷歌翻译
受到计算机愿景和语言理解的深度学习的巨大成功的影响,建议的研究已经转移到发明基于神经网络的新推荐模型。近年来,我们在开发神经推荐模型方面目睹了显着进展,这概括和超越了传统的推荐模型,由于神经网络的强烈代表性。在本调查论文中,我们从建议建模与准确性目标的角度进行了系统审查,旨在总结该领域,促进研究人员和从业者在推荐系统上工作的研究人员和从业者。具体而具体基于推荐建模期间的数据使用,我们将工作划分为协作过滤和信息丰富的建议:1)协作滤波,其利用用户项目交互数据的关键来源; 2)内容丰富的建议,其另外利用与用户和项目相关的侧面信息,如用户配置文件和项目知识图; 3)时间/顺序推荐,其考虑与交互相关的上下文信息,例如时间,位置和过去的交互。在为每种类型审查代表性工作后,我们终于讨论了这一领域的一些有希望的方向。
translated by 谷歌翻译
Clustering is a fundamental problem in network analysis that finds closely connected groups of nodes and separates them from other nodes in the graph, while link prediction is to predict whether two nodes in a network are likely to have a link. The definition of both naturally determines that clustering must play a positive role in obtaining accurate link prediction tasks. Yet researchers have long ignored or used inappropriate ways to undermine this positive relationship. In this article, We construct a simple but efficient clustering-driven link prediction framework(ClusterLP), with the goal of directly exploiting the cluster structures to obtain connections between nodes as accurately as possible in both undirected graphs and directed graphs. Specifically, we propose that it is easier to establish links between nodes with similar representation vectors and cluster tendencies in undirected graphs, while nodes in a directed graphs can more easily point to nodes similar to their representation vectors and have greater influence in their own cluster. We customized the implementation of ClusterLP for undirected and directed graphs, respectively, and the experimental results using multiple real-world networks on the link prediction task showed that our models is highly competitive with existing baseline models. The code implementation of ClusterLP and baselines we use are available at https://github.com/ZINUX1998/ClusterLP.
translated by 谷歌翻译
最近有一项激烈的活动在嵌入非常高维和非线性数据结构的嵌入中,其中大部分在数据科学和机器学习文献中。我们分四部分调查这项活动。在第一部分中,我们涵盖了非线性方法,例如主曲线,多维缩放,局部线性方法,ISOMAP,基于图形的方法和扩散映射,基于内核的方法和随机投影。第二部分与拓扑嵌入方法有关,特别是将拓扑特性映射到持久图和映射器算法中。具有巨大增长的另一种类型的数据集是非常高维网络数据。第三部分中考虑的任务是如何将此类数据嵌入中等维度的向量空间中,以使数据适合传统技术,例如群集和分类技术。可以说,这是算法机器学习方法与统计建模(所谓的随机块建模)之间的对比度。在论文中,我们讨论了两种方法的利弊。调查的最后一部分涉及嵌入$ \ mathbb {r}^ 2 $,即可视化中。提出了三种方法:基于第一部分,第二和第三部分中的方法,$ t $ -sne,UMAP和大节。在两个模拟数据集上进行了说明和比较。一个由嘈杂的ranunculoid曲线组成的三胞胎,另一个由随机块模型和两种类型的节点产生的复杂性的网络组成。
translated by 谷歌翻译
Knowledge graphs (KG) have served as the key component of various natural language processing applications. Commonsense knowledge graphs (CKG) are a special type of KG, where entities and relations are composed of free-form text. However, previous works in KG completion and CKG completion suffer from long-tail relations and newly-added relations which do not have many know triples for training. In light of this, few-shot KG completion (FKGC), which requires the strengths of graph representation learning and few-shot learning, has been proposed to challenge the problem of limited annotated data. In this paper, we comprehensively survey previous attempts on such tasks in the form of a series of methods and applications. Specifically, we first introduce FKGC challenges, commonly used KGs, and CKGs. Then we systematically categorize and summarize existing works in terms of the type of KGs and the methods. Finally, we present applications of FKGC models on prediction tasks in different areas and share our thoughts on future research directions of FKGC.
translated by 谷歌翻译
图形神经网络(GNN)在解决图形结构数据(即网络)方面的各种分析任务方面已广受欢迎。典型的gnns及其变体遵循一种消息的方式,该方式通过网络拓扑沿网络拓扑的特征传播过程获得网络表示,然而,它们忽略了许多现实世界网络中存在的丰富文本语义(例如,局部单词序列)。现有的文本丰富网络方法通过主要利用内部信息(例如主题或短语/单词)来整合文本语义,这些信息通常无法全面地挖掘文本语义,从而限制了网络结构和文本语义之间的相互指导。为了解决这些问题,我们提出了一个具有外部知识(TEKO)的新型文本富裕的图形神经网络,以充分利用文本丰富的网络中的结构和文本信息。具体而言,我们首先提出一个灵活的异质语义网络,该网络结合了文档和实体之间的高质量实体和互动。然后,我们介绍两种类型的外部知识,即结构化的三胞胎和非结构化实体描述,以更深入地了解文本语义。我们进一步为构建的异质语义网络设计了互惠卷积机制,使网络结构和文本语义能够相互协作并学习高级网络表示。在四个公共文本丰富的网络以及一个大规模的电子商务搜索数据集上进行了广泛的实验结果,这说明了Teko优于最先进的基线。
translated by 谷歌翻译