Clustering is a fundamental problem in network analysis that finds closely connected groups of nodes and separates them from other nodes in the graph, while link prediction is to predict whether two nodes in a network are likely to have a link. The definition of both naturally determines that clustering must play a positive role in obtaining accurate link prediction tasks. Yet researchers have long ignored or used inappropriate ways to undermine this positive relationship. In this article, We construct a simple but efficient clustering-driven link prediction framework(ClusterLP), with the goal of directly exploiting the cluster structures to obtain connections between nodes as accurately as possible in both undirected graphs and directed graphs. Specifically, we propose that it is easier to establish links between nodes with similar representation vectors and cluster tendencies in undirected graphs, while nodes in a directed graphs can more easily point to nodes similar to their representation vectors and have greater influence in their own cluster. We customized the implementation of ClusterLP for undirected and directed graphs, respectively, and the experimental results using multiple real-world networks on the link prediction task showed that our models is highly competitive with existing baseline models. The code implementation of ClusterLP and baselines we use are available at https://github.com/ZINUX1998/ClusterLP.
translated by 谷歌翻译
Temporal networks are an important type of network whose topological structure changes over time. Compared with methods on static networks, temporal network embedding (TNE) methods are facing three challenges: 1) it cannot describe the temporal dependence across network snapshots; 2) the node embedding in the latent space fails to indicate changes in the network topology; and 3) it cannot avoid a lot of redundant computation via parameter inheritance on a series of snapshots. To this end, we propose a novel temporal network embedding method named Dynamic Cluster Structure Constraint model (DyCSC), whose core idea is to capture the evolution of temporal networks by imposing a temporal constraint on the tendency of the nodes in the network to a given number of clusters. It not only generates low-dimensional embedding vectors for nodes but also preserves the dynamic nonlinear features of temporal networks. Experimental results on multiple realworld datasets have demonstrated the superiority of DyCSC for temporal graph embedding, as it consistently outperforms competing methods by significant margins in multiple temporal link prediction tasks. Moreover, the ablation study further validates the effectiveness of the proposed temporal constraint.
translated by 谷歌翻译
复杂的网络是代表现实生活系统的图形,这些系统表现出独特的特征,这些特征在纯粹的常规或完全随机的图中未发现。由于基础过程的复杂性,对此类系统的研究至关重要,但具有挑战性。然而,由于大量网络数据的可用性,近几十年来,这项任务变得更加容易。复杂网络中的链接预测旨在估计网络中缺少两个节点之间的链接的可能性。由于数据收集的不完美或仅仅是因为它们尚未出现,因此可能会缺少链接。发现网络数据中实体之间的新关系吸引了研究人员在社会学,计算机科学,物理学和生物学等各个领域的关注。大多数现有研究的重点是无向复杂网络中的链接预测。但是,并非所有现实生活中的系统都可以忠实地表示为无向网络。当使用链接预测算法时,通常会做出这种简化的假设,但不可避免地会导致有关节点之间关系和预测性能中降解的信息的丢失。本文介绍了针对有向网络的明确设计的链接预测方法。它基于相似性范式,该范式最近已证明在无向网络中成功。提出的算法通过在相似性和受欢迎程度上将其建模为不对称性来处理节点关系中的不对称性。鉴于观察到的网络拓扑结构,该算法将隐藏的相似性近似为最短路径距离,并使用边缘权重捕获并取消链接的不对称性和节点的受欢迎程度。在现实生活中评估了所提出的方法,实验结果证明了其在预测各种网络数据类型和大小的丢失链接方面的有效性。
translated by 谷歌翻译
链接预测是一项重要的任务,在各个域中具有广泛的应用程序。但是,大多数现有的链接预测方法都假定给定的图遵循同质的假设,并设计基于相似性的启发式方法或表示学习方法来预测链接。但是,许多现实世界图是异性图,同义假设不存在,这挑战了现有的链接预测方法。通常,在异性图中,有许多引起链接形成的潜在因素,并且两个链接的节点在一个或两个因素中往往相似,但在其他因素中可能是不同的,导致总体相似性较低。因此,一种方法是学习每个节点的分离表示形式,每个矢量捕获一个因子上的节点的潜在表示,这铺平了一种方法来模拟异性图中的链接形成,从而导致更好的节点表示学习和链接预测性能。但是,对此的工作非常有限。因此,在本文中,我们研究了一个新的问题,该问题是在异性图上进行链接预测的分离表示学习。我们提出了一种新颖的框架分解,可以通过建模链接形成并执行感知因素的消息来学习以促进链接预测来学习解开的表示形式。在13个现实世界数据集上进行的广泛实验证明了Disenlink对异性恋和血友病图的链接预测的有效性。我们的代码可从https://github.com/sjz5202/disenlink获得
translated by 谷歌翻译
Graph AutoCododers(GAE)和变分图自动编码器(VGAE)作为链接预测的强大方法出现。他们的表现对社区探测问题的印象不那么令人印象深刻,根据最近和同意的实验评估,它们的表现通常超过了诸如louvain方法之类的简单替代方案。目前尚不清楚可以通过GAE和VGAE改善社区检测的程度,尤其是在没有节点功能的情况下。此外,不确定是否可以在链接预测上同时保留良好的性能。在本文中,我们表明,可以高精度地共同解决这两个任务。为此,我们介绍和理论上研究了一个社区保留的消息传递方案,通过在计算嵌入空间时考虑初始图形结构和基于模块化的先验社区来掺杂我们的GAE和VGAE编码器。我们还提出了新颖的培训和优化策略,包括引入一个模块化的正规器,以补充联合链路预测和社区检测的现有重建损失。我们通过对各种现实世界图的深入实验验证,证明了方法的经验有效性,称为模块化感知的GAE和VGAE。
translated by 谷歌翻译
时间网络链接预测是网络科学领域的重要任务,并且在实际情况下具有广泛的应用。揭示网络的进化机制对于链接预测至关重要,如何有效利用历史信息来实现时间链接并有效提取网络结构的高阶模式仍然是一个至关重要的挑战。为了解决这些问题,在本文中,我们提出了一个具有调整后的Sigmoid函数和2-Simplex结构(TLPSS)的新型时间链接预测模型。调整后的Sigmoid衰减模式考虑了活跃,衰减和稳定的边缘状态,这适当适合信息的生命周期。此外,引入了由单纯形高阶结构组成的潜在矩阵序列,以增强链接预测方法的性能,因为它在稀疏网络中非常可行。结合信息的生命周期和单纯级结构,通过满足动态网络中时间和结构信息的一致性来实现TLPS的整体性能。六个现实世界数据集的实验结果证明了TLPS的有效性,与其他基线方法相比,我们提出的模型平均提高了链接预测的性能15%。
translated by 谷歌翻译
Inferring missing links or detecting spurious ones based on observed graphs, known as link prediction, is a long-standing challenge in graph data analysis. With the recent advances in deep learning, graph neural networks have been used for link prediction and have achieved state-of-the-art performance. Nevertheless, existing methods developed for this purpose are typically discriminative, computing features of local subgraphs around two neighboring nodes and predicting potential links between them from the perspective of subgraph classification. In this formalism, the selection of enclosing subgraphs and heuristic structural features for subgraph classification significantly affects the performance of the methods. To overcome this limitation, this paper proposes a novel and radically different link prediction algorithm based on the network reconstruction theory, called GraphLP. Instead of sampling positive and negative links and heuristically computing the features of their enclosing subgraphs, GraphLP utilizes the feature learning ability of deep-learning models to automatically extract the structural patterns of graphs for link prediction under the assumption that real-world graphs are not locally isolated. Moreover, GraphLP explores high-order connectivity patterns to utilize the hierarchical organizational structures of graphs for link prediction. Our experimental results on all common benchmark datasets from different applications demonstrate that the proposed method consistently outperforms other state-of-the-art methods. Unlike the discriminative neural network models used for link prediction, GraphLP is generative, which provides a new paradigm for neural-network-based link prediction.
translated by 谷歌翻译
图表表示学习是一种快速增长的领域,其中一个主要目标是在低维空间中产生有意义的图形表示。已经成功地应用了学习的嵌入式来执行各种预测任务,例如链路预测,节点分类,群集和可视化。图表社区的集体努力提供了数百种方法,但在所有评估指标下没有单一方法擅长,例如预测准确性,运行时间,可扩展性等。该调查旨在通过考虑算法来评估嵌入方法的所有主要类别的图表变体,参数选择,可伸缩性,硬件和软件平台,下游ML任务和多样化数据集。我们使用包含手动特征工程,矩阵分解,浅神经网络和深图卷积网络的分类法组织了图形嵌入技术。我们使用广泛使用的基准图表评估了节点分类,链路预测,群集和可视化任务的这些类别算法。我们在Pytorch几何和DGL库上设计了我们的实验,并在不同的多核CPU和GPU平台上运行实验。我们严格地审查了各种性能指标下嵌入方法的性能,并总结了结果。因此,本文可以作为比较指南,以帮助用户选择最适合其任务的方法。
translated by 谷歌翻译
网络完成是一个比链接预测更难的问题,因为它不仅尝试推断丢失的链接,还要推断节点。已经提出了不同的方法来解决此问题,但是很少有人使用结构信息 - 局部连接模式的相似性。在本文中,我们提出了一个名为C-GIN的模型,以根据图形自动编码器框架从网络的观察到的部分捕获局部结构模式,该框架配备了图形同构网络模型,并将这些模式推广到完成整个图形。对来自不同领域的合成和现实世界网络的实验和分析表明,C-Gin可以实现竞争性能,而所需的信息较少,并且在大多数情况下,与基线预测模型相比,可以获得更高的准确性。我们进一步提出了一个基于网络结构的“可达聚类系数(CC)”。实验表明,我们的模型在具有较高可及的CC的网络上表现更好。
translated by 谷歌翻译
一组广泛建立的无监督节点嵌入方法可以解释为由两个独特的步骤组成:i)基于兴趣图的相似性矩阵的定义,然后是II)ii)该矩阵的明确或隐式因素化。受这个观点的启发,我们提出了框架的两个步骤的改进。一方面,我们建议根据自由能距离编码节点相似性,该自由能距离在最短路径和通勤时间距离之间进行了插值,从而提供了额外的灵活性。另一方面,我们根据损耗函数提出了一种基质分解方法,该方法将Skip-Gram模型的损失函数推广到任意相似性矩阵。与基于广泛使用的$ \ ell_2 $损失的因素化相比,该方法可以更好地保留与较高相似性分数相关的节点对。此外,它可以使用高级自动分化工具包轻松实现,并通过利用GPU资源进行有效计算。在现实世界数据集上的节点聚类,节点分类和链接预测实验证明了与最先进的替代方案相比,合并基于自由能的相似性以及所提出的矩阵分解的有效性。
translated by 谷歌翻译
Graph embedding algorithms embed a graph into a vector space where the structure and the inherent properties of the graph are preserved. The existing graph embedding methods cannot preserve the asymmetric transitivity well, which is a critical property of directed graphs. Asymmetric transitivity depicts the correlation among directed edges, that is, if there is a directed path from u to v, then there is likely a directed edge from u to v. Asymmetric transitivity can help in capturing structures of graphs and recovering from partially observed graphs. To tackle this challenge, we propose the idea of preserving asymmetric transitivity by approximating high-order proximity which are based on asymmetric transitivity. In particular, we develop a novel graph embedding algorithm, High-Order Proximity preserved Embedding (HOPE for short), which is scalable to preserve high-order proximities of large scale graphs and capable of capturing the asymmetric transitivity. More specifically, we first derive a general formulation that cover multiple popular highorder proximity measurements, then propose a scalable embedding algorithm to approximate the high-order proximity measurements based on their general formulation. Moreover, we provide a theoretical upper bound on the RMSE (Root Mean Squared Error) of the approximation. Our empirical experiments on a synthetic dataset and three realworld datasets demonstrate that HOPE can approximate the high-order proximities significantly better than the state-ofart algorithms and outperform the state-of-art algorithms in tasks of reconstruction, link prediction and vertex recommendation.
translated by 谷歌翻译
Variational Graph Autoencoders (VGAEs) are powerful models for unsupervised learning of node representations from graph data. In this work, we systematically analyze modeling node attributes in VGAEs and show that attribute decoding is important for node representation learning. We further propose a new learning model, interpretable NOde Representation with Attribute Decoding (NORAD). The model encodes node representations in an interpretable approach: node representations capture community structures in the graph and the relationship between communities and node attributes. We further propose a rectifying procedure to refine node representations of isolated notes, improving the quality of these nodes' representations. Our empirical results demonstrate the advantage of the proposed model when learning graph data in an interpretable approach.
translated by 谷歌翻译
在本文中,我们提出了一个通用框架,以缩放图形自动编码器(AE)和图形自动编码器(VAE)。该框架利用图形退化概念仅从一个密集的节点子集训练模型,而不是使用整个图。加上一种简单而有效的传播机制,我们的方法可显着提高可扩展性和训练速度,同时保持性能。我们在现有图AE和VAE的几种变体上评估和讨论我们的方法,并将这些模型的首次应用于具有多达数百万个节点和边缘的大图。我们取得了经验竞争的结果W.R.T.几种流行的可扩展节点嵌入方法,这些方法强调了对更可扩展图AE和VAE进行进一步研究的相关性。
translated by 谷歌翻译
This paper studies the problem of embedding very large information networks into low-dimensional vector spaces, which is useful in many tasks such as visualization, node classification, and link prediction. Most existing graph embedding methods do not scale for real world information networks which usually contain millions of nodes. In this paper, we propose a novel network embedding method called the "LINE," which is suitable for arbitrary types of information networks: undirected, directed, and/or weighted. The method optimizes a carefully designed objective function that preserves both the local and global network structures. An edge-sampling algorithm is proposed that addresses the limitation of the classical stochastic gradient descent and improves both the effectiveness and the efficiency of the inference. Empirical experiments prove the effectiveness of the LINE on a variety of real-world information networks, including language networks, social networks, and citation networks. The algorithm is very efficient, which is able to learn the embedding of a network with millions of vertices and billions of edges in a few hours on a typical single machine. The source code of the LINE is available online. 1
translated by 谷歌翻译
Pre-publication draft of a book to be published byMorgan & Claypool publishers. Unedited version released with permission. All relevant copyrights held by the author and publisher extend to this pre-publication draft.
translated by 谷歌翻译
Graph is an important data representation which appears in a wide diversity of real-world scenarios. Effective graph analytics provides users a deeper understanding of what is behind the data, and thus can benefit a lot of useful applications such as node classification, node recommendation, link prediction, etc. However, most graph analytics methods suffer the high computation and space cost. Graph embedding is an effective yet efficient way to solve the graph analytics problem. It converts the graph data into a low dimensional space in which the graph structural information and graph properties are maximumly preserved. In this survey, we conduct a comprehensive review of the literature in graph embedding. We first introduce the formal definition of graph embedding as well as the related concepts. After that, we propose two taxonomies of graph embedding which correspond to what challenges exist in different graph embedding problem settings and how the existing work address these challenges in their solutions. Finally, we summarize the applications that graph embedding enables and suggest four promising future research directions in terms of computation efficiency, problem settings, techniques and application scenarios.
translated by 谷歌翻译
图形神经网络(GNN)已被广泛应用于各种领域,以通过图形结构数据学习。在各种任务(例如节点分类和图形分类)中,他们对传统启发式方法显示了显着改进。但是,由于GNN严重依赖于平滑的节点特征而不是图形结构,因此在链接预测中,它们通常比简单的启发式方法表现出差的性能,例如,结构信息(例如,重叠的社区,学位和最短路径)至关重要。为了解决这一限制,我们建议邻里重叠感知的图形神经网络(NEO-GNNS),这些神经网络(NEO-GNNS)从邻接矩阵中学习有用的结构特征,并估算了重叠的邻域以进行链接预测。我们的Neo-Gnns概括了基于社区重叠的启发式方法,并处理重叠的多跳社区。我们在开放图基准数据集(OGB)上进行的广泛实验表明,NEO-GNNS始终在链接预测中实现最新性能。我们的代码可在https://github.com/seongjunyun/neo_gnns上公开获取。
translated by 谷歌翻译
Deep learning has revolutionized many machine learning tasks in recent years, ranging from image classification and video processing to speech recognition and natural language understanding. The data in these tasks are typically represented in the Euclidean space. However, there is an increasing number of applications where data are generated from non-Euclidean domains and are represented as graphs with complex relationships and interdependency between objects. The complexity of graph data has imposed significant challenges on existing machine learning algorithms. Recently, many studies on extending deep learning approaches for graph data have emerged. In this survey, we provide a comprehensive overview of graph neural networks (GNNs) in data mining and machine learning fields. We propose a new taxonomy to divide the state-of-the-art graph neural networks into four categories, namely recurrent graph neural networks, convolutional graph neural networks, graph autoencoders, and spatial-temporal graph neural networks. We further discuss the applications of graph neural networks across various domains and summarize the open source codes, benchmark data sets, and model evaluation of graph neural networks. Finally, we propose potential research directions in this rapidly growing field.
translated by 谷歌翻译
Network embedding is an important method to learn low-dimensional representations of vertexes in networks, aiming to capture and preserve the network structure. Almost all the existing network embedding methods adopt shallow models. However, since the underlying network structure is complex, shallow models cannot capture the highly non-linear network structure, resulting in sub-optimal network representations. Therefore, how to find a method that is able to effectively capture the highly non-linear network structure and preserve the global and local structure is an open yet important problem. To solve this problem, in this paper we propose a Structural Deep Network Embedding method, namely SDNE. More specifically, we first propose a semi-supervised deep model, which has multiple layers of non-linear functions, thereby being able to capture the highly non-linear network structure. Then we propose to exploit the first-order and second-order proximity jointly to preserve the network structure. The second-order proximity is used by the unsupervised component to capture the global network structure. While the first-order proximity is used as the supervised information in the supervised component to preserve the local network structure. By jointly optimizing them in the semi-supervised deep model, our method can preserve both the local and global network structure and is robust to sparse networks. Empirically, we conduct the experiments on five real-world networks, including a language network, a citation network and three social networks. The results show that compared to the baselines, our method can reconstruct the original network significantly better and achieves substantial gains in three applications, i.e. multi-label classification, link prediction and visualization.
translated by 谷歌翻译
图嵌入方法旨在通过将节点映射到低维矢量空间来查找有用的图表。这是一项具有重要下游应用程序的任务,例如链接预测,图形重建,数据可视化,节点分类和语言建模。近年来,图形嵌入领域见证了从线性代数方法转向基于局部的优化方法,结合了随机步行和深神经网络,以解决嵌入大图的问题。但是,尽管优化工具有所改进,但图形嵌入方法仍然是一般设计的,以忽略现实生活网络的特殊性的方式。确实,近年来,理解和建模复杂的现实生活网络取得了重大进展。但是,获得的结果对嵌入算法的发展产生了很小的影响。本文旨在通过设计一种图形嵌入方法来解决此问题,该方法利用网络科学领域的最新有价值的见解。更确切地说,我们基于普及性相似性和局部吸引力范例提出了一种新颖的图形嵌入方法。我们在大量现实生活网络上评估了在链接预测任务上提出的方法的性能。我们使用广泛的实验分析表明,所提出的方法优于嵌入算法的最先进的图。我们还证明了它对数据稀缺性和嵌入维度的选择的稳健性。
translated by 谷歌翻译