社区检测是网络科学的基本和重要问题,但只有几个基于图形神经网络的社区检测算法,其中无监督的算法几乎是空白的。通过融合具有网络功能的高阶模块化信息,本文首次提出了基于变分AualiCoder重建的社区检测VGGAer,并给出了其非概率版本。他们不需要任何先前的信息。我们精心设计了基于社区检测任务的相应输入功能,解码器和下游任务,这些设计简洁,自然,表现良好(我们的设计下的NMI值得到59.1%-565.9%)。基于一系列具有广泛数据集和先​​进方法的一系列实验,VGAER实现了卓越的性能,并具有更简单的设计竞争力和潜力。最后,我们报告了算法收敛性分析和T-SNE可视化的结果,清楚地描绘了VGAER的稳定性能和强大的网络模块化能力。我们的代码可在https://github.com/qcydm/vgaer提供。
translated by 谷歌翻译
Graph AutoCododers(GAE)和变分图自动编码器(VGAE)作为链接预测的强大方法出现。他们的表现对社区探测问题的印象不那么令人印象深刻,根据最近和同意的实验评估,它们的表现通常超过了诸如louvain方法之类的简单替代方案。目前尚不清楚可以通过GAE和VGAE改善社区检测的程度,尤其是在没有节点功能的情况下。此外,不确定是否可以在链接预测上同时保留良好的性能。在本文中,我们表明,可以高精度地共同解决这两个任务。为此,我们介绍和理论上研究了一个社区保留的消息传递方案,通过在计算嵌入空间时考虑初始图形结构和基于模块化的先验社区来掺杂我们的GAE和VGAE编码器。我们还提出了新颖的培训和优化策略,包括引入一个模块化的正规器,以补充联合链路预测和社区检测的现有重建损失。我们通过对各种现实世界图的深入实验验证,证明了方法的经验有效性,称为模块化感知的GAE和VGAE。
translated by 谷歌翻译
Pre-publication draft of a book to be published byMorgan & Claypool publishers. Unedited version released with permission. All relevant copyrights held by the author and publisher extend to this pre-publication draft.
translated by 谷歌翻译
Clustering is a fundamental problem in network analysis that finds closely connected groups of nodes and separates them from other nodes in the graph, while link prediction is to predict whether two nodes in a network are likely to have a link. The definition of both naturally determines that clustering must play a positive role in obtaining accurate link prediction tasks. Yet researchers have long ignored or used inappropriate ways to undermine this positive relationship. In this article, We construct a simple but efficient clustering-driven link prediction framework(ClusterLP), with the goal of directly exploiting the cluster structures to obtain connections between nodes as accurately as possible in both undirected graphs and directed graphs. Specifically, we propose that it is easier to establish links between nodes with similar representation vectors and cluster tendencies in undirected graphs, while nodes in a directed graphs can more easily point to nodes similar to their representation vectors and have greater influence in their own cluster. We customized the implementation of ClusterLP for undirected and directed graphs, respectively, and the experimental results using multiple real-world networks on the link prediction task showed that our models is highly competitive with existing baseline models. The code implementation of ClusterLP and baselines we use are available at https://github.com/ZINUX1998/ClusterLP.
translated by 谷歌翻译
在本文中,我们研究了在非全粒图上进行节点表示学习的自我监督学习的问题。现有的自我监督学习方法通​​常假定该图是同质的,其中链接的节点通常属于同一类或具有相似的特征。但是,这种同质性的假设在现实图表中并不总是正确的。我们通过为图神经网络开发脱钩的自我监督学习(DSSL)框架来解决这个问题。 DSSL模仿了节点的生成过程和语义结构的潜在变量建模的链接,该过程将不同邻域之间的不同基础语义解散到自我监督的节点学习过程中。我们的DSSL框架对编码器不可知,不需要预制的增强,因此对不同的图表灵活。为了通过潜在变量有效地优化框架,我们得出了自我监督目标的较低范围的证据,并开发了具有变异推理的可扩展培训算法。我们提供理论分析,以证明DSSL享有更好的下游性能。与竞争性的自我监督学习基线相比,对各种类图基准的广泛实验表明,我们提出的框架可以显着取得更好的性能。
translated by 谷歌翻译
Deep learning has been shown to be successful in a number of domains, ranging from acoustics, images, to natural language processing. However, applying deep learning to the ubiquitous graph data is non-trivial because of the unique characteristics of graphs. Recently, substantial research efforts have been devoted to applying deep learning methods to graphs, resulting in beneficial advances in graph analysis techniques. In this survey, we comprehensively review the different types of deep learning methods on graphs. We divide the existing methods into five categories based on their model architectures and training strategies: graph recurrent neural networks, graph convolutional networks, graph autoencoders, graph reinforcement learning, and graph adversarial methods. We then provide a comprehensive overview of these methods in a systematic manner mainly by following their development history. We also analyze the differences and compositions of different methods. Finally, we briefly outline the applications in which they have been used and discuss potential future research directions.
translated by 谷歌翻译
Variational Graph Autoencoders (VGAEs) are powerful models for unsupervised learning of node representations from graph data. In this work, we systematically analyze modeling node attributes in VGAEs and show that attribute decoding is important for node representation learning. We further propose a new learning model, interpretable NOde Representation with Attribute Decoding (NORAD). The model encodes node representations in an interpretable approach: node representations capture community structures in the graph and the relationship between communities and node attributes. We further propose a rectifying procedure to refine node representations of isolated notes, improving the quality of these nodes' representations. Our empirical results demonstrate the advantage of the proposed model when learning graph data in an interpretable approach.
translated by 谷歌翻译
给定实体及其在Web数据中的交互,可能在不同的时间发生,我们如何找到实体社区并跟踪其演变?在本文中,我们从图形群集的角度处理这项重要任务。最近,通过深层聚类方法,已经实现了各个领域的最新聚类性能。特别是,深图聚类(DGC)方法通过学习节点表示和群集分配在关节优化框架中成功扩展到图形结构的数据。尽管建模选择有所不同(例如,编码器架构),但现有的DGC方法主要基于自动编码器,并使用相同的群集目标和相对较小的适应性。同样,尽管许多现实世界图都是动态的,但以前的DGC方法仅被视为静态图。在这项工作中,我们开发了CGC,这是一个新颖的端到端图形聚类框架,其与现有方法的根本不同。 CGC在对比度图学习框架中学习节点嵌入和群集分配,在多级别方案中仔细选择了正面和负样本,以反映层次结构的社区结构和网络同质。此外,我们将CGC扩展到时间不断发展的数据,其中时间图以增量学习方式执行,并具有检测更改点的能力。对现实世界图的广泛评估表明,所提出的CGC始终优于现有方法。
translated by 谷歌翻译
图表表示学习是一种快速增长的领域,其中一个主要目标是在低维空间中产生有意义的图形表示。已经成功地应用了学习的嵌入式来执行各种预测任务,例如链路预测,节点分类,群集和可视化。图表社区的集体努力提供了数百种方法,但在所有评估指标下没有单一方法擅长,例如预测准确性,运行时间,可扩展性等。该调查旨在通过考虑算法来评估嵌入方法的所有主要类别的图表变体,参数选择,可伸缩性,硬件和软件平台,下游ML任务和多样化数据集。我们使用包含手动特征工程,矩阵分解,浅神经网络和深图卷积网络的分类法组织了图形嵌入技术。我们使用广泛使用的基准图表评估了节点分类,链路预测,群集和可视化任务的这些类别算法。我们在Pytorch几何和DGL库上设计了我们的实验,并在不同的多核CPU和GPU平台上运行实验。我们严格地审查了各种性能指标下嵌入方法的性能,并总结了结果。因此,本文可以作为比较指南,以帮助用户选择最适合其任务的方法。
translated by 谷歌翻译
Deep learning has revolutionized many machine learning tasks in recent years, ranging from image classification and video processing to speech recognition and natural language understanding. The data in these tasks are typically represented in the Euclidean space. However, there is an increasing number of applications where data are generated from non-Euclidean domains and are represented as graphs with complex relationships and interdependency between objects. The complexity of graph data has imposed significant challenges on existing machine learning algorithms. Recently, many studies on extending deep learning approaches for graph data have emerged. In this survey, we provide a comprehensive overview of graph neural networks (GNNs) in data mining and machine learning fields. We propose a new taxonomy to divide the state-of-the-art graph neural networks into four categories, namely recurrent graph neural networks, convolutional graph neural networks, graph autoencoders, and spatial-temporal graph neural networks. We further discuss the applications of graph neural networks across various domains and summarize the open source codes, benchmark data sets, and model evaluation of graph neural networks. Finally, we propose potential research directions in this rapidly growing field.
translated by 谷歌翻译
最近,最大化的互信息是一种强大的无监测图表表示学习的方法。现有方法通常有效地从拓扑视图中捕获信息但忽略特征视图。为了规避这个问题,我们通过利用功能和拓扑视图利用互信息最大化提出了一种新的方法。具体地,我们首先利用多视图表示学习模块来更好地捕获跨图形上的特征和拓扑视图的本地和全局信息内容。为了模拟由特征和拓扑空间共享的信息,我们使用相互信息最大化和重建损耗最小化开发公共表示学习模块。要明确鼓励图形表示之间的多样性在相同的视图中,我们还引入了一个分歧正则化,以扩大同一视图之间的表示之间的距离。合成和实际数据集的实验证明了集成功能和拓扑视图的有效性。特别是,与先前的监督方法相比,我们所提出的方法可以在无监督的代表和线性评估协议下实现可比或甚至更好的性能。
translated by 谷歌翻译
尽管图表学习(GRL)取得了重大进展,但要以足够的方式提取和嵌入丰富的拓扑结构和特征信息仍然是一个挑战。大多数现有方法都集中在本地结构上,并且无法完全融合全球拓扑结构。为此,我们提出了一种新颖的结构保留图表学习(SPGRL)方法,以完全捕获图的结构信息。具体而言,为了减少原始图的不确定性和错误信息,我们通过k-nearest邻居方法构建了特征图作为互补视图。该特征图可用于对比节点级别以捕获本地关系。此外,我们通过最大化整个图形和特征嵌入的相互信息(MI)来保留全局拓扑结构信息,从理论上讲,该信息可以简化为交换功能的特征嵌入和原始图以重建本身。广泛的实验表明,我们的方法在半监督节点分类任务上具有相当出色的性能,并且在图形结构或节点特征上噪声扰动下的鲁棒性出色。
translated by 谷歌翻译
网络完成是一个比链接预测更难的问题,因为它不仅尝试推断丢失的链接,还要推断节点。已经提出了不同的方法来解决此问题,但是很少有人使用结构信息 - 局部连接模式的相似性。在本文中,我们提出了一个名为C-GIN的模型,以根据图形自动编码器框架从网络的观察到的部分捕获局部结构模式,该框架配备了图形同构网络模型,并将这些模式推广到完成整个图形。对来自不同领域的合成和现实世界网络的实验和分析表明,C-Gin可以实现竞争性能,而所需的信息较少,并且在大多数情况下,与基线预测模型相比,可以获得更高的准确性。我们进一步提出了一个基于网络结构的“可达聚类系数(CC)”。实验表明,我们的模型在具有较高可及的CC的网络上表现更好。
translated by 谷歌翻译
图表是一个宇宙数据结构,广泛用于组织现实世界中的数据。像交通网络,社交和学术网络这样的各种实际网络网络可以由图表代表。近年来,目睹了在网络中代表顶点的快速发展,进入低维矢量空间,称为网络表示学习。表示学习可以促进图形数据上的新算法的设计。在本调查中,我们对网络代表学习的当前文献进行了全面审查。现有算法可以分为三组:浅埋模型,异构网络嵌入模型,图形神经网络的模型。我们为每个类别审查最先进的算法,并讨论这些算法之间的基本差异。调查的一个优点是,我们系统地研究了不同类别的算法底层的理论基础,这提供了深入的见解,以更好地了解网络表示学习领域的发展。
translated by 谷歌翻译
网络嵌入作为网络分析的有希望的研究领域出现。最近,通过将冗余还原原理应用于对应于图像样本的两个扭曲版本的嵌入向量,提出了一种名为Barlow双胞胎的方法。通过此激励,我们提出了Barlow Graph自动编码器,这是一个简单而有效的学习网络嵌入的架构。它旨在最大限度地提高节点的立即和较大邻域的嵌入向量之间的相似性,同时最小化这些投影的组件之间的冗余。此外,我们还介绍了名为Barlow变形图自动编码器的变型对应物。我们的方法产生了对归纳链路预测的有希望的结果,并且还涉及用于聚类和下游节点分类的领域,如广泛的三个基准引用数据集上的多种已知技术的广泛比较所证明的。
translated by 谷歌翻译
基于Web的交互可以经常由归因图表示,并且在这些图中的节点聚类最近受到了很多关注。多次努力已成功应用图形卷积网络(GCN),但由于GCNS已被显示出遭受过平滑问题的GCNS的精度一些限制。虽然其他方法(特别是基于拉普拉斯平滑的方法)已经报告了更好的准确性,但所有工作的基本限制都是缺乏可扩展性。本文通过将LAPLACIAN平滑与广义的PageRank相同,并将随机步行基于算法应用为可伸缩图滤波器来解决这一打开问题。这构成了我们可扩展的深度聚类算法RWSL的基础,其中通过自我监督的迷你批量培训机制,我们同时优化了一个深度神经网络,用于采样集群分配分配和AutoEncoder,用于群集导向的嵌入。使用6个现实世界数据集和6个聚类指标,我们表明RWSL实现了几个最近基线的结果。最值得注意的是,我们显示与所有其他深度聚类框架不同的RWSL可以继续以超过一百万个节点的图形扩展,即句柄。我们还演示了RWSL如何在仅使用单个GPU的18亿边缘的图表上执行节点聚类。
translated by 谷歌翻译
Machine learning on graphs is an important and ubiquitous task with applications ranging from drug design to friendship recommendation in social networks. The primary challenge in this domain is finding a way to represent, or encode, graph structure so that it can be easily exploited by machine learning models. Traditionally, machine learning approaches relied on user-defined heuristics to extract features encoding structural information about a graph (e.g., degree statistics or kernel functions). However, recent years have seen a surge in approaches that automatically learn to encode graph structure into low-dimensional embeddings, using techniques based on deep learning and nonlinear dimensionality reduction. Here we provide a conceptual review of key advancements in this area of representation learning on graphs, including matrix factorization-based methods, random-walk based algorithms, and graph neural networks. We review methods to embed individual nodes as well as approaches to embed entire (sub)graphs. In doing so, we develop a unified framework to describe these recent approaches, and we highlight a number of important applications and directions for future work.
translated by 谷歌翻译
Anomaly analytics is a popular and vital task in various research contexts, which has been studied for several decades. At the same time, deep learning has shown its capacity in solving many graph-based tasks like, node classification, link prediction, and graph classification. Recently, many studies are extending graph learning models for solving anomaly analytics problems, resulting in beneficial advances in graph-based anomaly analytics techniques. In this survey, we provide a comprehensive overview of graph learning methods for anomaly analytics tasks. We classify them into four categories based on their model architectures, namely graph convolutional network (GCN), graph attention network (GAT), graph autoencoder (GAE), and other graph learning models. The differences between these methods are also compared in a systematic manner. Furthermore, we outline several graph-based anomaly analytics applications across various domains in the real world. Finally, we discuss five potential future research directions in this rapidly growing field.
translated by 谷歌翻译
最近关于图表卷积网络(GCN)的研究表明,初始节点表示(即,第一次图卷积前的节点表示)很大程度上影响最终的模型性能。但是,在学习节点的初始表示时,大多数现有工作线性地组合了节点特征的嵌入,而不考虑特征之间的交互(或特征嵌入)。我们认为,当节点特征是分类时,例如,在许多实际应用程序中,如用户分析和推荐系统,功能交互通常会对预测分析进行重要信号。忽略它们将导致次优初始节点表示,从而削弱后续图表卷积的有效性。在本文中,我们提出了一个名为CatGCN的新GCN模型,当节点功能是分类时,为图表学习量身定制。具体地,我们将显式交互建模的两种方式集成到初始节点表示的学习中,即在每对节点特征上的本地交互建模和人工特征图上的全局交互建模。然后,我们通过基于邻域聚合的图形卷积来优化增强的初始节点表示。我们以端到端的方式训练CatGCN,并在半监督节点分类上展示它。来自腾讯和阿里巴巴数据集的三个用户分析的三个任务(预测用户年龄,城市和购买级别)的大量实验验证了CatGCN的有效性,尤其是在图表卷积之前执行特征交互建模的积极效果。
translated by 谷歌翻译
A large number of studies on Graph Outlier Detection (GOD) have emerged in recent years due to its wide applications, in which Unsupervised Node Outlier Detection (UNOD) on attributed networks is an important area. UNOD focuses on detecting two kinds of typical outliers in graphs: the structural outlier and the contextual outlier. Most existing works conduct experiments based on datasets with injected outliers. However, we find that the most widely-used outlier injection approach has a serious data leakage issue. By only utilizing such data leakage, a simple approach can achieve state-of-the-art performance in detecting outliers. In addition, we observe that most existing algorithms have a performance drop with varied injection settings. The other major issue is on balanced detection performance between the two types of outliers, which has not been considered by existing studies. In this paper, we analyze the cause of the data leakage issue in depth since the injection approach is a building block to advance UNOD. Moreover, we devise a novel variance-based model to detect structural outliers, which outperforms existing algorithms significantly at different injection settings. On top of this, we propose a new framework, Variance-based Graph Outlier Detection (VGOD), which combines our variance-based model and attribute reconstruction model to detect outliers in a balanced way. Finally, we conduct extensive experiments to demonstrate the effectiveness and efficiency of VGOD. The results on 5 real-world datasets validate that VGOD achieves not only the best performance in detecting outliers but also a balanced detection performance between structural and contextual outliers. Our code is available at https://github.com/goldenNormal/vgod-github.
translated by 谷歌翻译