在本文中,我们提出了一个通用框架,以缩放图形自动编码器(AE)和图形自动编码器(VAE)。该框架利用图形退化概念仅从一个密集的节点子集训练模型,而不是使用整个图。加上一种简单而有效的传播机制,我们的方法可显着提高可扩展性和训练速度,同时保持性能。我们在现有图AE和VAE的几种变体上评估和讨论我们的方法,并将这些模型的首次应用于具有多达数百万个节点和边缘的大图。我们取得了经验竞争的结果W.R.T.几种流行的可扩展节点嵌入方法,这些方法强调了对更可扩展图AE和VAE进行进一步研究的相关性。
translated by 谷歌翻译
Graph AutoCododers(GAE)和变分图自动编码器(VGAE)作为链接预测的强大方法出现。他们的表现对社区探测问题的印象不那么令人印象深刻,根据最近和同意的实验评估,它们的表现通常超过了诸如louvain方法之类的简单替代方案。目前尚不清楚可以通过GAE和VGAE改善社区检测的程度,尤其是在没有节点功能的情况下。此外,不确定是否可以在链接预测上同时保留良好的性能。在本文中,我们表明,可以高精度地共同解决这两个任务。为此,我们介绍和理论上研究了一个社区保留的消息传递方案,通过在计算嵌入空间时考虑初始图形结构和基于模块化的先验社区来掺杂我们的GAE和VGAE编码器。我们还提出了新颖的培训和优化策略,包括引入一个模块化的正规器,以补充联合链路预测和社区检测的现有重建损失。我们通过对各种现实世界图的深入实验验证,证明了方法的经验有效性,称为模块化感知的GAE和VGAE。
translated by 谷歌翻译
Clustering is a fundamental problem in network analysis that finds closely connected groups of nodes and separates them from other nodes in the graph, while link prediction is to predict whether two nodes in a network are likely to have a link. The definition of both naturally determines that clustering must play a positive role in obtaining accurate link prediction tasks. Yet researchers have long ignored or used inappropriate ways to undermine this positive relationship. In this article, We construct a simple but efficient clustering-driven link prediction framework(ClusterLP), with the goal of directly exploiting the cluster structures to obtain connections between nodes as accurately as possible in both undirected graphs and directed graphs. Specifically, we propose that it is easier to establish links between nodes with similar representation vectors and cluster tendencies in undirected graphs, while nodes in a directed graphs can more easily point to nodes similar to their representation vectors and have greater influence in their own cluster. We customized the implementation of ClusterLP for undirected and directed graphs, respectively, and the experimental results using multiple real-world networks on the link prediction task showed that our models is highly competitive with existing baseline models. The code implementation of ClusterLP and baselines we use are available at https://github.com/ZINUX1998/ClusterLP.
translated by 谷歌翻译
Pre-publication draft of a book to be published byMorgan & Claypool publishers. Unedited version released with permission. All relevant copyrights held by the author and publisher extend to this pre-publication draft.
translated by 谷歌翻译
Deep learning has revolutionized many machine learning tasks in recent years, ranging from image classification and video processing to speech recognition and natural language understanding. The data in these tasks are typically represented in the Euclidean space. However, there is an increasing number of applications where data are generated from non-Euclidean domains and are represented as graphs with complex relationships and interdependency between objects. The complexity of graph data has imposed significant challenges on existing machine learning algorithms. Recently, many studies on extending deep learning approaches for graph data have emerged. In this survey, we provide a comprehensive overview of graph neural networks (GNNs) in data mining and machine learning fields. We propose a new taxonomy to divide the state-of-the-art graph neural networks into four categories, namely recurrent graph neural networks, convolutional graph neural networks, graph autoencoders, and spatial-temporal graph neural networks. We further discuss the applications of graph neural networks across various domains and summarize the open source codes, benchmark data sets, and model evaluation of graph neural networks. Finally, we propose potential research directions in this rapidly growing field.
translated by 谷歌翻译
Variational Graph Autoencoders (VGAEs) are powerful models for unsupervised learning of node representations from graph data. In this work, we systematically analyze modeling node attributes in VGAEs and show that attribute decoding is important for node representation learning. We further propose a new learning model, interpretable NOde Representation with Attribute Decoding (NORAD). The model encodes node representations in an interpretable approach: node representations capture community structures in the graph and the relationship between communities and node attributes. We further propose a rectifying procedure to refine node representations of isolated notes, improving the quality of these nodes' representations. Our empirical results demonstrate the advantage of the proposed model when learning graph data in an interpretable approach.
translated by 谷歌翻译
Machine learning on graphs is an important and ubiquitous task with applications ranging from drug design to friendship recommendation in social networks. The primary challenge in this domain is finding a way to represent, or encode, graph structure so that it can be easily exploited by machine learning models. Traditionally, machine learning approaches relied on user-defined heuristics to extract features encoding structural information about a graph (e.g., degree statistics or kernel functions). However, recent years have seen a surge in approaches that automatically learn to encode graph structure into low-dimensional embeddings, using techniques based on deep learning and nonlinear dimensionality reduction. Here we provide a conceptual review of key advancements in this area of representation learning on graphs, including matrix factorization-based methods, random-walk based algorithms, and graph neural networks. We review methods to embed individual nodes as well as approaches to embed entire (sub)graphs. In doing so, we develop a unified framework to describe these recent approaches, and we highlight a number of important applications and directions for future work.
translated by 谷歌翻译
图表表示学习是一种快速增长的领域,其中一个主要目标是在低维空间中产生有意义的图形表示。已经成功地应用了学习的嵌入式来执行各种预测任务,例如链路预测,节点分类,群集和可视化。图表社区的集体努力提供了数百种方法,但在所有评估指标下没有单一方法擅长,例如预测准确性,运行时间,可扩展性等。该调查旨在通过考虑算法来评估嵌入方法的所有主要类别的图表变体,参数选择,可伸缩性,硬件和软件平台,下游ML任务和多样化数据集。我们使用包含手动特征工程,矩阵分解,浅神经网络和深图卷积网络的分类法组织了图形嵌入技术。我们使用广泛使用的基准图表评估了节点分类,链路预测,群集和可视化任务的这些类别算法。我们在Pytorch几何和DGL库上设计了我们的实验,并在不同的多核CPU和GPU平台上运行实验。我们严格地审查了各种性能指标下嵌入方法的性能,并总结了结果。因此,本文可以作为比较指南,以帮助用户选择最适合其任务的方法。
translated by 谷歌翻译
Deep learning has been shown to be successful in a number of domains, ranging from acoustics, images, to natural language processing. However, applying deep learning to the ubiquitous graph data is non-trivial because of the unique characteristics of graphs. Recently, substantial research efforts have been devoted to applying deep learning methods to graphs, resulting in beneficial advances in graph analysis techniques. In this survey, we comprehensively review the different types of deep learning methods on graphs. We divide the existing methods into five categories based on their model architectures and training strategies: graph recurrent neural networks, graph convolutional networks, graph autoencoders, graph reinforcement learning, and graph adversarial methods. We then provide a comprehensive overview of these methods in a systematic manner mainly by following their development history. We also analyze the differences and compositions of different methods. Finally, we briefly outline the applications in which they have been used and discuss potential future research directions.
translated by 谷歌翻译
图形自动编码器在嵌入基于图的数据集方面有效。大多数图形自动编码器体系结构都具有较浅的深度,这些深度限制了它们捕获由多支架隔开的节点之间有意义关系的能力。在本文中,我们提出了残留的变分图自动编码器Resvgae,这是一种具有多个残差模块的深度变分图自动编码器模型。我们表明,我们的多个残差模块,具有残差连接的卷积层,提高了图自动编码器的平均精度。实验结果表明,与其他最先进的方法相比,我们提出的剩余模块的模型优于没有残留模块的模型,并获得了相似的结果。
translated by 谷歌翻译
最近,图形神经网络(GNN)通过利用图形结构和节点特征的知识来表现出图表表示的显着性能。但是,他们中的大多数都有两个主要限制。首先,GNN可以通过堆叠更多的层来学习高阶结构信息,但由于过度光滑的问题,无法处理较大的深度。其次,由于昂贵的计算成本和高内存使用情况,在大图上应用这些方法并不容易。在本文中,我们提出了节点自适应特征平滑(NAFS),这是一种简单的非参数方法,该方法构建了没有参数学习的节点表示。 NAFS首先通过特征平滑提取每个节点及其不同啤酒花的邻居的特征,然后自适应地结合了平滑的特征。此外,通过不同的平滑策略提取的平滑特征的合奏可以进一步增强构建的节点表示形式。我们在两个不同的应用程序方案上对四个基准数据集进行实验:节点群集和链接预测。值得注意的是,具有功能合奏的NAFS优于这些任务上最先进的GNN,并减轻上述大多数基于学习的GNN对应物的两个限制。
translated by 谷歌翻译
给定实体及其在Web数据中的交互,可能在不同的时间发生,我们如何找到实体社区并跟踪其演变?在本文中,我们从图形群集的角度处理这项重要任务。最近,通过深层聚类方法,已经实现了各个领域的最新聚类性能。特别是,深图聚类(DGC)方法通过学习节点表示和群集分配在关节优化框架中成功扩展到图形结构的数据。尽管建模选择有所不同(例如,编码器架构),但现有的DGC方法主要基于自动编码器,并使用相同的群集目标和相对较小的适应性。同样,尽管许多现实世界图都是动态的,但以前的DGC方法仅被视为静态图。在这项工作中,我们开发了CGC,这是一个新颖的端到端图形聚类框架,其与现有方法的根本不同。 CGC在对比度图学习框架中学习节点嵌入和群集分配,在多级别方案中仔细选择了正面和负样本,以反映层次结构的社区结构和网络同质。此外,我们将CGC扩展到时间不断发展的数据,其中时间图以增量学习方式执行,并具有检测更改点的能力。对现实世界图的广泛评估表明,所提出的CGC始终优于现有方法。
translated by 谷歌翻译
图形神经网络已用于各种学习任务,例如链接预测,节点分类和节点群集。其中,链接预测是一项相对研究的图形学习任务,其当前最新模型基于浅层图自动编码器(GAE)体系结构的一层或两层。在本文中,我们专注于解决链接预测的当前方法的局限性,该预测只能使用浅的GAE和变分GAE,并创建有效的方法来加深(变异)GAE架构以实现稳定和竞争性的性能。我们提出的方法是创新的方法将标准自动编码器(AES)纳入GAE的体系结构,在该体系结构中,标准AE被利用以通过无缝整合邻接信息和节点来学习必要的,低维的表示,而GAE则进一步构建了多尺度的低规模的低尺度低尺度的低尺度。通过残差连接的维度表示,以学习紧凑的链接预测的整体嵌入。从经验上讲,在各种基准测试数据集上进行的广泛实验验证了我们方法的有效性,并证明了我们加深的图形模型以进行链接预测的竞争性能。从理论上讲,我们证明我们的深度扩展包括具有不同阶的多项式过滤器。
translated by 谷歌翻译
Link prediction is a key problem for network-structured data. Link prediction heuristics use some score functions, such as common neighbors and Katz index, to measure the likelihood of links. They have obtained wide practical uses due to their simplicity, interpretability, and for some of them, scalability. However, every heuristic has a strong assumption on when two nodes are likely to link, which limits their effectiveness on networks where these assumptions fail. In this regard, a more reasonable way should be learning a suitable heuristic from a given network instead of using predefined ones. By extracting a local subgraph around each target link, we aim to learn a function mapping the subgraph patterns to link existence, thus automatically learning a "heuristic" that suits the current network. In this paper, we study this heuristic learning paradigm for link prediction. First, we develop a novel γ-decaying heuristic theory. The theory unifies a wide range of heuristics in a single framework, and proves that all these heuristics can be well approximated from local subgraphs. Our results show that local subgraphs reserve rich information related to link existence. Second, based on the γ-decaying theory, we propose a new method to learn heuristics from local subgraphs using a graph neural network (GNN). Its experimental results show unprecedented performance, working consistently well on a wide range of problems.
translated by 谷歌翻译
基于Web的交互可以经常由归因图表示,并且在这些图中的节点聚类最近受到了很多关注。多次努力已成功应用图形卷积网络(GCN),但由于GCNS已被显示出遭受过平滑问题的GCNS的精度一些限制。虽然其他方法(特别是基于拉普拉斯平滑的方法)已经报告了更好的准确性,但所有工作的基本限制都是缺乏可扩展性。本文通过将LAPLACIAN平滑与广义的PageRank相同,并将随机步行基于算法应用为可伸缩图滤波器来解决这一打开问题。这构成了我们可扩展的深度聚类算法RWSL的基础,其中通过自我监督的迷你批量培训机制,我们同时优化了一个深度神经网络,用于采样集群分配分配和AutoEncoder,用于群集导向的嵌入。使用6个现实世界数据集和6个聚类指标,我们表明RWSL实现了几个最近基线的结果。最值得注意的是,我们显示与所有其他深度聚类框架不同的RWSL可以继续以超过一百万个节点的图形扩展,即句柄。我们还演示了RWSL如何在仅使用单个GPU的18亿边缘的图表上执行节点聚类。
translated by 谷歌翻译
我们介绍了一种新颖的屏蔽图AutoEncoder(MGAE)框架,以在图形结构数据上执行有效的学习。从自我监督学习中欣识见,我们随机掩盖了大部分边缘,并在训练期间尝试重建这些缺失的边缘。 Mgae有两个核心设计。首先,我们发现掩蔽了输入图结构的高比率,例如70 \%$,产生一个非凡和有意义的自我监督任务,使下游应用程序受益。其次,我们使用图形神经网络(GNN)作为编码器,以在部分掩蔽的图表上执行消息传播。为了重建大量掩模边缘,提出了一种定制的互相关解码器。它可以捕获多粒度的锚边的头部和尾部节点之间的互相关。耦合这两种设计使MGAE能够有效且有效地培训。在多个开放数据集(Planetoid和OGB基准测试)上进行了广泛的实验,证明MGAE通常比链接预测和节点分类更好地表现优于最先进的无监督竞争对手。
translated by 谷歌翻译
在本文中,我们提出了多分辨率的等级图变分性Autiachoders(MGVAE),第一层级生成模型以多分辨率和等分的方式学习和生成图。在每个分辨率级别,MGVAE采用更高的顺序消息,以便在学习中对图进行编码,同时学习将其分配到互斥的集群中并赋予最终产生潜在分布的层次结构的较低分辨率。然后,MGVAE构造分层生成模型以改变地解码成粗糙的图形的层次。重要的是,我们提出的框架是关于节点排序的端到端排列等级。MGVAE通过多种生成任务实现竞争结果,包括一般图生成,分子产生,无监督的分子表示学习,以预测分子特性,引用图的链路预测,以及基于图的图像生成。
translated by 谷歌翻译
网络嵌入作为网络分析的有希望的研究领域出现。最近,通过将冗余还原原理应用于对应于图像样本的两个扭曲版本的嵌入向量,提出了一种名为Barlow双胞胎的方法。通过此激励,我们提出了Barlow Graph自动编码器,这是一个简单而有效的学习网络嵌入的架构。它旨在最大限度地提高节点的立即和较大邻域的嵌入向量之间的相似性,同时最小化这些投影的组件之间的冗余。此外,我们还介绍了名为Barlow变形图自动编码器的变型对应物。我们的方法产生了对归纳链路预测的有希望的结果,并且还涉及用于聚类和下游节点分类的领域,如广泛的三个基准引用数据集上的多种已知技术的广泛比较所证明的。
translated by 谷歌翻译
在过去十年中,图形内核引起了很多关注,并在结构化数据上发展成为一种快速发展的学习分支。在过去的20年中,该领域发生的相当大的研究活动导致开发数十个图形内核,每个图形内核都对焦于图形的特定结构性质。图形内核已成功地成功地在广泛的域中,从社交网络到生物信息学。本调查的目标是提供图形内核的文献的统一视图。特别是,我们概述了各种图形内核。此外,我们对公共数据集的几个内核进行了实验评估,并提供了比较研究。最后,我们讨论图形内核的关键应用,并概述了一些仍有待解决的挑战。
translated by 谷歌翻译
我们展示了拓扑转型等值表示学习,是图形数据节点表示的自我监督学习的一般范式,以实现图形卷积神经网络(GCNNS)的广泛适用性。通过在转换之前和之后的拓扑转换和节点表示之间的相互信息,从信息理论的角度来看,我们将提出的模型正式化。我们得出最大化这种相互信息可以放宽以最小化应用拓扑变换与节点表示之间的估计之间的跨熵。特别是,我们寻求从原始图表中采样节点对的子集,并在每对之间翻转边缘连接以改变图形拓扑。然后,我们通过从原始和变换图的特征表示重构拓扑转换来自动列出表示编码器以学习节点表示。在实验中,我们将所提出的模型应用于下游节点分类,图形分类和链路预测任务,结果表明,所提出的方法优于现有的无监督方法。
translated by 谷歌翻译