图形神经网络(GNN)已在许多图分析任务(例如节点分类和链接预测)上实现了最新结果。然而,事实证明,图形群集等图形上的重要无监督问题对GNN的进步具有更大的抵抗力。图群集的总体目标与GNN中的节点合并相同 - 这是否意味着GNN池方法在聚类图上做得很好?令人惊讶的是,答案是没有的 - 当前的GNN合并方法通常无法恢复群集结构,而在简单的基线(例如应用于学习的表示形式上的K-均值)良好工作的情况下。我们通过仔细设计一组实验来进一步研究,以研究图形结构和属性数据中的不同信噪比情景。为了解决这些方法在聚类中的性能不佳,我们引入了深层模块化网络(DMON),这是一种受群集质量模块化量度启发的无监督池方法,并显示了它如何解决现实世界图的挑战性聚类结构的恢复。同样,在现实世界中,我们表明DMON产生的高质量簇与地面真相标签密切相关,从而实现了最先进的结果,比不同指标的其他合并方法提高了40%以上。
translated by 谷歌翻译
Pre-publication draft of a book to be published byMorgan & Claypool publishers. Unedited version released with permission. All relevant copyrights held by the author and publisher extend to this pre-publication draft.
translated by 谷歌翻译
Graph AutoCododers(GAE)和变分图自动编码器(VGAE)作为链接预测的强大方法出现。他们的表现对社区探测问题的印象不那么令人印象深刻,根据最近和同意的实验评估,它们的表现通常超过了诸如louvain方法之类的简单替代方案。目前尚不清楚可以通过GAE和VGAE改善社区检测的程度,尤其是在没有节点功能的情况下。此外,不确定是否可以在链接预测上同时保留良好的性能。在本文中,我们表明,可以高精度地共同解决这两个任务。为此,我们介绍和理论上研究了一个社区保留的消息传递方案,通过在计算嵌入空间时考虑初始图形结构和基于模块化的先验社区来掺杂我们的GAE和VGAE编码器。我们还提出了新颖的培训和优化策略,包括引入一个模块化的正规器,以补充联合链路预测和社区检测的现有重建损失。我们通过对各种现实世界图的深入实验验证,证明了方法的经验有效性,称为模块化感知的GAE和VGAE。
translated by 谷歌翻译
光谱群集中使用的目标函数通常由两个术语组成:i)一个术语最小化群集分配的局部二次变化,并且;ii)一个平衡聚类分区并有助于避免退化解决方案的术语。本文表明,配备合适消息传递层的图形神经网络可以通过仅优化平衡项来生成良好的集群分配。归因图数据集的结果显示了拟议方法在聚类性能和计算时间方面的有效性。
translated by 谷歌翻译
Recently, graph neural networks (GNNs) have revolutionized the field of graph representation learning through effectively learned node embeddings, and achieved state-of-the-art results in tasks such as node classification and link prediction. However, current GNN methods are inherently flat and do not learn hierarchical representations of graphs-a limitation that is especially problematic for the task of graph classification, where the goal is to predict the label associated with an entire graph. Here we propose DIFFPOOL, a differentiable graph pooling module that can generate hierarchical representations of graphs and can be combined with various graph neural network architectures in an end-to-end fashion. DIFFPOOL learns a differentiable soft cluster assignment for nodes at each layer of a deep GNN, mapping nodes to a set of clusters, which then form the coarsened input for the next GNN layer. Our experimental results show that combining existing GNN methods with DIFFPOOL yields an average improvement of 5-10% accuracy on graph classification benchmarks, compared to all existing pooling approaches, achieving a new state-of-the-art on four out of five benchmark data sets.
translated by 谷歌翻译
Graph Neural Networks (GNNs) are deep learning models designed to process attributed graphs. GNNs can compute cluster assignments accounting both for the vertex features and for the graph topology. Existing GNNs for clustering are trained by optimizing an unsupervised minimum cut objective, which is approximated by a Spectral Clustering (SC) relaxation. SC offers a closed-form solution that, however, is not particularly useful for a GNN trained with gradient descent. Additionally, the SC relaxation is loose and yields overly smooth cluster assignments, which do not separate well the samples. We propose a GNN model that optimizes a tighter relaxation of the minimum cut based on graph total variation (GTV). Our model has two core components: i) a message-passing layer that minimizes the $\ell_1$ distance in the features of adjacent vertices, which is key to achieving sharp cluster transitions; ii) a loss function that minimizes the GTV in the cluster assignments while ensuring balanced partitions. By optimizing the proposed loss, our model can be self-trained to perform clustering. In addition, our clustering procedure can be used to implement graph pooling in deep GNN architectures for graph classification. Experiments show that our model outperforms other GNN-based approaches for clustering and graph pooling.
translated by 谷歌翻译
In the last few years, graph neural networks (GNNs) have become the standard toolkit for analyzing and learning from data on graphs. This emerging field has witnessed an extensive growth of promising techniques that have been applied with success to computer science, mathematics, biology, physics and chemistry. But for any successful field to become mainstream and reliable, benchmarks must be developed to quantify progress. This led us in March 2020 to release a benchmark framework that i) comprises of a diverse collection of mathematical and real-world graphs, ii) enables fair model comparison with the same parameter budget to identify key architectures, iii) has an open-source, easy-to-use and reproducible code infrastructure, and iv) is flexible for researchers to experiment with new theoretical ideas. As of December 2022, the GitHub repository has reached 2,000 stars and 380 forks, which demonstrates the utility of the proposed open-source framework through the wide usage by the GNN community. In this paper, we present an updated version of our benchmark with a concise presentation of the aforementioned framework characteristics, an additional medium-sized molecular dataset AQSOL, similar to the popular ZINC, but with a real-world measured chemical target, and discuss how this framework can be leveraged to explore new GNN designs and insights. As a proof of value of our benchmark, we study the case of graph positional encoding (PE) in GNNs, which was introduced with this benchmark and has since spurred interest of exploring more powerful PE for Transformers and GNNs in a robust experimental setting.
translated by 谷歌翻译
The stochastic block model (SBM) is a random graph model with planted clusters. It is widely employed as a canonical model to study clustering and community detection, and provides generally a fertile ground to study the statistical and computational tradeoffs that arise in network and data sciences.This note surveys the recent developments that establish the fundamental limits for community detection in the SBM, both with respect to information-theoretic and computational thresholds, and for various recovery requirements such as exact, partial and weak recovery (a.k.a., detection). The main results discussed are the phase transitions for exact recovery at the Chernoff-Hellinger threshold, the phase transition for weak recovery at the Kesten-Stigum threshold, the optimal distortion-SNR tradeoff for partial recovery, the learning of the SBM parameters and the gap between information-theoretic and computational thresholds.The note also covers some of the algorithms developed in the quest of achieving the limits, in particular two-round algorithms via graph-splitting, semi-definite programming, linearized belief propagation, classical and nonbacktracking spectral methods. A few open problems are also discussed.
translated by 谷歌翻译
图神经网络(GNN)是非欧盟数据的强大深度学习方法。流行的GNN是通信算法(MPNNS),它们在本地图中汇总并结合了信号。但是,浅的mpnns倾向于错过远程信号,并且在某些异质图上表现不佳,而深度mpnns可能会遇到过度平滑或过度阵型等问题。为了减轻此类问题,现有的工作通常会从欧几里得数据上训练神经网络或修改图形结构中借用归一化技术。然而,这些方法在理论上并不是很好地理解,并且可能会提高整体计算复杂性。在这项工作中,我们从光谱图嵌入中汲取灵感,并提出$ \ texttt {powerembed} $ - 一种简单的层归一化技术来增强mpnns。我们显示$ \ texttt {powerembed} $可以证明图形运算符的顶部 - $ k $引导特征向量,该算子可以防止过度光滑,并且对图形拓扑是不可知的;同时,它产生了从本地功能到全球信号的表示列表,避免了过度阵列。我们将$ \ texttt {powerembed} $应用于广泛的模拟和真实图表,并展示其竞争性能,尤其是对于异性图。
translated by 谷歌翻译
图形神经网络(GNN)已被证明可以实现竞争结果,以解决与图形相关的任务,例如节点和图形分类,链接预测和节点以及各种域中的图形群集。大多数GNN使用消息传递框架,因此称为MPNN。尽管有很有希望的结果,但据报道,MPNN会遭受过度平滑,过度阵型和不足的影响。文献中已经提出了图形重新布线和图形池作为解决这些局限性的解决方案。但是,大多数最先进的图形重新布线方法无法保留该图的全局拓扑,因此没有可区分(电感),并且需要调整超参数。在本文中,我们提出了Diffwire,这是一个在MPNN中进行图形重新布线的新型框架,它通过利用LOV \'ASZ绑定来原理,完全可区分且无参数。我们的方法通过提出两个新的,mpnns中的新的互补层来提供统一的图形重新布线:首先,ctlayer,一个学习通勤时间并将其用作边缘重新加权的相关函数;其次,Gaplayer是优化光谱差距的图层,具体取决于网络的性质和手头的任务。我们从经验上验证了我们提出的方法的价值,并使用基准数据集分别验证了这些层的每个层以进行图形分类。 Diffwire将通勤时间的可学习性汇集到相关的曲率定义,为发展更具表现力的MPNN的发展打开了大门。
translated by 谷歌翻译
随机块模型(SBM)是一个随机图模型,其连接不同的顶点组不同。它被广泛用作研究聚类和社区检测的规范模型,并提供了肥沃的基础来研究组合统计和更普遍的数据科学中出现的信息理论和计算权衡。该专着调查了最近在SBM中建立社区检测的基本限制的最新发展,无论是在信息理论和计算方案方面,以及各种恢复要求,例如精确,部分和弱恢复。讨论的主要结果是在Chernoff-Hellinger阈值中进行精确恢复的相转换,Kesten-Stigum阈值弱恢复的相变,最佳的SNR - 单位信息折衷的部分恢复以及信息理论和信息理论之间的差距计算阈值。该专着给出了在寻求限制时开发的主要算法的原则推导,特别是通过绘制绘制,半定义编程,(线性化)信念传播,经典/非背带频谱和图形供电。还讨论了其他块模型的扩展,例如几何模型和一些开放问题。
translated by 谷歌翻译
Deep learning has revolutionized many machine learning tasks in recent years, ranging from image classification and video processing to speech recognition and natural language understanding. The data in these tasks are typically represented in the Euclidean space. However, there is an increasing number of applications where data are generated from non-Euclidean domains and are represented as graphs with complex relationships and interdependency between objects. The complexity of graph data has imposed significant challenges on existing machine learning algorithms. Recently, many studies on extending deep learning approaches for graph data have emerged. In this survey, we provide a comprehensive overview of graph neural networks (GNNs) in data mining and machine learning fields. We propose a new taxonomy to divide the state-of-the-art graph neural networks into four categories, namely recurrent graph neural networks, convolutional graph neural networks, graph autoencoders, and spatial-temporal graph neural networks. We further discuss the applications of graph neural networks across various domains and summarize the open source codes, benchmark data sets, and model evaluation of graph neural networks. Finally, we propose potential research directions in this rapidly growing field.
translated by 谷歌翻译
由于大型数据集中的深度学习模型需要大量时间和资源,因此希望构建一个小型合成数据集,我们可以通过该数据集充分训练深度学习模型。最近有一些作品通过复杂的BI级优化探索了有关凝结图像数据集的解决方案。例如,数据集冷凝(DC)匹配网络梯度W.R.T.大型数据和小合成数据,在每个外迭代处,网络权重优化了多个步骤。但是,现有方法具有其固有的局限性:(1)它们不直接适用于数据离散的图表; (2)由于所涉及的嵌套优化,冷凝过程在计算上昂贵。为了弥合差距,我们研究了针对图形数据集量身定制的有效数据集冷凝,在该数据集中我们将离散图结构模拟为概率模型。我们进一步提出了一个单步梯度匹配方案,该方案仅执行一个步骤,而无需训练网络权重。我们的理论分析表明,该策略可以生成合成图,从而导致实际图上的分类损失降低。各种图数据集的广泛实验证明了该方法的有效性和效率。特别是,我们能够将数据集大小降低90%,同时大约98%的原始性能,并且我们的方法明显快于多步梯度匹配(例如,CIFAR10中的15倍用于合成500个图)。
translated by 谷歌翻译
鉴于在现实世界应用中大规模图的流行率,训练神经模型的存储和时间引起了人们的关注。为了减轻关注点,我们提出和研究图形神经网络(GNNS)的图形凝结问题。具体而言,我们旨在将大型原始图凝结成一个小的,合成的和高度信息的图,以便在小图和大图上训练的GNN具有可比性的性能。我们通过优化梯度匹配损失并设计一种凝结节点期货和结构信息的策略来模仿原始图上的GNN训练轨迹,以解决凝结问题。广泛的实验证明了所提出的框架在将不同的图形数据集凝结成信息较小的较小图中的有效性。特别是,我们能够在REDDIT上近似于95.3%的原始测试准确性,Flickr的99.8%和CiteSeer的99.0%,同时将其图形尺寸降低了99.9%以上,并且可以使用冷凝图来训练各种GNN架构Code在https://github.com/chandlerbang/gcond上发布。
translated by 谷歌翻译
在本文中,我们提出了一种新方法来检测具有归因顶点的无向图中的簇。目的是将不仅在结构连接性方面,而且在属性值方面相似的顶点分组。我们通过创建[6,38]中提出的其他顶点和边缘,将顶点之间的结构和属性相似。然后将增强图嵌入到与其拉普拉斯式相关的欧几里得空间中,在该空间中,应用了修改的K-均值算法以识别簇。修改后的k均值依赖于矢量距离度量,根据每个原始顶点,我们分配了合适的矢量值坐标集,这取决于结构连接性和属性相似性,因此每个原始图顶点都被认为是$ M+1的代表增强图的$顶点,如果$ m $是顶点属性的数量。为了定义坐标矢量,我们基于自适应AMG(代数多机)方法采用了我们最近提出的算法,该方法识别了嵌入欧几里得空间中的坐标方向,以代数平滑的矢量相对于我们的增强图Laplacian,从而扩展了laplacian,从而扩展了坐标。没有属性的图形的先前结果。我们通过与一些知名方法进行比较,分析了我们提出的聚类方法的有效性,这些方法可以免费获得软件实现,并与文献中报告的结果相比,在两种不同类型的广泛使用的合成图上以及在某些现实世界中的图形上。
translated by 谷歌翻译
网络研究中最根本的问题之一是社区检测。随机块模型(SBM)是一种流行的模型,具有不同的估计方法,其社区检测一致性结果揭晓。但是,SBM受到强烈假设的限制:同一社区中的所有节点在随机上都是等效的,这可能不适合实际应用。我们引入了成对协变量调整后的随机块模型(PCABM),这是SBM的概括,该模型包含成对协变量信息。我们研究协变量和社区分配系数的最大似然估计。结果表明,在适当的稀疏条件下,协变量和社区分配的系数估计均一致。引入了带有调节的光谱聚类(SCWA),以有效地求解PCABM。在某些条件下,我们得出了SCWA下社区检测的错误限制,并表明它是社区检测一致的。此外,研究了模型的选择,并研究了成对协变量的特征选择,并提出了两种相应的算法。当可访问协变量信息时,PCABM与SBM或学位校正的随机块模型(DCBM)进行比较。
translated by 谷歌翻译
尽管图形神经网络(GNNS)领域的进步,但目前仅使用少量数据集来评估新模型。这种持续依赖少数数据集提供了对模型之间的性能差异的最小见解,对于可能具有与用作学术基准的数据集有很大不同的工业从业人员而言,尤其具有挑战性。在Google在GNN基础架构和开源软件方面的工作过程中,我们试图开发改进的基准,这些基准可健壮,可调,可扩展且可推广。在这项工作中,我们介绍了GraphWorld,这是一种新的方法和系统,用于对任何可疑的GNN任务进行任意大量的合成图种群进行基准测试GNN模型。 GraphWorld允许用户有效地生成具有数百万个统计上不同数据集的世界。它可访问,可扩展且易于使用。 GraphWorld可以在没有专门硬件的情况下在一台计算机上运行,​​也可以轻松地扩展到在任意群集或云框架上运行。使用GraphWorld,用户对Graph Generator参数具有细粒度的控制,并且可以使用内置的超参数调整基准测试任意GNN模型。我们从GraphWorld实验中介绍了有关数以百亿个基准数据集中数以万计的GNN模型的性能特征的见解。我们进一步表明,GraphWorld有效地探索了标准基准测试的基准数据集空间区域,从而揭示了在历史上无法获得的模型之间的比较。使用GraphWorld,我们还能够研究图形属性与任务性能指标之间的关系,这对于经典的现实基准集合而言,这几乎是不可能的。
translated by 谷歌翻译
Clustering is a fundamental problem in network analysis that finds closely connected groups of nodes and separates them from other nodes in the graph, while link prediction is to predict whether two nodes in a network are likely to have a link. The definition of both naturally determines that clustering must play a positive role in obtaining accurate link prediction tasks. Yet researchers have long ignored or used inappropriate ways to undermine this positive relationship. In this article, We construct a simple but efficient clustering-driven link prediction framework(ClusterLP), with the goal of directly exploiting the cluster structures to obtain connections between nodes as accurately as possible in both undirected graphs and directed graphs. Specifically, we propose that it is easier to establish links between nodes with similar representation vectors and cluster tendencies in undirected graphs, while nodes in a directed graphs can more easily point to nodes similar to their representation vectors and have greater influence in their own cluster. We customized the implementation of ClusterLP for undirected and directed graphs, respectively, and the experimental results using multiple real-world networks on the link prediction task showed that our models is highly competitive with existing baseline models. The code implementation of ClusterLP and baselines we use are available at https://github.com/ZINUX1998/ClusterLP.
translated by 谷歌翻译
我们从光谱的角度解决图形生成问题,首先生成图形laplacian光谱的主要部分,然后构建与这些特征值和特征向量相匹配的图。光谱调节允许直接建模全局和局部图结构,并有助于克服单发图生成器的表达性和模式崩溃问题。我们的新颖的甘(Spectre)称为Spectre,可以使用一声模型来产生比以前可能更大的图。Spectre的表现优于最先进的深度自动回归发电机在建模忠诚方面,同时还避免了昂贵的顺序产生和对节点排序的依赖。一个很好的例子,在相当大的合成和现实图形中,Specter的幽灵比最佳竞争对手的最佳竞争对手的改进是4到170倍,该竞争对手不合适,比自回旋发电机快23至30倍。
translated by 谷歌翻译
近年来,基于Weisfeiler-Leman算法的算法和神经架构,是一个众所周知的Graph同构问题的启发式问题,它成为具有图形和关系数据的机器学习的强大工具。在这里,我们全面概述了机器学习设置中的算法的使用,专注于监督的制度。我们讨论了理论背景,展示了如何将其用于监督的图形和节点表示学习,讨论最近的扩展,并概述算法的连接(置换 - )方面的神经结构。此外,我们概述了当前的应用和未来方向,以刺激进一步的研究。
translated by 谷歌翻译