我们提出了一个分散的“Local2Global”的图形表示学习方法,即可以先用来缩放任何嵌入技术。我们的Local2Global方法首先将输入图分成重叠的子图(或“修补程序”)并独立地培训每个修补程序的本地表示。在第二步中,我们通过估计使用来自贴片重叠的信息的刚性动作的一组刚性运动来将本地表示将本地表示与全局一致的表示。 Local2Global相对于现有工作的关键区别特征是,在分布式训练期间无需经常昂贵的参数同步训练曲线的培训。这允许Local2Global缩放到大规模的工业应用,其中输入图甚至可能均不适合存储器,并且可以以分布式方式存储。我们在不同大小的数据集上应用Local2Global,并表明我们的方法在边缘重建和半监督分类上的规模和准确性之间实现了良好的权衡。我们还考虑异常检测的下游任务,并展示如何使用Local2Global在网络安全网络中突出显示异常。
translated by 谷歌翻译
Pre-publication draft of a book to be published byMorgan & Claypool publishers. Unedited version released with permission. All relevant copyrights held by the author and publisher extend to this pre-publication draft.
translated by 谷歌翻译
Graph convolutional network (GCN) has been successfully applied to many graph-based applications; however, training a large-scale GCN remains challenging. Current SGD-based algorithms suffer from either a high computational cost that exponentially grows with number of GCN layers, or a large space requirement for keeping the entire graph and the embedding of each node in memory. In this paper, we propose Cluster-GCN, a novel GCN algorithm that is suitable for SGD-based training by exploiting the graph clustering structure. Cluster-GCN works as the following: at each step, it samples a block of nodes that associate with a dense subgraph identified by a graph clustering algorithm, and restricts the neighborhood search within this subgraph. This simple but effective strategy leads to significantly improved memory and computational efficiency while being able to achieve comparable test accuracy with previous algorithms. To test the scalability of our algorithm, we create a new Amazon2M data with 2 million nodes and 61 million edges which is more than 5 times larger than the previous largest publicly available dataset (Reddit). For training a 3-layer GCN on this data, Cluster-GCN is faster than the previous state-of-the-art VR-GCN (1523 seconds vs 1961 seconds) and using much less memory (2.2GB vs 11.2GB). Furthermore, for training 4 layer GCN on this data, our algorithm can finish in around 36 minutes while all the existing GCN training algorithms fail to train due to the out-of-memory issue. Furthermore, Cluster-GCN allows us to train much deeper GCN without much time and memory overhead, which leads to improved prediction accuracy-using a 5-layer Cluster-GCN, we achieve state-of-the-art test F1 score 99.36 on the PPI dataset, while the previous best result was 98.71 by [16]. Our codes are publicly available at https://github.com/google-research/google-research/ tree/master/cluster_gcn.
translated by 谷歌翻译
In the last few years, graph neural networks (GNNs) have become the standard toolkit for analyzing and learning from data on graphs. This emerging field has witnessed an extensive growth of promising techniques that have been applied with success to computer science, mathematics, biology, physics and chemistry. But for any successful field to become mainstream and reliable, benchmarks must be developed to quantify progress. This led us in March 2020 to release a benchmark framework that i) comprises of a diverse collection of mathematical and real-world graphs, ii) enables fair model comparison with the same parameter budget to identify key architectures, iii) has an open-source, easy-to-use and reproducible code infrastructure, and iv) is flexible for researchers to experiment with new theoretical ideas. As of December 2022, the GitHub repository has reached 2,000 stars and 380 forks, which demonstrates the utility of the proposed open-source framework through the wide usage by the GNN community. In this paper, we present an updated version of our benchmark with a concise presentation of the aforementioned framework characteristics, an additional medium-sized molecular dataset AQSOL, similar to the popular ZINC, but with a real-world measured chemical target, and discuss how this framework can be leveraged to explore new GNN designs and insights. As a proof of value of our benchmark, we study the case of graph positional encoding (PE) in GNNs, which was introduced with this benchmark and has since spurred interest of exploring more powerful PE for Transformers and GNNs in a robust experimental setting.
translated by 谷歌翻译
近年来,基于Weisfeiler-Leman算法的算法和神经架构,是一个众所周知的Graph同构问题的启发式问题,它成为具有图形和关系数据的机器学习的强大工具。在这里,我们全面概述了机器学习设置中的算法的使用,专注于监督的制度。我们讨论了理论背景,展示了如何将其用于监督的图形和节点表示学习,讨论最近的扩展,并概述算法的连接(置换 - )方面的神经结构。此外,我们概述了当前的应用和未来方向,以刺激进一步的研究。
translated by 谷歌翻译
Using graph neural networks for large graphs is challenging since there is no clear way of constructing mini-batches. To solve this, previous methods have relied on sampling or graph clustering. While these approaches often lead to good training convergence, they introduce significant overhead due to expensive random data accesses and perform poorly during inference. In this work we instead focus on model behavior during inference. We theoretically model batch construction via maximizing the influence score of nodes on the outputs. This formulation leads to optimal approximation of the output when we do not have knowledge of the trained model. We call the resulting method influence-based mini-batching (IBMB). IBMB accelerates inference by up to 130x compared to previous methods that reach similar accuracy. Remarkably, with adaptive optimization and the right training schedule IBMB can also substantially accelerate training, thanks to precomputed batches and consecutive memory accesses. This results in up to 18x faster training per epoch and up to 17x faster convergence per runtime compared to previous methods.
translated by 谷歌翻译
Recently, graph neural networks (GNNs) have revolutionized the field of graph representation learning through effectively learned node embeddings, and achieved state-of-the-art results in tasks such as node classification and link prediction. However, current GNN methods are inherently flat and do not learn hierarchical representations of graphs-a limitation that is especially problematic for the task of graph classification, where the goal is to predict the label associated with an entire graph. Here we propose DIFFPOOL, a differentiable graph pooling module that can generate hierarchical representations of graphs and can be combined with various graph neural network architectures in an end-to-end fashion. DIFFPOOL learns a differentiable soft cluster assignment for nodes at each layer of a deep GNN, mapping nodes to a set of clusters, which then form the coarsened input for the next GNN layer. Our experimental results show that combining existing GNN methods with DIFFPOOL yields an average improvement of 5-10% accuracy on graph classification benchmarks, compared to all existing pooling approaches, achieving a new state-of-the-art on four out of five benchmark data sets.
translated by 谷歌翻译
通过提取和利用来自异构信息网络(HIN)的高阶信息的提取和利用模拟异质性,近年来一直在吸引巨大的研究关注。这种异构网络嵌入(HNE)方法有效地利用小规模旋流的异质性。然而,在现实世界中,随着新节点和不同类型的链路的连续引入,何种素数量呈指数级增长,使其成为十亿尺度的网络。在这种关链接上的学习节点嵌入式为现有的HNE方法进行了性能瓶颈,这些方法通常是集中的,即完成数据,并且模型都在单机上。为了满足强大的效率和有效性保障的大型HNE任务,我们呈现\纺织{分散嵌入框架的异构信息网络}(Dehin)。在Dehin中,我们生成一个分布式并行管道,它利用超图来注入到HNE任务中的并行化。 Dehin呈现了一种上下文保留的分区机制,可创新地将大HIN作为超图制定,其超高频连接语义相似的节点。我们的框架然后采用分散的策略来通过采用类似的树形管道来有效地分隔帖。然后,每个结果的子网被分配给分布式工作人员,该工作者采用深度信息最大化定理,从其接收的分区本地学习节点嵌入。我们进一步设计了一种新颖的嵌入对准方案,将独立学习的节点嵌入从所有子网嵌入到公共向量空间上的新颖嵌入对准方案,从而允许下游任务等链路预测和节点分类。
translated by 谷歌翻译
尽管近期图形神经网络(GNN)成功,但常见的架构通常表现出显着的限制,包括对过天飞机,远程依赖性和杂散边缘的敏感性,例如,由于图形异常或对抗性攻击。至少部分地解决了一个简单的透明框架内的这些问题,我们考虑了一个新的GNN层系列,旨在模仿和整合两个经典迭代算法的更新规则,即近端梯度下降和迭代重复最小二乘(IRLS)。前者定义了一个可扩展的基础GNN架构,其免受过性的,而仍然可以通过允许任意传播步骤捕获远程依赖性。相反,后者产生了一种新颖的注意机制,该注意机制被明确地锚定到底层端到端能量函数,以及相对于边缘不确定性的稳定性。当结合时,我们获得了一个非常简单而强大的模型,我们在包括标准化基准,与异常扰动的图形,具有异化的图形和涉及远程依赖性的图形的不同方案的极其简单而强大的模型。在此过程中,我们与已明确为各个任务设计的SOTA GNN方法进行比较,实现竞争或卓越的节点分类准确性。我们的代码可以在https://github.com/fftyyy/twirls获得。
translated by 谷歌翻译
Deep learning has revolutionized many machine learning tasks in recent years, ranging from image classification and video processing to speech recognition and natural language understanding. The data in these tasks are typically represented in the Euclidean space. However, there is an increasing number of applications where data are generated from non-Euclidean domains and are represented as graphs with complex relationships and interdependency between objects. The complexity of graph data has imposed significant challenges on existing machine learning algorithms. Recently, many studies on extending deep learning approaches for graph data have emerged. In this survey, we provide a comprehensive overview of graph neural networks (GNNs) in data mining and machine learning fields. We propose a new taxonomy to divide the state-of-the-art graph neural networks into four categories, namely recurrent graph neural networks, convolutional graph neural networks, graph autoencoders, and spatial-temporal graph neural networks. We further discuss the applications of graph neural networks across various domains and summarize the open source codes, benchmark data sets, and model evaluation of graph neural networks. Finally, we propose potential research directions in this rapidly growing field.
translated by 谷歌翻译
Graph AutoCododers(GAE)和变分图自动编码器(VGAE)作为链接预测的强大方法出现。他们的表现对社区探测问题的印象不那么令人印象深刻,根据最近和同意的实验评估,它们的表现通常超过了诸如louvain方法之类的简单替代方案。目前尚不清楚可以通过GAE和VGAE改善社区检测的程度,尤其是在没有节点功能的情况下。此外,不确定是否可以在链接预测上同时保留良好的性能。在本文中,我们表明,可以高精度地共同解决这两个任务。为此,我们介绍和理论上研究了一个社区保留的消息传递方案,通过在计算嵌入空间时考虑初始图形结构和基于模块化的先验社区来掺杂我们的GAE和VGAE编码器。我们还提出了新颖的培训和优化策略,包括引入一个模块化的正规器,以补充联合链路预测和社区检测的现有重建损失。我们通过对各种现实世界图的深入实验验证,证明了方法的经验有效性,称为模块化感知的GAE和VGAE。
translated by 谷歌翻译
图神经网络(GNN)是非欧盟数据的强大深度学习方法。流行的GNN是通信算法(MPNNS),它们在本地图中汇总并结合了信号。但是,浅的mpnns倾向于错过远程信号,并且在某些异质图上表现不佳,而深度mpnns可能会遇到过度平滑或过度阵型等问题。为了减轻此类问题,现有的工作通常会从欧几里得数据上训练神经网络或修改图形结构中借用归一化技术。然而,这些方法在理论上并不是很好地理解,并且可能会提高整体计算复杂性。在这项工作中,我们从光谱图嵌入中汲取灵感,并提出$ \ texttt {powerembed} $ - 一种简单的层归一化技术来增强mpnns。我们显示$ \ texttt {powerembed} $可以证明图形运算符的顶部 - $ k $引导特征向量,该算子可以防止过度光滑,并且对图形拓扑是不可知的;同时,它产生了从本地功能到全球信号的表示列表,避免了过度阵列。我们将$ \ texttt {powerembed} $应用于广泛的模拟和真实图表,并展示其竞争性能,尤其是对于异性图。
translated by 谷歌翻译
自我监督的学习提供了一个有希望的途径,消除了在图形上的代表学习中的昂贵标签信息的需求。然而,为了实现最先进的性能,方法通常需要大量的负例,并依赖于复杂的增强。这可能是昂贵的,特别是对于大图。为了解决这些挑战,我们介绍了引导的图形潜伏(BGRL) - 通过预测输入的替代增强来学习图表表示学习方法。 BGRL仅使用简单的增强,并减轻了对否定例子对比的需求,因此通过设计可扩展。 BGRL胜过或匹配现有的几种建立的基准,同时降低了内存成本的2-10倍。此外,我们表明,BGR1可以缩放到半监督方案中的数亿个节点的极大的图表 - 实现最先进的性能并改善监督基线,其中表示仅通过标签信息而塑造。特别是,我们的解决方案以BGRL为中心,将kdd杯2021的开放图基准的大规模挑战组成了一个获奖条目,在比所有先前可用的基准更大的级别的图形订单上,从而展示了我们方法的可扩展性和有效性。
translated by 谷歌翻译
图形神经网络(GNN)在许多基于图的学​​习任务中表现出很大的优势,但通常无法准确预测基于任务的节点集,例如链接/主题预测等。最近,许多作品通过使用随机节点功能或节点距离特征来解决此问题。但是,它们的收敛速度缓慢,预测不准确或高复杂性。在这项工作中,我们重新访问允许使用位置编码(PE)技术(例如Laplacian eigenmap,deepwalk等)的节点的位置特征。 。在这里,我们以原则性的方式研究了这些问题,并提出了一种可证明的解决方案,这是一类用严格数学分析的钉子的GNN层。 PEG使用单独的频道来更新原始节点功能和位置功能。 PEG施加置换量比W.R.T.原始节点功能并施加$ O(P)$(正交组)均值W.R.T.位置特征同时特征,其中$ p $是二手位置特征的维度。在8个现实世界网络上进行的广泛链接预测实验证明了PEG在概括和可伸缩性方面的优势。
translated by 谷歌翻译
在本文中,我们提出了一个通用框架,以缩放图形自动编码器(AE)和图形自动编码器(VAE)。该框架利用图形退化概念仅从一个密集的节点子集训练模型,而不是使用整个图。加上一种简单而有效的传播机制,我们的方法可显着提高可扩展性和训练速度,同时保持性能。我们在现有图AE和VAE的几种变体上评估和讨论我们的方法,并将这些模型的首次应用于具有多达数百万个节点和边缘的大图。我们取得了经验竞争的结果W.R.T.几种流行的可扩展节点嵌入方法,这些方法强调了对更可扩展图AE和VAE进行进一步研究的相关性。
translated by 谷歌翻译
Graph Convolutional Networks (GCNs) are powerful models for learning representations of attributed graphs. To scale GCNs to large graphs, state-of-the-art methods use various layer sampling techniques to alleviate the "neighbor explosion" problem during minibatch training. We propose GraphSAINT, a graph sampling based inductive learning method that improves training efficiency and accuracy in a fundamentally different way. By changing perspective, GraphSAINT constructs minibatches by sampling the training graph, rather than the nodes or edges across GCN layers. Each iteration, a complete GCN is built from the properly sampled subgraph. Thus, we ensure fixed number of well-connected nodes in all layers. We further propose normalization technique to eliminate bias, and sampling algorithms for variance reduction. Importantly, we can decouple the sampling from the forward and backward propagation, and extend GraphSAINT with many architecture variants (e.g., graph attention, jumping connection). GraphSAINT demonstrates superior performance in both accuracy and training time on five large graphs, and achieves new state-of-the-art F1 scores for PPI (0.995) and Reddit (0.970).
translated by 谷歌翻译
图形神经网络(GNNS)在提供图形结构时良好工作。但是,这种结构可能并不总是在现实世界应用中可用。该问题的一个解决方案是推断任务特定的潜在结构,然后将GNN应用于推断的图形。不幸的是,可能的图形结构的空间与节点的数量超级呈指数,因此任务特定的监督可能不足以学习结构和GNN参数。在这项工作中,我们提出了具有自我监督或拍打的邻接和GNN参数的同时学习,这是通过自我监督来推断图形结构的更多监督的方法。一个综合实验研究表明,缩小到具有数十万个节点的大图和胜过了几种模型,以便在已建立的基准上学习特定于任务的图形结构。
translated by 谷歌翻译
消息传递神经网络(MPNNS)是由于其简单性和可扩展性而大部分地进行图形结构数据的深度学习的领先架构。不幸的是,有人认为这些架构的表现力有限。本文提出了一种名为Comifariant Subgraph聚合网络(ESAN)的新颖框架来解决这个问题。我们的主要观察是,虽然两个图可能无法通过MPNN可区分,但它们通常包含可区分的子图。因此,我们建议将每个图形作为由某些预定义策略导出的一组子图,并使用合适的等分性架构来处理它。我们为图同构同构同构造的1立维Weisfeiler-Leman(1-WL)测试的新型变体,并在这些新的WL变体方面证明了ESAN的表达性下限。我们进一步证明,我们的方法增加了MPNNS和更具表现力的架构的表现力。此外,我们提供了理论结果,描述了设计选择诸如子图选择政策和等效性神经结构的设计方式如何影响我们的架构的表现力。要处理增加的计算成本,我们提出了一种子图采样方案,可以将其视为我们框架的随机版本。关于真实和合成数据集的一套全面的实验表明,我们的框架提高了流行的GNN架构的表现力和整体性能。
translated by 谷歌翻译
标记为图形结构数据的分类任务具有许多重要的应用程序,从社交建议到财务建模。深度神经网络越来越多地用于图形上的节点分类,其中具有相似特征的节点必须给出相同的标签。图形卷积网络(GCN)是如此广泛研究的神经网络体系结构,在此任务上表现良好。但是,对GCN的强大链接攻击攻击最近表明,即使对训练有素的模型进行黑框访问,培训图中也存在哪些链接(或边缘)。在本文中,我们提出了一种名为LPGNET的新神经网络体系结构,用于对具有隐私敏感边缘的图形进行培训。 LPGNET使用新颖的设计为训练过程中的图形结构提供了新颖的设计,为边缘提供了差异隐私(DP)保证。我们从经验上表明,LPGNET模型通常位于提供隐私和效用之间的最佳位置:它们比使用不使用边缘信息的“琐碎”私人体系结构(例如,香草MLP)和针对现有的链接策略攻击更好的弹性可以提供更好的实用性。使用完整边缘结构的香草GCN。 LPGNET还与DPGCN相比,LPGNET始终提供更好的隐私性权衡,这是我们大多数评估的数据集中将差异隐私改造为常规GCN的最新机制。
translated by 谷歌翻译
图表是一个宇宙数据结构,广泛用于组织现实世界中的数据。像交通网络,社交和学术网络这样的各种实际网络网络可以由图表代表。近年来,目睹了在网络中代表顶点的快速发展,进入低维矢量空间,称为网络表示学习。表示学习可以促进图形数据上的新算法的设计。在本调查中,我们对网络代表学习的当前文献进行了全面审查。现有算法可以分为三组:浅埋模型,异构网络嵌入模型,图形神经网络的模型。我们为每个类别审查最先进的算法,并讨论这些算法之间的基本差异。调查的一个优点是,我们系统地研究了不同类别的算法底层的理论基础,这提供了深入的见解,以更好地了解网络表示学习领域的发展。
translated by 谷歌翻译