Graph Neural Networks(GNNs) are a family of neural models tailored for graph-structure data and have shown superior performance in learning representations for graph-structured data. However, training GNNs on large graphs remains challenging and a promising direction is distributed GNN training, which is to partition the input graph and distribute the workload across multiple machines. The key bottleneck of the existing distributed GNNs training framework is the across-machine communication induced by the dependency on the graph data and aggregation operator of GNNs. In this paper, we study the communication complexity during distributed GNNs training and propose a simple lossless communication reduction method, termed the Aggregation before Communication (ABC) method. ABC method exploits the permutation-invariant property of the GNNs layer and leads to a paradigm where vertex-cut is proved to admit a superior communication performance than the currently popular paradigm (edge-cut). In addition, we show that the new partition paradigm is particularly ideal in the case of dynamic graphs where it is infeasible to control the edge placement due to the unknown stochastic of the graph-changing process.
translated by 谷歌翻译
Graph neural networks (GNNs) have been demonstrated to be a powerful algorithmic model in broad application fields for their effectiveness in learning over graphs. To scale GNN training up for large-scale and ever-growing graphs, the most promising solution is distributed training which distributes the workload of training across multiple computing nodes. However, the workflows, computational patterns, communication patterns, and optimization techniques of distributed GNN training remain preliminarily understood. In this paper, we provide a comprehensive survey of distributed GNN training by investigating various optimization techniques used in distributed GNN training. First, distributed GNN training is classified into several categories according to their workflows. In addition, their computational patterns and communication patterns, as well as the optimization techniques proposed by recent work are introduced. Second, the software frameworks and hardware platforms of distributed GNN training are also introduced for a deeper understanding. Third, distributed GNN training is compared with distributed training of deep neural networks, emphasizing the uniqueness of distributed GNN training. Finally, interesting issues and opportunities in this field are discussed.
translated by 谷歌翻译
我们在大图中介绍了图形神经网络(GNNS)的分布式全批量培训的顺序聚合和换算(SAR)方案。最近,GNN的大规模培训是基于非学习消息传递的基于采样的方法和方法主导的。另一方面,SAR是一种分布式技术,可以直接在整个大图上培训任何GNN类型。 SAR中的关键创新是分布式顺序修补方案,其在后向通过期间依次重新构造,然后在后向通行证期间释放禁止的大型GNN计算图。这导致优异的记忆缩放行为,其中每个工作人员的内存消耗与工人的数量线性地下降,即使对于密集连接的图形。使用SAR,我们报告了最大的全批量GNN培训应用到目前为止,并随着工人数量的增加而展示了大的内存节省。我们还基于内核融合和注意力矩阵的一般技术提出了一种优化了基于关注的模型的运行时和内存效率。我们表明,与SAR相结合,我们的优化注意核导致了基于关注的GNN的显着加速和内存节省。
translated by 谷歌翻译
图形神经网络(GNN)已被证明是分析非欧国人图数据的强大工具。但是,缺乏有效的分布图学习(GL)系统极大地阻碍了GNN的应用,尤其是当图形大且GNN相对深时。本文中,我们提出了GraphTheta,这是一种以顶点为中心的图形编程模型实现的新颖分布式和可扩展的GL系统。 GraphTheta是第一个基于分布式图处理的GL系统,其神经网络运算符以用户定义的功能实现。该系统支持多种培训策略,并在分布式(虚拟)机器上启用高度可扩展的大图学习。为了促进图形卷积实现,GraphTheta提出了一个名为NN-Tgar的新的GL抽象,以弥合图形处理和图形深度学习之间的差距。提出了分布式图引擎,以通过混合平行执行进行随机梯度下降优化。此外,除了全球批次和迷你批次外,我们还为新的集群批次培训策略提供了支持。我们使用许多网络大小的数据集评估GraphTheta,范围从小,适度到大规模。实验结果表明,GraphTheta可以很好地扩展到1,024名工人,用于培训内部开发的GNN,该工业尺度的Aripay数据集为14亿个节点和41亿个属性边缘,并带有CPU虚拟机(Dockers)群的小群。 (5 $ \ sim $ 12GB)。此外,GraphTheta比最先进的GNN实现获得了可比或更好的预测结果,证明其学习GNN和现有框架的能力,并且可以超过多达$ 2.02 \ tims $ $ 2.02 \ times $,具有更好的可扩展性。据我们所知,这项工作介绍了文献中最大的边缘属性GNN学习任务。
translated by 谷歌翻译
图形神经网络(GNNS)将深度神经网络(DNN)的成功扩展到非欧几里德图数据,实现了各种任务的接地性能,例如节点分类和图形属性预测。尽管如此,现有系统效率低,培训数十亿节点和GPU的节点和边缘训练大图。主要瓶颈是准备GPU数据的过程 - 子图采样和特征检索。本文提出了一个分布式GNN培训系统的BGL,旨在解决一些关键思想的瓶颈。首先,我们提出了一种动态缓存引擎,以最小化特征检索流量。通过协同设计缓存政策和抽样顺序,我们发现低开销和高缓存命中率的精美斑点。其次,我们改善了曲线图分区算法,以减少子图采样期间的交叉分区通信。最后,仔细资源隔离减少了不同数据预处理阶段之间的争用。关于各种GNN模型和大图数据集的广泛实验表明,BGL平均明显优于现有的GNN训练系统20.68倍。
translated by 谷歌翻译
开发用于训练图形的可扩展解决方案,用于链路预测任务的Neural网络(GNNS)由于具有高计算成本和巨大内存占用的高数据依赖性,因此由于高数据依赖性而具有挑战性。我们提出了一种新的方法,用于缩放知识图形嵌入模型的培训,以满足这些挑战。为此,我们提出了以下算法策略:自给自足的分区,基于约束的负采样和边缘迷你批量培训。两者都是分区策略和基于约束的负面采样,避免在训练期间交叉分区数据传输。在我们的实验评估中,我们表明,我们基于GNN的知识图形嵌入模型的缩放解决方案在基准数据集中实现了16倍的加速,同时将可比的模型性能作为标准度量的非分布式方法。
translated by 谷歌翻译
图表神经网络(GNNS)最近提出了用于处理图形结构数据的神经网络结构。由于他们所采用的邻国聚合策略,现有的GNNS专注于捕获节点级信息并忽略高级信息。因此,现有的GNN受到本地置换不变性(LPI)问题引起的代表性限制。为了克服这些限制并丰富GNN捕获的特征,我们提出了一种新的GNN框架,称为两级GNN(TL-GNN)。这与节点级信息合并子图级信息。此外,我们提供了对LPI问题的数学分析,这表明子图级信息有利于克服与LPI相关的问题。还提出了一种基于动态编程算法的子图计数方法,并且该具有时间复杂度是O(n ^ 3),n是图的节点的数量。实验表明,TL-GNN优于现有的GNN,实现了最先进的性能。
translated by 谷歌翻译
Training Graph Neural Networks, on graphs containing billions of vertices and edges, at scale using minibatch sampling poses a key challenge: strong-scaling graphs and training examples results in lower compute and higher communication volume and potential performance loss. DistGNN-MB employs a novel Historical Embedding Cache combined with compute-communication overlap to address this challenge. On a 32-node (64-socket) cluster of $3^{rd}$ generation Intel Xeon Scalable Processors with 36 cores per socket, DistGNN-MB trains 3-layer GraphSAGE and GAT models on OGBN-Papers100M to convergence with epoch times of 2 seconds and 4.9 seconds, respectively, on 32 compute nodes. At this scale, DistGNN-MB trains GraphSAGE 5.2x faster than the widely-used DistDGL. DistGNN-MB trains GraphSAGE and GAT 10x and 17.2x faster, respectively, as compute nodes scale from 2 to 32.
translated by 谷歌翻译
最近出现了许多子图增强图神经网络(GNN),可证明增强了标准(消息通话)GNN的表达能力。但是,对这些方法之间的相互关系和weisfeiler层次结构的关系有限。此外,当前的方法要么使用给定尺寸的所有子图,要随机均匀地对其进行采样,或者使用手工制作的启发式方法,而不是学习以数据驱动的方式选择子图。在这里,我们提供了一种统一的方法来研究此类体系结构,通过引入理论框架并扩展了亚图增强GNN的已知表达结果。具体而言,我们表明,增加子图的大小总是会增加表达能力,并通过将它们与已建立的$ k \ text { - } \ Mathsf {Wl} $ hierArchy联系起来,从而更好地理解其局限性。此外,我们还使用最近通过复杂的离散概率分布进行反向传播的方法探索了学习对子图进行采样的不同方法。从经验上讲,我们研究了不同子图增强的GNN的预测性能,表明我们的数据驱动体系结构与非DATA驱动的亚图增强图形神经网络相比,在标准基准数据集上提高了对标准基准数据集的预测准确性,同时减少了计算时间。
translated by 谷歌翻译
图形神经网络(GNNS)是关于图形机器学习问题的深度学习架构。最近已经表明,GNN的富有效力可以精确地由组合Weisfeiler-Leman算法和有限可变计数逻辑来表征。该对应关系甚至导致了对应于更高维度的WL算法的新的高阶GNN。本文的目的是解释GNN的这些描述性特征。
translated by 谷歌翻译
Graph Neural Networks (GNNs) had been demonstrated to be inherently susceptible to the problems of over-smoothing and over-squashing. These issues prohibit the ability of GNNs to model complex graph interactions by limiting their effectiveness at taking into account distant information. Our study reveals the key connection between the local graph geometry and the occurrence of both of these issues, thereby providing a unified framework for studying them at a local scale using the Ollivier's Ricci curvature. Based on our theory, a number of principled methods are proposed to alleviate the over-smoothing and over-squashing issues.
translated by 谷歌翻译
Deep learning-based approaches have been developed to solve challenging problems in wireless communications, leading to promising results. Early attempts adopted neural network architectures inherited from applications such as computer vision. They often yield poor performance in large scale networks (i.e., poor scalability) and unseen network settings (i.e., poor generalization). To resolve these issues, graph neural networks (GNNs) have been recently adopted, as they can effectively exploit the domain knowledge, i.e., the graph topology in wireless communications problems. GNN-based methods can achieve near-optimal performance in large-scale networks and generalize well under different system settings, but the theoretical underpinnings and design guidelines remain elusive, which may hinder their practical implementations. This paper endeavors to fill both the theoretical and practical gaps. For theoretical guarantees, we prove that GNNs achieve near-optimal performance in wireless networks with much fewer training samples than traditional neural architectures. Specifically, to solve an optimization problem on an $n$-node graph (where the nodes may represent users, base stations, or antennas), GNNs' generalization error and required number of training samples are $\mathcal{O}(n)$ and $\mathcal{O}(n^2)$ times lower than the unstructured multi-layer perceptrons. For design guidelines, we propose a unified framework that is applicable to general design problems in wireless networks, which includes graph modeling, neural architecture design, and theory-guided performance enhancement. Extensive simulations, which cover a variety of important problems and network settings, verify our theory and the effectiveness of the proposed design framework.
translated by 谷歌翻译
Graph Convolutional Networks (GCNs) are extensively utilized for deep learning on graphs. The large data sizes of graphs and their vertex features make scalable training algorithms and distributed memory systems necessary. Since the convolution operation on graphs induces irregular memory access patterns, designing a memory- and communication-efficient parallel algorithm for GCN training poses unique challenges. We propose a highly parallel training algorithm that scales to large processor counts. In our solution, the large adjacency and vertex-feature matrices are partitioned among processors. We exploit the vertex-partitioning of the graph to use non-blocking point-to-point communication operations between processors for better scalability. To further minimize the parallelization overheads, we introduce a sparse matrix partitioning scheme based on a hypergraph partitioning model for full-batch training. We also propose a novel stochastic hypergraph model to encode the expected communication volume in mini-batch training. We show the merits of the hypergraph model, previously unexplored for GCN training, over the standard graph partitioning model which does not accurately encode the communication costs. Experiments performed on real-world graph datasets demonstrate that the proposed algorithms achieve considerable speedups over alternative solutions. The optimizations achieved on communication costs become even more pronounced at high scalability with many processors. The performance benefits are preserved in deeper GCNs having more layers as well as on billion-scale graphs.
translated by 谷歌翻译
Learning node embeddings that capture a node's position within the broader graph structure is crucial for many prediction tasks on graphs. However, existing Graph Neural Network (GNN) architectures have limited power in capturing the position/location of a given node with respect to all other nodes of the graph. Here we propose Position-aware Graph Neural Networks (P-GNNs), a new class of GNNs for computing position-aware node embeddings. P-GNN first samples sets of anchor nodes, computes the distance of a given target node to each anchor-set, and then learns a non-linear distance-weighted aggregation scheme over the anchor-sets. This way P-GNNs can capture positions/locations of nodes with respect to the anchor nodes. P-GNNs have several advantages: they are inductive, scalable, and can incorporate node feature information. We apply P-GNNs to multiple prediction tasks including link prediction and community detection. We show that P-GNNs consistently outperform state of the art GNNs, with up to 66% improvement in terms of the ROC AUC score.Node embedding methods can be categorized into Graph Neural Networks (GNNs) approaches (Scarselli et al., 2009),
translated by 谷歌翻译
近年来,基于Weisfeiler-Leman算法的算法和神经架构,是一个众所周知的Graph同构问题的启发式问题,它成为具有图形和关系数据的机器学习的强大工具。在这里,我们全面概述了机器学习设置中的算法的使用,专注于监督的制度。我们讨论了理论背景,展示了如何将其用于监督的图形和节点表示学习,讨论最近的扩展,并概述算法的连接(置换 - )方面的神经结构。此外,我们概述了当前的应用和未来方向,以刺激进一步的研究。
translated by 谷歌翻译
近年来,图形神经网络(GNNS)被出现为一个强大的神经结构,以学习在监督的端到端时尚中的节点和图表的矢量表示。到目前为止,只有经验评估GNNS - 显示有希望的结果。以下工作从理论的角度调查了GNN,并将它们与1美元 - 二维韦斯美犬 - Leman Graph同构Heuristic(1美元-WL)相关联。我们表明GNNS在区分非同义(子)图表中,GNN具有与1美元-WL相同的表现力。因此,这两种算法也具有相同的缺点。基于此,我们提出了GNN的概括,所谓的$ k $ -dimensional gnns($ k $ -gnns),这可以考虑多个尺度的高阶图结构。这些高阶结构在社交网络和分子图的表征中起重要作用。我们的实验评估证实了我们的理论调查结果,并确认了更高阶信息在图形分类和回归的任务中有用。
translated by 谷歌翻译
Deploying graph neural networks (GNNs) on whole-graph classification or regression tasks is known to be challenging: it often requires computing node features that are mindful of both local interactions in their neighbourhood and the global context of the graph structure. GNN architectures that navigate this space need to avoid pathological behaviours, such as bottlenecks and oversquashing, while ideally having linear time and space complexity requirements. In this work, we propose an elegant approach based on propagating information over expander graphs. We leverage an efficient method for constructing expander graphs of a given size, and use this insight to propose the EGP model. We show that EGP is able to address all of the above concerns, while requiring minimal effort to set up, and provide evidence of its empirical utility on relevant graph classification datasets and baselines in the Open Graph Benchmark. Importantly, using expander graphs as a template for message passing necessarily gives rise to negative curvature. While this appears to be counterintuitive in light of recent related work on oversquashing, we theoretically demonstrate that negatively curved edges are likely to be required to obtain scalable message passing without bottlenecks. To the best of our knowledge, this is a previously unstudied result in the context of graph representation learning, and we believe our analysis paves the way to a novel class of scalable methods to counter oversquashing in GNNs.
translated by 谷歌翻译
在本文中,我们提供了一种使用图形神经网络(GNNS)的理论,用于多节点表示学习(我们有兴趣学习一组多个节点的表示)。我们知道GNN旨在学习单节点表示。当我们想学习涉及多个节点的节点集表示时,先前作品中的常见做法是直接将GNN学习的多节点表示与节点集的关节表示。在本文中,我们显示了这种方法的基本限制,即无法捕获节点集中节点之间的依赖性,并且认为直接聚合各个节点表示不会导致多个节点的有效关节表示。然后,我们注意到,以前的一些成功的工作作品用于多节点表示学习,包括密封,距离编码和ID-GNN,所有使用的节点标记。这些方法根据应用GNN之前的与目标节点集的关系,首先标记图中的节点。然后,在标记的图表中获得的节点表示被聚合到节点集表示中。通过调查其内部机制,我们将这些节点标记技术统一到单个和最基本的形式,即标记技巧。我们证明,通过标记技巧,可以获得足够富有表现力的GNN学习最具表现力的节点集表示,因此原则上可以解决节点集的任何联合学习任务。关于一个重要的双节点表示学习任务,链接预测,验证了我们理论的实验。我们的工作建立了使用GNN在节点集上使用GNN进行联合预测任务的理论基础。
translated by 谷歌翻译
随着图表和图表学习的开发,已经提出了许多优越的方法来处理图形结构学习的可扩展性和过度厚度问题。但是,大多数策略都是基于实践经验而不是理论分析而设计的。在本文中,我们使用连接到所有现有顶点的特定虚拟节点,而不会影响原始顶点和边缘属性。我们进一步证明,这种虚拟节点可以帮助构建有效的单态边缘到vertex变换,并呈现呈呈倒数,以恢复原始图。这也表明,添加虚拟节点可以保留本地和全局结构,以更好地图表表示。我们扩展了具有虚拟节点的图形内核和图形神经网络,并在图形分类和子图同构匹配任务上进行实验。经验结果表明,以虚拟节点为输入的图表显着增强了图形结构学习,并且使用其边缘到vertex图也可以实现相似的结果。我们还讨论了神经网络中假人的表达能力的增长。
translated by 谷歌翻译
消息传递神经网络(MPNNS)是由于其简单性和可扩展性而大部分地进行图形结构数据的深度学习的领先架构。不幸的是,有人认为这些架构的表现力有限。本文提出了一种名为Comifariant Subgraph聚合网络(ESAN)的新颖框架来解决这个问题。我们的主要观察是,虽然两个图可能无法通过MPNN可区分,但它们通常包含可区分的子图。因此,我们建议将每个图形作为由某些预定义策略导出的一组子图,并使用合适的等分性架构来处理它。我们为图同构同构同构造的1立维Weisfeiler-Leman(1-WL)测试的新型变体,并在这些新的WL变体方面证明了ESAN的表达性下限。我们进一步证明,我们的方法增加了MPNNS和更具表现力的架构的表现力。此外,我们提供了理论结果,描述了设计选择诸如子图选择政策和等效性神经结构的设计方式如何影响我们的架构的表现力。要处理增加的计算成本,我们提出了一种子图采样方案,可以将其视为我们框架的随机版本。关于真实和合成数据集的一套全面的实验表明,我们的框架提高了流行的GNN架构的表现力和整体性能。
translated by 谷歌翻译