The graph convolutional networks (GCN) recently proposed by Kipf and Welling are an effective graph model for semi-supervised learning. This model, however, was originally designed to be learned with the presence of both training and test data. Moreover, the recursive neighborhood expansion across layers poses time and memory challenges for training with large, dense graphs. To relax the requirement of simultaneous availability of test data, we interpret graph convolutions as integral transforms of embedding functions under probability measures. Such an interpretation allows for the use of Monte Carlo approaches to consistently estimate the integrals, which in turn leads to a batched training scheme as we propose in this work-FastGCN. Enhanced with importance sampling, FastGCN not only is efficient for training but also generalizes well for inference. We show a comprehensive set of experiments to demonstrate its effectiveness compared with GCN and related models. In particular, training is orders of magnitude more efficient while predictions remain comparably accurate.
translated by 谷歌翻译
Graph convolutional network (GCN) has been successfully applied to many graph-based applications; however, training a large-scale GCN remains challenging. Current SGD-based algorithms suffer from either a high computational cost that exponentially grows with number of GCN layers, or a large space requirement for keeping the entire graph and the embedding of each node in memory. In this paper, we propose Cluster-GCN, a novel GCN algorithm that is suitable for SGD-based training by exploiting the graph clustering structure. Cluster-GCN works as the following: at each step, it samples a block of nodes that associate with a dense subgraph identified by a graph clustering algorithm, and restricts the neighborhood search within this subgraph. This simple but effective strategy leads to significantly improved memory and computational efficiency while being able to achieve comparable test accuracy with previous algorithms. To test the scalability of our algorithm, we create a new Amazon2M data with 2 million nodes and 61 million edges which is more than 5 times larger than the previous largest publicly available dataset (Reddit). For training a 3-layer GCN on this data, Cluster-GCN is faster than the previous state-of-the-art VR-GCN (1523 seconds vs 1961 seconds) and using much less memory (2.2GB vs 11.2GB). Furthermore, for training 4 layer GCN on this data, our algorithm can finish in around 36 minutes while all the existing GCN training algorithms fail to train due to the out-of-memory issue. Furthermore, Cluster-GCN allows us to train much deeper GCN without much time and memory overhead, which leads to improved prediction accuracy-using a 5-layer Cluster-GCN, we achieve state-of-the-art test F1 score 99.36 on the PPI dataset, while the previous best result was 98.71 by [16]. Our codes are publicly available at https://github.com/google-research/google-research/ tree/master/cluster_gcn.
translated by 谷歌翻译
Graph Convolutional Networks (GCNs) are powerful models for learning representations of attributed graphs. To scale GCNs to large graphs, state-of-the-art methods use various layer sampling techniques to alleviate the "neighbor explosion" problem during minibatch training. We propose GraphSAINT, a graph sampling based inductive learning method that improves training efficiency and accuracy in a fundamentally different way. By changing perspective, GraphSAINT constructs minibatches by sampling the training graph, rather than the nodes or edges across GCN layers. Each iteration, a complete GCN is built from the properly sampled subgraph. Thus, we ensure fixed number of well-connected nodes in all layers. We further propose normalization technique to eliminate bias, and sampling algorithms for variance reduction. Importantly, we can decouple the sampling from the forward and backward propagation, and extend GraphSAINT with many architecture variants (e.g., graph attention, jumping connection). GraphSAINT demonstrates superior performance in both accuracy and training time on five large graphs, and achieves new state-of-the-art F1 scores for PPI (0.995) and Reddit (0.970).
translated by 谷歌翻译
图形卷积网络(GCN)已被证明是一个有力的概念,在过去几年中,已成功应用于许多领域的各种任务。在这项工作中,我们研究了为GCN定义铺平道路的理论,包括经典图理论的相关部分。我们还讨论并在实验上证明了GCN的关键特性和局限性,例如由样品的统计依赖性引起的,该图由图的边缘引入,这会导致完整梯度的估计值偏置。我们讨论的另一个限制是Minibatch采样对模型性能的负面影响。结果,在参数更新期间,在整个数据集上计算梯度,从而破坏了对大图的可扩展性。为了解决这个问题,我们研究了替代方法,这些方法允许在每次迭代中仅采样一部分数据,可以安全地学习良好的参数。我们重现了KIPF等人的工作中报告的结果。并提出一个灵感签名的实现,这是一种无抽样的minibatch方法。最终,我们比较了基准数据集上的两个实现,证明它们在半监督节点分类任务的预测准确性方面是可比的。
translated by 谷歌翻译
Low-dimensional embeddings of nodes in large graphs have proved extremely useful in a variety of prediction tasks, from content recommendation to identifying protein functions. However, most existing approaches require that all nodes in the graph are present during training of the embeddings; these previous approaches are inherently transductive and do not naturally generalize to unseen nodes. Here we present GraphSAGE, a general inductive framework that leverages node feature information (e.g., text attributes) to efficiently generate node embeddings for previously unseen data. Instead of training individual embeddings for each node, we learn a function that generates embeddings by sampling and aggregating features from a node's local neighborhood. Our algorithm outperforms strong baselines on three inductive node-classification benchmarks: we classify the category of unseen nodes in evolving information graphs based on citation and Reddit post data, and we show that our algorithm generalizes to completely unseen graphs using a multi-graph dataset of protein-protein interactions. * The two first authors made equal contributions. 1 While it is common to refer to these data structures as social or biological networks, we use the term graph to avoid ambiguity with neural network terminology.
translated by 谷歌翻译
图表表示学习是一种快速增长的领域,其中一个主要目标是在低维空间中产生有意义的图形表示。已经成功地应用了学习的嵌入式来执行各种预测任务,例如链路预测,节点分类,群集和可视化。图表社区的集体努力提供了数百种方法,但在所有评估指标下没有单一方法擅长,例如预测准确性,运行时间,可扩展性等。该调查旨在通过考虑算法来评估嵌入方法的所有主要类别的图表变体,参数选择,可伸缩性,硬件和软件平台,下游ML任务和多样化数据集。我们使用包含手动特征工程,矩阵分解,浅神经网络和深图卷积网络的分类法组织了图形嵌入技术。我们使用广泛使用的基准图表评估了节点分类,链路预测,群集和可视化任务的这些类别算法。我们在Pytorch几何和DGL库上设计了我们的实验,并在不同的多核CPU和GPU平台上运行实验。我们严格地审查了各种性能指标下嵌入方法的性能,并总结了结果。因此,本文可以作为比较指南,以帮助用户选择最适合其任务的方法。
translated by 谷歌翻译
图表卷积网络(GCNS)在各种半监督节点分类任务中取得了令人印象深刻的实证进步。尽管取得了巨大的成功,但在大型图形上培训GCNS遭受了计算和内存问题。规避这些障碍的潜在路径是基于采样的方法,其中在每个层处采样节点的子集。虽然最近的研究已经证明了基于采样的方法的有效性,但这些作品缺乏在现实环境下的理论融合担保,并且不能完全利用优化期间演出参数的信息。在本文中,我们描述并分析了一般的双差异减少模式,可以在内存预算下加速任何采样方法。所提出的模式的激励推动是仔细分析采样方法的差异,其中示出了诱导方差可以在前进传播期间分解为节点嵌入近似方差(Zeroth阶差异)(第一 - 顺序变化)在后向传播期间。理论上,从理论上分析所提出的架构的融合,并显示它享有$ \ Mathcal {O}(1 / T)$收敛率。我们通过将建议的模式集成在不同的采样方法中并将其应用于不同的大型实际图形来补充我们的理论结果。
translated by 谷歌翻译
Deep learning has revolutionized many machine learning tasks in recent years, ranging from image classification and video processing to speech recognition and natural language understanding. The data in these tasks are typically represented in the Euclidean space. However, there is an increasing number of applications where data are generated from non-Euclidean domains and are represented as graphs with complex relationships and interdependency between objects. The complexity of graph data has imposed significant challenges on existing machine learning algorithms. Recently, many studies on extending deep learning approaches for graph data have emerged. In this survey, we provide a comprehensive overview of graph neural networks (GNNs) in data mining and machine learning fields. We propose a new taxonomy to divide the state-of-the-art graph neural networks into four categories, namely recurrent graph neural networks, convolutional graph neural networks, graph autoencoders, and spatial-temporal graph neural networks. We further discuss the applications of graph neural networks across various domains and summarize the open source codes, benchmark data sets, and model evaluation of graph neural networks. Finally, we propose potential research directions in this rapidly growing field.
translated by 谷歌翻译
Pre-publication draft of a book to be published byMorgan & Claypool publishers. Unedited version released with permission. All relevant copyrights held by the author and publisher extend to this pre-publication draft.
translated by 谷歌翻译
大规模淋巴结分类的图形神经网络(GNNS)培训具有挑战性。关键困难在于在避免邻居爆炸问题的同时获得准确的隐藏节点表示。在这里,我们提出了一种新技术,称为特征动量(FM),该技术在更新功能表示时使用动量步骤来合并历史嵌入。我们开发了两种特定的算法,称为GraphFM-IB和GraphFM-OB,它们分别考虑了内部和隔离外数据。 GraphFM-AIB将FM应用于内部采样数据,而GraphFM-OB则将FM应用于隔离数据的隔离数据,而口气数据是1跳入数据的1个邻域。对于特征嵌入的估计误差,我们为GraphFM-IB和GraphFM-OB的理论见解提供了严格的合并分析。从经验上讲,我们观察到GraphFM-IB可以有效缓解现有方法的邻里爆炸问题。此外,GraphFM-OB在多个大型图形数据集上实现了有希望的性能。
translated by 谷歌翻译
即使机器学习算法已经在数据科学中发挥了重要作用,但许多当前方法对输入数据提出了不现实的假设。由于不兼容的数据格式,或数据集中的异质,分层或完全缺少的数据片段,因此很难应用此类方法。作为解决方案,我们提出了一个用于样本表示,模型定义和培训的多功能,统一的框架,称为“ Hmill”。我们深入审查框架构建和扩展的机器学习的多个范围范式。从理论上讲,为HMILL的关键组件的设计合理,我们将通用近似定理的扩展显示到框架中实现的模型所实现的所有功能的集合。本文还包含有关我们实施中技术和绩效改进的详细讨论,该讨论将在MIT许可下发布供下载。该框架的主要资产是其灵活性,它可以通过相同的工具对不同的现实世界数据源进行建模。除了单独观察到每个对象的一组属性的标准设置外,我们解释了如何在框架中实现表示整个对象系统的图表中的消息推断。为了支持我们的主张,我们使用框架解决了网络安全域的三个不同问题。第一种用例涉及来自原始网络观察结果的IoT设备识别。在第二个问题中,我们研究了如何使用以有向图表示的操作系统的快照可以对恶意二进制文件进行分类。最后提供的示例是通过网络中实体之间建模域黑名单扩展的任务。在所有三个问题中,基于建议的框架的解决方案可实现与专业方法相当的性能。
translated by 谷歌翻译
数据处理的最新进展刺激了对非常大尺度的学习图的需求。众所周知,图形神经网络(GNN)是解决图形学习任务的一种新兴和有力的方法,很难扩大规模。大多数可扩展模型应用基于节点的技术来简化GNN的昂贵图形消息传播过程。但是,我们发现当应用于百万甚至数十亿尺度的图表时,这种加速度不足。在这项工作中,我们提出了Scara,这是一种可扩展的GNN,具有针对图形计算的特征优化。 Scara有效地计算出从节点功能中嵌入的图形,并进一步选择和重用功能计算结果以减少开销。理论分析表明,我们的模型在传播过程以及GNN培训和推理中具有确保精度,实现了子线性时间的复杂性。我们在各种数据集上进行了广泛的实验,以评估圣aca的功效和效率。与基线的性能比较表明,与快速收敛和可比精度相比,与当前的最新方法相比,圣aca最高可达到100倍的图形传播加速度。最值得注意的是,在100秒内处理最大的十亿个GNN数据集纸100m(1.11亿节点,1.6B边缘)上的预先计算是有效的。
translated by 谷歌翻译
消息传递神经网络(MPNNS)是由于其简单性和可扩展性而大部分地进行图形结构数据的深度学习的领先架构。不幸的是,有人认为这些架构的表现力有限。本文提出了一种名为Comifariant Subgraph聚合网络(ESAN)的新颖框架来解决这个问题。我们的主要观察是,虽然两个图可能无法通过MPNN可区分,但它们通常包含可区分的子图。因此,我们建议将每个图形作为由某些预定义策略导出的一组子图,并使用合适的等分性架构来处理它。我们为图同构同构同构造的1立维Weisfeiler-Leman(1-WL)测试的新型变体,并在这些新的WL变体方面证明了ESAN的表达性下限。我们进一步证明,我们的方法增加了MPNNS和更具表现力的架构的表现力。此外,我们提供了理论结果,描述了设计选择诸如子图选择政策和等效性神经结构的设计方式如何影响我们的架构的表现力。要处理增加的计算成本,我们提出了一种子图采样方案,可以将其视为我们框架的随机版本。关于真实和合成数据集的一套全面的实验表明,我们的框架提高了流行的GNN架构的表现力和整体性能。
translated by 谷歌翻译
The goal of graph summarization is to represent large graphs in a structured and compact way. A graph summary based on equivalence classes preserves pre-defined features of a graph's vertex within a $k$-hop neighborhood such as the vertex labels and edge labels. Based on these neighborhood characteristics, the vertex is assigned to an equivalence class. The calculation of the assigned equivalence class must be a permutation invariant operation on the pre-defined features. This is achieved by sorting on the feature values, e. g., the edge labels, which is computationally expensive, and subsequently hashing the result. Graph Neural Networks (GNN) fulfill the permutation invariance requirement. We formulate the problem of graph summarization as a subgraph classification task on the root vertex of the $k$-hop neighborhood. We adapt different GNN architectures, both based on the popular message-passing protocol and alternative approaches, to perform the structural graph summarization task. We compare different GNNs with a standard multi-layer perceptron (MLP) and Bloom filter as non-neural method. For our experiments, we consider four popular graph summary models on a large web graph. This resembles challenging multi-class vertex classification tasks with the numbers of classes ranging from $576$ to multiple hundreds of thousands. Our results show that the performance of GNNs are close to each other. In three out of four experiments, the non-message-passing GraphMLP model outperforms the other GNNs. The performance of the standard MLP is extraordinary good, especially in the presence of many classes. Finally, the Bloom filter outperforms all neural architectures by a large margin, except for the dataset with the fewest number of $576$ classes.
translated by 谷歌翻译
Graph Convolutional Networks (GCNs) and their variants have experienced significant attention and have become the de facto methods for learning graph representations. GCNs derive inspiration primarily from recent deep learning approaches, and as a result, may inherit unnecessary complexity and redundant computation. In this paper, we reduce this excess complexity through successively removing nonlinearities and collapsing weight matrices between consecutive layers. We theoretically analyze the resulting linear model and show that it corresponds to a fixed low-pass filter followed by a linear classifier. Notably, our experimental evaluation demonstrates that these simplifications do not negatively impact accuracy in many downstream applications. Moreover, the resulting model scales to larger datasets, is naturally interpretable, and yields up to two orders of magnitude speedup over FastGCN.
translated by 谷歌翻译
我们研究大规模网络嵌入问题,旨在学习网络挖掘应用的低维潜在表示。网络嵌入领域的最新研究导致了大型进展,如深散,线,NetMF,NetSMF。然而,许多真实网络的巨大尺寸使得从整个网络学习网络嵌入的网络昂贵。在这项工作中,我们提出了一种新的网络嵌入方法,称为“NES”,其学习来自小型代表性子图的网络嵌入。 NES利用图表采样的理论,以有效地构建具有较小尺寸的代表性子图,该子图尺寸可用于对完整网络进行推断,使得能够显着提高嵌入学习的效率。然后,NES有效地计算从该代表子图嵌入的网络。与众所周知的方法相比,对各种规模和类型网络的广泛实验表明NES实现了可比性和显着的效率优势。
translated by 谷歌翻译
Using graph neural networks for large graphs is challenging since there is no clear way of constructing mini-batches. To solve this, previous methods have relied on sampling or graph clustering. While these approaches often lead to good training convergence, they introduce significant overhead due to expensive random data accesses and perform poorly during inference. In this work we instead focus on model behavior during inference. We theoretically model batch construction via maximizing the influence score of nodes on the outputs. This formulation leads to optimal approximation of the output when we do not have knowledge of the trained model. We call the resulting method influence-based mini-batching (IBMB). IBMB accelerates inference by up to 130x compared to previous methods that reach similar accuracy. Remarkably, with adaptive optimization and the right training schedule IBMB can also substantially accelerate training, thanks to precomputed batches and consecutive memory accesses. This results in up to 18x faster training per epoch and up to 17x faster convergence per runtime compared to previous methods.
translated by 谷歌翻译
尽管神经网络取得了巨大的经验成功,但对培训程序的理论理解仍然有限,尤其是在为优化问题的非凸性性质而提供测试性能的性能保证时。当前的论文通过简化了凸结构的另一个问题来研究神经网络培训的另一种方法 - 解决单调变异不平等(MVI) - 灵感来自最近的工作(Juditsky&Nemirovsky,2019年)。可以通过计算有效的过程找到对MVI的解决方案,重要的是,这会导致$ \ ell_2 $和$ \ ell _ {\ elfty} $在模型恢复和预测准确性下的性能保证层线性神经网络。此外,我们研究了MVI在训练多层神经网络中的使用,并提出了一种称为\ textit {随机变异不平等}(SVI)的实用算法,并证明了其在训练完全连接的神经网络和图形神经网络(GNN)中的适用性(GNN )(SVI是完全一般的,可用于训练其他类型的神经网络)。与广泛使用的随机梯度下降方法相比,我们证明了SVI的竞争性或更好的性能,涉及各种性能指标的合成和真实网络数据预测任务,尤其是在培训早期阶段提高效率方面。
translated by 谷歌翻译
Graph classification is an important area in both modern research and industry. Multiple applications, especially in chemistry and novel drug discovery, encourage rapid development of machine learning models in this area. To keep up with the pace of new research, proper experimental design, fair evaluation, and independent benchmarks are essential. Design of strong baselines is an indispensable element of such works. In this thesis, we explore multiple approaches to graph classification. We focus on Graph Neural Networks (GNNs), which emerged as a de facto standard deep learning technique for graph representation learning. Classical approaches, such as graph descriptors and molecular fingerprints, are also addressed. We design fair evaluation experimental protocol and choose proper datasets collection. This allows us to perform numerous experiments and rigorously analyze modern approaches. We arrive to many conclusions, which shed new light on performance and quality of novel algorithms. We investigate application of Jumping Knowledge GNN architecture to graph classification, which proves to be an efficient tool for improving base graph neural network architectures. Multiple improvements to baseline models are also proposed and experimentally verified, which constitutes an important contribution to the field of fair model comparison.
translated by 谷歌翻译
现实世界中的大规模图形数据通常是动态而不是静态。数据随着时间的推移而出现的新节点,边缘,甚至是类,例如引用网络和研发协作网络。图形神经网络(GNNS)已成为众多关于图形结构数据的任务的标准方法。在这项工作中,我们采用了两步程序来探索GNN如何递增地适应新的未完成图形数据。首先,我们分析标准基准数据集的转换和归纳学习之间的边缘。在归纳预测后,我们将未标记的数据添加到图表中并显示模型稳定。然后,我们探索不断添加越来越多的标记数据的情况,同时考虑案例,在任何情况下都没有使用类标签注释。此外,我们在图表演变时介绍了新的类,并探索了自动检测来自先前看不见的类学的方法。为了以原则的方式处理不断发展的图形,我们提出了一个终身学习框架,用于图表数据以及评估协议。在本框架中,我们评估代表性的GNN架构。我们观察到模型参数内的隐式知识在显式知识时变得更加重要,即来自过去任务的数据,是有限的。我们发现,在开放世界节点分类中,令人惊讶地少数过去任务的数据足以达到通过从所有过去任务中记住数据达到的性能。在看不见的类检测的具有挑战性任务中,我们发现使用加权交叉熵损失对于稳定性很重要。
translated by 谷歌翻译