Graph Neural Networks (GNNs) have been taking role in many areas, thanks to their expressive power on graph-structured data. On the other hand, Mobile Ad-Hoc Networks (MANETs) are gaining attention as network technologies have been taken to the 5G level. However, there is no study that evaluates the efficiency of GNNs on MANETs. In this study, we aim to fill this absence by implementing a MANET dataset in a popular GNN framework, i.e., PyTorch Geometric; and show how GNNs can be utilized to analyze the traffic of MANETs. We operate an edge prediction task on the dataset with GraphSAGE (SAG) model, where SAG model tries to predict whether there is a link between two nodes. We construe several evaluation metrics to measure the performance and efficiency of GNNs on MANETs. SAG model showed 82.1 accuracy on average in the experiments.
translated by 谷歌翻译
As the interest to Graph Neural Networks (GNNs) is growing, the importance of benchmarking and performance characterization studies of GNNs is increasing. So far, we have seen many studies that investigate and present the performance and computational efficiency of GNNs. However, the work done so far has been carried out using a few high-level GNN frameworks. Although these frameworks provide ease of use, they contain too many dependencies to other existing libraries. The layers of implementation details and the dependencies complicate the performance analysis of GNN models that are built on top of these frameworks, especially while using architectural simulators. Furthermore, different approaches on GNN computation are generally overlooked in prior characterization studies, and merely one of the common computational models is evaluated. Based on these shortcomings and needs that we observed, we developed a benchmark suite that is framework independent, supporting versatile computational models, easily configurable and can be used with architectural simulators without additional effort. Our benchmark suite, which we call gSuite, makes use of only hardware vendor's libraries and therefore it is independent of any other frameworks. gSuite enables performing detailed performance characterization studies on GNN Inference using both contemporary GPU profilers and architectural GPU simulators. To illustrate the benefits of our new benchmark suite, we perform a detailed characterization study with a set of well-known GNN models with various datasets; running gSuite both on a real GPU card and a timing-detailed GPU simulator. We also implicate the effect of computational models on performance. We use several evaluation metrics to rigorously measure the performance of GNN computation.
translated by 谷歌翻译
图表表示学习是一种快速增长的领域,其中一个主要目标是在低维空间中产生有意义的图形表示。已经成功地应用了学习的嵌入式来执行各种预测任务,例如链路预测,节点分类,群集和可视化。图表社区的集体努力提供了数百种方法,但在所有评估指标下没有单一方法擅长,例如预测准确性,运行时间,可扩展性等。该调查旨在通过考虑算法来评估嵌入方法的所有主要类别的图表变体,参数选择,可伸缩性,硬件和软件平台,下游ML任务和多样化数据集。我们使用包含手动特征工程,矩阵分解,浅神经网络和深图卷积网络的分类法组织了图形嵌入技术。我们使用广泛使用的基准图表评估了节点分类,链路预测,群集和可视化任务的这些类别算法。我们在Pytorch几何和DGL库上设计了我们的实验,并在不同的多核CPU和GPU平台上运行实验。我们严格地审查了各种性能指标下嵌入方法的性能,并总结了结果。因此,本文可以作为比较指南,以帮助用户选择最适合其任务的方法。
translated by 谷歌翻译
开发用于训练图形的可扩展解决方案,用于链路预测任务的Neural网络(GNNS)由于具有高计算成本和巨大内存占用的高数据依赖性,因此由于高数据依赖性而具有挑战性。我们提出了一种新的方法,用于缩放知识图形嵌入模型的培训,以满足这些挑战。为此,我们提出了以下算法策略:自给自足的分区,基于约束的负采样和边缘迷你批量培训。两者都是分区策略和基于约束的负面采样,避免在训练期间交叉分区数据传输。在我们的实验评估中,我们表明,我们基于GNN的知识图形嵌入模型的缩放解决方案在基准数据集中实现了16倍的加速,同时将可比的模型性能作为标准度量的非分布式方法。
translated by 谷歌翻译
图形神经网络(GNN)在处理图形结构数据的问题上表现出巨大的希望。 GNNS的独特点之一是它们的灵活性适应多个问题,这不仅导致广泛的适用性,而且在为特定问题找到最佳模型或加速技术时会带来重要的挑战。此类挑战的一个例子在于一个事实,即GNN模型或加速技术的准确性或有效性通常取决于基础图的结构。在本文中,为了解决图形依赖性加速的问题,我们提出了预后,这是一个数据驱动的模型,可以通过检查输入图来预测给定GNN模型在任意特征图上运行的GNN训练时间指标。这样的预测是基于先前使用多样化的合成图数据集经过离线训练的回归做出的。在实践中,我们的方法允许做出明智的决定,以用于特定问题的设计。在本文中,为特定用例定义并应用了构建预后的方法,其中有助于确定哪种图表更好。我们的结果表明,预后有助于在多种广泛使用的GNN模型(例如GCN,GIN,GAT或GRAPHSAGE)中随机选择图表的平均速度为1.22倍。
translated by 谷歌翻译
图形神经网络(GNN)已被广泛应用于各种领域,以通过图形结构数据学习。在各种任务(例如节点分类和图形分类)中,他们对传统启发式方法显示了显着改进。但是,由于GNN严重依赖于平滑的节点特征而不是图形结构,因此在链接预测中,它们通常比简单的启发式方法表现出差的性能,例如,结构信息(例如,重叠的社区,学位和最短路径)至关重要。为了解决这一限制,我们建议邻里重叠感知的图形神经网络(NEO-GNNS),这些神经网络(NEO-GNNS)从邻接矩阵中学习有用的结构特征,并估算了重叠的邻域以进行链接预测。我们的Neo-Gnns概括了基于社区重叠的启发式方法,并处理重叠的多跳社区。我们在开放图基准数据集(OGB)上进行的广泛实验表明,NEO-GNNS始终在链接预测中实现最新性能。我们的代码可在https://github.com/seongjunyun/neo_gnns上公开获取。
translated by 谷歌翻译
Graph neural networks (GNNs) have been demonstrated to be a powerful algorithmic model in broad application fields for their effectiveness in learning over graphs. To scale GNN training up for large-scale and ever-growing graphs, the most promising solution is distributed training which distributes the workload of training across multiple computing nodes. However, the workflows, computational patterns, communication patterns, and optimization techniques of distributed GNN training remain preliminarily understood. In this paper, we provide a comprehensive survey of distributed GNN training by investigating various optimization techniques used in distributed GNN training. First, distributed GNN training is classified into several categories according to their workflows. In addition, their computational patterns and communication patterns, as well as the optimization techniques proposed by recent work are introduced. Second, the software frameworks and hardware platforms of distributed GNN training are also introduced for a deeper understanding. Third, distributed GNN training is compared with distributed training of deep neural networks, emphasizing the uniqueness of distributed GNN training. Finally, interesting issues and opportunities in this field are discussed.
translated by 谷歌翻译
尽管图形神经网络(GNNS)领域的进步,但目前仅使用少量数据集来评估新模型。这种持续依赖少数数据集提供了对模型之间的性能差异的最小见解,对于可能具有与用作学术基准的数据集有很大不同的工业从业人员而言,尤其具有挑战性。在Google在GNN基础架构和开源软件方面的工作过程中,我们试图开发改进的基准,这些基准可健壮,可调,可扩展且可推广。在这项工作中,我们介绍了GraphWorld,这是一种新的方法和系统,用于对任何可疑的GNN任务进行任意大量的合成图种群进行基准测试GNN模型。 GraphWorld允许用户有效地生成具有数百万个统计上不同数据集的世界。它可访问,可扩展且易于使用。 GraphWorld可以在没有专门硬件的情况下在一台计算机上运行,​​也可以轻松地扩展到在任意群集或云框架上运行。使用GraphWorld,用户对Graph Generator参数具有细粒度的控制,并且可以使用内置的超参数调整基准测试任意GNN模型。我们从GraphWorld实验中介绍了有关数以百亿个基准数据集中数以万计的GNN模型的性能特征的见解。我们进一步表明,GraphWorld有效地探索了标准基准测试的基准数据集空间区域,从而揭示了在历史上无法获得的模型之间的比较。使用GraphWorld,我们还能够研究图形属性与任务性能指标之间的关系,这对于经典的现实基准集合而言,这几乎是不可能的。
translated by 谷歌翻译
基于图形神经网络(GNN)方法最近已成为处理图数据的流行工具,因为它们能够合并结构信息。GNNS性能的唯一障碍是缺乏标记数据。图像和文本数据的数据增强技术无法用于图形数据,因为图形数据的复杂和非欧几里得结构。这一差距迫使研究人员将注意力转向开发图形数据的数据增强技术。大多数提出的图形数据增强(GDA)技术都是特定于任务的。在本文中,我们根据不同的图形任务调查了现有的GDA技术。这项调查不仅提供了GDA研究界的参考,而且还向其他领域的研究人员提供了必要的信息。
translated by 谷歌翻译
随着实际图表的扩大,将部署具有数十亿个参数的较大GNN模型。此类模型中的高参数计数使图表的训练和推断昂贵且具有挑战性。为了降低GNN的计算和记忆成本,通常采用了输入图中的冗余节点和边缘等优化方法。但是,直接针对模型层稀疏的模型压缩,主要限于用于图像分类和对象检测等任务的传统深神网络(DNN)。在本文中,我们利用两种最先进的模型压缩方法(1)训练和修剪以及(2)稀疏训练GNN中的重量层。我们评估并比较了两种方法的效率,从精确性,训练稀疏性和现实世界图上的训练拖失lop方面。我们的实验结果表明,在IA-Email,Wiki-Talk和Stackoverflow数据集上,用于链接预测,稀疏训练和较低的训练拖失板可以使用火车和修剪方法达到可比的精度。在用于节点分类的大脑数据集上,稀疏训练使用较低的数字插槽(小于1/7的火车和修剪方法),并在极端模型的稀疏性下保留了更好的精度性能。
translated by 谷歌翻译
图形数据库(GDB)启用对非结构化,复杂,丰富且通常庞大的图形数据集的处理和分析。尽管GDB在学术界和行业中都具有很大的意义,但几乎没有努力将它们与图形神经网络(GNNS)的预测能力融为一体。在这项工作中,我们展示了如何无缝将几乎所有GNN模型与GDB的计算功能相结合。为此,我们观察到这些系统大多数是基于或支持的,称为标记的属性图(LPG)的图形数据模型,在该模型中,顶点和边缘可以任意复杂的标签和属性集。然后,我们开发LPG2VEC,这是一种编码器,将任意LPG数据集转换为可以与广泛的GNN类直接使用的表示形式,包括卷积,注意力,消息通话,甚至高阶或频谱模型。在我们的评估中,我们表明,LPG2VEC可以正确保留代表LPG标签和属性的丰富信息,并且与与图形相比,与与图形相比,它提高了预测的准确性,而不管有针对性的学习任务或使用过的GNN模型,多达34%没有LPG标签/属性。通常,LPG2VEC可以将最强大的GNN的预测能力与LPG模型中编码的全部信息范围相结合,为神经图数据库铺平了道路,这是一类系统,其中维护的数据的绝大复杂性将从现代和未来中受益图机学习方法。
translated by 谷歌翻译
图形神经网络(GNN)已被证明是分析非欧国人图数据的强大工具。但是,缺乏有效的分布图学习(GL)系统极大地阻碍了GNN的应用,尤其是当图形大且GNN相对深时。本文中,我们提出了GraphTheta,这是一种以顶点为中心的图形编程模型实现的新颖分布式和可扩展的GL系统。 GraphTheta是第一个基于分布式图处理的GL系统,其神经网络运算符以用户定义的功能实现。该系统支持多种培训策略,并在分布式(虚拟)机器上启用高度可扩展的大图学习。为了促进图形卷积实现,GraphTheta提出了一个名为NN-Tgar的新的GL抽象,以弥合图形处理和图形深度学习之间的差距。提出了分布式图引擎,以通过混合平行执行进行随机梯度下降优化。此外,除了全球批次和迷你批次外,我们还为新的集群批次培训策略提供了支持。我们使用许多网络大小的数据集评估GraphTheta,范围从小,适度到大规模。实验结果表明,GraphTheta可以很好地扩展到1,024名工人,用于培训内部开发的GNN,该工业尺度的Aripay数据集为14亿个节点和41亿个属性边缘,并带有CPU虚拟机(Dockers)群的小群。 (5 $ \ sim $ 12GB)。此外,GraphTheta比最先进的GNN实现获得了可比或更好的预测结果,证明其学习GNN和现有框架的能力,并且可以超过多达$ 2.02 \ tims $ $ 2.02 \ times $,具有更好的可扩展性。据我们所知,这项工作介绍了文献中最大的边缘属性GNN学习任务。
translated by 谷歌翻译
图表表示学习已经成为许多情景中的无处不在的组成部分,从社会网络分析到智能电网的能量预测。在几个应用程序中,确保关于某些受保护属性的节点(或图形)表示的公平对其正确部署至关重要。然而,图表深度学习的公平仍然在探索,很少有解决方案。特别地,在若干真实世界图(即同声源性)上相似节点对簇的趋势可以显着恶化这些程序的公平性。在本文中,我们提出了一种新颖的偏见边缘辍学算法(Fairdrop)来反击精神剧并改善图形表示学习中的公平性。 Fairdrop可以在许多现有算法上轻松插入,具有高效,适应性,并且可以与其他公平诱导的解决方案结合。在描述了一般算法之后,我们在两个基准任务中展示其应用,具体地,作为用于生产节点嵌入的随机步道模型,以及用于链路预测的图形卷积网络。我们证明,所提出的算法可以成功地改善所有型号的公平,直到精度小或可忽略的降低,并与现有的最先进的解决方案相比。在一个消融研究中,我们证明我们的算法可以灵活地在偏置公平性和无偏见的边缘辍学之间插入。此外,为了更好地评估增益,我们提出了一种新的二元组定义,以测量与基于组的公平度量配对时的链路预测任务的偏差。特别是,我们扩展了用于测量节点嵌入的偏差的指标,以考虑图形结构。
translated by 谷歌翻译
数据增强已广泛用于图像数据和语言数据,但仍然探索图形神经网络(GNN)。现有方法专注于从全局视角增强图表数据,并大大属于两个类型:具有特征噪声注入的结构操纵和对抗训练。但是,最近的图表数据增强方法忽略了GNNS“消息传递机制的本地信息的重要性。在这项工作中,我们介绍了本地增强,这通过其子图结构增强了节点表示的局部。具体而言,我们将数据增强模拟为特征生成过程。鉴于节点的功能,我们的本地增强方法了解其邻居功能的条件分布,并生成更多邻居功能,以提高下游任务的性能。基于本地增强,我们进一步设计了一个新颖的框架:La-GNN,可以以即插即用的方式应用于任何GNN模型。广泛的实验和分析表明,局部增强一致地对各种基准的各种GNN架构始终如一地产生性能改进。
translated by 谷歌翻译
链接预测是一项重要的任务,在各个域中具有广泛的应用程序。但是,大多数现有的链接预测方法都假定给定的图遵循同质的假设,并设计基于相似性的启发式方法或表示学习方法来预测链接。但是,许多现实世界图是异性图,同义假设不存在,这挑战了现有的链接预测方法。通常,在异性图中,有许多引起链接形成的潜在因素,并且两个链接的节点在一个或两个因素中往往相似,但在其他因素中可能是不同的,导致总体相似性较低。因此,一种方法是学习每个节点的分离表示形式,每个矢量捕获一个因子上的节点的潜在表示,这铺平了一种方法来模拟异性图中的链接形成,从而导致更好的节点表示学习和链接预测性能。但是,对此的工作非常有限。因此,在本文中,我们研究了一个新的问题,该问题是在异性图上进行链接预测的分离表示学习。我们提出了一种新颖的框架分解,可以通过建模链接形成并执行感知因素的消息来学习以促进链接预测来学习解开的表示形式。在13个现实世界数据集上进行的广泛实验证明了Disenlink对异性恋和血友病图的链接预测的有效性。我们的代码可从https://github.com/sjz5202/disenlink获得
translated by 谷歌翻译
最近提出了基于子图的图表学习(SGRL)来应对规范图神经网络(GNNS)遇到的一些基本挑战,并在许多重要的数据科学应用(例如链接,关系和主题预测)中证明了优势。但是,当前的SGRL方法遇到了可伸缩性问题,因为它们需要为每个培训或测试查询提取子图。扩大规范GNN的最新解决方案可能不适用于SGRL。在这里,我们通过共同设计学习算法及其系统支持,为可扩展的SGRL提出了一种新颖的框架Surel。 Surel采用基于步行的子图表分解,并将步行重新形成子图,从而大大降低了子图提取的冗余并支持并行计算。具有数百万个节点和边缘的六个同质,异质和高阶图的实验证明了Surel的有效性和可扩展性。特别是,与SGRL基线相比,Surel可以实现10 $ \ times $ Quad-Up,具有可比甚至更好的预测性能;与规范GNN相比,Surel可实现50%的预测准确性。
translated by 谷歌翻译
最近,图形神经网络(GNN)通过利用图形结构和节点特征的知识来表现出图表表示的显着性能。但是,他们中的大多数都有两个主要限制。首先,GNN可以通过堆叠更多的层来学习高阶结构信息,但由于过度光滑的问题,无法处理较大的深度。其次,由于昂贵的计算成本和高内存使用情况,在大图上应用这些方法并不容易。在本文中,我们提出了节点自适应特征平滑(NAFS),这是一种简单的非参数方法,该方法构建了没有参数学习的节点表示。 NAFS首先通过特征平滑提取每个节点及其不同啤酒花的邻居的特征,然后自适应地结合了平滑的特征。此外,通过不同的平滑策略提取的平滑特征的合奏可以进一步增强构建的节点表示形式。我们在两个不同的应用程序方案上对四个基准数据集进行实验:节点群集和链接预测。值得注意的是,具有功能合奏的NAFS优于这些任务上最先进的GNN,并减轻上述大多数基于学习的GNN对应物的两个限制。
translated by 谷歌翻译
The goal of graph summarization is to represent large graphs in a structured and compact way. A graph summary based on equivalence classes preserves pre-defined features of a graph's vertex within a $k$-hop neighborhood such as the vertex labels and edge labels. Based on these neighborhood characteristics, the vertex is assigned to an equivalence class. The calculation of the assigned equivalence class must be a permutation invariant operation on the pre-defined features. This is achieved by sorting on the feature values, e. g., the edge labels, which is computationally expensive, and subsequently hashing the result. Graph Neural Networks (GNN) fulfill the permutation invariance requirement. We formulate the problem of graph summarization as a subgraph classification task on the root vertex of the $k$-hop neighborhood. We adapt different GNN architectures, both based on the popular message-passing protocol and alternative approaches, to perform the structural graph summarization task. We compare different GNNs with a standard multi-layer perceptron (MLP) and Bloom filter as non-neural method. For our experiments, we consider four popular graph summary models on a large web graph. This resembles challenging multi-class vertex classification tasks with the numbers of classes ranging from $576$ to multiple hundreds of thousands. Our results show that the performance of GNNs are close to each other. In three out of four experiments, the non-message-passing GraphMLP model outperforms the other GNNs. The performance of the standard MLP is extraordinary good, especially in the presence of many classes. Finally, the Bloom filter outperforms all neural architectures by a large margin, except for the dataset with the fewest number of $576$ classes.
translated by 谷歌翻译
Learning fair graph representations for downstream applications is becoming increasingly important, but existing work has mostly focused on improving fairness at the global level by either modifying the graph structure or objective function without taking into account the local neighborhood of a node. In this work, we formally introduce the notion of neighborhood fairness and develop a computational framework for learning such locally fair embeddings. We argue that the notion of neighborhood fairness is more appropriate since GNN-based models operate at the local neighborhood level of a node. Our neighborhood fairness framework has two main components that are flexible for learning fair graph representations from arbitrary data: the first aims to construct fair neighborhoods for any arbitrary node in a graph and the second enables adaption of these fair neighborhoods to better capture certain application or data-dependent constraints, such as allowing neighborhoods to be more biased towards certain attributes or neighbors in the graph.Furthermore, while link prediction has been extensively studied, we are the first to investigate the graph representation learning task of fair link classification. We demonstrate the effectiveness of the proposed neighborhood fairness framework for a variety of graph machine learning tasks including fair link prediction, link classification, and learning fair graph embeddings. Notably, our approach achieves not only better fairness but also increases the accuracy in the majority of cases across a wide variety of graphs, problem settings, and metrics.
translated by 谷歌翻译
Clustering is a fundamental problem in network analysis that finds closely connected groups of nodes and separates them from other nodes in the graph, while link prediction is to predict whether two nodes in a network are likely to have a link. The definition of both naturally determines that clustering must play a positive role in obtaining accurate link prediction tasks. Yet researchers have long ignored or used inappropriate ways to undermine this positive relationship. In this article, We construct a simple but efficient clustering-driven link prediction framework(ClusterLP), with the goal of directly exploiting the cluster structures to obtain connections between nodes as accurately as possible in both undirected graphs and directed graphs. Specifically, we propose that it is easier to establish links between nodes with similar representation vectors and cluster tendencies in undirected graphs, while nodes in a directed graphs can more easily point to nodes similar to their representation vectors and have greater influence in their own cluster. We customized the implementation of ClusterLP for undirected and directed graphs, respectively, and the experimental results using multiple real-world networks on the link prediction task showed that our models is highly competitive with existing baseline models. The code implementation of ClusterLP and baselines we use are available at https://github.com/ZINUX1998/ClusterLP.
translated by 谷歌翻译