In recent years, graph representation learning has achieved remarkable success while suffering from low-quality data problems. As a mature technology to improve data quality in computer vision, data augmentation has also attracted increasing attention in graph domain. For promoting the development of this emerging research direction, in this survey, we comprehensively review and summarize the existing graph data augmentation (GDAug) techniques. Specifically, we first summarize a variety of feasible taxonomies, and then classify existing GDAug studies based on fine-grained graph elements. Furthermore, for each type of GDAug technique, we formalize the general definition, discuss the technical details, and give schematic illustration. In addition, we also summarize common performance metrics and specific design metrics for constructing a GDAug evaluation system. Finally, we summarize the applications of GDAug from both data and model levels, as well as future directions.
translated by 谷歌翻译
图形神经网络是一种强大的深度学习工具,用于建模图形结构化数据,在众多图形学习任务上表现出了出色的性能。为了解决深图学习中的数据噪声和数据稀缺性问题,最近有关图形数据的研究已加剧。但是,常规数据增强方法几乎无法处理具有多模式性的非欧几里得空间中定义的图形结构化数据。在这项调查中,我们正式提出了图数据扩展的问题,并进一步审查了代表性技术及其在不同深度学习问题中的应用。具体而言,我们首先提出了图形数据扩展技术的分类法,然后通过根据增强信息方式对相关工作进行分类,从而提供结构化的审查。此外,我们总结了以数据为中心的深图学习中两个代表性问题中图数据扩展的应用:(1)可靠的图形学习,重点是增强输入图的实用性以及通过图数据增强的模型容量; (2)低资源图学习,其针对通过图数据扩大标记的训练数据量表的目标。对于每个问题,我们还提供层次结构问题分类法,并审查与图数据增强相关的现有文献。最后,我们指出了有希望的研究方向和未来研究的挑战。
translated by 谷歌翻译
translated by 谷歌翻译
Graph machine learning has been extensively studied in both academia and industry. Although booming with a vast number of emerging methods and techniques, most of the literature is built on the in-distribution hypothesis, i.e., testing and training graph data are identically distributed. However, this in-distribution hypothesis can hardly be satisfied in many real-world graph scenarios where the model performance substantially degrades when there exist distribution shifts between testing and training graph data. To solve this critical problem, out-of-distribution (OOD) generalization on graphs, which goes beyond the in-distribution hypothesis, has made great progress and attracted ever-increasing attention from the research community. In this paper, we comprehensively survey OOD generalization on graphs and present a detailed review of recent advances in this area. First, we provide a formal problem definition of OOD generalization on graphs. Second, we categorize existing methods into three classes from conceptually different perspectives, i.e., data, model, and learning strategy, based on their positions in the graph machine learning pipeline, followed by detailed discussions for each category. We also review the theories related to OOD generalization on graphs and introduce the commonly used graph datasets for thorough evaluations. Finally, we share our insights on future research directions. This paper is the first systematic and comprehensive review of OOD generalization on graphs, to the best of our knowledge.
translated by 谷歌翻译
近年来,图形神经网络(GNNS)已实现了节点分类的最新性能。但是,大多数现有的GNN会遭受图形不平衡问题。在许多实际情况下,节点类都是不平衡的,其中一些多数类构成了图的大部分部分。 GNN中的消息传播机制将进一步扩大这些多数类的主导地位,从而导致次级分类性能。在这项工作中,我们试图通过生成少数族裔类实例来平衡培训数据,从而扩展了以前的基于过度采样的技术来解决这个问题。此任务是不平凡的,因为这些技术的设计是实例是独立的。忽视关系信息会使此过采样过程变得复杂。此外,节点分类任务通常仅使用少数标记的节点进行半监督设置,从而为少数族裔实例的产生提供了不足的监督。生成的低质量新节点会损害训练有素的分类器。在这项工作中,我们通过在构造的嵌入空间中综合新节点来解决这些困难,该节点编码节点属性和拓扑信息。此外,对边缘生成器进行同时训练,以建模图结构并为新样品提供关系。为了进一步提高数据效率,我们还探索合成的混合``中间''节点在此过度采样过程中利用多数类的节点。对现实世界数据集的实验验证了我们提出的框架的有效性。
translated by 谷歌翻译
translated by 谷歌翻译
图形神经网络(GNNS)在学习图表表示中取得了前所未有的成功,以识别图形的分类标签。然而,GNN的大多数现有图形分类问题遵循平衡数据拆分协议,这与许多真实情景中的许多实际方案都有比其他类别更少的标签。在这种不平衡情况下直接培训GNN可能导致少数群体类别中的图形的无色表达,并损害下游分类的整体性能,这意味着开发有效GNN处理不平衡图分类的重要性。现有方法是针对非图形结构数据量身定制的,或专为不平衡节点分类而设计,而少数关注不平衡图分类。为此,我们介绍了一个新颖的框架,图形图形 - 图形神经网络(G $ ^ 2 $ GNN),通过从邻近图和本地从图形本身来源地通过全局导出额外的监督来减轻图形不平衡问题。在全球范围内,我们基于内核相似性构建图表(GOG)的图表,并执行GOG传播以聚合相邻图形表示,其最初通过通过GNN编码器汇集的节点级传播而获得。在本地,我们通过掩模节点或丢弃边缘采用拓扑增强,以改善辨别说明书测试图的拓扑结构中的模型概括性。在七个基准数据集中进行的广泛图形分类实验证明了我们提出的G $ ^ $ ^ 2 $ GNN优于F1-Macro和F1-Micro Scores的大约5 \%的大量基线。 G $ ^ 2 $ GNN的实现可用于\ href {} {}。
translated by 谷歌翻译
translated by 谷歌翻译
In recent years, semi-supervised graph learning with data augmentation (DA) is currently the most commonly used and best-performing method to enhance model robustness in sparse scenarios with few labeled samples. Differing from homogeneous graph, DA in heterogeneous graph has greater challenges: heterogeneity of information requires DA strategies to effectively handle heterogeneous relations, which considers the information contribution of different types of neighbors and edges to the target nodes. Furthermore, over-squashing of information is caused by the negative curvature that formed by the non-uniformity distribution and strong clustering in complex graph. To address these challenges, this paper presents a novel method named Semi-Supervised Heterogeneous Graph Learning with Multi-level Data Augmentation (HG-MDA). For the problem of heterogeneity of information in DA, node and topology augmentation strategies are proposed for the characteristics of heterogeneous graph. And meta-relation-based attention is applied as one of the indexes for selecting augmented nodes and edges. For the problem of over-squashing of information, triangle based edge adding and removing are designed to alleviate the negative curvature and bring the gain of topology. Finally, the loss function consists of the cross-entropy loss for labeled data and the consistency regularization for unlabeled data. In order to effectively fuse the prediction results of various DA strategies, the sharpening is used. Existing experiments on public datasets, i.e., ACM, DBLP, OGB, and industry dataset MB show that HG-MDA outperforms current SOTA models. Additionly, HG-MDA is applied to user identification in internet finance scenarios, helping the business to add 30% key users, and increase loans and balances by 3.6%, 11.1%, and 9.8%.
translated by 谷歌翻译
translated by 谷歌翻译
translated by 谷歌翻译
在异质图上的自我监督学习(尤其是对比度学习)方法可以有效地摆脱对监督数据的依赖。同时,大多数现有的表示学习方法将异质图嵌入到欧几里得或双曲线的单个几何空间中。这种单个几何视图通常不足以观察由于其丰富的语义和复杂结构而观察到异质图的完整图片。在这些观察结果下,本文提出了一种新型的自我监督学习方法,称为几何对比度学习(GCL),以更好地表示监督数据是不可用时的异质图。 GCL同时观察了从欧几里得和双曲线观点的异质图,旨在强烈合并建模丰富的语义和复杂结构的能力,这有望为下游任务带来更多好处。 GCL通过在局部局部和局部全球语义水平上对比表示两种几何视图之间的相互信息。在四个基准数据集上进行的广泛实验表明,在三个任务上,所提出的方法在包括节点分类,节点群集和相似性搜索在内的三个任务上都超过了强基础,包括无监督的方法和监督方法。
translated by 谷歌翻译
Pre-publication draft of a book to be published byMorgan & Claypool publishers. Unedited version released with permission. All relevant copyrights held by the author and publisher extend to this pre-publication draft.
translated by 谷歌翻译
图表分类具有生物信息学,社会科学,自动假新闻检测,Web文档分类等中的应用程序。在许多实践方案中,包括网络级应用程序,其中标签稀缺或难以获得,无人监督的学习是一种自然范式,但它交易表现。最近,对比学习(CL)使得无监督的计算机视觉模型能够竞争对抗监督。分析Visual CL框架的理论和实证工作发现,利用大型数据集和域名感知增强对于框架成功至关重要。有趣的是,图表CL框架通常会在使用较小数据的顺序的同时报告高性能,并且使用可能损坏图形的底层属性的域名增强(例如,节点或边缘丢弃,功能捕获)。通过这些差异的激励,我们寻求确定:(i)为什么现有的图形Cl框架尽管增加了增强和有限的数据; (ii)是否遵守Visual CL原理可以提高图形分类任务的性能。通过广泛的分析,我们识别图形数据增强和评估协议的缺陷实践,这些协议通常用于图形CL文献中,并提出了未来的研究和应用的改进的实践和理智检查。我们表明,在小型基准数据集上,图形神经网络的归纳偏差可以显着补偿现有框架的局限性。在采用相对较大的图形分类任务的研究中,我们发现常用的域名忽视增强的表现不佳,同时遵守Visual Cl中的原则可以显着提高性能。例如,在基于图形的文档分类中,可以用于更好的Web搜索,我们显示任务相关的增强提高了20%的准确性。
translated by 谷歌翻译
Deep learning has revolutionized many machine learning tasks in recent years, ranging from image classification and video processing to speech recognition and natural language understanding. The data in these tasks are typically represented in the Euclidean space. However, there is an increasing number of applications where data are generated from non-Euclidean domains and are represented as graphs with complex relationships and interdependency between objects. The complexity of graph data has imposed significant challenges on existing machine learning algorithms. Recently, many studies on extending deep learning approaches for graph data have emerged. In this survey, we provide a comprehensive overview of graph neural networks (GNNs) in data mining and machine learning fields. We propose a new taxonomy to divide the state-of-the-art graph neural networks into four categories, namely recurrent graph neural networks, convolutional graph neural networks, graph autoencoders, and spatial-temporal graph neural networks. We further discuss the applications of graph neural networks across various domains and summarize the open source codes, benchmark data sets, and model evaluation of graph neural networks. Finally, we propose potential research directions in this rapidly growing field.
translated by 谷歌翻译
translated by 谷歌翻译
Generalizable, transferrable, and robust representation learning on graph-structured data remains a challenge for current graph neural networks (GNNs). Unlike what has been developed for convolutional neural networks (CNNs) for image data, self-supervised learning and pre-training are less explored for GNNs. In this paper, we propose a graph contrastive learning (GraphCL) framework for learning unsupervised representations of graph data. We first design four types of graph augmentations to incorporate various priors. We then systematically study the impact of various combinations of graph augmentations on multiple datasets, in four different settings: semi-supervised, unsupervised, and transfer learning as well as adversarial attacks. The results show that, even without tuning augmentation extents nor using sophisticated GNN architectures, our GraphCL framework can produce graph representations of similar or better generalizability, transferrability, and robustness compared to state-of-the-art methods. We also investigate the impact of parameterized graph augmentation extents and patterns, and observe further performance gains in preliminary experiments. Our codes are available at:
translated by 谷歌翻译
translated by 谷歌翻译
Inspired by the impressive success of contrastive learning (CL), a variety of graph augmentation strategies have been employed to learn node representations in a self-supervised manner. Existing methods construct the contrastive samples by adding perturbations to the graph structure or node attributes. Although impressive results are achieved, it is rather blind to the wealth of prior information assumed: with the increase of the perturbation degree applied on the original graph, 1) the similarity between the original graph and the generated augmented graph gradually decreases; 2) the discrimination between all nodes within each augmented view gradually increases. In this paper, we argue that both such prior information can be incorporated (differently) into the contrastive learning paradigm following our general ranking framework. In particular, we first interpret CL as a special case of learning to rank (L2R), which inspires us to leverage the ranking order among positive augmented views. Meanwhile, we introduce a self-ranking paradigm to ensure that the discriminative information among different nodes can be maintained and also be less altered to the perturbations of different degrees. Experiment results on various benchmark datasets verify the effectiveness of our algorithm compared with the supervised and unsupervised models.
translated by 谷歌翻译
translated by 谷歌翻译