In recent years, using a self-supervised learning framework to learn the general characteristics of graphs has been considered a promising paradigm for graph representation learning. The core of self-supervised learning strategies for graph neural networks lies in constructing suitable positive sample selection strategies. However, existing GNNs typically aggregate information from neighboring nodes to update node representations, leading to an over-reliance on neighboring positive samples, i.e., homophilous samples; while ignoring long-range positive samples, i.e., positive samples that are far apart on the graph but structurally equivalent samples, a problem we call "neighbor bias." This neighbor bias can reduce the generalization performance of GNNs. In this paper, we argue that the generalization properties of GNNs should be determined by combining homogeneous samples and structurally equivalent samples, which we call the "GC combination hypothesis." Therefore, we propose a topological signal-driven self-supervised method. It uses a topological information-guided structural equivalence sampling strategy. First, we extract multiscale topological features using persistent homology. Then we compute the structural equivalence of node pairs based on their topological features. In particular, we design a topological loss function to pull in non-neighboring node pairs with high structural equivalence in the representation space to alleviate neighbor bias. Finally, we use the joint training mechanism to adjust the effect of structural equivalence on the model to fit datasets with different characteristics. We conducted experiments on the node classification task across seven graph datasets. The results show that the model performance can be effectively improved using a strategy of topological signal enhancement.
translated by 谷歌翻译
Inspired by the impressive success of contrastive learning (CL), a variety of graph augmentation strategies have been employed to learn node representations in a self-supervised manner. Existing methods construct the contrastive samples by adding perturbations to the graph structure or node attributes. Although impressive results are achieved, it is rather blind to the wealth of prior information assumed: with the increase of the perturbation degree applied on the original graph, 1) the similarity between the original graph and the generated augmented graph gradually decreases; 2) the discrimination between all nodes within each augmented view gradually increases. In this paper, we argue that both such prior information can be incorporated (differently) into the contrastive learning paradigm following our general ranking framework. In particular, we first interpret CL as a special case of learning to rank (L2R), which inspires us to leverage the ranking order among positive augmented views. Meanwhile, we introduce a self-ranking paradigm to ensure that the discriminative information among different nodes can be maintained and also be less altered to the perturbations of different degrees. Experiment results on various benchmark datasets verify the effectiveness of our algorithm compared with the supervised and unsupervised models.
translated by 谷歌翻译
尽管图表学习(GRL)取得了重大进展,但要以足够的方式提取和嵌入丰富的拓扑结构和特征信息仍然是一个挑战。大多数现有方法都集中在本地结构上,并且无法完全融合全球拓扑结构。为此,我们提出了一种新颖的结构保留图表学习(SPGRL)方法,以完全捕获图的结构信息。具体而言,为了减少原始图的不确定性和错误信息,我们通过k-nearest邻居方法构建了特征图作为互补视图。该特征图可用于对比节点级别以捕获本地关系。此外,我们通过最大化整个图形和特征嵌入的相互信息(MI)来保留全局拓扑结构信息,从理论上讲,该信息可以简化为交换功能的特征嵌入和原始图以重建本身。广泛的实验表明,我们的方法在半监督节点分类任务上具有相当出色的性能,并且在图形结构或节点特征上噪声扰动下的鲁棒性出色。
translated by 谷歌翻译
关于图表的深度学习最近吸引了重要的兴趣。然而,大多数作品都侧重于(半)监督学习,导致缺点包括重标签依赖,普遍性差和弱势稳健性。为了解决这些问题,通过良好设计的借口任务在不依赖于手动标签的情况下提取信息知识的自我监督学习(SSL)已成为图形数据的有希望和趋势的学习范例。与计算机视觉和自然语言处理等其他域的SSL不同,图表上的SSL具有独家背景,设计理念和分类。在图表的伞下自我监督学习,我们对采用图表数据采用SSL技术的现有方法及时及全面的审查。我们构建一个统一的框架,数学上正式地规范图表SSL的范例。根据借口任务的目标,我们将这些方法分为四类:基于生成的,基于辅助性的,基于对比的和混合方法。我们进一步描述了曲线图SSL在各种研究领域的应用,并总结了绘图SSL的常用数据集,评估基准,性能比较和开源代码。最后,我们讨论了该研究领域的剩余挑战和潜在的未来方向。
translated by 谷歌翻译
图表表示学习(GRL)对于图形结构数据分析至关重要。然而,大多数现有的图形神经网络(GNNS)严重依赖于标签信息,这通常是在现实世界中获得的昂贵。现有无监督的GRL方法遭受某些限制,例如对单调对比和可扩展性有限的沉重依赖。为了克服上述问题,鉴于最近的图表对比学习的进步,我们通过曲线图介绍了一种新颖的自我监控图形表示学习算法,即通过利用所提出的调整变焦方案来学习节点表示来学习节点表示。具体地,该机制使G-Zoom能够从多个尺度的图表中探索和提取自我监督信号:MICRO(即,节点级别),MESO(即,邻域级)和宏(即,子图级) 。首先,我们通过两个不同的图形增强生成输入图的两个增强视图。然后,我们逐渐地从节点,邻近逐渐为上述三个尺度建立三种不同的对比度,在那里我们最大限度地提高了横跨尺度的图形表示之间的协议。虽然我们可以从微距和宏观视角上从给定图中提取有价值的线索,但是邻域级对比度基于我们的调整后的缩放方案提供了可自定义选项的能力,以便手动选择位于微观和介于微观之间的最佳视点宏观透视更好地理解图数据。此外,为了使我们的模型可扩展到大图,我们采用了并行图形扩散方法来从图形尺寸下解耦模型训练。我们对现实世界数据集进行了广泛的实验,结果表明,我们所提出的模型始终始终优于最先进的方法。
translated by 谷歌翻译
对比学习在图表学习领域表现出了巨大的希望。通过手动构建正/负样本,大多数图对比度学习方法依赖于基于矢量内部产品的相似性度量标准来区分图形表示样品。但是,手工制作的样品构建(例如,图表的节点或边缘的扰动)可能无法有效捕获图形的固有局部结构。同样,基于矢量内部产品的相似性度量标准无法完全利用图形的局部结构来表征图差。为此,在本文中,我们提出了一种基于自适应子图生成的新型对比度学习框架,以实现有效且强大的自我监督图表示学习,并且最佳传输距离被用作子绘图之间的相似性度量。它的目的是通过捕获图的固有结构来生成对比样品,并根据子图的特征和结构同时区分样品。具体而言,对于每个中心节点,通过自适应学习关系权重与相应邻域的节点,我们首先开发一个网络来生成插值子图。然后,我们分别构建来自相同和不同节点的子图的正和负对。最后,我们采用两种类型的最佳运输距离(即Wasserstein距离和Gromov-Wasserstein距离)来构建结构化的对比损失。基准数据集上的广泛节点分类实验验证了我们的图形对比学习方法的有效性。
translated by 谷歌翻译
In recent years, semi-supervised graph learning with data augmentation (DA) is currently the most commonly used and best-performing method to enhance model robustness in sparse scenarios with few labeled samples. Differing from homogeneous graph, DA in heterogeneous graph has greater challenges: heterogeneity of information requires DA strategies to effectively handle heterogeneous relations, which considers the information contribution of different types of neighbors and edges to the target nodes. Furthermore, over-squashing of information is caused by the negative curvature that formed by the non-uniformity distribution and strong clustering in complex graph. To address these challenges, this paper presents a novel method named Semi-Supervised Heterogeneous Graph Learning with Multi-level Data Augmentation (HG-MDA). For the problem of heterogeneity of information in DA, node and topology augmentation strategies are proposed for the characteristics of heterogeneous graph. And meta-relation-based attention is applied as one of the indexes for selecting augmented nodes and edges. For the problem of over-squashing of information, triangle based edge adding and removing are designed to alleviate the negative curvature and bring the gain of topology. Finally, the loss function consists of the cross-entropy loss for labeled data and the consistency regularization for unlabeled data. In order to effectively fuse the prediction results of various DA strategies, the sharpening is used. Existing experiments on public datasets, i.e., ACM, DBLP, OGB, and industry dataset MB show that HG-MDA outperforms current SOTA models. Additionly, HG-MDA is applied to user identification in internet finance scenarios, helping the business to add 30% key users, and increase loans and balances by 3.6%, 11.1%, and 9.8%.
translated by 谷歌翻译
图对比度学习(GCL)一直是图形自学学习的新兴解决方案。 GCL的核心原理是在正视图中降低样品之间的距离,但在负视图中增加样品之间的距离。在实现有希望的性能的同时,当前的GCL方法仍然受到两个局限性:(1)增强的不可控制的有效性,该图扰动可能会产生针对语义和图形数据的特征流程的无效视图; (2)不可靠的二进制对比理由,对于非欧几里得图数据而言,难以确定构造观点的积极性和负面性。为了应对上述局限性,我们提出了一个新的对比度学习范式,即图形软对比度学习(GSCL),该范例通过排名的社区无需任何增强和二进制对比符合性,在较细性的范围内进行对比度学习。 GSCL建立在图接近的基本假设上,即连接的邻居比遥远的节点更相似。具体而言,我们在配对和列表的封闭式排名中,以保留附近的相对排名关系。此外,随着邻里规模的指数增长,考虑了更多的啤酒花,我们提出了提高学习效率的邻里抽样策略。广泛的实验结果表明,我们提出的GSCL可以始终如一地在各种公共数据集上实现与GCL相当复杂的各种公共数据集的最新性能。
translated by 谷歌翻译
在异质图上的自我监督学习(尤其是对比度学习)方法可以有效地摆脱对监督数据的依赖。同时,大多数现有的表示学习方法将异质图嵌入到欧几里得或双曲线的单个几何空间中。这种单个几何视图通常不足以观察由于其丰富的语义和复杂结构而观察到异质图的完整图片。在这些观察结果下,本文提出了一种新型的自我监督学习方法,称为几何对比度学习(GCL),以更好地表示监督数据是不可用时的异质图。 GCL同时观察了从欧几里得和双曲线观点的异质图,旨在强烈合并建模丰富的语义和复杂结构的能力,这有望为下游任务带来更多好处。 GCL通过在局部局部和局部全球语义水平上对比表示两种几何视图之间的相互信息。在四个基准数据集上进行的广泛实验表明,在三个任务上,所提出的方法在包括节点分类,节点群集和相似性搜索在内的三个任务上都超过了强基础,包括无监督的方法和监督方法。
translated by 谷歌翻译
无监督的图形表示学习是图形数据的非琐碎主题。在结构化数据的无监督代表学习中对比学习和自我监督学习的成功激发了图表上的类似尝试。使用对比损耗的当前无监督的图形表示学习和预培训主要基于手工增强图数据之间的对比度。但是,由于不可预测的不变性,图数据增强仍然没有很好地探索。在本文中,我们提出了一种新颖的协作图形神经网络对比学习框架(CGCL),它使用多个图形编码器来观察图形。不同视图观察的特征充当了图形编码器之间对比学习的图表增强,避免了任何扰动以保证不变性。 CGCL能够处理图形级和节点级表示学习。广泛的实验表明CGCL在无监督的图表表示学习中的优势以及图形表示学习的手工数据增强组合的非必要性。
translated by 谷歌翻译
Recently, contrastive learning (CL) has emerged as a successful method for unsupervised graph representation learning. Most graph CL methods first perform stochastic augmentation on the input graph to obtain two graph views and maximize the agreement of representations in the two views. Despite the prosperous development of graph CL methods, the design of graph augmentation schemes-a crucial component in CL-remains rarely explored. We argue that the data augmentation schemes should preserve intrinsic structures and attributes of graphs, which will force the model to learn representations that are insensitive to perturbation on unimportant nodes and edges. However, most existing methods adopt uniform data augmentation schemes, like uniformly dropping edges and uniformly shuffling features, leading to suboptimal performance. In this paper, we propose a novel graph contrastive representation learning method with adaptive augmentation that incorporates various priors for topological and semantic aspects of the graph. Specifically, on the topology level, we design augmentation schemes based on node centrality measures to highlight important connective structures. On the node attribute level, we corrupt node features by adding more noise to unimportant node features, to enforce the model to recognize underlying semantic information. We perform extensive experiments of node classification on a variety of real-world datasets. Experimental results demonstrate that our proposed method consistently outperforms existing state-of-the-art baselines and even surpasses some supervised counterparts, which validates the effectiveness of the proposed contrastive framework with adaptive augmentation. CCS CONCEPTS• Computing methodologies → Unsupervised learning; Neural networks; Learning latent representations.
translated by 谷歌翻译
Graph Neural Networks (GNNs) have attracted increasing attention in recent years and have achieved excellent performance in semi-supervised node classification tasks. The success of most GNNs relies on one fundamental assumption, i.e., the original graph structure data is available. However, recent studies have shown that GNNs are vulnerable to the complex underlying structure of the graph, making it necessary to learn comprehensive and robust graph structures for downstream tasks, rather than relying only on the raw graph structure. In light of this, we seek to learn optimal graph structures for downstream tasks and propose a novel framework for semi-supervised classification. Specifically, based on the structural context information of graph and node representations, we encode the complex interactions in semantics and generate semantic graphs to preserve the global structure. Moreover, we develop a novel multi-measure attention layer to optimize the similarity rather than prescribing it a priori, so that the similarity can be adaptively evaluated by integrating measures. These graphs are fused and optimized together with GNN towards semi-supervised classification objective. Extensive experiments and ablation studies on six real-world datasets clearly demonstrate the effectiveness of our proposed model and the contribution of each component.
translated by 谷歌翻译
Deep learning has revolutionized many machine learning tasks in recent years, ranging from image classification and video processing to speech recognition and natural language understanding. The data in these tasks are typically represented in the Euclidean space. However, there is an increasing number of applications where data are generated from non-Euclidean domains and are represented as graphs with complex relationships and interdependency between objects. The complexity of graph data has imposed significant challenges on existing machine learning algorithms. Recently, many studies on extending deep learning approaches for graph data have emerged. In this survey, we provide a comprehensive overview of graph neural networks (GNNs) in data mining and machine learning fields. We propose a new taxonomy to divide the state-of-the-art graph neural networks into four categories, namely recurrent graph neural networks, convolutional graph neural networks, graph autoencoders, and spatial-temporal graph neural networks. We further discuss the applications of graph neural networks across various domains and summarize the open source codes, benchmark data sets, and model evaluation of graph neural networks. Finally, we propose potential research directions in this rapidly growing field.
translated by 谷歌翻译
图表学习目的旨在将节点内容与图形结构集成以学习节点/图表示。然而,发现许多现有的图形学习方法在具有高异性级别的数据上不能很好地工作,这是不同类标签之间很大比例的边缘。解决这个问题的最新努力集中在改善消息传递机制上。但是,尚不清楚异质性是否确实会损害图神经网络(GNNS)的性能。关键是要展现一个节点与其直接邻居之间的关系,例如它们是异性还是同质性?从这个角度来看,我们在这里研究了杂质表示在披露连接节点之间的关系之前/之后的杂音表示的作用。特别是,我们提出了一个端到端框架,该框架既学习边缘的类型(即异性/同质性),并利用边缘类型的信息来提高图形神经网络的表现力。我们以两种不同的方式实施此框架。具体而言,为了避免通过异质边缘传递的消息,我们可以通过删除边缘分类器鉴定的异性边缘来优化图形结构。另外,可以利用有关异性邻居的存在的信息进行特征学习,因此,设计了一种混合消息传递方法来汇总同质性邻居,并根据边缘分类使异性邻居多样化。广泛的实验表明,在整个同质级别的多个数据集上,通过在多个数据集上提出的框架对GNN的绩效提高了显着提高。
translated by 谷歌翻译
随着对比学习的兴起,无人监督的图形表示学习最近一直蓬勃发展,甚至超过了一些机器学习任务中的监督对应物。图表表示的大多数对比模型学习侧重于最大化本地和全局嵌入之间的互信息,或主要取决于节点级别的对比嵌入。然而,它们仍然不足以全面探索网络拓扑的本地和全球视图。虽然前者认为本地全球关系,但其粗略的全球信息导致本地和全球观点之间的思考。后者注重节点级别对齐,以便全局视图的作用出现不起眼。为避免落入这两个极端情况,我们通过对比群集分配来提出一种新颖的无监督图形表示模型,称为GCCA。通过组合聚类算法和对比学习,它有动力综合利用本地和全球信息。这不仅促进了对比效果,而且还提供了更高质量的图形信息。同时,GCCA进一步挖掘群集级信息,这使得它能够了解除了图形拓扑之外的节点之间的难以捉摸的关联。具体地,我们首先使用不同的图形增强策略生成两个增强的图形,然后使用聚类算法分别获取其群集分配和原型。所提出的GCCA进一步强制不同增强图中的相同节点来通过最小化交叉熵损失来互相识别它们的群集分配。为了展示其有效性,我们将在三个不同的下游任务中与最先进的模型进行比较。实验结果表明,GCCA在大多数任务中具有强大的竞争力。
translated by 谷歌翻译
在本文中,我们研究了在非全粒图上进行节点表示学习的自我监督学习的问题。现有的自我监督学习方法通​​常假定该图是同质的,其中链接的节点通常属于同一类或具有相似的特征。但是,这种同质性的假设在现实图表中并不总是正确的。我们通过为图神经网络开发脱钩的自我监督学习(DSSL)框架来解决这个问题。 DSSL模仿了节点的生成过程和语义结构的潜在变量建模的链接,该过程将不同邻域之间的不同基础语义解散到自我监督的节点学习过程中。我们的DSSL框架对编码器不可知,不需要预制的增强,因此对不同的图表灵活。为了通过潜在变量有效地优化框架,我们得出了自我监督目标的较低范围的证据,并开发了具有变异推理的可扩展培训算法。我们提供理论分析,以证明DSSL享有更好的下游性能。与竞争性的自我监督学习基线相比,对各种类图基准的广泛实验表明,我们提出的框架可以显着取得更好的性能。
translated by 谷歌翻译
Inspired by the success of contrastive learning (CL) in computer vision and natural language processing, graph contrastive learning (GCL) has been developed to learn discriminative node representations on graph datasets. However, the development of GCL on Heterogeneous Information Networks (HINs) is still in the infant stage. For example, it is unclear how to augment the HINs without substantially altering the underlying semantics, and how to design the contrastive objective to fully capture the rich semantics. Moreover, early investigations demonstrate that CL suffers from sampling bias, whereas conventional debiasing techniques are empirically shown to be inadequate for GCL. How to mitigate the sampling bias for heterogeneous GCL is another important problem. To address the aforementioned challenges, we propose a novel Heterogeneous Graph Contrastive Multi-view Learning (HGCML) model. In particular, we use metapaths as the augmentation to generate multiple subgraphs as multi-views, and propose a contrastive objective to maximize the mutual information between any pairs of metapath-induced views. To alleviate the sampling bias, we further propose a positive sampling strategy to explicitly select positives for each node via jointly considering semantic and structural information preserved on each metapath view. Extensive experiments demonstrate HGCML consistently outperforms state-of-the-art baselines on five real-world benchmark datasets.
translated by 谷歌翻译
在过去的几年中,图表学习(GRL)是分析图形结构数据的有力策略。最近,GRL方法通过采用用于图像的学习表示形式而开发的自我监督学习方法来显示出令人鼓舞的结果。尽管它们成功了,但现有的GRL方法倾向于忽略图像和图形之间的固有区别,即,假定图像是独立和相同分布的,而图表在数据实例之间显示了关系信息,即节点。为了完全受益于图形结构数据中固有的关系信息,我们提出了一种名为RGRL的新颖GRL方法,该方法从图形本身生成的关系信息中学习。 RGRL学习节点表示形式,使节点之间的关系是增强的不变性,即增强不变的关系,只要保留节点之间的关系,就可以改变节点表示。通过在全球和本地观点中考虑节点之间的关系,RGRL克服了对对比和非对抗性方法的局限性,并实现了两者中最好的。在各种下游任务上对十四个基准数据集进行了广泛的实验,证明了RGRL优于最先进的基线。 RGRL的源代码可在https://github.com/namkyeong/rgrl上获得。
translated by 谷歌翻译
In recent years, graph representation learning has achieved remarkable success while suffering from low-quality data problems. As a mature technology to improve data quality in computer vision, data augmentation has also attracted increasing attention in graph domain. For promoting the development of this emerging research direction, in this survey, we comprehensively review and summarize the existing graph data augmentation (GDAug) techniques. Specifically, we first summarize a variety of feasible taxonomies, and then classify existing GDAug studies based on fine-grained graph elements. Furthermore, for each type of GDAug technique, we formalize the general definition, discuss the technical details, and give schematic illustration. In addition, we also summarize common performance metrics and specific design metrics for constructing a GDAug evaluation system. Finally, we summarize the applications of GDAug from both data and model levels, as well as future directions.
translated by 谷歌翻译
消息传递已作为设计图形神经网络(GNN)的有效工具的发展。但是,消息传递的大多数现有方法简单地简单或平均所有相邻的功能更新节点表示。它们受到两个问题的限制,即(i)缺乏可解释性来识别对GNN的预测重要的节点特征,以及(ii)特征过度混合,导致捕获长期依赖和无能为力的过度平滑问题在异质或低同质的下方处理图。在本文中,我们提出了一个节点级胶囊图神经网络(NCGNN),以通过改进的消息传递方案来解决这些问题。具体而言,NCGNN表示节点为节点级胶囊组,其中每个胶囊都提取其相应节点的独特特征。对于每个节点级胶囊,开发了一个新颖的动态路由过程,以适应适当的胶囊,以从设计的图形滤波器确定的子图中聚集。 NCGNN聚集仅有利的胶囊并限制无关的消息,以避免交互节点的过度混合特征。因此,它可以缓解过度平滑的问题,并通过同粒或异质的图表学习有效的节点表示。此外,我们提出的消息传递方案本质上是可解释的,并免于复杂的事后解释,因为图形过滤器和动态路由过程确定了节点特征的子集,这对于从提取的子分类中的模型预测最为重要。关于合成和现实图形的广泛实验表明,NCGNN可以很好地解决过度光滑的问题,并为半监视的节点分类产生更好的节点表示。它的表现优于同质和异质的艺术状态。
translated by 谷歌翻译