Graph representation learning has emerged as a powerful technique for addressing real-world problems. Various downstream graph learning tasks have benefited from its recent developments, such as node classification, similarity search, and graph classification. However, prior arts on graph representation learning focus on domain specific problems and train a dedicated model for each graph dataset, which is usually non-transferable to out-of-domain data. Inspired by the recent advances in pre-training from natural language processing and computer vision, we design Graph Contrastive Coding (GCC) 1 -a self-supervised graph neural network pre-training framework-to capture the universal network topological properties across multiple networks. We design GCC's pre-training task as subgraph instance discrimination in and across networks and leverage contrastive learning to empower graph neural networks to learn the intrinsic and transferable structural representations. We conduct extensive experiments on three graph learning tasks and ten graph datasets. The results show that GCC pre-trained on a collection of diverse datasets can achieve competitive or better performance to its task-specific and trained-from-scratch counterparts. This suggests that the pre-training and fine-tuning paradigm presents great potential for graph representation learning.
translated by 谷歌翻译
图对比度学习已被证明是图形神经网络(GNN)预训练的有效任务。但是,一个关键问题可能会严重阻碍现有作品中的代表权:当前方法创建的积极实例通常会错过图表的关键信息,甚至会错过非法实例(例如分子生成中的非化学意识图)。为了解决此问题,我们建议直接从训练集中的现有图中选择正图实例,该实例最终保持与目标图的合法性和相似性。我们的选择基于某些特定于域的成对相似性测量以及从层次图编码图中的相似性关系的采样。此外,我们开发了一种自适应节点级预训练方法,以动态掩盖节点在图中均匀分布。我们对来自各个域的$ 13 $图形分类和节点分类基准数据集进行了广泛的实验。结果表明,通过我们的策略预先培训的GNN模型可以胜过那些训练有素的从划痕模型以及通过现有方法获得的变体。
translated by 谷歌翻译
无监督的图形表示学习是图形数据的非琐碎主题。在结构化数据的无监督代表学习中对比学习和自我监督学习的成功激发了图表上的类似尝试。使用对比损耗的当前无监督的图形表示学习和预培训主要基于手工增强图数据之间的对比度。但是,由于不可预测的不变性,图数据增强仍然没有很好地探索。在本文中,我们提出了一种新颖的协作图形神经网络对比学习框架(CGCL),它使用多个图形编码器来观察图形。不同视图观察的特征充当了图形编码器之间对比学习的图表增强,避免了任何扰动以保证不变性。 CGCL能够处理图形级和节点级表示学习。广泛的实验表明CGCL在无监督的图表表示学习中的优势以及图形表示学习的手工数据增强组合的非必要性。
translated by 谷歌翻译
Many applications of machine learning require a model to make accurate predictions on test examples that are distributionally different from training ones, while task-specific labels are scarce during training. An effective approach to this challenge is to pre-train a model on related tasks where data is abundant, and then fine-tune it on a downstream task of interest. While pre-training has been effective in many language and vision domains, it remains an open question how to effectively use pre-training on graph datasets. In this paper, we develop a new strategy and self-supervised methods for pre-training Graph Neural Networks (GNNs). The key to the success of our strategy is to pre-train an expressive GNN at the level of individual nodes as well as entire graphs so that the GNN can learn useful local and global representations simultaneously. We systematically study pre-training on multiple graph classification datasets. We find that naïve strategies, which pre-train GNNs at the level of either entire graphs or individual nodes, give limited improvement and can even lead to negative transfer on many downstream tasks. In contrast, our strategy avoids negative transfer and improves generalization significantly across downstream tasks, leading up to 9.4% absolute improvements in ROC-AUC over non-pre-trained models and achieving state-of-the-art performance for molecular property prediction and protein function prediction.However, pre-training on graph datasets remains a hard challenge. Several key studies (
translated by 谷歌翻译
近年来,自我监督学习(SSL)已广泛探索。特别是,生成的SSL在自然语言处理和其他AI领域(例如BERT和GPT的广泛采用)中获得了新的成功。尽管如此,对比度学习 - 严重依赖结构数据的增强和复杂的培训策略,这是图SSL的主要方法,而迄今为止,生成SSL在图形上的进度(尤其是GAES)尚未达到潜在的潜力。正如其他领域所承诺的。在本文中,我们确定并检查对GAE的发展产生负面影响的问题,包括其重建目标,训练鲁棒性和错误指标。我们提出了一个蒙版的图形自动编码器Graphmae,该图可以减轻这些问题,以预处理生成性自我监督图。我们建议没有重建图形结构,而是提议通过掩盖策略和缩放余弦误差将重点放在特征重建上,从而使GraphMae的强大训练受益。我们在21个公共数据集上进行了大量实验,以实现三个不同的图形学习任务。结果表明,Graphmae-A简单的图形自动编码器具有仔细的设计-CAN始终在对比度和生成性最新基准相比,始终产生优于性的表现。这项研究提供了对图自动编码器的理解,并证明了在图上的生成自我监督预训练的潜力。
translated by 谷歌翻译
关于图表的深度学习最近吸引了重要的兴趣。然而,大多数作品都侧重于(半)监督学习,导致缺点包括重标签依赖,普遍性差和弱势稳健性。为了解决这些问题,通过良好设计的借口任务在不依赖于手动标签的情况下提取信息知识的自我监督学习(SSL)已成为图形数据的有希望和趋势的学习范例。与计算机视觉和自然语言处理等其他域的SSL不同,图表上的SSL具有独家背景,设计理念和分类。在图表的伞下自我监督学习,我们对采用图表数据采用SSL技术的现有方法及时及全面的审查。我们构建一个统一的框架,数学上正式地规范图表SSL的范例。根据借口任务的目标,我们将这些方法分为四类:基于生成的,基于辅助性的,基于对比的和混合方法。我们进一步描述了曲线图SSL在各种研究领域的应用,并总结了绘图SSL的常用数据集,评估基准,性能比较和开源代码。最后,我们讨论了该研究领域的剩余挑战和潜在的未来方向。
translated by 谷歌翻译
图表是一个宇宙数据结构,广泛用于组织现实世界中的数据。像交通网络,社交和学术网络这样的各种实际网络网络可以由图表代表。近年来,目睹了在网络中代表顶点的快速发展,进入低维矢量空间,称为网络表示学习。表示学习可以促进图形数据上的新算法的设计。在本调查中,我们对网络代表学习的当前文献进行了全面审查。现有算法可以分为三组:浅埋模型,异构网络嵌入模型,图形神经网络的模型。我们为每个类别审查最先进的算法,并讨论这些算法之间的基本差异。调查的一个优点是,我们系统地研究了不同类别的算法底层的理论基础,这提供了深入的见解,以更好地了解网络表示学习领域的发展。
translated by 谷歌翻译
Generalizable, transferrable, and robust representation learning on graph-structured data remains a challenge for current graph neural networks (GNNs). Unlike what has been developed for convolutional neural networks (CNNs) for image data, self-supervised learning and pre-training are less explored for GNNs. In this paper, we propose a graph contrastive learning (GraphCL) framework for learning unsupervised representations of graph data. We first design four types of graph augmentations to incorporate various priors. We then systematically study the impact of various combinations of graph augmentations on multiple datasets, in four different settings: semi-supervised, unsupervised, and transfer learning as well as adversarial attacks. The results show that, even without tuning augmentation extents nor using sophisticated GNN architectures, our GraphCL framework can produce graph representations of similar or better generalizability, transferrability, and robustness compared to state-of-the-art methods. We also investigate the impact of parameterized graph augmentation extents and patterns, and observe further performance gains in preliminary experiments. Our codes are available at: https://github.com/Shen-Lab/GraphCL.
translated by 谷歌翻译
图形存在于许多现实世界中的应用中,例如财务欺诈检测,商业建议和社交网络分析。但是,鉴于图形注释或标记的高成本,我们面临严重的图形标签 - 刻度问题,即,图可能具有一些标记的节点。这样一个问题的一个例子是所谓的\ textit {少数弹性节点分类}。该问题的主要方法均依靠\ textit {情节元学习}。在这项工作中,我们通过提出一个基本问题来挑战现状,元学习是否是对几个弹性节点分类任务的必要条件。我们在标准的几杆节点分类设置下提出了一个新的简单框架,作为学习有效图形编码器的元学习的替代方法。该框架由有监督的图形对比学习以及新颖的数据增强,子图编码和图形上的多尺度对比度组成。在三个基准数据集(Corafull,Reddit,OGBN)上进行的广泛实验表明,新框架显着胜过基于最先进的元学习方法。
translated by 谷歌翻译
图表表示学习是一种快速增长的领域,其中一个主要目标是在低维空间中产生有意义的图形表示。已经成功地应用了学习的嵌入式来执行各种预测任务,例如链路预测,节点分类,群集和可视化。图表社区的集体努力提供了数百种方法,但在所有评估指标下没有单一方法擅长,例如预测准确性,运行时间,可扩展性等。该调查旨在通过考虑算法来评估嵌入方法的所有主要类别的图表变体,参数选择,可伸缩性,硬件和软件平台,下游ML任务和多样化数据集。我们使用包含手动特征工程,矩阵分解,浅神经网络和深图卷积网络的分类法组织了图形嵌入技术。我们使用广泛使用的基准图表评估了节点分类,链路预测,群集和可视化任务的这些类别算法。我们在Pytorch几何和DGL库上设计了我们的实验,并在不同的多核CPU和GPU平台上运行实验。我们严格地审查了各种性能指标下嵌入方法的性能,并总结了结果。因此,本文可以作为比较指南,以帮助用户选择最适合其任务的方法。
translated by 谷歌翻译
在过去十年中,图形内核引起了很多关注,并在结构化数据上发展成为一种快速发展的学习分支。在过去的20年中,该领域发生的相当大的研究活动导致开发数十个图形内核,每个图形内核都对焦于图形的特定结构性质。图形内核已成功地成功地在广泛的域中,从社交网络到生物信息学。本调查的目标是提供图形内核的文献的统一视图。特别是,我们概述了各种图形内核。此外,我们对公共数据集的几个内核进行了实验评估,并提供了比较研究。最后,我们讨论图形内核的关键应用,并概述了一些仍有待解决的挑战。
translated by 谷歌翻译
图表神经网络(GNNS)已广泛应用于推荐任务,并获得了非常吸引人的性能。然而,大多数基于GNN的推荐方法在实践中遭受数据稀疏问题。同时,预训练技术在减轻了各个领域(如自然语言处理(NLP)和计算机视觉(CV)等域中的数据稀疏而取得了巨大成功。因此,图形预培训具有扩大基于GNN的建议的数据稀疏的巨大潜力。但是,预先培训GNN,建议面临独特的挑战。例如,不同推荐任务中的用户项交互图具有不同的用户和项目集,并且它们通常存在不同的属性。因此,在NLP和CV中常用的成功机制将知识从预训练任务转移到下游任务,例如共享所学习的嵌入式或特征提取器,而不是直接适用于现有的基于GNN的推荐模型。为了解决这些挑战,我们精致地设计了一个自适应图形预训练框架,用于本地化协作滤波(适应)。它不需要传输用户/项目嵌入式,并且能够跨越不同图的共同知识和每个图形的唯一性。广泛的实验结果表明了适应的有效性和优越性。
translated by 谷歌翻译
推荐系统预测用户在项目中的潜在兴趣,其中核心是学习用户/项目嵌入品。然而,它遭受了数据稀疏问题,跨域推荐可以缓解。但是,大多数事先有效共同学习源域和目标域模型,或者需要侧面特征。然而,由于学习的嵌入由包含偏置信息的源域主导,共同训练和侧面特征将影响目标域上的预测。受到当代艺术在图形表示学习的预训练中的启发,我们提出了一种用于跨域推荐的预先训练和微调图。我们设计了一种用于跨域推荐(PCREC)的新型预训练图神经网络,其采用了图形编码器的对比自我监督的预训练。然后,我们传输预先训练的图形编码器以初始化目标域上的节点嵌入,这有益于目标域上的单个域推荐系统的微调。实验结果表明了PCRec的优越性。详细分析验证了PCRec在传输信息中的优越性,同时避免来自源域的偏差。
translated by 谷歌翻译
由于现实世界图形/网络数据中的广泛标签稀缺问题,因此,自我监督的图形神经网络(GNN)非常需要。曲线图对比度学习(GCL),通过训练GNN以其不同的增强形式最大化相同图表之间的表示之间的对应关系,即使在不使用标签的情况下也可以产生稳健和可转移的GNN。然而,GNN由传统的GCL培训经常冒险捕获冗余图形特征,因此可能是脆弱的,并在下游任务中提供子对比。在这里,我们提出了一种新的原理,称为普通的普通GCL(AD-GCL),其使GNN能够通过优化GCL中使用的对抗性图形增强策略来避免在训练期间捕获冗余信息。我们将AD-GCL与理论解释和设计基于可训练的边缘滴加图的实际实例化。我们通过与最先进的GCL方法进行了实验验证了AD-GCL,并在无监督,6 \%$ 14 \%$ 6 \%$ 14 \%$ 6 \%$ 6 \%$ 3 \%$ 3 \%$达到半监督总体学习设置,具有18个不同的基准数据集,用于分子属性回归和分类和社交网络分类。
translated by 谷歌翻译
对比学习在图表学习领域表现出了巨大的希望。通过手动构建正/负样本,大多数图对比度学习方法依赖于基于矢量内部产品的相似性度量标准来区分图形表示样品。但是,手工制作的样品构建(例如,图表的节点或边缘的扰动)可能无法有效捕获图形的固有局部结构。同样,基于矢量内部产品的相似性度量标准无法完全利用图形的局部结构来表征图差。为此,在本文中,我们提出了一种基于自适应子图生成的新型对比度学习框架,以实现有效且强大的自我监督图表示学习,并且最佳传输距离被用作子绘图之间的相似性度量。它的目的是通过捕获图的固有结构来生成对比样品,并根据子图的特征和结构同时区分样品。具体而言,对于每个中心节点,通过自适应学习关系权重与相应邻域的节点,我们首先开发一个网络来生成插值子图。然后,我们分别构建来自相同和不同节点的子图的正和负对。最后,我们采用两种类型的最佳运输距离(即Wasserstein距离和Gromov-Wasserstein距离)来构建结构化的对比损失。基准数据集上的广泛节点分类实验验证了我们的图形对比学习方法的有效性。
translated by 谷歌翻译
Inspired by the impressive success of contrastive learning (CL), a variety of graph augmentation strategies have been employed to learn node representations in a self-supervised manner. Existing methods construct the contrastive samples by adding perturbations to the graph structure or node attributes. Although impressive results are achieved, it is rather blind to the wealth of prior information assumed: with the increase of the perturbation degree applied on the original graph, 1) the similarity between the original graph and the generated augmented graph gradually decreases; 2) the discrimination between all nodes within each augmented view gradually increases. In this paper, we argue that both such prior information can be incorporated (differently) into the contrastive learning paradigm following our general ranking framework. In particular, we first interpret CL as a special case of learning to rank (L2R), which inspires us to leverage the ranking order among positive augmented views. Meanwhile, we introduce a self-ranking paradigm to ensure that the discriminative information among different nodes can be maintained and also be less altered to the perturbations of different degrees. Experiment results on various benchmark datasets verify the effectiveness of our algorithm compared with the supervised and unsupervised models.
translated by 谷歌翻译
Graph structure learning (GSL), which aims to learn the adjacency matrix for graph neural networks (GNNs), has shown great potential in boosting the performance of GNNs. Most existing GSL works apply a joint learning framework where the estimated adjacency matrix and GNN parameters are optimized for downstream tasks. However, as GSL is essentially a link prediction task, whose goal may largely differ from the goal of the downstream task. The inconsistency of these two goals limits the GSL methods to learn the potential optimal graph structure. Moreover, the joint learning framework suffers from scalability issues in terms of time and space during the process of estimation and optimization of the adjacency matrix. To mitigate these issues, we propose a graph structure refinement (GSR) framework with a pretrain-finetune pipeline. Specifically, The pre-training phase aims to comprehensively estimate the underlying graph structure by a multi-view contrastive learning framework with both intra- and inter-view link prediction tasks. Then, the graph structure is refined by adding and removing edges according to the edge probabilities estimated by the pre-trained model. Finally, the fine-tuning GNN is initialized by the pre-trained model and optimized toward downstream tasks. With the refined graph structure remaining static in the fine-tuning space, GSR avoids estimating and optimizing graph structure in the fine-tuning phase which enjoys great scalability and efficiency. Moreover, the fine-tuning GNN is boosted by both migrating knowledge and refining graphs. Extensive experiments are conducted to evaluate the effectiveness (best performance on six benchmark datasets), efficiency, and scalability (13.8x faster using 32.8% GPU memory compared to the best GSL baseline on Cora) of the proposed model.
translated by 谷歌翻译
Machine learning on graphs is an important and ubiquitous task with applications ranging from drug design to friendship recommendation in social networks. The primary challenge in this domain is finding a way to represent, or encode, graph structure so that it can be easily exploited by machine learning models. Traditionally, machine learning approaches relied on user-defined heuristics to extract features encoding structural information about a graph (e.g., degree statistics or kernel functions). However, recent years have seen a surge in approaches that automatically learn to encode graph structure into low-dimensional embeddings, using techniques based on deep learning and nonlinear dimensionality reduction. Here we provide a conceptual review of key advancements in this area of representation learning on graphs, including matrix factorization-based methods, random-walk based algorithms, and graph neural networks. We review methods to embed individual nodes as well as approaches to embed entire (sub)graphs. In doing so, we develop a unified framework to describe these recent approaches, and we highlight a number of important applications and directions for future work.
translated by 谷歌翻译
图形内核是历史上最广泛使用的图形分类任务的技术。然而,由于图的手工制作的组合特征,这些方法具有有限的性能。近年来,由于其性能卓越,图形神经网络(GNNS)已成为与下游图形相关任务的最先进的方法。大多数GNN基于消息传递神经网络(MPNN)框架。然而,最近的研究表明,MPNN不能超过Weisfeiler-Lehman(WL)算法在图形同构术中的力量。为了解决现有图形内核和GNN方法的限制,在本文中,我们提出了一种新的GNN框架,称为\ Texit {内核图形神经网络}(Kernnns),该框架将图形内核集成到GNN的消息传递过程中。通过卷积神经网络(CNNS)中的卷积滤波器的启发,KERGNNS采用可训练的隐藏图作为绘图过滤器,该绘图过滤器与子图组合以使用图形内核更新节点嵌入式。此外,我们表明MPNN可以被视为Kergnns的特殊情况。我们将Kergnns应用于多个与图形相关的任务,并使用交叉验证来与基准进行公平比较。我们表明,与现有的现有方法相比,我们的方法达到了竞争性能,证明了增加GNN的表现能力的可能性。我们还表明,KERGNNS中的训练有素的图形过滤器可以揭示数据集的本地图形结构,与传统GNN模型相比,显着提高了模型解释性。
translated by 谷歌翻译
在异质图上的自我监督学习(尤其是对比度学习)方法可以有效地摆脱对监督数据的依赖。同时,大多数现有的表示学习方法将异质图嵌入到欧几里得或双曲线的单个几何空间中。这种单个几何视图通常不足以观察由于其丰富的语义和复杂结构而观察到异质图的完整图片。在这些观察结果下,本文提出了一种新型的自我监督学习方法,称为几何对比度学习(GCL),以更好地表示监督数据是不可用时的异质图。 GCL同时观察了从欧几里得和双曲线观点的异质图,旨在强烈合并建模丰富的语义和复杂结构的能力,这有望为下游任务带来更多好处。 GCL通过在局部局部和局部全球语义水平上对比表示两种几何视图之间的相互信息。在四个基准数据集上进行的广泛实验表明,在三个任务上,所提出的方法在包括节点分类,节点群集和相似性搜索在内的三个任务上都超过了强基础,包括无监督的方法和监督方法。
translated by 谷歌翻译