Existing graph contrastive learning methods rely on augmentation techniques based on random perturbations (e.g., randomly adding or dropping edges and nodes). Nevertheless, altering certain edges or nodes can unexpectedly change the graph characteristics, and choosing the optimal perturbing ratio for each dataset requires onerous manual tuning. In this paper, we introduce Implicit Graph Contrastive Learning (iGCL), which utilizes augmentations in the latent space learned from a Variational Graph Auto-Encoder by reconstructing graph topological structure. Importantly, instead of explicitly sampling augmentations from latent distributions, we further propose an upper bound for the expected contrastive loss to improve the efficiency of our learning algorithm. Thus, graph semantics can be preserved within the augmentations in an intelligent way without arbitrary manual design or prior human knowledge. Experimental results on both graph-level and node-level tasks show that the proposed method achieves state-of-the-art performance compared to other benchmarks, where ablation studies in the end demonstrate the effectiveness of modules in iGCL.
translated by 谷歌翻译
Graph Contrastive Learning (GCL) has recently drawn much research interest for learning generalizable node representations in a self-supervised manner. In general, the contrastive learning process in GCL is performed on top of the representations learned by a graph neural network (GNN) backbone, which transforms and propagates the node contextual information based on its local neighborhoods. However, nodes sharing similar characteristics may not always be geographically close, which poses a great challenge for unsupervised GCL efforts due to their inherent limitations in capturing such global graph knowledge. In this work, we address their inherent limitations by proposing a simple yet effective framework -- Simple Neural Networks with Structural and Semantic Contrastive Learning} (S^3-CL). Notably, by virtue of the proposed structural and semantic contrastive learning algorithms, even a simple neural network can learn expressive node representations that preserve valuable global structural and semantic patterns. Our experiments demonstrate that the node representations learned by S^3-CL achieve superior performance on different downstream tasks compared with the state-of-the-art unsupervised GCL methods. Implementation and more experimental details are publicly available at \url{https://github.com/kaize0409/S-3-CL.}
translated by 谷歌翻译
图级表示在各种现实世界中至关重要,例如预测分子的特性。但是实际上,精确的图表注释通常非常昂贵且耗时。为了解决这个问题,图形对比学习构造实例歧视任务,将正面对(同一图的增强对)汇总在一起,并将负面对(不同图的增强对)推开,以进行无监督的表示。但是,由于为了查询,其负面因素是从所有图中均匀抽样的,因此现有方法遭受关键采样偏置问题的损失,即,否定物可能与查询具有相同的语义结构,从而导致性能降解。为了减轻这种采样偏见问题,在本文中,我们提出了一种典型的图形对比度学习(PGCL)方法。具体而言,PGCL通过将语义相似的图形群群归为同一组的群集数据的基础语义结构,并同时鼓励聚类的一致性,以实现同一图的不同增强。然后给出查询,它通过从与查询群集不同的群集中绘制图形进行负采样,从而确保查询及其阴性样本之间的语义差异。此外,对于查询,PGCL根据其原型(集群质心)和查询原型之间的距离进一步重新重新重新重新重新享受其负样本,从而使那些具有中等原型距离的负面因素具有相对较大的重量。事实证明,这种重新加权策略比统一抽样更有效。各种图基准的实验结果证明了我们的PGCL比最新方法的优势。代码可在https://github.com/ha-lins/pgcl上公开获取。
translated by 谷歌翻译
无监督的图形表示学习是图形数据的非琐碎主题。在结构化数据的无监督代表学习中对比学习和自我监督学习的成功激发了图表上的类似尝试。使用对比损耗的当前无监督的图形表示学习和预培训主要基于手工增强图数据之间的对比度。但是,由于不可预测的不变性,图数据增强仍然没有很好地探索。在本文中,我们提出了一种新颖的协作图形神经网络对比学习框架(CGCL),它使用多个图形编码器来观察图形。不同视图观察的特征充当了图形编码器之间对比学习的图表增强,避免了任何扰动以保证不变性。 CGCL能够处理图形级和节点级表示学习。广泛的实验表明CGCL在无监督的图表表示学习中的优势以及图形表示学习的手工数据增强组合的非必要性。
translated by 谷歌翻译
Inspired by the impressive success of contrastive learning (CL), a variety of graph augmentation strategies have been employed to learn node representations in a self-supervised manner. Existing methods construct the contrastive samples by adding perturbations to the graph structure or node attributes. Although impressive results are achieved, it is rather blind to the wealth of prior information assumed: with the increase of the perturbation degree applied on the original graph, 1) the similarity between the original graph and the generated augmented graph gradually decreases; 2) the discrimination between all nodes within each augmented view gradually increases. In this paper, we argue that both such prior information can be incorporated (differently) into the contrastive learning paradigm following our general ranking framework. In particular, we first interpret CL as a special case of learning to rank (L2R), which inspires us to leverage the ranking order among positive augmented views. Meanwhile, we introduce a self-ranking paradigm to ensure that the discriminative information among different nodes can be maintained and also be less altered to the perturbations of different degrees. Experiment results on various benchmark datasets verify the effectiveness of our algorithm compared with the supervised and unsupervised models.
translated by 谷歌翻译
Generalizable, transferrable, and robust representation learning on graph-structured data remains a challenge for current graph neural networks (GNNs). Unlike what has been developed for convolutional neural networks (CNNs) for image data, self-supervised learning and pre-training are less explored for GNNs. In this paper, we propose a graph contrastive learning (GraphCL) framework for learning unsupervised representations of graph data. We first design four types of graph augmentations to incorporate various priors. We then systematically study the impact of various combinations of graph augmentations on multiple datasets, in four different settings: semi-supervised, unsupervised, and transfer learning as well as adversarial attacks. The results show that, even without tuning augmentation extents nor using sophisticated GNN architectures, our GraphCL framework can produce graph representations of similar or better generalizability, transferrability, and robustness compared to state-of-the-art methods. We also investigate the impact of parameterized graph augmentation extents and patterns, and observe further performance gains in preliminary experiments. Our codes are available at: https://github.com/Shen-Lab/GraphCL.
translated by 谷歌翻译
尽管图表学习(GRL)取得了重大进展,但要以足够的方式提取和嵌入丰富的拓扑结构和特征信息仍然是一个挑战。大多数现有方法都集中在本地结构上,并且无法完全融合全球拓扑结构。为此,我们提出了一种新颖的结构保留图表学习(SPGRL)方法,以完全捕获图的结构信息。具体而言,为了减少原始图的不确定性和错误信息,我们通过k-nearest邻居方法构建了特征图作为互补视图。该特征图可用于对比节点级别以捕获本地关系。此外,我们通过最大化整个图形和特征嵌入的相互信息(MI)来保留全局拓扑结构信息,从理论上讲,该信息可以简化为交换功能的特征嵌入和原始图以重建本身。广泛的实验表明,我们的方法在半监督节点分类任务上具有相当出色的性能,并且在图形结构或节点特征上噪声扰动下的鲁棒性出色。
translated by 谷歌翻译
对比度学习是图表学习中的有效无监督方法,对比度学习的关键组成部分在于构建正和负样本。以前的方法通常利用图中节点的接近度作为原理。最近,基于数据增强的对比度学习方法已进步以显示视觉域中的强大力量,一些作品将此方法从图像扩展到图形。但是,与图像上的数据扩展不同,图上的数据扩展远不那么直观,而且很难提供高质量的对比样品,这为改进留出了很大的空间。在这项工作中,通过引入一个对抗性图视图以进行数据增强,我们提出了一种简单但有效的方法,对抗图对比度学习(ARIEL),以在合理的约束中提取信息性的对比样本。我们开发了一种称为稳定训练的信息正则化的新技术,并使用子图抽样以进行可伸缩。我们通过将每个图形实例视为超级节点,从节点级对比度学习到图级。 Ariel始终优于在现实世界数据集上的节点级别和图形级分类任务的当前图对比度学习方法。我们进一步证明,面对对抗性攻击,Ariel更加强大。
translated by 谷歌翻译
对比学习已被广​​泛应用于图形表示学习,其中观测发生器在产生有效的对比样本方面发挥着重要作用。大多数现有的对比学习方法采用预定义的视图生成方法,例如节点滴或边缘扰动,这通常不能适应输入数据或保持原始语义结构。为了解决这个问题,我们提出了一份名为自动化图形对比学习(AutoGCL)的小说框架。具体而言,AutoGCL采用一组由自动增强策略协调的一组学习图形视图生成器,其中每个图形视图生成器都会学习输入调节的图形的概率分布。虽然AutoGCL中的图形视图发生器在生成每个对比样本中保留原始图的最代表性结构,但自动增强学会在整个对比学习程序中介绍适当的增强差异的政策。此外,AutoGCL采用联合培训策略,以培训学习的视图发生器,图形编码器和分类器以端到端的方式,导致拓扑异质性,在产生对比样本时的语义相似性。关于半监督学习,无监督学习和转移学习的广泛实验展示了我们在图形对比学习中的最先进的自动支持者框架的优越性。此外,可视化结果进一步证实,与现有的视图生成方法相比,可学习的视图发生器可以提供更紧凑和语义有意义的对比样本。
translated by 谷歌翻译
Recently, contrastive learning (CL) has emerged as a successful method for unsupervised graph representation learning. Most graph CL methods first perform stochastic augmentation on the input graph to obtain two graph views and maximize the agreement of representations in the two views. Despite the prosperous development of graph CL methods, the design of graph augmentation schemes-a crucial component in CL-remains rarely explored. We argue that the data augmentation schemes should preserve intrinsic structures and attributes of graphs, which will force the model to learn representations that are insensitive to perturbation on unimportant nodes and edges. However, most existing methods adopt uniform data augmentation schemes, like uniformly dropping edges and uniformly shuffling features, leading to suboptimal performance. In this paper, we propose a novel graph contrastive representation learning method with adaptive augmentation that incorporates various priors for topological and semantic aspects of the graph. Specifically, on the topology level, we design augmentation schemes based on node centrality measures to highlight important connective structures. On the node attribute level, we corrupt node features by adding more noise to unimportant node features, to enforce the model to recognize underlying semantic information. We perform extensive experiments of node classification on a variety of real-world datasets. Experimental results demonstrate that our proposed method consistently outperforms existing state-of-the-art baselines and even surpasses some supervised counterparts, which validates the effectiveness of the proposed contrastive framework with adaptive augmentation. CCS CONCEPTS• Computing methodologies → Unsupervised learning; Neural networks; Learning latent representations.
translated by 谷歌翻译
图对比度学习(GCL)改善了图表的学习,从而导致SOTA在各种下游任务上。图扩大步骤是GCL的重要但几乎没有研究的步骤。在本文中,我们表明,通过图表增强获得的节点嵌入是高度偏差的,在某种程度上限制了从学习下游任务的学习区分特征的对比模型。隐藏功能(功能增强)。受到所谓矩阵草图的启发,我们提出了Costa,这是GCL的一种新颖的协变功能空间增强框架,该框架通过维护原始功能的``好草图''来生成增强功能。为了强调Costa的特征增强功能的优势,我们研究了一个保存记忆和计算的单视图设置(除了多视图ONE)。我们表明,与基于图的模型相比,带有Costa的功能增强功能可比较/更好。
translated by 谷歌翻译
对比度学习是图表学习中有效的无监督方法。最近,基于数据增强的对比度学习方法已从图像扩展到图形。但是,大多数先前的作品都直接根据为图像设计的模型进行了调整。与图像上的数据增强不同,图表上的数据扩展远不那么直观,而且很难提供高质量的对比样本,这是对比度学习模型的性能的关键。这为改进现有图形对比学习框架留出了很多空间。在这项工作中,通过引入对抗图视图和信息正常化程序,我们提出了一种简单但有效的方法,即对逆向对比度学习(ARIEL),以在合理的约束中提取信息性的对比样本。它始终优于各种现实世界数据集的节点分类任务中当前的图形对比度学习方法,并进一步提高了图对比度学习的鲁棒性。
translated by 谷歌翻译
图表表示学习(GRL)对于图形结构数据分析至关重要。然而,大多数现有的图形神经网络(GNNS)严重依赖于标签信息,这通常是在现实世界中获得的昂贵。现有无监督的GRL方法遭受某些限制,例如对单调对比和可扩展性有限的沉重依赖。为了克服上述问题,鉴于最近的图表对比学习的进步,我们通过曲线图介绍了一种新颖的自我监控图形表示学习算法,即通过利用所提出的调整变焦方案来学习节点表示来学习节点表示。具体地,该机制使G-Zoom能够从多个尺度的图表中探索和提取自我监督信号:MICRO(即,节点级别),MESO(即,邻域级)和宏(即,子图级) 。首先,我们通过两个不同的图形增强生成输入图的两个增强视图。然后,我们逐渐地从节点,邻近逐渐为上述三个尺度建立三种不同的对比度,在那里我们最大限度地提高了横跨尺度的图形表示之间的协议。虽然我们可以从微距和宏观视角上从给定图中提取有价值的线索,但是邻域级对比度基于我们的调整后的缩放方案提供了可自定义选项的能力,以便手动选择位于微观和介于微观之间的最佳视点宏观透视更好地理解图数据。此外,为了使我们的模型可扩展到大图,我们采用了并行图形扩散方法来从图形尺寸下解耦模型训练。我们对现实世界数据集进行了广泛的实验,结果表明,我们所提出的模型始终始终优于最先进的方法。
translated by 谷歌翻译
在本文中,我们研究了在非全粒图上进行节点表示学习的自我监督学习的问题。现有的自我监督学习方法通​​常假定该图是同质的,其中链接的节点通常属于同一类或具有相似的特征。但是,这种同质性的假设在现实图表中并不总是正确的。我们通过为图神经网络开发脱钩的自我监督学习(DSSL)框架来解决这个问题。 DSSL模仿了节点的生成过程和语义结构的潜在变量建模的链接,该过程将不同邻域之间的不同基础语义解散到自我监督的节点学习过程中。我们的DSSL框架对编码器不可知,不需要预制的增强,因此对不同的图表灵活。为了通过潜在变量有效地优化框架,我们得出了自我监督目标的较低范围的证据,并开发了具有变异推理的可扩展培训算法。我们提供理论分析,以证明DSSL享有更好的下游性能。与竞争性的自我监督学习基线相比,对各种类图基准的广泛实验表明,我们提出的框架可以显着取得更好的性能。
translated by 谷歌翻译
Graph contrastive learning is an important method for deep graph clustering. The existing methods first generate the graph views with stochastic augmentations and then train the network with a cross-view consistency principle. Although good performance has been achieved, we observe that the existing augmentation methods are usually random and rely on pre-defined augmentations, which is insufficient and lacks negotiation between the final clustering task. To solve the problem, we propose a novel Graph Contrastive Clustering method with the Learnable graph Data Augmentation (GCC-LDA), which is optimized completely by the neural networks. An adversarial learning mechanism is designed to keep cross-view consistency in the latent space while ensuring the diversity of augmented views. In our framework, a structure augmentor and an attribute augmentor are constructed for augmentation learning in both structure level and attribute level. To improve the reliability of the learned affinity matrix, clustering is introduced to the learning procedure and the learned affinity matrix is refined with both the high-confidence pseudo-label matrix and the cross-view sample similarity matrix. During the training procedure, to provide persistent optimization for the learned view, we design a two-stage training strategy to obtain more reliable clustering information. Extensive experimental results demonstrate the effectiveness of GCC-LDA on six benchmark datasets.
translated by 谷歌翻译
图形对比学习(GCL)已成为学习图形无监督表示的有效工具。关键思想是通过数据扩展最大化每个图的两个增强视图之间的一致性。现有的GCL模型主要集中在给定情况下的所有图表上应用\ textit {相同的增强策略}。但是,实际图通常不是单态,而是各种本质的抽象。即使在相同的情况下(例如,大分子和在线社区),不同的图形可能需要各种增强来执行有效的GCL。因此,盲目地增强所有图表而不考虑其个人特征可能会破坏GCL艺术的表现。 {a} u Mentigation(GPA),通过允许每个图选择自己的合适的增强操作来推进常规GCL。本质上,GPA根据其拓扑属性和节点属性通过可学习的增强选择器为每个图定制了量身定制的增强策略,该策略是插件模块,可以通过端到端的下游GCL型号有效地训练。来自不同类型和域的11个基准图的广泛实验证明了GPA与最先进的竞争对手的优势。此外,通过可视化不同类型的数据集中学习的增强分布,我们表明GPA可以有效地识别最合适的数据集每个图的增强基于其特征。
translated by 谷歌翻译
图是对物体之间关系的强大表示,吸引了很多关注。图形学习的一个基本挑战是如何在没有标签的情况下训练有效的图形神经网络(GNN)编码器,这些标签既昂贵又耗时。对比学习(CL)是应对这一挑战的最受欢迎的范式之一,该挑战通过区分正和负节点对来训练GNN。尽管最近的CL方法取得了成功,但仍然存在两个爆炸案。首先,如何减少基于随机拓扑的数据增强引入的语义错误。传统CL通过节点级拓扑接近定义正和负节点对,该节点拓扑接近度仅基于图形拓扑,而不论节点属性的语义信息如何,因此某些语义上相似的节点可能被错误地视为负对。其次,如何有效地对现实图形的多重性进行建模,其中节点通过各种关系连接,并且每个关系都可以形成均匀的图层。为了解决这些问题,我们提出了一种新型的多重异质图原型对比度倾斜(X-GAL)框架来提取节点嵌入。 X-GOAL由两个组成部分组成:目标框架,该目标框架学习每个均匀图层的节点嵌入,以及一个对齐正则化,通过对齐层特定的节点嵌入来共同对不同的层进行模拟不同的层。具体而言,目标框架通过简洁的图形转换技术捕获节点级信息,并通过将节点拉到嵌入空间中的同一语义簇中,从而捕获群集级信息。对齐正则化在节点和群集级别的层上对齐嵌入。我们在各种现实世界数据集和下游任务上评估X-GAL,以证明其有效性。
translated by 谷歌翻译
Contrastive learning (CL), which can extract the information shared between different contrastive views, has become a popular paradigm for vision representation learning. Inspired by the success in computer vision, recent work introduces CL into graph modeling, dubbed as graph contrastive learning (GCL). However, generating contrastive views in graphs is more challenging than that in images, since we have little prior knowledge on how to significantly augment a graph without changing its labels. We argue that typical data augmentation techniques (e.g., edge dropping) in GCL cannot generate diverse enough contrastive views to filter out noises. Moreover, previous GCL methods employ two view encoders with exactly the same neural architecture and tied parameters, which further harms the diversity of augmented views. To address this limitation, we propose a novel paradigm named model augmented GCL (MA-GCL), which will focus on manipulating the architectures of view encoders instead of perturbing graph inputs. Specifically, we present three easy-to-implement model augmentation tricks for GCL, namely asymmetric, random and shuffling, which can respectively help alleviate high- frequency noises, enrich training instances and bring safer augmentations. All three tricks are compatible with typical data augmentations. Experimental results show that MA-GCL can achieve state-of-the-art performance on node classification benchmarks by applying the three tricks on a simple base model. Extensive studies also validate our motivation and the effectiveness of each trick. (Code, data and appendix are available at https://github.com/GXM1141/MA-GCL. )
translated by 谷歌翻译
近年来,自我监督学习(SSL)已广泛探索。特别是,生成的SSL在自然语言处理和其他AI领域(例如BERT和GPT的广泛采用)中获得了新的成功。尽管如此,对比度学习 - 严重依赖结构数据的增强和复杂的培训策略,这是图SSL的主要方法,而迄今为止,生成SSL在图形上的进度(尤其是GAES)尚未达到潜在的潜力。正如其他领域所承诺的。在本文中,我们确定并检查对GAE的发展产生负面影响的问题,包括其重建目标,训练鲁棒性和错误指标。我们提出了一个蒙版的图形自动编码器Graphmae,该图可以减轻这些问题,以预处理生成性自我监督图。我们建议没有重建图形结构,而是提议通过掩盖策略和缩放余弦误差将重点放在特征重建上,从而使GraphMae的强大训练受益。我们在21个公共数据集上进行了大量实验,以实现三个不同的图形学习任务。结果表明,Graphmae-A简单的图形自动编码器具有仔细的设计-CAN始终在对比度和生成性最新基准相比,始终产生优于性的表现。这项研究提供了对图自动编码器的理解,并证明了在图上的生成自我监督预训练的潜力。
translated by 谷歌翻译
图神经网络(GNN)在学习图表表示方面取得了巨大成功,从而促进了各种与图形相关的任务。但是,大多数GNN方法都采用监督的学习设置,由于难以获得标记的数据,因此在现实世界中并不总是可行的。因此,图表自学学习一直在吸引越来越多的关注。图对比度学习(GCL)是自我监督学习的代表性框架。通常,GCL通过将语义上相似的节点(阳性样品)和不同的节点(阴性样品)与锚节点进行对比来学习节点表示。没有访问标签,通常通过数据增强产生阳性样品,而负样品是从整个图中均匀采样的,这导致了亚最佳目标。具体而言,数据增强自然限制了该过程中涉及的正样本的数量(通常只采用一个阳性样本)。另一方面,随机采样过程不可避免地选择假阴性样品(样品与锚共享相同的语义)。这些问题限制了GCL的学习能力。在这项工作中,我们提出了一个增强的目标,以解决上述问题。我们首先引入了一个不可能实现的理想目标,该目标包含所有正样本,没有假阴性样本。然后,基于对阳性和负样品进行采样的分布,将这个理想的目标转化为概率形式。然后,我们以节点相似性对这些分布进行建模,并得出增强的目标。各种数据集上的全面实验证明了在不同设置下提出的增强目标的有效性。
translated by 谷歌翻译