随着对比学习的兴起,无人监督的图形表示学习最近一直蓬勃发展,甚至超过了一些机器学习任务中的监督对应物。图表表示的大多数对比模型学习侧重于最大化本地和全局嵌入之间的互信息,或主要取决于节点级别的对比嵌入。然而,它们仍然不足以全面探索网络拓扑的本地和全球视图。虽然前者认为本地全球关系,但其粗略的全球信息导致本地和全球观点之间的思考。后者注重节点级别对齐,以便全局视图的作用出现不起眼。为避免落入这两个极端情况,我们通过对比群集分配来提出一种新颖的无监督图形表示模型,称为GCCA。通过组合聚类算法和对比学习,它有动力综合利用本地和全球信息。这不仅促进了对比效果,而且还提供了更高质量的图形信息。同时,GCCA进一步挖掘群集级信息,这使得它能够了解除了图形拓扑之外的节点之间的难以捉摸的关联。具体地,我们首先使用不同的图形增强策略生成两个增强的图形,然后使用聚类算法分别获取其群集分配和原型。所提出的GCCA进一步强制不同增强图中的相同节点来通过最小化交叉熵损失来互相识别它们的群集分配。为了展示其有效性,我们将在三个不同的下游任务中与最先进的模型进行比较。实验结果表明,GCCA在大多数任务中具有强大的竞争力。
translated by 谷歌翻译
关于图表的深度学习最近吸引了重要的兴趣。然而,大多数作品都侧重于(半)监督学习,导致缺点包括重标签依赖,普遍性差和弱势稳健性。为了解决这些问题,通过良好设计的借口任务在不依赖于手动标签的情况下提取信息知识的自我监督学习(SSL)已成为图形数据的有希望和趋势的学习范例。与计算机视觉和自然语言处理等其他域的SSL不同,图表上的SSL具有独家背景,设计理念和分类。在图表的伞下自我监督学习,我们对采用图表数据采用SSL技术的现有方法及时及全面的审查。我们构建一个统一的框架,数学上正式地规范图表SSL的范例。根据借口任务的目标,我们将这些方法分为四类:基于生成的,基于辅助性的,基于对比的和混合方法。我们进一步描述了曲线图SSL在各种研究领域的应用,并总结了绘图SSL的常用数据集,评估基准,性能比较和开源代码。最后,我们讨论了该研究领域的剩余挑战和潜在的未来方向。
translated by 谷歌翻译
图表表示学习(GRL)对于图形结构数据分析至关重要。然而,大多数现有的图形神经网络(GNNS)严重依赖于标签信息,这通常是在现实世界中获得的昂贵。现有无监督的GRL方法遭受某些限制,例如对单调对比和可扩展性有限的沉重依赖。为了克服上述问题,鉴于最近的图表对比学习的进步,我们通过曲线图介绍了一种新颖的自我监控图形表示学习算法,即通过利用所提出的调整变焦方案来学习节点表示来学习节点表示。具体地,该机制使G-Zoom能够从多个尺度的图表中探索和提取自我监督信号:MICRO(即,节点级别),MESO(即,邻域级)和宏(即,子图级) 。首先,我们通过两个不同的图形增强生成输入图的两个增强视图。然后,我们逐渐地从节点,邻近逐渐为上述三个尺度建立三种不同的对比度,在那里我们最大限度地提高了横跨尺度的图形表示之间的协议。虽然我们可以从微距和宏观视角上从给定图中提取有价值的线索,但是邻域级对比度基于我们的调整后的缩放方案提供了可自定义选项的能力,以便手动选择位于微观和介于微观之间的最佳视点宏观透视更好地理解图数据。此外,为了使我们的模型可扩展到大图,我们采用了并行图形扩散方法来从图形尺寸下解耦模型训练。我们对现实世界数据集进行了广泛的实验,结果表明,我们所提出的模型始终始终优于最先进的方法。
translated by 谷歌翻译
Inspired by the impressive success of contrastive learning (CL), a variety of graph augmentation strategies have been employed to learn node representations in a self-supervised manner. Existing methods construct the contrastive samples by adding perturbations to the graph structure or node attributes. Although impressive results are achieved, it is rather blind to the wealth of prior information assumed: with the increase of the perturbation degree applied on the original graph, 1) the similarity between the original graph and the generated augmented graph gradually decreases; 2) the discrimination between all nodes within each augmented view gradually increases. In this paper, we argue that both such prior information can be incorporated (differently) into the contrastive learning paradigm following our general ranking framework. In particular, we first interpret CL as a special case of learning to rank (L2R), which inspires us to leverage the ranking order among positive augmented views. Meanwhile, we introduce a self-ranking paradigm to ensure that the discriminative information among different nodes can be maintained and also be less altered to the perturbations of different degrees. Experiment results on various benchmark datasets verify the effectiveness of our algorithm compared with the supervised and unsupervised models.
translated by 谷歌翻译
Recently, contrastive learning (CL) has emerged as a successful method for unsupervised graph representation learning. Most graph CL methods first perform stochastic augmentation on the input graph to obtain two graph views and maximize the agreement of representations in the two views. Despite the prosperous development of graph CL methods, the design of graph augmentation schemes-a crucial component in CL-remains rarely explored. We argue that the data augmentation schemes should preserve intrinsic structures and attributes of graphs, which will force the model to learn representations that are insensitive to perturbation on unimportant nodes and edges. However, most existing methods adopt uniform data augmentation schemes, like uniformly dropping edges and uniformly shuffling features, leading to suboptimal performance. In this paper, we propose a novel graph contrastive representation learning method with adaptive augmentation that incorporates various priors for topological and semantic aspects of the graph. Specifically, on the topology level, we design augmentation schemes based on node centrality measures to highlight important connective structures. On the node attribute level, we corrupt node features by adding more noise to unimportant node features, to enforce the model to recognize underlying semantic information. We perform extensive experiments of node classification on a variety of real-world datasets. Experimental results demonstrate that our proposed method consistently outperforms existing state-of-the-art baselines and even surpasses some supervised counterparts, which validates the effectiveness of the proposed contrastive framework with adaptive augmentation. CCS CONCEPTS• Computing methodologies → Unsupervised learning; Neural networks; Learning latent representations.
translated by 谷歌翻译
尽管图表学习(GRL)取得了重大进展,但要以足够的方式提取和嵌入丰富的拓扑结构和特征信息仍然是一个挑战。大多数现有方法都集中在本地结构上,并且无法完全融合全球拓扑结构。为此,我们提出了一种新颖的结构保留图表学习(SPGRL)方法,以完全捕获图的结构信息。具体而言,为了减少原始图的不确定性和错误信息,我们通过k-nearest邻居方法构建了特征图作为互补视图。该特征图可用于对比节点级别以捕获本地关系。此外,我们通过最大化整个图形和特征嵌入的相互信息(MI)来保留全局拓扑结构信息,从理论上讲,该信息可以简化为交换功能的特征嵌入和原始图以重建本身。广泛的实验表明,我们的方法在半监督节点分类任务上具有相当出色的性能,并且在图形结构或节点特征上噪声扰动下的鲁棒性出色。
translated by 谷歌翻译
在异质图上的自我监督学习(尤其是对比度学习)方法可以有效地摆脱对监督数据的依赖。同时,大多数现有的表示学习方法将异质图嵌入到欧几里得或双曲线的单个几何空间中。这种单个几何视图通常不足以观察由于其丰富的语义和复杂结构而观察到异质图的完整图片。在这些观察结果下,本文提出了一种新型的自我监督学习方法,称为几何对比度学习(GCL),以更好地表示监督数据是不可用时的异质图。 GCL同时观察了从欧几里得和双曲线观点的异质图,旨在强烈合并建模丰富的语义和复杂结构的能力,这有望为下游任务带来更多好处。 GCL通过在局部局部和局部全球语义水平上对比表示两种几何视图之间的相互信息。在四个基准数据集上进行的广泛实验表明,在三个任务上,所提出的方法在包括节点分类,节点群集和相似性搜索在内的三个任务上都超过了强基础,包括无监督的方法和监督方法。
translated by 谷歌翻译
图级表示在各种现实世界中至关重要,例如预测分子的特性。但是实际上,精确的图表注释通常非常昂贵且耗时。为了解决这个问题,图形对比学习构造实例歧视任务,将正面对(同一图的增强对)汇总在一起,并将负面对(不同图的增强对)推开,以进行无监督的表示。但是,由于为了查询,其负面因素是从所有图中均匀抽样的,因此现有方法遭受关键采样偏置问题的损失,即,否定物可能与查询具有相同的语义结构,从而导致性能降解。为了减轻这种采样偏见问题,在本文中,我们提出了一种典型的图形对比度学习(PGCL)方法。具体而言,PGCL通过将语义相似的图形群群归为同一组的群集数据的基础语义结构,并同时鼓励聚类的一致性,以实现同一图的不同增强。然后给出查询,它通过从与查询群集不同的群集中绘制图形进行负采样,从而确保查询及其阴性样本之间的语义差异。此外,对于查询,PGCL根据其原型(集群质心)和查询原型之间的距离进一步重新重新重新重新重新享受其负样本,从而使那些具有中等原型距离的负面因素具有相对较大的重量。事实证明,这种重新加权策略比统一抽样更有效。各种图基准的实验结果证明了我们的PGCL比最新方法的优势。代码可在https://github.com/ha-lins/pgcl上公开获取。
translated by 谷歌翻译
Graph Contrastive Learning (GCL) has recently drawn much research interest for learning generalizable node representations in a self-supervised manner. In general, the contrastive learning process in GCL is performed on top of the representations learned by a graph neural network (GNN) backbone, which transforms and propagates the node contextual information based on its local neighborhoods. However, nodes sharing similar characteristics may not always be geographically close, which poses a great challenge for unsupervised GCL efforts due to their inherent limitations in capturing such global graph knowledge. In this work, we address their inherent limitations by proposing a simple yet effective framework -- Simple Neural Networks with Structural and Semantic Contrastive Learning} (S^3-CL). Notably, by virtue of the proposed structural and semantic contrastive learning algorithms, even a simple neural network can learn expressive node representations that preserve valuable global structural and semantic patterns. Our experiments demonstrate that the node representations learned by S^3-CL achieve superior performance on different downstream tasks compared with the state-of-the-art unsupervised GCL methods. Implementation and more experimental details are publicly available at \url{https://github.com/kaize0409/S-3-CL.}
translated by 谷歌翻译
Most existing deep learning models are trained based on the closed-world assumption, where the test data is assumed to be drawn i.i.d. from the same distribution as the training data, known as in-distribution (ID). However, when models are deployed in an open-world scenario, test samples can be out-of-distribution (OOD) and therefore should be handled with caution. To detect such OOD samples drawn from unknown distribution, OOD detection has received increasing attention lately. However, current endeavors mostly focus on grid-structured data and its application for graph-structured data remains under-explored. Considering the fact that data labeling on graphs is commonly time-expensive and labor-intensive, in this work we study the problem of unsupervised graph OOD detection, aiming at detecting OOD graphs solely based on unlabeled ID data. To achieve this goal, we develop a new graph contrastive learning framework GOOD-D for detecting OOD graphs without using any ground-truth labels. By performing hierarchical contrastive learning on the augmented graphs generated by our perturbation-free graph data augmentation method, GOOD-D is able to capture the latent ID patterns and accurately detect OOD graphs based on the semantic inconsistency in different granularities (i.e., node-level, graph-level, and group-level). As a pioneering work in unsupervised graph-level OOD detection, we build a comprehensive benchmark to compare our proposed approach with different state-of-the-art methods. The experiment results demonstrate the superiority of our approach over different methods on various datasets.
translated by 谷歌翻译
在过去的几年中,图表学习(GRL)是分析图形结构数据的有力策略。最近,GRL方法通过采用用于图像的学习表示形式而开发的自我监督学习方法来显示出令人鼓舞的结果。尽管它们成功了,但现有的GRL方法倾向于忽略图像和图形之间的固有区别,即,假定图像是独立和相同分布的,而图表在数据实例之间显示了关系信息,即节点。为了完全受益于图形结构数据中固有的关系信息,我们提出了一种名为RGRL的新颖GRL方法,该方法从图形本身生成的关系信息中学习。 RGRL学习节点表示形式,使节点之间的关系是增强的不变性,即增强不变的关系,只要保留节点之间的关系,就可以改变节点表示。通过在全球和本地观点中考虑节点之间的关系,RGRL克服了对对比和非对抗性方法的局限性,并实现了两者中最好的。在各种下游任务上对十四个基准数据集进行了广泛的实验,证明了RGRL优于最先进的基线。 RGRL的源代码可在https://github.com/namkyeong/rgrl上获得。
translated by 谷歌翻译
由于在建模相互依存系统中,由于其高效用,多层图已经在许多领域获得了大量的研究。然而,多层图的聚类,其旨在将图形节点划分为类别或社区,仍处于新生阶段。现有方法通常限于利用MultiView属性或多个网络,并忽略更复杂和更丰富的网络框架。为此,我们向多层图形聚类提出了一种名为Multidayer agal对比聚类网络(MGCCN)的多层图形聚类的通用和有效的AutoEncoder框架。 MGCCN由三个模块组成:(1)应用机制以更好地捕获节点与邻居之间的相关性以获得更好的节点嵌入。 (2)更好地探索不同网络中的一致信息,引入了对比融合策略。 (3)MGCCN采用自我监督的组件,可迭代地增强节点嵌入和聚类。对不同类型的真实图数据数据的广泛实验表明我们所提出的方法优于最先进的技术。
translated by 谷歌翻译
对比度学习是图表学习中的有效无监督方法,对比度学习的关键组成部分在于构建正和负样本。以前的方法通常利用图中节点的接近度作为原理。最近,基于数据增强的对比度学习方法已进步以显示视觉域中的强大力量,一些作品将此方法从图像扩展到图形。但是,与图像上的数据扩展不同,图上的数据扩展远不那么直观,而且很难提供高质量的对比样品,这为改进留出了很大的空间。在这项工作中,通过引入一个对抗性图视图以进行数据增强,我们提出了一种简单但有效的方法,对抗图对比度学习(ARIEL),以在合理的约束中提取信息性的对比样本。我们开发了一种称为稳定训练的信息正则化的新技术,并使用子图抽样以进行可伸缩。我们通过将每个图形实例视为超级节点,从节点级对比度学习到图级。 Ariel始终优于在现实世界数据集上的节点级别和图形级分类任务的当前图对比度学习方法。我们进一步证明,面对对抗性攻击,Ariel更加强大。
translated by 谷歌翻译
Inspired by the success of contrastive learning (CL) in computer vision and natural language processing, graph contrastive learning (GCL) has been developed to learn discriminative node representations on graph datasets. However, the development of GCL on Heterogeneous Information Networks (HINs) is still in the infant stage. For example, it is unclear how to augment the HINs without substantially altering the underlying semantics, and how to design the contrastive objective to fully capture the rich semantics. Moreover, early investigations demonstrate that CL suffers from sampling bias, whereas conventional debiasing techniques are empirically shown to be inadequate for GCL. How to mitigate the sampling bias for heterogeneous GCL is another important problem. To address the aforementioned challenges, we propose a novel Heterogeneous Graph Contrastive Multi-view Learning (HGCML) model. In particular, we use metapaths as the augmentation to generate multiple subgraphs as multi-views, and propose a contrastive objective to maximize the mutual information between any pairs of metapath-induced views. To alleviate the sampling bias, we further propose a positive sampling strategy to explicitly select positives for each node via jointly considering semantic and structural information preserved on each metapath view. Extensive experiments demonstrate HGCML consistently outperforms state-of-the-art baselines on five real-world benchmark datasets.
translated by 谷歌翻译
Clustering is a fundamental problem in network analysis that finds closely connected groups of nodes and separates them from other nodes in the graph, while link prediction is to predict whether two nodes in a network are likely to have a link. The definition of both naturally determines that clustering must play a positive role in obtaining accurate link prediction tasks. Yet researchers have long ignored or used inappropriate ways to undermine this positive relationship. In this article, We construct a simple but efficient clustering-driven link prediction framework(ClusterLP), with the goal of directly exploiting the cluster structures to obtain connections between nodes as accurately as possible in both undirected graphs and directed graphs. Specifically, we propose that it is easier to establish links between nodes with similar representation vectors and cluster tendencies in undirected graphs, while nodes in a directed graphs can more easily point to nodes similar to their representation vectors and have greater influence in their own cluster. We customized the implementation of ClusterLP for undirected and directed graphs, respectively, and the experimental results using multiple real-world networks on the link prediction task showed that our models is highly competitive with existing baseline models. The code implementation of ClusterLP and baselines we use are available at https://github.com/ZINUX1998/ClusterLP.
translated by 谷歌翻译
最近,最大化的互信息是一种强大的无监测图表表示学习的方法。现有方法通常有效地从拓扑视图中捕获信息但忽略特征视图。为了规避这个问题,我们通过利用功能和拓扑视图利用互信息最大化提出了一种新的方法。具体地,我们首先利用多视图表示学习模块来更好地捕获跨图形上的特征和拓扑视图的本地和全局信息内容。为了模拟由特征和拓扑空间共享的信息,我们使用相互信息最大化和重建损耗最小化开发公共表示学习模块。要明确鼓励图形表示之间的多样性在相同的视图中,我们还引入了一个分歧正则化,以扩大同一视图之间的表示之间的距离。合成和实际数据集的实验证明了集成功能和拓扑视图的有效性。特别是,与先前的监督方法相比,我们所提出的方法可以在无监督的代表和线性评估协议下实现可比或甚至更好的性能。
translated by 谷歌翻译
网络嵌入作为网络分析的有希望的研究领域出现。最近,通过将冗余还原原理应用于对应于图像样本的两个扭曲版本的嵌入向量,提出了一种名为Barlow双胞胎的方法。通过此激励,我们提出了Barlow Graph自动编码器,这是一个简单而有效的学习网络嵌入的架构。它旨在最大限度地提高节点的立即和较大邻域的嵌入向量之间的相似性,同时最小化这些投影的组件之间的冗余。此外,我们还介绍了名为Barlow变形图自动编码器的变型对应物。我们的方法产生了对归纳链路预测的有希望的结果,并且还涉及用于聚类和下游节点分类的领域,如广泛的三个基准引用数据集上的多种已知技术的广泛比较所证明的。
translated by 谷歌翻译
无监督的图形表示学习是图形数据的非琐碎主题。在结构化数据的无监督代表学习中对比学习和自我监督学习的成功激发了图表上的类似尝试。使用对比损耗的当前无监督的图形表示学习和预培训主要基于手工增强图数据之间的对比度。但是,由于不可预测的不变性,图数据增强仍然没有很好地探索。在本文中,我们提出了一种新颖的协作图形神经网络对比学习框架(CGCL),它使用多个图形编码器来观察图形。不同视图观察的特征充当了图形编码器之间对比学习的图表增强,避免了任何扰动以保证不变性。 CGCL能够处理图形级和节点级表示学习。广泛的实验表明CGCL在无监督的图表表示学习中的优势以及图形表示学习的手工数据增强组合的非必要性。
translated by 谷歌翻译
In recent years, graph representation learning has achieved remarkable success while suffering from low-quality data problems. As a mature technology to improve data quality in computer vision, data augmentation has also attracted increasing attention in graph domain. For promoting the development of this emerging research direction, in this survey, we comprehensively review and summarize the existing graph data augmentation (GDAug) techniques. Specifically, we first summarize a variety of feasible taxonomies, and then classify existing GDAug studies based on fine-grained graph elements. Furthermore, for each type of GDAug technique, we formalize the general definition, discuss the technical details, and give schematic illustration. In addition, we also summarize common performance metrics and specific design metrics for constructing a GDAug evaluation system. Finally, we summarize the applications of GDAug from both data and model levels, as well as future directions.
translated by 谷歌翻译
基于图的异常检测已被广泛用于检测现实世界应用中的恶意活动。迄今为止,现有的解决此问题的尝试集中在二进制分类制度中的结构特征工程或学习上。在这项工作中,我们建议利用图形对比编码,并提出监督的GCCAD模型,以将异常节点与正常节点的距离与全球环境(例如所有节点的平均值)相比。为了使用稀缺标签处理场景,我们通过设计用于生成合成节点标签的图形损坏策略,进一步使GCCAD成为一个自制的框架。为了实现对比目标,我们设计了一个图形神经网络编码器,该编码器可以在消息传递过程中推断并进一步删除可疑链接,并了解输入图的全局上下文。我们在四个公共数据集上进行了广泛的实验,表明1)GCCAD显着且始终如一地超过各种高级基线,2)其自我监督版本没有微调可以通过其完全监督的版本来实现可比性的性能。
translated by 谷歌翻译