Most existing deep learning models are trained based on the closed-world assumption, where the test data is assumed to be drawn i.i.d. from the same distribution as the training data, known as in-distribution (ID). However, when models are deployed in an open-world scenario, test samples can be out-of-distribution (OOD) and therefore should be handled with caution. To detect such OOD samples drawn from unknown distribution, OOD detection has received increasing attention lately. However, current endeavors mostly focus on grid-structured data and its application for graph-structured data remains under-explored. Considering the fact that data labeling on graphs is commonly time-expensive and labor-intensive, in this work we study the problem of unsupervised graph OOD detection, aiming at detecting OOD graphs solely based on unlabeled ID data. To achieve this goal, we develop a new graph contrastive learning framework GOOD-D for detecting OOD graphs without using any ground-truth labels. By performing hierarchical contrastive learning on the augmented graphs generated by our perturbation-free graph data augmentation method, GOOD-D is able to capture the latent ID patterns and accurately detect OOD graphs based on the semantic inconsistency in different granularities (i.e., node-level, graph-level, and group-level). As a pioneering work in unsupervised graph-level OOD detection, we build a comprehensive benchmark to compare our proposed approach with different state-of-the-art methods. The experiment results demonstrate the superiority of our approach over different methods on various datasets.
translated by 谷歌翻译
图表表示学习(GRL)对于图形结构数据分析至关重要。然而,大多数现有的图形神经网络(GNNS)严重依赖于标签信息,这通常是在现实世界中获得的昂贵。现有无监督的GRL方法遭受某些限制,例如对单调对比和可扩展性有限的沉重依赖。为了克服上述问题,鉴于最近的图表对比学习的进步,我们通过曲线图介绍了一种新颖的自我监控图形表示学习算法,即通过利用所提出的调整变焦方案来学习节点表示来学习节点表示。具体地,该机制使G-Zoom能够从多个尺度的图表中探索和提取自我监督信号:MICRO(即,节点级别),MESO(即,邻域级)和宏(即,子图级) 。首先,我们通过两个不同的图形增强生成输入图的两个增强视图。然后,我们逐渐地从节点,邻近逐渐为上述三个尺度建立三种不同的对比度,在那里我们最大限度地提高了横跨尺度的图形表示之间的协议。虽然我们可以从微距和宏观视角上从给定图中提取有价值的线索,但是邻域级对比度基于我们的调整后的缩放方案提供了可自定义选项的能力,以便手动选择位于微观和介于微观之间的最佳视点宏观透视更好地理解图数据。此外,为了使我们的模型可扩展到大图,我们采用了并行图形扩散方法来从图形尺寸下解耦模型训练。我们对现实世界数据集进行了广泛的实验,结果表明,我们所提出的模型始终始终优于最先进的方法。
translated by 谷歌翻译
图级表示在各种现实世界中至关重要,例如预测分子的特性。但是实际上,精确的图表注释通常非常昂贵且耗时。为了解决这个问题,图形对比学习构造实例歧视任务,将正面对(同一图的增强对)汇总在一起,并将负面对(不同图的增强对)推开,以进行无监督的表示。但是,由于为了查询,其负面因素是从所有图中均匀抽样的,因此现有方法遭受关键采样偏置问题的损失,即,否定物可能与查询具有相同的语义结构,从而导致性能降解。为了减轻这种采样偏见问题,在本文中,我们提出了一种典型的图形对比度学习(PGCL)方法。具体而言,PGCL通过将语义相似的图形群群归为同一组的群集数据的基础语义结构,并同时鼓励聚类的一致性,以实现同一图的不同增强。然后给出查询,它通过从与查询群集不同的群集中绘制图形进行负采样,从而确保查询及其阴性样本之间的语义差异。此外,对于查询,PGCL根据其原型(集群质心)和查询原型之间的距离进一步重新重新重新重新重新享受其负样本,从而使那些具有中等原型距离的负面因素具有相对较大的重量。事实证明,这种重新加权策略比统一抽样更有效。各种图基准的实验结果证明了我们的PGCL比最新方法的优势。代码可在https://github.com/ha-lins/pgcl上公开获取。
translated by 谷歌翻译
关于图表的深度学习最近吸引了重要的兴趣。然而,大多数作品都侧重于(半)监督学习,导致缺点包括重标签依赖,普遍性差和弱势稳健性。为了解决这些问题,通过良好设计的借口任务在不依赖于手动标签的情况下提取信息知识的自我监督学习(SSL)已成为图形数据的有希望和趋势的学习范例。与计算机视觉和自然语言处理等其他域的SSL不同,图表上的SSL具有独家背景,设计理念和分类。在图表的伞下自我监督学习,我们对采用图表数据采用SSL技术的现有方法及时及全面的审查。我们构建一个统一的框架,数学上正式地规范图表SSL的范例。根据借口任务的目标,我们将这些方法分为四类:基于生成的,基于辅助性的,基于对比的和混合方法。我们进一步描述了曲线图SSL在各种研究领域的应用,并总结了绘图SSL的常用数据集,评估基准,性能比较和开源代码。最后,我们讨论了该研究领域的剩余挑战和潜在的未来方向。
translated by 谷歌翻译
在异质图上的自我监督学习(尤其是对比度学习)方法可以有效地摆脱对监督数据的依赖。同时,大多数现有的表示学习方法将异质图嵌入到欧几里得或双曲线的单个几何空间中。这种单个几何视图通常不足以观察由于其丰富的语义和复杂结构而观察到异质图的完整图片。在这些观察结果下,本文提出了一种新型的自我监督学习方法,称为几何对比度学习(GCL),以更好地表示监督数据是不可用时的异质图。 GCL同时观察了从欧几里得和双曲线观点的异质图,旨在强烈合并建模丰富的语义和复杂结构的能力,这有望为下游任务带来更多好处。 GCL通过在局部局部和局部全球语义水平上对比表示两种几何视图之间的相互信息。在四个基准数据集上进行的广泛实验表明,在三个任务上,所提出的方法在包括节点分类,节点群集和相似性搜索在内的三个任务上都超过了强基础,包括无监督的方法和监督方法。
translated by 谷歌翻译
Graph Contrastive Learning (GCL) has recently drawn much research interest for learning generalizable node representations in a self-supervised manner. In general, the contrastive learning process in GCL is performed on top of the representations learned by a graph neural network (GNN) backbone, which transforms and propagates the node contextual information based on its local neighborhoods. However, nodes sharing similar characteristics may not always be geographically close, which poses a great challenge for unsupervised GCL efforts due to their inherent limitations in capturing such global graph knowledge. In this work, we address their inherent limitations by proposing a simple yet effective framework -- Simple Neural Networks with Structural and Semantic Contrastive Learning} (S^3-CL). Notably, by virtue of the proposed structural and semantic contrastive learning algorithms, even a simple neural network can learn expressive node representations that preserve valuable global structural and semantic patterns. Our experiments demonstrate that the node representations learned by S^3-CL achieve superior performance on different downstream tasks compared with the state-of-the-art unsupervised GCL methods. Implementation and more experimental details are publicly available at \url{https://github.com/kaize0409/S-3-CL.}
translated by 谷歌翻译
随着对比学习的兴起,无人监督的图形表示学习最近一直蓬勃发展,甚至超过了一些机器学习任务中的监督对应物。图表表示的大多数对比模型学习侧重于最大化本地和全局嵌入之间的互信息,或主要取决于节点级别的对比嵌入。然而,它们仍然不足以全面探索网络拓扑的本地和全球视图。虽然前者认为本地全球关系,但其粗略的全球信息导致本地和全球观点之间的思考。后者注重节点级别对齐,以便全局视图的作用出现不起眼。为避免落入这两个极端情况,我们通过对比群集分配来提出一种新颖的无监督图形表示模型,称为GCCA。通过组合聚类算法和对比学习,它有动力综合利用本地和全球信息。这不仅促进了对比效果,而且还提供了更高质量的图形信息。同时,GCCA进一步挖掘群集级信息,这使得它能够了解除了图形拓扑之外的节点之间的难以捉摸的关联。具体地,我们首先使用不同的图形增强策略生成两个增强的图形,然后使用聚类算法分别获取其群集分配和原型。所提出的GCCA进一步强制不同增强图中的相同节点来通过最小化交叉熵损失来互相识别它们的群集分配。为了展示其有效性,我们将在三个不同的下游任务中与最先进的模型进行比较。实验结果表明,GCCA在大多数任务中具有强大的竞争力。
translated by 谷歌翻译
Graph machine learning has been extensively studied in both academia and industry. Although booming with a vast number of emerging methods and techniques, most of the literature is built on the in-distribution hypothesis, i.e., testing and training graph data are identically distributed. However, this in-distribution hypothesis can hardly be satisfied in many real-world graph scenarios where the model performance substantially degrades when there exist distribution shifts between testing and training graph data. To solve this critical problem, out-of-distribution (OOD) generalization on graphs, which goes beyond the in-distribution hypothesis, has made great progress and attracted ever-increasing attention from the research community. In this paper, we comprehensively survey OOD generalization on graphs and present a detailed review of recent advances in this area. First, we provide a formal problem definition of OOD generalization on graphs. Second, we categorize existing methods into three classes from conceptually different perspectives, i.e., data, model, and learning strategy, based on their positions in the graph machine learning pipeline, followed by detailed discussions for each category. We also review the theories related to OOD generalization on graphs and introduce the commonly used graph datasets for thorough evaluations. Finally, we share our insights on future research directions. This paper is the first systematic and comprehensive review of OOD generalization on graphs, to the best of our knowledge.
translated by 谷歌翻译
无监督的图形表示学习是图形数据的非琐碎主题。在结构化数据的无监督代表学习中对比学习和自我监督学习的成功激发了图表上的类似尝试。使用对比损耗的当前无监督的图形表示学习和预培训主要基于手工增强图数据之间的对比度。但是,由于不可预测的不变性,图数据增强仍然没有很好地探索。在本文中,我们提出了一种新颖的协作图形神经网络对比学习框架(CGCL),它使用多个图形编码器来观察图形。不同视图观察的特征充当了图形编码器之间对比学习的图表增强,避免了任何扰动以保证不变性。 CGCL能够处理图形级和节点级表示学习。广泛的实验表明CGCL在无监督的图表表示学习中的优势以及图形表示学习的手工数据增强组合的非必要性。
translated by 谷歌翻译
Graph anomaly detection (GAD) is a vital task in graph-based machine learning and has been widely applied in many real-world applications. The primary goal of GAD is to capture anomalous nodes from graph datasets, which evidently deviate from the majority of nodes. Recent methods have paid attention to various scales of contrastive strategies for GAD, i.e., node-subgraph and node-node contrasts. However, they neglect the subgraph-subgraph comparison information which the normal and abnormal subgraph pairs behave differently in terms of embeddings and structures in GAD, resulting in sub-optimal task performance. In this paper, we fulfill the above idea in the proposed multi-view multi-scale contrastive learning framework with subgraph-subgraph contrast for the first practice. To be specific, we regard the original input graph as the first view and generate the second view by graph augmentation with edge modifications. With the guidance of maximizing the similarity of the subgraph pairs, the proposed subgraph-subgraph contrast contributes to more robust subgraph embeddings despite of the structure variation. Moreover, the introduced subgraph-subgraph contrast cooperates well with the widely-adopted node-subgraph and node-node contrastive counterparts for mutual GAD performance promotions. Besides, we also conduct sufficient experiments to investigate the impact of different graph augmentation approaches on detection performance. The comprehensive experimental results well demonstrate the superiority of our method compared with the state-of-the-art approaches and the effectiveness of the multi-view subgraph pair contrastive strategy for the GAD task.
translated by 谷歌翻译
基于图的异常检测已被广泛用于检测现实世界应用中的恶意活动。迄今为止,现有的解决此问题的尝试集中在二进制分类制度中的结构特征工程或学习上。在这项工作中,我们建议利用图形对比编码,并提出监督的GCCAD模型,以将异常节点与正常节点的距离与全球环境(例如所有节点的平均值)相比。为了使用稀缺标签处理场景,我们通过设计用于生成合成节点标签的图形损坏策略,进一步使GCCAD成为一个自制的框架。为了实现对比目标,我们设计了一个图形神经网络编码器,该编码器可以在消息传递过程中推断并进一步删除可疑链接,并了解输入图的全局上下文。我们在四个公共数据集上进行了广泛的实验,表明1)GCCAD显着且始终如一地超过各种高级基线,2)其自我监督版本没有微调可以通过其完全监督的版本来实现可比性的性能。
translated by 谷歌翻译
Inspired by the impressive success of contrastive learning (CL), a variety of graph augmentation strategies have been employed to learn node representations in a self-supervised manner. Existing methods construct the contrastive samples by adding perturbations to the graph structure or node attributes. Although impressive results are achieved, it is rather blind to the wealth of prior information assumed: with the increase of the perturbation degree applied on the original graph, 1) the similarity between the original graph and the generated augmented graph gradually decreases; 2) the discrimination between all nodes within each augmented view gradually increases. In this paper, we argue that both such prior information can be incorporated (differently) into the contrastive learning paradigm following our general ranking framework. In particular, we first interpret CL as a special case of learning to rank (L2R), which inspires us to leverage the ranking order among positive augmented views. Meanwhile, we introduce a self-ranking paradigm to ensure that the discriminative information among different nodes can be maintained and also be less altered to the perturbations of different degrees. Experiment results on various benchmark datasets verify the effectiveness of our algorithm compared with the supervised and unsupervised models.
translated by 谷歌翻译
在本文中,我们研究了在非全粒图上进行节点表示学习的自我监督学习的问题。现有的自我监督学习方法通​​常假定该图是同质的,其中链接的节点通常属于同一类或具有相似的特征。但是,这种同质性的假设在现实图表中并不总是正确的。我们通过为图神经网络开发脱钩的自我监督学习(DSSL)框架来解决这个问题。 DSSL模仿了节点的生成过程和语义结构的潜在变量建模的链接,该过程将不同邻域之间的不同基础语义解散到自我监督的节点学习过程中。我们的DSSL框架对编码器不可知,不需要预制的增强,因此对不同的图表灵活。为了通过潜在变量有效地优化框架,我们得出了自我监督目标的较低范围的证据,并开发了具有变异推理的可扩展培训算法。我们提供理论分析,以证明DSSL享有更好的下游性能。与竞争性的自我监督学习基线相比,对各种类图基准的广泛实验表明,我们提出的框架可以显着取得更好的性能。
translated by 谷歌翻译
Recently, graph anomaly detection has attracted increasing attention in data mining and machine learning communities. Apart from existing attribute anomalies, graph anomaly detection also captures suspicious topological-abnormal nodes that differ from the major counterparts. Although massive graph-based detection approaches have been proposed, most of them focus on node-level comparison while pay insufficient attention on the surrounding topology structures. Nodes with more dissimilar neighborhood substructures have more suspicious to be abnormal. To enhance the local substructure detection ability, we propose a novel Graph Anomaly Detection framework via Multi-scale Substructure Learning (GADMSL for abbreviation). Unlike previous algorithms, we manage to capture anomalous substructures where the inner similarities are relatively low in dense-connected regions. Specifically, we adopt a region proposal module to find high-density substructures in the network as suspicious regions. Their inner-node embedding similarities indicate the anomaly degree of the detected substructures. Generally, a lower degree of embedding similarities means a higher probability that the substructure contains topology anomalies. To distill better embeddings of node attributes, we further introduce a graph contrastive learning scheme, which observes attribute anomalies in the meantime. In this way, GADMSL can detect both topology and attribute anomalies. Ultimately, extensive experiments on benchmark datasets show that GADMSL greatly improves detection performance (up to 7.30% AUC and 17.46% AUPRC gains) compared to state-of-the-art attributed networks anomaly detection algorithms.
translated by 谷歌翻译
与其他图表相比,图形级异常检测(GAD)描述了检测其结构和/或其节点特征的图表的问题。GAD中的一个挑战是制定图表表示,该图表示能够检测本地和全局 - 异常图,即它们的细粒度(节点级)或整体(图级)属性异常的图形,分别。为了解决这一挑战,我们介绍了一种新的深度异常检测方法,用于通过图表和节点表示的联合随机蒸馏学习丰富的全球和局部正常模式信息。通过训练一个GNN来实现随机初始化网络权重的另一GNN来实现随机蒸馏。来自各种域的16个真实图形数据集的广泛实验表明,我们的模型显着优于七种最先进的模型。代码和数据集可以在https://git.io/llocalkd中获得。
translated by 谷歌翻译
Generalizable, transferrable, and robust representation learning on graph-structured data remains a challenge for current graph neural networks (GNNs). Unlike what has been developed for convolutional neural networks (CNNs) for image data, self-supervised learning and pre-training are less explored for GNNs. In this paper, we propose a graph contrastive learning (GraphCL) framework for learning unsupervised representations of graph data. We first design four types of graph augmentations to incorporate various priors. We then systematically study the impact of various combinations of graph augmentations on multiple datasets, in four different settings: semi-supervised, unsupervised, and transfer learning as well as adversarial attacks. The results show that, even without tuning augmentation extents nor using sophisticated GNN architectures, our GraphCL framework can produce graph representations of similar or better generalizability, transferrability, and robustness compared to state-of-the-art methods. We also investigate the impact of parameterized graph augmentation extents and patterns, and observe further performance gains in preliminary experiments. Our codes are available at: https://github.com/Shen-Lab/GraphCL.
translated by 谷歌翻译
图形对比学习(GCL)已成为学习图形无监督表示的有效工具。关键思想是通过数据扩展最大化每个图的两个增强视图之间的一致性。现有的GCL模型主要集中在给定情况下的所有图表上应用\ textit {相同的增强策略}。但是,实际图通常不是单态,而是各种本质的抽象。即使在相同的情况下(例如,大分子和在线社区),不同的图形可能需要各种增强来执行有效的GCL。因此,盲目地增强所有图表而不考虑其个人特征可能会破坏GCL艺术的表现。 {a} u Mentigation(GPA),通过允许每个图选择自己的合适的增强操作来推进常规GCL。本质上,GPA根据其拓扑属性和节点属性通过可学习的增强选择器为每个图定制了量身定制的增强策略,该策略是插件模块,可以通过端到端的下游GCL型号有效地训练。来自不同类型和域的11个基准图的广泛实验证明了GPA与最先进的竞争对手的优势。此外,通过可视化不同类型的数据集中学习的增强分布,我们表明GPA可以有效地识别最合适的数据集每个图的增强基于其特征。
translated by 谷歌翻译
尽管图表学习(GRL)取得了重大进展,但要以足够的方式提取和嵌入丰富的拓扑结构和特征信息仍然是一个挑战。大多数现有方法都集中在本地结构上,并且无法完全融合全球拓扑结构。为此,我们提出了一种新颖的结构保留图表学习(SPGRL)方法,以完全捕获图的结构信息。具体而言,为了减少原始图的不确定性和错误信息,我们通过k-nearest邻居方法构建了特征图作为互补视图。该特征图可用于对比节点级别以捕获本地关系。此外,我们通过最大化整个图形和特征嵌入的相互信息(MI)来保留全局拓扑结构信息,从理论上讲,该信息可以简化为交换功能的特征嵌入和原始图以重建本身。广泛的实验表明,我们的方法在半监督节点分类任务上具有相当出色的性能,并且在图形结构或节点特征上噪声扰动下的鲁棒性出色。
translated by 谷歌翻译
图形相似性学习是指计算两个图之间的相似性得分,这在许多现实的应用程序(例如视觉跟踪,图形分类和协作过滤)中需要。由于大多数现有的图形神经网络产生了单个图的有效图表,因此几乎没有努力共同学习两个图表并计算其相似性得分。此外,现有的无监督图相似性学习方法主要基于聚类,它忽略了图对中体现的有价值的信息。为此,我们提出了一个对比度图匹配网络(CGMN),以进行自我监督的图形相似性学习,以计算任何两个输入图对象之间的相似性。具体而言,我们分别在一对中为每个图生成两个增强视图。然后,我们采用两种策略,即跨视图相互作用和跨刻画相互作用,以实现有效的节点表示学习。前者求助于两种观点中节点表示的一致性。后者用于识别不同图之间的节点差异。最后,我们通过汇总操作进行图形相似性计算将节点表示形式转换为图形表示。我们已经在八个现实世界数据集上评估了CGMN,实验结果表明,所提出的新方法优于图形相似性学习下游任务的最新方法。
translated by 谷歌翻译
Inspired by the success of contrastive learning (CL) in computer vision and natural language processing, graph contrastive learning (GCL) has been developed to learn discriminative node representations on graph datasets. However, the development of GCL on Heterogeneous Information Networks (HINs) is still in the infant stage. For example, it is unclear how to augment the HINs without substantially altering the underlying semantics, and how to design the contrastive objective to fully capture the rich semantics. Moreover, early investigations demonstrate that CL suffers from sampling bias, whereas conventional debiasing techniques are empirically shown to be inadequate for GCL. How to mitigate the sampling bias for heterogeneous GCL is another important problem. To address the aforementioned challenges, we propose a novel Heterogeneous Graph Contrastive Multi-view Learning (HGCML) model. In particular, we use metapaths as the augmentation to generate multiple subgraphs as multi-views, and propose a contrastive objective to maximize the mutual information between any pairs of metapath-induced views. To alleviate the sampling bias, we further propose a positive sampling strategy to explicitly select positives for each node via jointly considering semantic and structural information preserved on each metapath view. Extensive experiments demonstrate HGCML consistently outperforms state-of-the-art baselines on five real-world benchmark datasets.
translated by 谷歌翻译