深图形聚类,旨在揭示底层的图形结构并将节点划分为不同的群体,近年来引起了密集的关注。然而,我们观察到,在节点编码的过程中,现有方法遭受表示崩溃,这倾向于将所有数据映射到相同的表示中。因此,节点表示的鉴别能力是有限的,导致不满足的聚类性能。为了解决这个问题,我们提出了一种新颖的自我监督的深图聚类方法,通过以双向还原信息相关性来称呼双重关联减少网络(DCRN)。具体而言,在我们的方法中,我们首先将暹罗网络设计为编码样本。然后通过强制跨视图样本相关矩阵和跨视图特征相关矩阵分别近似两个标识矩阵,我们减少了双级的信息相关性,从而提高了所得特征的判别能力。此外,为了减轻通过在GCN中过度平滑引起的表示崩溃,我们引入了传播正规化术语,使网络能够利用浅网络结构获得远程信息。六个基准数据集的广泛实验结果证明了提出的DCRN对现有最先进方法的有效性。
translated by 谷歌翻译
Benefiting from the intrinsic supervision information exploitation capability, contrastive learning has achieved promising performance in the field of deep graph clustering recently. However, we observe that two drawbacks of the positive and negative sample construction mechanisms limit the performance of existing algorithms from further improvement. 1) The quality of positive samples heavily depends on the carefully designed data augmentations, while inappropriate data augmentations would easily lead to the semantic drift and indiscriminative positive samples. 2) The constructed negative samples are not reliable for ignoring important clustering information. To solve these problems, we propose a Cluster-guided Contrastive deep Graph Clustering network (CCGC) by mining the intrinsic supervision information in the high-confidence clustering results. Specifically, instead of conducting complex node or edge perturbation, we construct two views of the graph by designing special Siamese encoders whose weights are not shared between the sibling sub-networks. Then, guided by the high-confidence clustering information, we carefully select and construct the positive samples from the same high-confidence cluster in two views. Moreover, to construct semantic meaningful negative sample pairs, we regard the centers of different high-confidence clusters as negative samples, thus improving the discriminative capability and reliability of the constructed sample pairs. Lastly, we design an objective function to pull close the samples from the same cluster while pushing away those from other clusters by maximizing and minimizing the cross-view cosine similarity between positive and negative samples. Extensive experimental results on six datasets demonstrate the effectiveness of CCGC compared with the existing state-of-the-art algorithms.
translated by 谷歌翻译
对比度学习最近引起了深度群集的充满希望的表现。但是,复杂的数据增强和耗时的图卷积操作破坏了这些方法的效率。为了解决此问题,我们提出了一种简单的对比度图聚类(SCGC)算法,以从网络体系结构,数据增强和目标函数的角度改进现有方法。至于架构,我们的网络包括两个主要部分,即预处理和网络骨干。一个简单的低通denoising操作将邻居信息聚合作为独立的预处理,仅包括两个多层感知器(MLP)作为骨干。对于数据增强,我们没有通过图形引入复杂操作,而是通过设计参数UNSHARED SIAMESE编码并直接损坏节点嵌入的参数来构造同一顶点的两个增强视图。最后,关于目标函数,为了进一步提高聚类性能,新型的跨视图结构一致性目标函数旨在增强学习网络的判别能力。七个基准数据集的广泛实验结果验证了我们提出的算法的有效性和优势。值得注意的是,我们的算法的表现超过了最近的对比群集竞争对手,平均速度至少七倍。
translated by 谷歌翻译
近年来,图形神经网络(GNNS)在半监督节点分类中实现了有希望的性能。但是,监督不足的问题以及代表性崩溃,在很大程度上限制了GNN在该领域的性能。为了减轻半监督场景中节点表示的崩溃,我们提出了一种新型的图形对比学习方法,称为混合图对比度网络(MGCN)。在我们的方法中,我们通过扩大决策边界的边距并提高潜在表示的跨视图一致性来提高潜在特征的歧视能力。具体而言,我们首先采用了基于插值的策略来在潜在空间中进行数据增强,然后迫使预测模型在样本之间进行线性更改。其次,我们使学习的网络能够通过强迫跨视图的相关矩阵近似身份矩阵来分开两个插值扰动视图的样品。通过结合两个设置,我们从丰富的未标记节点和罕见但有价值的标记节点中提取丰富的监督信息,以进行判别表示学习。六个数据集的广泛实验结果证明了与现有最​​新方法相比,MGCN的有效性和普遍性。
translated by 谷歌翻译
Contrastive deep graph clustering, which aims to divide nodes into disjoint groups via contrastive mechanisms, is a challenging research spot. Among the recent works, hard sample mining-based algorithms have achieved great attention for their promising performance. However, we find that the existing hard sample mining methods have two problems as follows. 1) In the hardness measurement, the important structural information is overlooked for similarity calculation, degrading the representativeness of the selected hard negative samples. 2) Previous works merely focus on the hard negative sample pairs while neglecting the hard positive sample pairs. Nevertheless, samples within the same cluster but with low similarity should also be carefully learned. To solve the problems, we propose a novel contrastive deep graph clustering method dubbed Hard Sample Aware Network (HSAN) by introducing a comprehensive similarity measure criterion and a general dynamic sample weighing strategy. Concretely, in our algorithm, the similarities between samples are calculated by considering both the attribute embeddings and the structure embeddings, better revealing sample relationships and assisting hardness measurement. Moreover, under the guidance of the carefully collected high-confidence clustering information, our proposed weight modulating function will first recognize the positive and negative samples and then dynamically up-weight the hard sample pairs while down-weighting the easy ones. In this way, our method can mine not only the hard negative samples but also the hard positive sample, thus improving the discriminative capability of the samples further. Extensive experiments and analyses demonstrate the superiority and effectiveness of our proposed method.
translated by 谷歌翻译
图形表示学习(GRL)属性缺失的图表,这是一个常见的难以具有挑战性的问题,最近引起了相当大的关注。我们观察到现有文献:1)隔离属性和结构嵌入的学习因此未能采取两种类型的信息的充分优势; 2)对潜伏空间变量的分布假设施加过于严格的分布假设,从而导致差异较少的特征表示。在本文中,基于在两个信息源之间引入亲密信息交互的想法,我们提出了我们的暹罗属性丢失的图形自动编码器(SAGA)。具体而言,已经进行了三种策略。首先,我们通过引入暹罗网络结构来共享两个进程学习的参数来纠缠嵌入属性嵌入和结构嵌入,这允许网络培训从更丰富和不同的信息中受益。其次,我们介绍了一个K到最近的邻居(knn)和结构约束,增强了学习机制,通过过滤不可靠的连接来提高缺失属性的潜在特征的质量。第三,我们手动掩盖多个相邻矩阵上的连接,并强力嵌入子网恢复真正的相邻矩阵,从而强制实现所得到的网络能够选择性地利用更高级别的判别特征来进行数据完成。六个基准数据集上的广泛实验表明了我们传奇的优越性,反对最先进的方法。
translated by 谷歌翻译
Graph contrastive learning is an important method for deep graph clustering. The existing methods first generate the graph views with stochastic augmentations and then train the network with a cross-view consistency principle. Although good performance has been achieved, we observe that the existing augmentation methods are usually random and rely on pre-defined augmentations, which is insufficient and lacks negotiation between the final clustering task. To solve the problem, we propose a novel Graph Contrastive Clustering method with the Learnable graph Data Augmentation (GCC-LDA), which is optimized completely by the neural networks. An adversarial learning mechanism is designed to keep cross-view consistency in the latent space while ensuring the diversity of augmented views. In our framework, a structure augmentor and an attribute augmentor are constructed for augmentation learning in both structure level and attribute level. To improve the reliability of the learned affinity matrix, clustering is introduced to the learning procedure and the learned affinity matrix is refined with both the high-confidence pseudo-label matrix and the cross-view sample similarity matrix. During the training procedure, to provide persistent optimization for the learned view, we design a two-stage training strategy to obtain more reliable clustering information. Extensive experimental results demonstrate the effectiveness of GCC-LDA on six benchmark datasets.
translated by 谷歌翻译
归因图群集是图形分析字段中最重要的任务之一,其目的是将具有相似表示的节点分组到没有手动指导的情况下。基于图形对比度学习的最新研究在处理图形结构数据方面取得了令人印象深刻的结果。但是,现有的基于图形学习的方法1)不要直接解决聚类任务,因为表示和聚类过程是分开的; 2)过多地取决于图数据扩展,这极大地限制了对比度学习的能力; 3)忽略子空间聚类的对比度消息。为了适应上述问题,我们提出了一个通用框架,称为双重对比归因于图形聚类网络(DCAGC)。在DCAGC中,通过利用邻里对比模块,将最大化邻居节点的相似性,并提高节点表示的质量。同时,对比度自我表达模块是通过在自我表达层重建之前和之后最小化节点表示形式来构建的,以获得用于光谱群集的区分性自我表达矩阵。 DCAGC的所有模块均在统一框架中训练和优化,因此学习的节点表示包含面向群集的消息。与16种最先进的聚类方法相比,四个属性图数据集的大量实验结果显示了DCAGC的优势。本文的代码可在https://github.com/wangtong627/dual-contrastive-attributed-graph-cluster-clustering-network上获得。
translated by 谷歌翻译
Multi-view attributed graph clustering is an important approach to partition multi-view data based on the attribute feature and adjacent matrices from different views. Some attempts have been made in utilizing Graph Neural Network (GNN), which have achieved promising clustering performance. Despite this, few of them pay attention to the inherent specific information embedded in multiple views. Meanwhile, they are incapable of recovering the latent high-level representation from the low-level ones, greatly limiting the downstream clustering performance. To fill these gaps, a novel Dual Information enhanced multi-view Attributed Graph Clustering (DIAGC) method is proposed in this paper. Specifically, the proposed method introduces the Specific Information Reconstruction (SIR) module to disentangle the explorations of the consensus and specific information from multiple views, which enables GCN to capture the more essential low-level representations. Besides, the Mutual Information Maximization (MIM) module maximizes the agreement between the latent high-level representation and low-level ones, and enables the high-level representation to satisfy the desired clustering structure with the help of the Self-supervised Clustering (SC) module. Extensive experiments on several real-world benchmarks demonstrate the effectiveness of the proposed DIAGC method compared with the state-of-the-art baselines.
translated by 谷歌翻译
现有的深度嵌入聚类工作仅考虑最深层的学习功能嵌入,因此未能利用来自群集分配的可用辨别信息,从而产生性能限制。为此,我们提出了一种新颖的方法,即深入关注引导的图形聚类与双自我监督(DAGC)。具体地,DAGC首先利用异质性 - 方向融合模块,以便于在每个层中自适应地集成自动编码器的特征和图形卷积网络,然后使用尺度明智的融合模块动态地连接不同层中的多尺度特征。这种模块能够通过基于注意的机制学习歧视特征。此外,我们设计了一种分配明智的融合模块,它利用群集分配直接获取聚类结果。为了更好地探索集群分配的歧视信息,我们开发了一种双重自我监督解决方案,包括软自我监督策略,具有三联kullback-Leibler发散损失和具有伪监督损失的硬自我监督策略。广泛的实验验证了我们的方法在六个基准数据集中始终如一地优于最先进的方法。特别是,我们的方法通过最佳基线超过18.14%的方法将ARI提高。
translated by 谷歌翻译
基于图形的多视图聚类,旨在跨多种视图获取数据分区,近年来接受了相当大的关注。虽然已经为基于图形的多视图群集进行了巨大努力,但它对各种视图融合特征仍然是一个挑战,以学习聚类的常见表示。在本文中,我们提出了一种新的一致多曲线图嵌入聚类框架(CMGEC)。具体地,设计了一种多图自动编码器(M-GAE),用于使用多图注意融合编码器灵活地编码多视图数据的互补信息。为了引导所学过的公共表示维护每个视图中相邻特征的相似性,引入了多视图相互信息最大化模块(MMIM)。此外,设计了一个图形融合网络(GFN),以探讨来自不同视图的图表之间的关系,并提供M-GAE所需的常见共识图。通过联合训练这些模型,可以获得共同的潜在表示,其从多个视图中编码更多互补信息,并更全面地描绘数据。三种类型的多视图数据集的实验表明CMGEC优于最先进的聚类方法。
translated by 谷歌翻译
深度聚类最近引起了极大的关注。尽管取得了显着的进展,但以前的大多数深度聚类作品仍有两个局限性。首先,其中许多集中在某些基于分布的聚类损失上,缺乏通过对比度学习来利用样本(或增强)关系的能力。其次,他们经常忽略了间接样本结构信息,从而忽略了多尺度邻里结构学习的丰富可能性。鉴于这一点,本文提出了一种新的深聚类方法,称为图像聚类,其中包括对比度学习和多尺度图卷积网络(IcicleGCN),该网络(ICICELGCN)也弥合了卷积神经网络(CNN)和图形卷积网络(GCN)之间的差距。作为对比度学习与图像聚类任务的多尺度邻域结构学习之间的差距。所提出的IcicleGCN框架由四个主要模块组成,即基于CNN的主链,实例相似性模块(ISM),关节群集结构学习和实例重建模块(JC-SLIM)和多尺度GCN模块(M -GCN)。具体而言,在每个图像上执行了两个随机增强,使用两个重量共享视图的骨干网络用于学习增强样品的表示形式,然后将其馈送到ISM和JC-SLIM以进行实例级别和集群级别的对比度分别学习。此外,为了实施多尺度的邻域结构学习,通过(i)通过(i)层次融合的层相互作用和(ii)共同自适应学习确保他们的最后一层,同时对两个GCN和自动编码器进行了同时培训。层输出分布保持一致。多个图像数据集上的实验证明了IcicleGCN优于最先进的群集性能。
translated by 谷歌翻译
由于在建模相互依存系统中,由于其高效用,多层图已经在许多领域获得了大量的研究。然而,多层图的聚类,其旨在将图形节点划分为类别或社区,仍处于新生阶段。现有方法通常限于利用MultiView属性或多个网络,并忽略更复杂和更丰富的网络框架。为此,我们向多层图形聚类提出了一种名为Multidayer agal对比聚类网络(MGCCN)的多层图形聚类的通用和有效的AutoEncoder框架。 MGCCN由三个模块组成:(1)应用机制以更好地捕获节点与邻居之间的相关性以获得更好的节点嵌入。 (2)更好地探索不同网络中的一致信息,引入了对比融合策略。 (3)MGCCN采用自我监督的组件,可迭代地增强节点嵌入和聚类。对不同类型的真实图数据数据的广泛实验表明我们所提出的方法优于最先进的技术。
translated by 谷歌翻译
这项工作为聚类提供了无监督的深入判别分析。该方法基于深层神经网络,旨在最大程度地减少群集内差异,并以无监督的方式最大化集群间差异。该方法能够将数据投射到具有紧凑和不同分布模式的非线性低维潜在空间中,以便可以有效地识别数据簇。我们进一步提供了该方法的扩展,以便可以有效利用可用的图形信息来提高聚类性能。带有或没有图形信息的图像和非图像数据的广泛数值结果证明了所提出的方法的有效性。
translated by 谷歌翻译
图形相似性学习是指计算两个图之间的相似性得分,这在许多现实的应用程序(例如视觉跟踪,图形分类和协作过滤)中需要。由于大多数现有的图形神经网络产生了单个图的有效图表,因此几乎没有努力共同学习两个图表并计算其相似性得分。此外,现有的无监督图相似性学习方法主要基于聚类,它忽略了图对中体现的有价值的信息。为此,我们提出了一个对比度图匹配网络(CGMN),以进行自我监督的图形相似性学习,以计算任何两个输入图对象之间的相似性。具体而言,我们分别在一对中为每个图生成两个增强视图。然后,我们采用两种策略,即跨视图相互作用和跨刻画相互作用,以实现有效的节点表示学习。前者求助于两种观点中节点表示的一致性。后者用于识别不同图之间的节点差异。最后,我们通过汇总操作进行图形相似性计算将节点表示形式转换为图形表示。我们已经在八个现实世界数据集上评估了CGMN,实验结果表明,所提出的新方法优于图形相似性学习下游任务的最新方法。
translated by 谷歌翻译
Network embedding (NE) approaches have emerged as a predominant technique to represent complex networks and have benefited numerous tasks. However, most NE approaches rely on a homophily assumption to learn embeddings with the guidance of supervisory signals, leaving the unsupervised heterophilous scenario relatively unexplored. This problem becomes especially relevant in fields where a scarcity of labels exists. Here, we formulate the unsupervised NE task as an r-ego network discrimination problem and develop the SELENE framework for learning on networks with homophily and heterophily. Specifically, we design a dual-channel feature embedding pipeline to discriminate r-ego networks using node attributes and structural information separately. We employ heterophily adapted self-supervised learning objective functions to optimise the framework to learn intrinsic node embeddings. We show that SELENE's components improve the quality of node embeddings, facilitating the discrimination of connected heterophilous nodes. Comprehensive empirical evaluations on both synthetic and real-world datasets with varying homophily ratios validate the effectiveness of SELENE in homophilous and heterophilous settings showing an up to 12.52% clustering accuracy gain.
translated by 谷歌翻译
基于Web的交互可以经常由归因图表示,并且在这些图中的节点聚类最近受到了很多关注。多次努力已成功应用图形卷积网络(GCN),但由于GCNS已被显示出遭受过平滑问题的GCNS的精度一些限制。虽然其他方法(特别是基于拉普拉斯平滑的方法)已经报告了更好的准确性,但所有工作的基本限制都是缺乏可扩展性。本文通过将LAPLACIAN平滑与广义的PageRank相同,并将随机步行基于算法应用为可伸缩图滤波器来解决这一打开问题。这构成了我们可扩展的深度聚类算法RWSL的基础,其中通过自我监督的迷你批量培训机制,我们同时优化了一个深度神经网络,用于采样集群分配分配和AutoEncoder,用于群集导向的嵌入。使用6个现实世界数据集和6个聚类指标,我们表明RWSL实现了几个最近基线的结果。最值得注意的是,我们显示与所有其他深度聚类框架不同的RWSL可以继续以超过一百万个节点的图形扩展,即句柄。我们还演示了RWSL如何在仅使用单个GPU的18亿边缘的图表上执行节点聚类。
translated by 谷歌翻译
尽管图表学习(GRL)取得了重大进展,但要以足够的方式提取和嵌入丰富的拓扑结构和特征信息仍然是一个挑战。大多数现有方法都集中在本地结构上,并且无法完全融合全球拓扑结构。为此,我们提出了一种新颖的结构保留图表学习(SPGRL)方法,以完全捕获图的结构信息。具体而言,为了减少原始图的不确定性和错误信息,我们通过k-nearest邻居方法构建了特征图作为互补视图。该特征图可用于对比节点级别以捕获本地关系。此外,我们通过最大化整个图形和特征嵌入的相互信息(MI)来保留全局拓扑结构信息,从理论上讲,该信息可以简化为交换功能的特征嵌入和原始图以重建本身。广泛的实验表明,我们的方法在半监督节点分类任务上具有相当出色的性能,并且在图形结构或节点特征上噪声扰动下的鲁棒性出色。
translated by 谷歌翻译
最近,最大化的互信息是一种强大的无监测图表表示学习的方法。现有方法通常有效地从拓扑视图中捕获信息但忽略特征视图。为了规避这个问题,我们通过利用功能和拓扑视图利用互信息最大化提出了一种新的方法。具体地,我们首先利用多视图表示学习模块来更好地捕获跨图形上的特征和拓扑视图的本地和全局信息内容。为了模拟由特征和拓扑空间共享的信息,我们使用相互信息最大化和重建损耗最小化开发公共表示学习模块。要明确鼓励图形表示之间的多样性在相同的视图中,我们还引入了一个分歧正则化,以扩大同一视图之间的表示之间的距离。合成和实际数据集的实验证明了集成功能和拓扑视图的有效性。特别是,与先前的监督方法相比,我们所提出的方法可以在无监督的代表和线性评估协议下实现可比或甚至更好的性能。
translated by 谷歌翻译
Graph Contrastive Learning (GCL) has recently drawn much research interest for learning generalizable node representations in a self-supervised manner. In general, the contrastive learning process in GCL is performed on top of the representations learned by a graph neural network (GNN) backbone, which transforms and propagates the node contextual information based on its local neighborhoods. However, nodes sharing similar characteristics may not always be geographically close, which poses a great challenge for unsupervised GCL efforts due to their inherent limitations in capturing such global graph knowledge. In this work, we address their inherent limitations by proposing a simple yet effective framework -- Simple Neural Networks with Structural and Semantic Contrastive Learning} (S^3-CL). Notably, by virtue of the proposed structural and semantic contrastive learning algorithms, even a simple neural network can learn expressive node representations that preserve valuable global structural and semantic patterns. Our experiments demonstrate that the node representations learned by S^3-CL achieve superior performance on different downstream tasks compared with the state-of-the-art unsupervised GCL methods. Implementation and more experimental details are publicly available at \url{https://github.com/kaize0409/S-3-CL.}
translated by 谷歌翻译