旨在识别不同网络中的相应节点的网络对齐任务对许多随后的应用程序具有重要意义。不需要标记的锚点链接,无监督的对准方法吸引了越来越多的关注。但是,由现有方法定义的拓扑一致性假设通常是低阶且准确的,因为仅考虑边缘式拓扑模式,这在无监督的环境中尤其有风险。为了重新定位对齐过程从低阶到高阶拓扑一致性的重点,在本文中,我们提出了一个名为HTC的完全无监督的网络对齐框架。提出的高阶拓扑一致性是基于边缘轨道制定的,将其合并到图形卷积网络的信息聚合过程中,以便将对齐一致性转换为节点嵌入的相似性。此外,编码器经过培训为多轨了解,然后进行完善以识别更受信任的锚点链接。通过整合所有不同的一致性顺序,可以全面评估节点对应关系。 {除了合理的理论分析外,所提出方法的优越性还通过广泛的实验评估得到了经验证明。在三对现实世界数据集和两对合成数据集上,我们的HTC始终以最少或可比的时间消耗优于各种各样的无监督和监督方法。由于我们的多轨道感知训练机制,它还表现出对结构噪声的鲁棒性。
translated by 谷歌翻译
Clustering is a fundamental problem in network analysis that finds closely connected groups of nodes and separates them from other nodes in the graph, while link prediction is to predict whether two nodes in a network are likely to have a link. The definition of both naturally determines that clustering must play a positive role in obtaining accurate link prediction tasks. Yet researchers have long ignored or used inappropriate ways to undermine this positive relationship. In this article, We construct a simple but efficient clustering-driven link prediction framework(ClusterLP), with the goal of directly exploiting the cluster structures to obtain connections between nodes as accurately as possible in both undirected graphs and directed graphs. Specifically, we propose that it is easier to establish links between nodes with similar representation vectors and cluster tendencies in undirected graphs, while nodes in a directed graphs can more easily point to nodes similar to their representation vectors and have greater influence in their own cluster. We customized the implementation of ClusterLP for undirected and directed graphs, respectively, and the experimental results using multiple real-world networks on the link prediction task showed that our models is highly competitive with existing baseline models. The code implementation of ClusterLP and baselines we use are available at https://github.com/ZINUX1998/ClusterLP.
translated by 谷歌翻译
图表可以模拟实体之间的复杂交互,它在许多重要的应用程序中自然出现。这些应用程序通常可以投入到标准图形学习任务中,其中关键步骤是学习低维图表示。图形神经网络(GNN)目前是嵌入方法中最受欢迎的模型。然而,邻域聚合范例中的标准GNN患有区分\ EMPH {高阶}图形结构的有限辨别力,而不是\ EMPH {低位}结构。为了捕获高阶结构,研究人员求助于主题和开发的基于主题的GNN。然而,现有的基于主基的GNN仍然仍然遭受较少的辨别力的高阶结构。为了克服上述局限性,我们提出了一个新颖的框架,以更好地捕获高阶结构的新框架,铰接于我们所提出的主题冗余最小化操作员和注射主题组合的新颖框架。首先,MGNN生成一组节点表示W.R.T.每个主题。下一阶段是我们在图案中提出的冗余最小化,该主题在彼此相互比较并蒸馏出每个主题的特征。最后,MGNN通过组合来自不同图案的多个表示来执行节点表示的更新。特别地,为了增强鉴别的功率,MGNN利用重新注射功能来组合表示的函数w.r.t.不同的主题。我们进一步表明,我们的拟议体系结构增加了GNN的表现力,具有理论分析。我们展示了MGNN在节点分类和图形分类任务上的七个公共基准上表现出最先进的方法。
translated by 谷歌翻译
We investigate the representation power of graph neural networks in the semisupervised node classification task under heterophily or low homophily, i.e., in networks where connected nodes may have different class labels and dissimilar features. Many popular GNNs fail to generalize to this setting, and are even outperformed by models that ignore the graph structure (e.g., multilayer perceptrons). Motivated by this limitation, we identify a set of key designs-ego-and neighbor-embedding separation, higher-order neighborhoods, and combination of intermediate representations-that boost learning from the graph structure under heterophily. We combine them into a graph neural network, H 2 GCN, which we use as the base method to empirically evaluate the effectiveness of the identified designs. Going beyond the traditional benchmarks with strong homophily, our empirical analysis shows that the identified designs increase the accuracy of GNNs by up to 40% and 27% over models without them on synthetic and real networks with heterophily, respectively, and yield competitive performance under homophily.
translated by 谷歌翻译
Aligning users across networks using graph representation learning has been found effective where the alignment is accomplished in a low-dimensional embedding space. Yet, achieving highly precise alignment is still challenging, especially when nodes with long-range connectivity to the labeled anchors are encountered. To alleviate this limitation, we purposefully designed WL-Align which adopts a regularized representation learning framework to learn distinctive node representations. It extends the Weisfeiler-Lehman Isormorphism Test and learns the alignment in alternating phases of "across-network Weisfeiler-Lehman relabeling" and "proximity-preserving representation learning". The across-network Weisfeiler-Lehman relabeling is achieved through iterating the anchor-based label propagation and a similarity-based hashing to exploit the known anchors' connectivity to different nodes in an efficient and robust manner. The representation learning module preserves the second-order proximity within individual networks and is regularized by the across-network Weisfeiler-Lehman hash labels. Extensive experiments on real-world and synthetic datasets have demonstrated that our proposed WL-Align outperforms the state-of-the-art methods, achieving significant performance improvements in the "exact matching" scenario. Data and code of WL-Align are available at https://github.com/ChenPengGang/WLAlignCode.
translated by 谷歌翻译
Graph Neural Networks (GNNs) have attracted increasing attention in recent years and have achieved excellent performance in semi-supervised node classification tasks. The success of most GNNs relies on one fundamental assumption, i.e., the original graph structure data is available. However, recent studies have shown that GNNs are vulnerable to the complex underlying structure of the graph, making it necessary to learn comprehensive and robust graph structures for downstream tasks, rather than relying only on the raw graph structure. In light of this, we seek to learn optimal graph structures for downstream tasks and propose a novel framework for semi-supervised classification. Specifically, based on the structural context information of graph and node representations, we encode the complex interactions in semantics and generate semantic graphs to preserve the global structure. Moreover, we develop a novel multi-measure attention layer to optimize the similarity rather than prescribing it a priori, so that the similarity can be adaptively evaluated by integrating measures. These graphs are fused and optimized together with GNN towards semi-supervised classification objective. Extensive experiments and ablation studies on six real-world datasets clearly demonstrate the effectiveness of our proposed model and the contribution of each component.
translated by 谷歌翻译
尽管图表学习(GRL)取得了重大进展,但要以足够的方式提取和嵌入丰富的拓扑结构和特征信息仍然是一个挑战。大多数现有方法都集中在本地结构上,并且无法完全融合全球拓扑结构。为此,我们提出了一种新颖的结构保留图表学习(SPGRL)方法,以完全捕获图的结构信息。具体而言,为了减少原始图的不确定性和错误信息,我们通过k-nearest邻居方法构建了特征图作为互补视图。该特征图可用于对比节点级别以捕获本地关系。此外,我们通过最大化整个图形和特征嵌入的相互信息(MI)来保留全局拓扑结构信息,从理论上讲,该信息可以简化为交换功能的特征嵌入和原始图以重建本身。广泛的实验表明,我们的方法在半监督节点分类任务上具有相当出色的性能,并且在图形结构或节点特征上噪声扰动下的鲁棒性出色。
translated by 谷歌翻译
消息传递已作为设计图形神经网络(GNN)的有效工具的发展。但是,消息传递的大多数现有方法简单地简单或平均所有相邻的功能更新节点表示。它们受到两个问题的限制,即(i)缺乏可解释性来识别对GNN的预测重要的节点特征,以及(ii)特征过度混合,导致捕获长期依赖和无能为力的过度平滑问题在异质或低同质的下方处理图。在本文中,我们提出了一个节点级胶囊图神经网络(NCGNN),以通过改进的消息传递方案来解决这些问题。具体而言,NCGNN表示节点为节点级胶囊组,其中每个胶囊都提取其相应节点的独特特征。对于每个节点级胶囊,开发了一个新颖的动态路由过程,以适应适当的胶囊,以从设计的图形滤波器确定的子图中聚集。 NCGNN聚集仅有利的胶囊并限制无关的消息,以避免交互节点的过度混合特征。因此,它可以缓解过度平滑的问题,并通过同粒或异质的图表学习有效的节点表示。此外,我们提出的消息传递方案本质上是可解释的,并免于复杂的事后解释,因为图形过滤器和动态路由过程确定了节点特征的子集,这对于从提取的子分类中的模型预测最为重要。关于合成和现实图形的广泛实验表明,NCGNN可以很好地解决过度光滑的问题,并为半监视的节点分类产生更好的节点表示。它的表现优于同质和异质的艺术状态。
translated by 谷歌翻译
在过去十年中,图形内核引起了很多关注,并在结构化数据上发展成为一种快速发展的学习分支。在过去的20年中,该领域发生的相当大的研究活动导致开发数十个图形内核,每个图形内核都对焦于图形的特定结构性质。图形内核已成功地成功地在广泛的域中,从社交网络到生物信息学。本调查的目标是提供图形内核的文献的统一视图。特别是,我们概述了各种图形内核。此外,我们对公共数据集的几个内核进行了实验评估,并提供了比较研究。最后,我们讨论图形内核的关键应用,并概述了一些仍有待解决的挑战。
translated by 谷歌翻译
近年来,基于Weisfeiler-Leman算法的算法和神经架构,是一个众所周知的Graph同构问题的启发式问题,它成为具有图形和关系数据的机器学习的强大工具。在这里,我们全面概述了机器学习设置中的算法的使用,专注于监督的制度。我们讨论了理论背景,展示了如何将其用于监督的图形和节点表示学习,讨论最近的扩展,并概述算法的连接(置换 - )方面的神经结构。此外,我们概述了当前的应用和未来方向,以刺激进一步的研究。
translated by 谷歌翻译
Low-dimensional embeddings of nodes in large graphs have proved extremely useful in a variety of prediction tasks, from content recommendation to identifying protein functions. However, most existing approaches require that all nodes in the graph are present during training of the embeddings; these previous approaches are inherently transductive and do not naturally generalize to unseen nodes. Here we present GraphSAGE, a general inductive framework that leverages node feature information (e.g., text attributes) to efficiently generate node embeddings for previously unseen data. Instead of training individual embeddings for each node, we learn a function that generates embeddings by sampling and aggregating features from a node's local neighborhood. Our algorithm outperforms strong baselines on three inductive node-classification benchmarks: we classify the category of unseen nodes in evolving information graphs based on citation and Reddit post data, and we show that our algorithm generalizes to completely unseen graphs using a multi-graph dataset of protein-protein interactions. * The two first authors made equal contributions. 1 While it is common to refer to these data structures as social or biological networks, we use the term graph to avoid ambiguity with neural network terminology.
translated by 谷歌翻译
图表表示学习是一种快速增长的领域,其中一个主要目标是在低维空间中产生有意义的图形表示。已经成功地应用了学习的嵌入式来执行各种预测任务,例如链路预测,节点分类,群集和可视化。图表社区的集体努力提供了数百种方法,但在所有评估指标下没有单一方法擅长,例如预测准确性,运行时间,可扩展性等。该调查旨在通过考虑算法来评估嵌入方法的所有主要类别的图表变体,参数选择,可伸缩性,硬件和软件平台,下游ML任务和多样化数据集。我们使用包含手动特征工程,矩阵分解,浅神经网络和深图卷积网络的分类法组织了图形嵌入技术。我们使用广泛使用的基准图表评估了节点分类,链路预测,群集和可视化任务的这些类别算法。我们在Pytorch几何和DGL库上设计了我们的实验,并在不同的多核CPU和GPU平台上运行实验。我们严格地审查了各种性能指标下嵌入方法的性能,并总结了结果。因此,本文可以作为比较指南,以帮助用户选择最适合其任务的方法。
translated by 谷歌翻译
图表表示学习(GRL)对于图形结构数据分析至关重要。然而,大多数现有的图形神经网络(GNNS)严重依赖于标签信息,这通常是在现实世界中获得的昂贵。现有无监督的GRL方法遭受某些限制,例如对单调对比和可扩展性有限的沉重依赖。为了克服上述问题,鉴于最近的图表对比学习的进步,我们通过曲线图介绍了一种新颖的自我监控图形表示学习算法,即通过利用所提出的调整变焦方案来学习节点表示来学习节点表示。具体地,该机制使G-Zoom能够从多个尺度的图表中探索和提取自我监督信号:MICRO(即,节点级别),MESO(即,邻域级)和宏(即,子图级) 。首先,我们通过两个不同的图形增强生成输入图的两个增强视图。然后,我们逐渐地从节点,邻近逐渐为上述三个尺度建立三种不同的对比度,在那里我们最大限度地提高了横跨尺度的图形表示之间的协议。虽然我们可以从微距和宏观视角上从给定图中提取有价值的线索,但是邻域级对比度基于我们的调整后的缩放方案提供了可自定义选项的能力,以便手动选择位于微观和介于微观之间的最佳视点宏观透视更好地理解图数据。此外,为了使我们的模型可扩展到大图,我们采用了并行图形扩散方法来从图形尺寸下解耦模型训练。我们对现实世界数据集进行了广泛的实验,结果表明,我们所提出的模型始终始终优于最先进的方法。
translated by 谷歌翻译
Inferring missing links or detecting spurious ones based on observed graphs, known as link prediction, is a long-standing challenge in graph data analysis. With the recent advances in deep learning, graph neural networks have been used for link prediction and have achieved state-of-the-art performance. Nevertheless, existing methods developed for this purpose are typically discriminative, computing features of local subgraphs around two neighboring nodes and predicting potential links between them from the perspective of subgraph classification. In this formalism, the selection of enclosing subgraphs and heuristic structural features for subgraph classification significantly affects the performance of the methods. To overcome this limitation, this paper proposes a novel and radically different link prediction algorithm based on the network reconstruction theory, called GraphLP. Instead of sampling positive and negative links and heuristically computing the features of their enclosing subgraphs, GraphLP utilizes the feature learning ability of deep-learning models to automatically extract the structural patterns of graphs for link prediction under the assumption that real-world graphs are not locally isolated. Moreover, GraphLP explores high-order connectivity patterns to utilize the hierarchical organizational structures of graphs for link prediction. Our experimental results on all common benchmark datasets from different applications demonstrate that the proposed method consistently outperforms other state-of-the-art methods. Unlike the discriminative neural network models used for link prediction, GraphLP is generative, which provides a new paradigm for neural-network-based link prediction.
translated by 谷歌翻译
消息传递神经网络(MPNNS)是由于其简单性和可扩展性而大部分地进行图形结构数据的深度学习的领先架构。不幸的是,有人认为这些架构的表现力有限。本文提出了一种名为Comifariant Subgraph聚合网络(ESAN)的新颖框架来解决这个问题。我们的主要观察是,虽然两个图可能无法通过MPNN可区分,但它们通常包含可区分的子图。因此,我们建议将每个图形作为由某些预定义策略导出的一组子图,并使用合适的等分性架构来处理它。我们为图同构同构同构造的1立维Weisfeiler-Leman(1-WL)测试的新型变体,并在这些新的WL变体方面证明了ESAN的表达性下限。我们进一步证明,我们的方法增加了MPNNS和更具表现力的架构的表现力。此外,我们提供了理论结果,描述了设计选择诸如子图选择政策和等效性神经结构的设计方式如何影响我们的架构的表现力。要处理增加的计算成本,我们提出了一种子图采样方案,可以将其视为我们框架的随机版本。关于真实和合成数据集的一套全面的实验表明,我们的框架提高了流行的GNN架构的表现力和整体性能。
translated by 谷歌翻译
Pre-publication draft of a book to be published byMorgan & Claypool publishers. Unedited version released with permission. All relevant copyrights held by the author and publisher extend to this pre-publication draft.
translated by 谷歌翻译
社交网络对齐旨在将人身份对齐,跨社交网络。已经显示基于嵌入的模型对于通常采用模型训练通常采用结构接近保持目标的对准有效。在观察中,“过度关闭”用户嵌入对造成对齐不准确的这种模型是不可避免的,我们提出了一种新颖的学习框架,该框架试图通过引入仔细植入的伪伪植入用户在用户中更广泛地分开。锚。我们进一步提出了一种元学习算法,用于指导在学习过程中更新伪锚嵌入。通过使用伪锚和元学习的建议干预允许学习框架适用于广泛的网络对准方法。我们已将建议的学习框架纳入了几种最先进的模型。我们的实验结果表明了其植入伪锚的方法可以通过相当大的余量而没有伪锚的,特别是当仅存在非常少数标记的锚点时,其有效性可能会优于没有伪锚定的对应物。
translated by 谷歌翻译
Graph AutoCododers(GAE)和变分图自动编码器(VGAE)作为链接预测的强大方法出现。他们的表现对社区探测问题的印象不那么令人印象深刻,根据最近和同意的实验评估,它们的表现通常超过了诸如louvain方法之类的简单替代方案。目前尚不清楚可以通过GAE和VGAE改善社区检测的程度,尤其是在没有节点功能的情况下。此外,不确定是否可以在链接预测上同时保留良好的性能。在本文中,我们表明,可以高精度地共同解决这两个任务。为此,我们介绍和理论上研究了一个社区保留的消息传递方案,通过在计算嵌入空间时考虑初始图形结构和基于模块化的先验社区来掺杂我们的GAE和VGAE编码器。我们还提出了新颖的培训和优化策略,包括引入一个模块化的正规器,以补充联合链路预测和社区检测的现有重建损失。我们通过对各种现实世界图的深入实验验证,证明了方法的经验有效性,称为模块化感知的GAE和VGAE。
translated by 谷歌翻译
时间图代表实体之间的动态关系,并发生在许多现实生活中的应用中,例如社交网络,电子商务,通信,道路网络,生物系统等。他们需要根据其生成建模和表示学习的研究超出与静态图有关的研究。在这项调查中,我们全面回顾了近期针对处理时间图提出的神经时间依赖图表的学习和生成建模方法。最后,我们确定了现有方法的弱点,并讨论了我们最近发表的论文提格的研究建议[24]。
translated by 谷歌翻译
最近关于图表卷积网络(GCN)的研究表明,初始节点表示(即,第一次图卷积前的节点表示)很大程度上影响最终的模型性能。但是,在学习节点的初始表示时,大多数现有工作线性地组合了节点特征的嵌入,而不考虑特征之间的交互(或特征嵌入)。我们认为,当节点特征是分类时,例如,在许多实际应用程序中,如用户分析和推荐系统,功能交互通常会对预测分析进行重要信号。忽略它们将导致次优初始节点表示,从而削弱后续图表卷积的有效性。在本文中,我们提出了一个名为CatGCN的新GCN模型,当节点功能是分类时,为图表学习量身定制。具体地,我们将显式交互建模的两种方式集成到初始节点表示的学习中,即在每对节点特征上的本地交互建模和人工特征图上的全局交互建模。然后,我们通过基于邻域聚合的图形卷积来优化增强的初始节点表示。我们以端到端的方式训练CatGCN,并在半监督节点分类上展示它。来自腾讯和阿里巴巴数据集的三个用户分析的三个任务(预测用户年龄,城市和购买级别)的大量实验验证了CatGCN的有效性,尤其是在图表卷积之前执行特征交互建模的积极效果。
translated by 谷歌翻译