我们展示了拓扑转型等值表示学习,是图形数据节点表示的自我监督学习的一般范式,以实现图形卷积神经网络(GCNNS)的广泛适用性。通过在转换之前和之后的拓扑转换和节点表示之间的相互信息,从信息理论的角度来看,我们将提出的模型正式化。我们得出最大化这种相互信息可以放宽以最小化应用拓扑变换与节点表示之间的估计之间的跨熵。特别是,我们寻求从原始图表中采样节点对的子集,并在每对之间翻转边缘连接以改变图形拓扑。然后,我们通过从原始和变换图的特征表示重构拓扑转换来自动列出表示编码器以学习节点表示。在实验中,我们将所提出的模型应用于下游节点分类,图形分类和链路预测任务,结果表明,所提出的方法优于现有的无监督方法。
translated by 谷歌翻译
尽管图表学习(GRL)取得了重大进展,但要以足够的方式提取和嵌入丰富的拓扑结构和特征信息仍然是一个挑战。大多数现有方法都集中在本地结构上,并且无法完全融合全球拓扑结构。为此,我们提出了一种新颖的结构保留图表学习(SPGRL)方法,以完全捕获图的结构信息。具体而言,为了减少原始图的不确定性和错误信息,我们通过k-nearest邻居方法构建了特征图作为互补视图。该特征图可用于对比节点级别以捕获本地关系。此外,我们通过最大化整个图形和特征嵌入的相互信息(MI)来保留全局拓扑结构信息,从理论上讲,该信息可以简化为交换功能的特征嵌入和原始图以重建本身。广泛的实验表明,我们的方法在半监督节点分类任务上具有相当出色的性能,并且在图形结构或节点特征上噪声扰动下的鲁棒性出色。
translated by 谷歌翻译
Deep learning has revolutionized many machine learning tasks in recent years, ranging from image classification and video processing to speech recognition and natural language understanding. The data in these tasks are typically represented in the Euclidean space. However, there is an increasing number of applications where data are generated from non-Euclidean domains and are represented as graphs with complex relationships and interdependency between objects. The complexity of graph data has imposed significant challenges on existing machine learning algorithms. Recently, many studies on extending deep learning approaches for graph data have emerged. In this survey, we provide a comprehensive overview of graph neural networks (GNNs) in data mining and machine learning fields. We propose a new taxonomy to divide the state-of-the-art graph neural networks into four categories, namely recurrent graph neural networks, convolutional graph neural networks, graph autoencoders, and spatial-temporal graph neural networks. We further discuss the applications of graph neural networks across various domains and summarize the open source codes, benchmark data sets, and model evaluation of graph neural networks. Finally, we propose potential research directions in this rapidly growing field.
translated by 谷歌翻译
This paper studies learning the representations of whole graphs in both unsupervised and semi-supervised scenarios. Graph-level representations are critical in a variety of real-world applications such as predicting the properties of molecules and community analysis in social networks. Traditional graph kernel based methods are simple, yet effective for obtaining fixed-length representations for graphs but they suffer from poor generalization due to hand-crafted designs. There are also some recent methods based on language models (e.g. graph2vec) but they tend to only consider certain substructures (e.g. subtrees) as graph representatives. Inspired by recent progress of unsupervised representation learning, in this paper we proposed a novel method called InfoGraph for learning graph-level representations. We maximize the mutual information between the graph-level representation and the representations of substructures of different scales (e.g., nodes, edges, triangles). By doing so, the graph-level representations encode aspects of the data that are shared across different scales of substructures. Furthermore, we further propose InfoGraph*, an extension of InfoGraph for semi-supervised scenarios. InfoGraph* maximizes the mutual information between unsupervised graph representations learned by InfoGraph and the representations learned by existing supervised methods. As a result, the supervised encoder learns from unlabeled data while preserving the latent semantic space favored by the current supervised task. Experimental results on the tasks of graph classification and molecular property prediction show that InfoGraph is superior to state-of-the-art baselines and InfoGraph* can achieve performance competitive with state-of-the-art semi-supervised models.
translated by 谷歌翻译
关于图表的深度学习最近吸引了重要的兴趣。然而,大多数作品都侧重于(半)监督学习,导致缺点包括重标签依赖,普遍性差和弱势稳健性。为了解决这些问题,通过良好设计的借口任务在不依赖于手动标签的情况下提取信息知识的自我监督学习(SSL)已成为图形数据的有希望和趋势的学习范例。与计算机视觉和自然语言处理等其他域的SSL不同,图表上的SSL具有独家背景,设计理念和分类。在图表的伞下自我监督学习,我们对采用图表数据采用SSL技术的现有方法及时及全面的审查。我们构建一个统一的框架,数学上正式地规范图表SSL的范例。根据借口任务的目标,我们将这些方法分为四类:基于生成的,基于辅助性的,基于对比的和混合方法。我们进一步描述了曲线图SSL在各种研究领域的应用,并总结了绘图SSL的常用数据集,评估基准,性能比较和开源代码。最后,我们讨论了该研究领域的剩余挑战和潜在的未来方向。
translated by 谷歌翻译
最近,最大化的互信息是一种强大的无监测图表表示学习的方法。现有方法通常有效地从拓扑视图中捕获信息但忽略特征视图。为了规避这个问题,我们通过利用功能和拓扑视图利用互信息最大化提出了一种新的方法。具体地,我们首先利用多视图表示学习模块来更好地捕获跨图形上的特征和拓扑视图的本地和全局信息内容。为了模拟由特征和拓扑空间共享的信息,我们使用相互信息最大化和重建损耗最小化开发公共表示学习模块。要明确鼓励图形表示之间的多样性在相同的视图中,我们还引入了一个分歧正则化,以扩大同一视图之间的表示之间的距离。合成和实际数据集的实验证明了集成功能和拓扑视图的有效性。特别是,与先前的监督方法相比,我们所提出的方法可以在无监督的代表和线性评估协议下实现可比或甚至更好的性能。
translated by 谷歌翻译
网络嵌入作为网络分析的有希望的研究领域出现。最近,通过将冗余还原原理应用于对应于图像样本的两个扭曲版本的嵌入向量,提出了一种名为Barlow双胞胎的方法。通过此激励,我们提出了Barlow Graph自动编码器,这是一个简单而有效的学习网络嵌入的架构。它旨在最大限度地提高节点的立即和较大邻域的嵌入向量之间的相似性,同时最小化这些投影的组件之间的冗余。此外,我们还介绍了名为Barlow变形图自动编码器的变型对应物。我们的方法产生了对归纳链路预测的有希望的结果,并且还涉及用于聚类和下游节点分类的领域,如广泛的三个基准引用数据集上的多种已知技术的广泛比较所证明的。
translated by 谷歌翻译
Pre-publication draft of a book to be published byMorgan & Claypool publishers. Unedited version released with permission. All relevant copyrights held by the author and publisher extend to this pre-publication draft.
translated by 谷歌翻译
We introduce a self-supervised approach for learning node and graph level representations by contrasting structural views of graphs. We show that unlike visual representation learning, increasing the number of views to more than two or contrasting multi-scale encodings do not improve performance, and the best performance is achieved by contrasting encodings from first-order neighbors and a graph diffusion. We achieve new state-ofthe-art results in self-supervised learning on 8 out of 8 node and graph classification benchmarks under the linear evaluation protocol. For example, on Cora (node) and Reddit-Binary (graph) classification benchmarks, we achieve 86.8% and 84.5% accuracy, which are 5.5% and 2.4% relative improvements over previous state-of-the-art. When compared to supervised baselines, our approach outperforms them in 4 out of 8 benchmarks.
translated by 谷歌翻译
Inferring missing links or detecting spurious ones based on observed graphs, known as link prediction, is a long-standing challenge in graph data analysis. With the recent advances in deep learning, graph neural networks have been used for link prediction and have achieved state-of-the-art performance. Nevertheless, existing methods developed for this purpose are typically discriminative, computing features of local subgraphs around two neighboring nodes and predicting potential links between them from the perspective of subgraph classification. In this formalism, the selection of enclosing subgraphs and heuristic structural features for subgraph classification significantly affects the performance of the methods. To overcome this limitation, this paper proposes a novel and radically different link prediction algorithm based on the network reconstruction theory, called GraphLP. Instead of sampling positive and negative links and heuristically computing the features of their enclosing subgraphs, GraphLP utilizes the feature learning ability of deep-learning models to automatically extract the structural patterns of graphs for link prediction under the assumption that real-world graphs are not locally isolated. Moreover, GraphLP explores high-order connectivity patterns to utilize the hierarchical organizational structures of graphs for link prediction. Our experimental results on all common benchmark datasets from different applications demonstrate that the proposed method consistently outperforms other state-of-the-art methods. Unlike the discriminative neural network models used for link prediction, GraphLP is generative, which provides a new paradigm for neural-network-based link prediction.
translated by 谷歌翻译
We present Deep Graph Infomax (DGI), a general approach for learning node representations within graph-structured data in an unsupervised manner. DGI relies on maximizing mutual information between patch representations and corresponding high-level summaries of graphs-both derived using established graph convolutional network architectures. The learnt patch representations summarize subgraphs centered around nodes of interest, and can thus be reused for downstream node-wise learning tasks. In contrast to most prior approaches to unsupervised learning with GCNs, DGI does not rely on random walk objectives, and is readily applicable to both transductive and inductive learning setups. We demonstrate competitive performance on a variety of node classification benchmarks, which at times even exceeds the performance of supervised learning.
translated by 谷歌翻译
对比学习在图表学习领域表现出了巨大的希望。通过手动构建正/负样本,大多数图对比度学习方法依赖于基于矢量内部产品的相似性度量标准来区分图形表示样品。但是,手工制作的样品构建(例如,图表的节点或边缘的扰动)可能无法有效捕获图形的固有局部结构。同样,基于矢量内部产品的相似性度量标准无法完全利用图形的局部结构来表征图差。为此,在本文中,我们提出了一种基于自适应子图生成的新型对比度学习框架,以实现有效且强大的自我监督图表示学习,并且最佳传输距离被用作子绘图之间的相似性度量。它的目的是通过捕获图的固有结构来生成对比样品,并根据子图的特征和结构同时区分样品。具体而言,对于每个中心节点,通过自适应学习关系权重与相应邻域的节点,我们首先开发一个网络来生成插值子图。然后,我们分别构建来自相同和不同节点的子图的正和负对。最后,我们采用两种类型的最佳运输距离(即Wasserstein距离和Gromov-Wasserstein距离)来构建结构化的对比损失。基准数据集上的广泛节点分类实验验证了我们的图形对比学习方法的有效性。
translated by 谷歌翻译
Recently, contrastive learning (CL) has emerged as a successful method for unsupervised graph representation learning. Most graph CL methods first perform stochastic augmentation on the input graph to obtain two graph views and maximize the agreement of representations in the two views. Despite the prosperous development of graph CL methods, the design of graph augmentation schemes-a crucial component in CL-remains rarely explored. We argue that the data augmentation schemes should preserve intrinsic structures and attributes of graphs, which will force the model to learn representations that are insensitive to perturbation on unimportant nodes and edges. However, most existing methods adopt uniform data augmentation schemes, like uniformly dropping edges and uniformly shuffling features, leading to suboptimal performance. In this paper, we propose a novel graph contrastive representation learning method with adaptive augmentation that incorporates various priors for topological and semantic aspects of the graph. Specifically, on the topology level, we design augmentation schemes based on node centrality measures to highlight important connective structures. On the node attribute level, we corrupt node features by adding more noise to unimportant node features, to enforce the model to recognize underlying semantic information. We perform extensive experiments of node classification on a variety of real-world datasets. Experimental results demonstrate that our proposed method consistently outperforms existing state-of-the-art baselines and even surpasses some supervised counterparts, which validates the effectiveness of the proposed contrastive framework with adaptive augmentation. CCS CONCEPTS• Computing methodologies → Unsupervised learning; Neural networks; Learning latent representations.
translated by 谷歌翻译
对比度学习是图表学习中的有效无监督方法,对比度学习的关键组成部分在于构建正和负样本。以前的方法通常利用图中节点的接近度作为原理。最近,基于数据增强的对比度学习方法已进步以显示视觉域中的强大力量,一些作品将此方法从图像扩展到图形。但是,与图像上的数据扩展不同,图上的数据扩展远不那么直观,而且很难提供高质量的对比样品,这为改进留出了很大的空间。在这项工作中,通过引入一个对抗性图视图以进行数据增强,我们提出了一种简单但有效的方法,对抗图对比度学习(ARIEL),以在合理的约束中提取信息性的对比样本。我们开发了一种称为稳定训练的信息正则化的新技术,并使用子图抽样以进行可伸缩。我们通过将每个图形实例视为超级节点,从节点级对比度学习到图级。 Ariel始终优于在现实世界数据集上的节点级别和图形级分类任务的当前图对比度学习方法。我们进一步证明,面对对抗性攻击,Ariel更加强大。
translated by 谷歌翻译
最近,在对图形结构数据上应用深度神经网络有很大的成功。然而,大多数工作侧重于节点或图形级监督学习,例如节点,链接或图形分类或节点级无监督学习(例如节点群集)。尽管其应用广泛,但图表级无监督的学习尚未受到很多关注。这可能主要归因于图形的高表示复杂性,可以由n表示!等效邻接矩阵,其中n是节点的数量。在这项工作中,我们通过提出用于图形结构数据的置换不变变化自动码器来解决此问题。我们所提出的模型间接学习以匹配输入和输出图的节点排序,而不施加特定节点排序或执行昂贵的图形匹配。我们展示了我们提出模型对各种图形重建和生成任务的有效性,并评估了下游图形水平分类和回归提取的表示的表现力。
translated by 谷歌翻译
动态图形表示学习是具有广泛应用程序的重要任务。以前关于动态图形学习的方法通常对嘈杂的图形信息(如缺失或虚假连接)敏感,可以产生退化的性能和泛化。为了克服这一挑战,我们提出了一种基于变换器的动态图表学习方法,命名为动态图形变换器(DGT),带有空间 - 时间编码,以有效地学习图形拓扑并捕获隐式链接。为了提高泛化能力,我们介绍了两个补充自我监督的预训练任务,并表明共同优化了两种预训练任务,通过信息理论分析导致较小的贝叶斯错误率。我们还提出了一个时间联盟图形结构和目标 - 上下文节点采样策略,用于高效和可扩展的培训。与现实世界数据集的广泛实验说明了与几个最先进的基线相比,DGT呈现出优异的性能。
translated by 谷歌翻译
异质图卷积网络在解决异质网络数据的各种网络分析任务方面已广受欢迎,从链接预测到节点分类。但是,大多数现有作品都忽略了多型节点之间的多重网络的关系异质性,而在元路径中,元素嵌入中关系的重要性不同,这几乎无法捕获不同关系跨不同关系的异质结构信号。为了应对这一挑战,这项工作提出了用于异质网络嵌入的多重异质图卷积网络(MHGCN)。我们的MHGCN可以通过多层卷积聚合自动学习多重异质网络中不同长度的有用的异质元路径相互作用。此外,我们有效地将多相关结构信号和属性语义集成到学习的节点嵌入中,并具有无监督和精选的学习范式。在具有各种网络分析任务的五个现实世界数据集上进行的广泛实验表明,根据所有评估指标,MHGCN与最先进的嵌入基线的优势。
translated by 谷歌翻译
在异质图上的自我监督学习(尤其是对比度学习)方法可以有效地摆脱对监督数据的依赖。同时,大多数现有的表示学习方法将异质图嵌入到欧几里得或双曲线的单个几何空间中。这种单个几何视图通常不足以观察由于其丰富的语义和复杂结构而观察到异质图的完整图片。在这些观察结果下,本文提出了一种新型的自我监督学习方法,称为几何对比度学习(GCL),以更好地表示监督数据是不可用时的异质图。 GCL同时观察了从欧几里得和双曲线观点的异质图,旨在强烈合并建模丰富的语义和复杂结构的能力,这有望为下游任务带来更多好处。 GCL通过在局部局部和局部全球语义水平上对比表示两种几何视图之间的相互信息。在四个基准数据集上进行的广泛实验表明,在三个任务上,所提出的方法在包括节点分类,节点群集和相似性搜索在内的三个任务上都超过了强基础,包括无监督的方法和监督方法。
translated by 谷歌翻译
对比度学习是图表学习中有效的无监督方法。最近,基于数据增强的对比度学习方法已从图像扩展到图形。但是,大多数先前的作品都直接根据为图像设计的模型进行了调整。与图像上的数据增强不同,图表上的数据扩展远不那么直观,而且很难提供高质量的对比样本,这是对比度学习模型的性能的关键。这为改进现有图形对比学习框架留出了很多空间。在这项工作中,通过引入对抗图视图和信息正常化程序,我们提出了一种简单但有效的方法,即对逆向对比度学习(ARIEL),以在合理的约束中提取信息性的对比样本。它始终优于各种现实世界数据集的节点分类任务中当前的图形对比度学习方法,并进一步提高了图对比度学习的鲁棒性。
translated by 谷歌翻译
图形神经网络已用于各种学习任务,例如链接预测,节点分类和节点群集。其中,链接预测是一项相对研究的图形学习任务,其当前最新模型基于浅层图自动编码器(GAE)体系结构的一层或两层。在本文中,我们专注于解决链接预测的当前方法的局限性,该预测只能使用浅的GAE和变分GAE,并创建有效的方法来加深(变异)GAE架构以实现稳定和竞争性的性能。我们提出的方法是创新的方法将标准自动编码器(AES)纳入GAE的体系结构,在该体系结构中,标准AE被利用以通过无缝整合邻接信息和节点来学习必要的,低维的表示,而GAE则进一步构建了多尺度的低规模的低尺度低尺度的低尺度。通过残差连接的维度表示,以学习紧凑的链接预测的整体嵌入。从经验上讲,在各种基准测试数据集上进行的广泛实验验证了我们方法的有效性,并证明了我们加深的图形模型以进行链接预测的竞争性能。从理论上讲,我们证明我们的深度扩展包括具有不同阶的多项式过滤器。
translated by 谷歌翻译