图形结构数据的半监督分类越来越受到关注,其中标签仅适用于社交网络和引文网络等一小部分数据。由于图的不规则性,这个问题很有挑战性。最近已经提出图形卷积神经网络(GCN)来解决这类问题,这些问题将图形拓扑结构馈送到网络中以指导诸如图形卷积之类的操作。然而,在没有给出图形的大多数情况下,它们是手动构造的,这往往是次优的。因此,我们提出了图学习神经网络(GLNN),它利用图的优化(特别是邻接矩阵)并集成到GCN forsemi监督的节点分类中。利用光谱图理论,它将图形学习和图形卷积结合到一个统一的框架中。具体来说,我们代表社交/引文网络asgraph信号的特征,并通过最大后验估计从图信号先验,稀疏约束和有效邻接矩阵的属性提出图学习的目标。然后将优化目标集成到GCN的损失函数中,从而导致共同性矩阵和高级特征的联合学习。实验结果表明,我们提出的GLNN优于广泛采用的社会网络数据集和引用网络数据集的最新方法。
translated by 谷歌翻译
鉴于大规模HIN的难以处理,网络嵌入学习了新空间中节点的低维接近保留表示,这是分析HIN的自然方式。然而,HINembedding中出现了两个挑战。 (1)不同的HIN结构具有不同的语义意义,在捕获HIN中节点之间的关系方面具有不同的作用,我们如何在HIN中的每个单独节点的不同元路径上学习个性化的偏好? (2)随着各种网络服务中大规模HIN数量的急剧增加,如何以有效的方式更新新节点的嵌入信息?为了应对这些挑战,我们提出了一种分层信息异构信息网络嵌入(HAHE)模型,该模型能够为每个节点学习个性化的元路径偏好,并且仅利用其neighbornode信息有效地更新每个新节点的嵌入信息。所提出的HAHE模型基于不同的元路径提取语义空间中节点的语义关系,并采用邻居关注层对每个节点进行邻域结构特征的加权聚合,使得每个新节点的嵌入信息得以有效更新。此外,还使用元路径关注层来学习每个单独节点的个性化元路径偏好。对几个真实世界数据集的广泛实验表明,我们提出的HAHE模型在各种评估指标方面明显优于最先进的方法。
translated by 谷歌翻译
We present a scalable approach for semi-supervised learning on graph-structured data that is based on an efficient variant of convolutional neural networks which operate directly on graphs. We motivate the choice of our convolutional architecture via a localized first-order approximation of spectral graph convolutions. Our model scales linearly in the number of graph edges and learns hidden layer representations that encode both local graph structure and features of nodes. In a number of experiments on citation networks and on a knowledge graph dataset we demonstrate that our approach outperforms related methods by a significant margin.
translated by 谷歌翻译
最近,图形嵌入作为图形分析的有效方法出现,例如节点分类和链接预测。 networkembedding的目标是找到保留图结构的图节点的低维表示。由于节点上可能存在信号特征,因此最近的方法如图形卷积网络(GCN)尝试除了节点关系之外还要包含节点信号。另一方面,多层图分析受到了很多关注。然而,在这些网络中尚未探索用于节点嵌入的其他方法。在本文中,我们研究了多层图中节点嵌入的问题,并提出了一种深层方法,该方法使用关系(图中各层之间的连接)和节点信号嵌入节点。我们评估节点分类任务的方法。实验结果证明了该方法对其他多层和单层竞争对手的优越性,并证明了使用跨层边缘的效果。
translated by 谷歌翻译
复杂网络被用作物理学,生物学,社会学和其他领域的系统建模的抽象。我们提出了一种基于快速个性化节点排名和深度学习的最新进展的算法,用于学习监督网络嵌入以及直接对网络节点进行分类。从同构网络和异构网络学习,我们的算法在九个节点分类基准测试中表现出强大的基线。分子生物学,金融学,社交媒体和语言处理的领域 - 迄今为止最大的节点分类集合之一。在速度和预测准确性方面,结果与当前最先进的技术相当或更好。通过所提出的算法获得的嵌入也是网络可视化的可行选择。
translated by 谷歌翻译
图卷积网络(GCN)是一种新兴的神经网络方法。它通过在聚合过程中聚合所有邻居的特征向量而不考虑邻居或特征是否有用来获取节点的新表示。最近的方法通过采样固定大小的邻居集合或者在聚合过程中为不同的邻居分配不同的权重来改进解决方案,但是在聚合过程中仍然对特征向量内的特征进行相同的处理。在本文中,我们引入了一个新的卷积运算,通过采样得到固定节点带宽的特征构造的常规尺寸特征图,得到第一级节点表示,然后传递给标准GCN以学习第二级节点表示。实验表明,我们的方法在半监督节点分类任务中优于竞争方法。此外,我们的方法为探索新的GCN架构打开了新的大门,特别是更深入的GCN模型。
translated by 谷歌翻译
Embedding network data into a low-dimensional vector space has shown promising performance for many real-world applications, such as node classification and entity retrieval. However, most existing methods focused only on leveraging network structure. For social networks, besides the network structure, there also exists rich information about social actors, such as user profiles of friendship networks and textual content of citation networks. These rich attribute information of social actors reveal the homophily effect, exerting huge impacts on the formation of social networks. In this paper, we explore the rich evidence source of attributes in social networks to improve network embedding. We propose a generic Social Network Embedding framework (SNE), which learns representations for social actors (i.e., nodes) by preserving both the structural proximity and attribute proximity. While the structural proximity captures the global network structure, the attribute proximity accounts for the homophily effect. To justify our proposal, we conduct extensive experiments on four real-world social networks. Compared to the state-of-the-art network embedding approaches, SNE can learn more informative representations, achieving substantial gains on the tasks of link prediction and node classification. Specifically, SNE significantly outperforms node2vec with an 8.2% relative improvement on the link prediction task, and a 12.7% gain on the node classification task.
translated by 谷歌翻译
社交网络分析是数据挖掘中的一个重要问题。用于分析社交网络的基本步骤是对网络数据进行编码,即内部维度表示,即网络嵌入,以便可以有效地保存网络拓扑结构和其他属性信息。网络表示倾向有助于进一步的应用,例如分类,链接预测,异常检测和聚类。此外,基于深度神经网络的技术在过去几年中引起了极大的兴趣。在本次调查中,我们利用神经网络模型对网络表示学习中的当前文献进行了全面的综述。首先,我们介绍了在同构网络中学习节点表示的基本模型。同时,我们还将介绍一些基础模型的扩展,以解决更复杂的场景,如分析归因网络,异构网络和动态网络。然后,我们介绍了嵌入子图的技术。之后,我们将介绍网络表示学习的应用。最后,讨论了未来工作的一些有希望的研究方向。
translated by 谷歌翻译
We propose to study the problem of few-shot learning with the prism of inference on a partially observed graphical model, constructed from a collection of input images whose label can be either observed or not. By assimilating generic message-passing inference algorithms with their neural-network counterparts, we define a graph neural network architecture that generalizes several of the recently proposed few-shot learning models. Besides providing improved numerical performance , our framework is easily extended to variants of few-shot learning, such as semi-supervised or active learning, demonstrating the ability of graph-based models to operate well on 'relational' tasks.
translated by 谷歌翻译
卷积神经网络(CNN)已经在诸如图像之类的网格图像上取得了巨大的成功,但是在学习图形等更通用的数据时面临着巨大的挑战。在CNN中,可训练的本地过滤器可以自动提取高级功能。使用过滤器的计算在感知域中需要固定数量的有序单元。但是,相邻单位的数量既不固定,也不在通用图中排序,从而阻碍了卷积运算的应用。在这里,我们通过提出可学习的图卷积层(LGCL)来解决这些挑战。 LGCL基于值排名自动为每个特征选择固定数量的相邻节点,以便以1-D格式转换图形数据类似于网格的结构,从而允许在通用图上使用常规卷积运算。为了实现大规模图形的模型训练,我们提出了一种子图训练方法,以减少先前的图形卷积方法所遭受的过多内存和计算资源需求。我们在转导和归纳学习环境中对节点分类任务的实验结果表明,我们的方法可以在Cora,Citeseer,Pubmed引文网络和蛋白质 - 蛋白质相互作用网络数据集上实现始终如一的更好性能。我们的结果还表明,与现有方法相比,使用子图训练策略的方法更有效。
translated by 谷歌翻译
给定一个图表,其中每个节点具有与其相关联的某些属性,并且一些节点具有与其相关联的标签,集体分类(CC)是使用来自thenode及其邻居的信息向每个未标记节点分配标签的任务。通常情况下,节点不仅受其直接邻居的影响,而且还受到高阶邻居,多跳的影响。最近最先进的CC模型学习了Weisfeiler-Lehman(WL)内核的端到端差异变化,以聚合多跳邻域信息。在这项工作中,我们提出了一个高阶传播框架HOPF,它为这些强大的可微分内核提供迭代推理机制。这种经典推理机制与最近的可微内核的组合允许框架学习图形卷积滤波器,其同时利用邻域中可用的属性和标签信息。此外,这些可分化的可区分内核可以扩展到超出现有可微内核的内存限制的更大跳数。我们还表明,现有的基于WLkernel的模型存在节点信息变形的问题,其中当考虑多跳时,节点的信息被其邻居的信息变形或淹没。为了解决这个问题,我们提出了HOPF的非特定实例,称为NIP模型,它在每个传播步骤中保留节点信息。 NIP模型的迭代公式进一步有助于将远距离跳跃信息简洁地结合为推断标签的摘要。我们对来自不同域的11个数据集进行了广泛的评估。我们表明,现有的CC模型不能跨数据集提供一致的性能,而提议的具有迭代推理的NIP模型更加健壮。
translated by 谷歌翻译
Looking from a global perspective, the landscape of online social networks is highly fragmented. A large number of online social networks have appeared, which can provide users with various types of services. Generally, information available in these online social networks is of diverse categories, which can be represented as heterogeneous social networks (HSNs) formally. Meanwhile, in such an age of online social media, users usually participate in multiple online social networks simultaneously, who can act as the anchors aligning different social networks together. So multiple HSNs not only represent information in each social network, but also fuse information from multiple networks. Formally, the online social networks sharing common users are named as the aligned social networks, and these shared users are called the anchor users. The heterogeneous information generated by users' social activities in the multiple aligned social networks provides social network practitioners and researchers with the opportunities to study individual user's social behaviors across multiple social platforms simultaneously. This paper presents a comprehensive survey about the latest research works on multiple aligned HSNs studies based on the broad learning setting, which covers 5 major research tasks, including network alignment, link prediction , community detection, information diffusion and network embedding respectively.
translated by 谷歌翻译
大图中节点的低维嵌入已被证明在从内容推荐到识别蛋白质功能的各种预测任务中极其有用。然而,大多数现有方法要求在嵌入训练期间存在图中的所有节点;这些以前的方法本质上是转换性的,并不自然地普遍认为看不见的节点。在这里,我们提出GraphSAGE,一种通用的归纳框架,它利用节点特征信息(例如,文本属性)来有效地为先前看不见的数据生成节点嵌入。我们学习了一种函数,通过对节点的localneighborhood中的特征进行采样和聚合来生成嵌入,而不是为每个节点进行单独的嵌入。我们的算法在三个归纳节点分类基准上优于强基线:我们根据引用和Reddit后期数据对信息图中看不见的节点类别进行分类,并且我们展示了我们的算法使用蛋白质 - 蛋白质相互作用的多图形数据集推广到完全看不见的图形。 。
translated by 谷歌翻译
Opinion spam has become a widespread problem in the online review world, where paid or biased reviewers write fake reviews to elevate or relegate a product (or business) to mislead the consumers for profit or fame. In recent years, opinion spam detection has attracted a lot of attention from both the business and research communities. However, the problem still remains challenging as human labeling is expensive and hence labeled data is scarce, which is needed for supervised learning and evaluation. There exist recent works (e.g., FraudEagle [2], SpEagle [19]) which address the spam detection problem as an unsupervised network inference task on the review network. These methods are also able to incorporate labels (if available), and have been shown to achieve improved performance under the semi-supervised inference setting, in which the labels of a random sample of nodes are consumed. In this work, we address the problem of active inference for opinion spam detection. Active inference is the process of carefully selecting a subset of instances (nodes) whose labels are obtained from an oracle to be used during the (network) inference. Our goal is to employ a label acquisition strategy that selects a given number of nodes (a.k.a. the budget) wisely, as opposed to randomly, so as to improve detection performance significantly over the random selection. Our key insight is to select nodes that (i) exhibit high uncertainty, (ii) reside in a dense region, and (iii) are close-by to other uncertain nodes in the network. Based on this insight, we design a utility measure, called Expected UnCertainty Reach (EUCR), and pick the node with the highest EUCR score at every step iteratively. Experiments on two large real-world datasets from Yelp.com show that our method significantly outperforms random sampling as well as other state-of-the-art active inference approaches.
translated by 谷歌翻译
Network embedding assigns nodes in a network to low-dimensional representations and effectively preserves the network structure. Recently, a significant amount of progresses have been made toward this emerging network analysis paradigm. In this survey, we focus on categorizing and then reviewing the current development on network embedding methods, and point out its future research directions. We first summarize the motivation of network embedding. We discuss the classical graph embedding algorithms and their relationship with network embedding. Afterwards and primarily, we provide a comprehensive overview of a large number of network embedding methods in a systematic manner , covering the structure-and property-preserving network embedding methods, the network embedding methods with side information and the advanced information preserving network embedding methods. Moreover, several evaluation approaches for network embedding and some useful online resources, including the network data sets and softwares, are reviewed, too. Finally, we discuss the framework of exploiting these network embedding methods to build an effective system and point out some potential future directions.
translated by 谷歌翻译
The proliferation of networked data in various disciplines motivates a surge of research interests on network or graph mining. Among them, node classification is a typical learning task that focuses on exploiting the node interactions to infer the missing labels of unlabeled nodes in the network. A vast majority of existing node classification algorithms overwhelmingly focus on static networks and they assume the whole network structure is readily available before performing learning algorithms. However, it is not the case in many real-world scenarios where new nodes and new links are continuously being added in the network. Considering the streaming nature of networks, we study how to perform online node classification on this kind of streaming networks (a.k.a. online learning on streaming networks). As the existence of noisy links may negatively affect the node classification performance, we first present an online network embedding algorithm to alleviate this problem by obtaining the embedding representation of new nodes on the fly. Then we feed the learned embedding representation into a novel online soft margin kernel learning algorithm to predict the node labels in a sequential manner. Theoretical analysis is presented to show the superiority of the proposed framework of online learning on streaming networks (OLSN).
translated by 谷歌翻译
Logistic回归是迄今为止在实际应用中使用最广泛的分类器。在本文中,我们对最先进的逻辑回归主动学习方法进行了基准测试,并讨论并说明了它们的基本特征。实验在三个合成数据集和44个真实世界数据集上进行,提供了这些主动学习方法相对于学习曲线区域(绘图分类准确性作为查询示例数量的函数)和计算成本的行为的洞察。令人惊讶的是,最早和最简单的主动学习方法之一,即不确定性采样,在整体上表现得非常好。另一个值得注意的发现是,在许多情况下,随机抽样是个人主动学习技术所不能淹没的基础抽样基础。
translated by 谷歌翻译
最近,图形卷积神经网络(图形CNN)已被广泛用于图形数据表示和半监督学习任务。然而,现有图形CNN通常使用固定图形,其可能不是最佳的受监督学习任务。在本文中,我们提出了一种新的GraphLearning-Convolutional Network(GLCN),用于图形数据表示和半监督学习。 GLCN的目的是通过在统一的网络架构中将图形学习和图形卷积结合在一起,学习最佳图形CNN用于半监督学习的最佳图形结构。主要优点是在GLCN中,两个给定的标签和估计的标签都被合并,因此可以提供有用的“弱监督信息以改进(或学习)图形结构,并且还有助于GLCN中的图形卷积操作用于未知的标签。七个基准测试的实验结果表明,GLCN显着优于基于CNN的最先进的传统固定结构图。
translated by 谷歌翻译
图形相似性搜索是最重要的基于图形的应用程序之一,例如。查找与查询化合物最相似的化合物。图形相似度/距离计算,如图形编辑距离(GED)和最大公共子图(MCS),是图形相似性搜索许多其他应用程序的核心操作,但对于在实践中计算。受到最近神经网络方法成功应用于节点或图形分类等几个应用程序的启发,我们提出了一种新颖的基于神经网络的方法来解决这个经典但具有挑战性的图形问题,旨在减轻计算负担,同时保持良好的性能。拟议的方法称为SimGNN,它结合了两种策略。首先,我们设计了一个可学习的嵌入函数,它将每个图形映射到嵌入向量,该嵌入向量提供图形的全局摘要。提出了一种新的注意机制来强调关于特定相似度量的重要节点。其次,我们设计了一个成对节点比较方法,用图形级嵌入来提供细粒度的节点级信息。我们的模型可以以端到端的方式进行训练,在看不见的图上实现更好的一般化,在最坏的情况下运行在二次时间内相对于两个图中的节点数。以GED计算为例,三个真实图数据集的实验结果证明了我们的方法的有效性和有效性。具体来说,与一系列基线相比,我们的模型实现了更小的错误率和更长的时间减少,包括几种GED计算的近似算法,以及许多现有的基于图神经网络的模型。我们的研究表明,SimGNN为未来的图相似度计算和图相似性搜索研究提供了新的方向。
translated by 谷歌翻译
Graph Convolutional Networks (GCNs) have shown significant improvements in semi-supervised learning on graph-structured data. Concurrently, unsupervised learning of graph embeddings has benefited from the information contained in random walks. In this paper, we propose a model: Network of GCNs (N-GCN), which marries these two lines of work. At its core, N-GCN trains multiple instances of GCNs over node pairs discovered at different distances in random walks, and learns a combination of the instance outputs which optimizes the classification objective. Our experiments show that our proposed N-GCN model improves state-of-the-art baselines on all of the challenging node classification tasks we consider: Cora, Citeseer, Pubmed, and PPI. In addition, our proposed method has other desirable properties, including generalization to recently proposed semi-supervised learning methods such as GraphSAGE, allowing us to propose N-SAGE, and resilience to adversarial input perturbations.
translated by 谷歌翻译