在过去的二十年中,我们看到万维网的链接结构被建模为有向图。在本文中,我们将将万维网的链接结构建模为定向超图。此外,我们将为该定向超图开发Pagerank算法。由于缺乏万维网定向HyperGraph数据集,我们将将Pagerank算法应用于代谢网络,该网络是定向的HyperGraph本身。实验表明,我们的新型Pagerank算法成功地应用于该代谢网络。
translated by 谷歌翻译
为了处理不规则的数据结构,许多数据科学家已经开发了图形卷积神经网络。但是,数据科学家只是主要集中于开发未指导图的深神网络方法。在本文中,我们将介绍用于定向超图的新型神经网络方法。换句话说,我们不仅将开发新型的定向超图神经网络方法,而且还将开发基于新颖的指导性超图的半监督学习方法。这些方法用于解决节点分类任务。实验中使用的两个数据集是Cora和Citeseer数据集。在经典的基于图形的半监督学习方法中,新颖的基于HyperGraph的半监督学习方法,用于解决此节点分类任务的新颖的定向超图神经网络方法,我们认识到新颖的定向HyperGraph神经网络成就最高精度。
translated by 谷歌翻译
本文介绍了HyperGraph神经网络方法的新颖版本。该方法用于解决嘈杂的标签学习问题。首先,我们将PCA尺寸还原技术应用于图像数据集的特征矩阵,以减少图像数据集的特征矩阵中的“噪声”和冗余功能方法。然后,基于经典的半监督学习方法,经典的基于超毛图的半手法学习方法,图形神经网络,HyperGraph神经网络和我们提出的HyperGraph神经网络用于解决嘈杂的标签学习问题。评估和比较这五种方法的精度。实验结果表明,当噪声水平提高时,超图神经网络方法达到了最佳性能。此外,高图神经网络方法至少与图神经网络一样好。
translated by 谷歌翻译
我们介绍了一种新颖的谐波分析,用于在函数上定义的函数,随机步行操作员是基石。作为第一步,我们将随机步行操作员的一组特征向量作为非正交傅里叶类型的功能,用于通过定向图。我们通过将从其Dirichlet能量获得的随机步行操作员的特征向量的变化与其相关的特征值的真实部分连接来发现频率解释。从这个傅立叶基础,我们可以进一步继续,并在有向图中建立多尺度分析。通过将Coifman和MagGioni扩展到定向图,我们提出了一种冗余小波变换和抽取的小波变换。因此,我们对导向图的谐波分析的发展导致我们考虑应用于突出了我们框架效率的指示图的图形上的半监督学习问题和信号建模问题。
translated by 谷歌翻译
我们研究了p-laplacians和光谱聚类,以融合了边缘依赖性顶点权重(EDVW)的最近提出的超图模型。这些权重可以反映在超边缘内顶点的不同重要性,从而赋予超图模型更高的表达性和灵活性。通过构建基于EDVWS的基于EDVWS的分裂函数,我们将具有EDVW的超图转换为频谱理论更好地开发的谱图。这样,现有的概念和定理,例如P-Laplacians和Subsodular HyperGraph设置下提出的P-Laplacians和Cheeger不平等现象,可以直接扩展到具有EDVW的超图。对于具有基于EDVWS的拆分功能的子管道超图,我们提出了一种有效的算法来计算与1-Laplacian的第二小特征值相关的特征向量。然后,我们利用此特征向量来聚类顶点,比基于2-Laplacian的传统光谱聚类获得更高的聚类精度。从更广泛的角度来看,所提出的算法适用于所有可降低图的亚物种超图。使用现实世界数据的数值实验证明了基于1-Laplacian和EDVW的光谱聚类的有效性。
translated by 谷歌翻译
作为建模复杂关系的强大工具,HyperGraphs从图表学习社区中获得了流行。但是,深度刻画学习中的常用框架专注于具有边缘独立的顶点权重(EIVW)的超图,而无需考虑具有具有更多建模功率的边缘依赖性顶点权重(EDVWS)的超图。为了弥补这一点,我们提出了一般的超图光谱卷积(GHSC),这是一个通用学习框架,不仅可以处理EDVW和EIVW HyperGraphs,而且更重要的是,理论上可以明确地利用现有强大的图形卷积神经网络(GCNN)明确说明,从而很大程度上可以释放。超图神经网络的设计。在此框架中,给定的无向GCNN的图形拉普拉斯被统一的HyperGraph Laplacian替换,该统一的HyperGraph Laplacian通过将我们所定义的广义超透明牌与简单的无向图等同起来,从随机的步行角度将顶点权重信息替换。来自各个领域的广泛实验,包括社交网络分析,视觉目标分类和蛋白质学习,证明了拟议框架的最新性能。
translated by 谷歌翻译
图表神经网络(GNNS)在各种机器学习任务中获得了表示学习的提高。然而,应用邻域聚合的大多数现有GNN通常在图中的图表上执行不良,其中相邻的节点属于不同的类。在本文中,我们示出了在典型的异界图中,边缘可以被引导,以及是否像是处理边缘,也可以使它们过度地影响到GNN模型的性能。此外,由于异常的限制,节点对来自本地邻域之外的类似节点的消息非常有益。这些激励我们开发一个自适应地学习图表的方向性的模型,并利用潜在的长距离相关性节点之间。我们首先将图拉普拉斯概括为基于所提出的特征感知PageRank算法向数字化,该算法同时考虑节点之间的图形方向性和长距离特征相似性。然后,Digraph Laplacian定义了一个图形传播矩阵,导致一个名为{\ em diglaciangcn}的模型。基于此,我们进一步利用节点之间的通勤时间测量的节点接近度,以便在拓扑级别上保留节点的远距离相关性。具有不同级别的10个数据集的广泛实验,同意级别展示了我们在节点分类任务任务中对现有解决方案的有效性。
translated by 谷歌翻译
This paper studies the problem of embedding very large information networks into low-dimensional vector spaces, which is useful in many tasks such as visualization, node classification, and link prediction. Most existing graph embedding methods do not scale for real world information networks which usually contain millions of nodes. In this paper, we propose a novel network embedding method called the "LINE," which is suitable for arbitrary types of information networks: undirected, directed, and/or weighted. The method optimizes a carefully designed objective function that preserves both the local and global network structures. An edge-sampling algorithm is proposed that addresses the limitation of the classical stochastic gradient descent and improves both the effectiveness and the efficiency of the inference. Empirical experiments prove the effectiveness of the LINE on a variety of real-world information networks, including language networks, social networks, and citation networks. The algorithm is very efficient, which is able to learn the embedding of a network with millions of vertices and billions of edges in a few hours on a typical single machine. The source code of the LINE is available online. 1
translated by 谷歌翻译
Graph clustering is a fundamental problem in unsupervised learning, with numerous applications in computer science and in analysing real-world data. In many real-world applications, we find that the clusters have a significant high-level structure. This is often overlooked in the design and analysis of graph clustering algorithms which make strong simplifying assumptions about the structure of the graph. This thesis addresses the natural question of whether the structure of clusters can be learned efficiently and describes four new algorithmic results for learning such structure in graphs and hypergraphs. All of the presented theoretical results are extensively evaluated on both synthetic and real-word datasets of different domains, including image classification and segmentation, migration networks, co-authorship networks, and natural language processing. These experimental results demonstrate that the newly developed algorithms are practical, effective, and immediately applicable for learning the structure of clusters in real-world data.
translated by 谷歌翻译
大图通常出现在社交网络,知识图,推荐系统,生命科学和决策问题中。通过其高级别属性总结大图有助于解决这些设置中的问题。在光谱聚类中,我们旨在确定大多数边缘落在簇内的节点簇,而在簇之间只有很少的边缘。此任务对于许多下游应用和探索性分析很重要。光谱聚类的核心步骤是执行相应图的拉普拉斯矩阵(或等效地,奇异值分解,SVD)的特征分类。迭代奇异值分解方法的收敛取决于给定矩阵的光谱的特征,即连续特征值之间的差异。对于对应于群集图的图形的图形拉普拉斯,特征值将是非负的,但很小(小于$ 1 $)的减慢收敛性。本文引入了一种可行的方法,用于扩张光谱以加速SVD求解器,然后又是光谱群集。这是通过对矩阵操作的多项式近似来实现的,矩阵操作有利地改变矩阵的光谱而不更改其特征向量。实验表明,这种方法显着加速了收敛,我们解释了如何并行化和随机近似于可用的计算。
translated by 谷歌翻译
Kernel matrices, as well as weighted graphs represented by them, are ubiquitous objects in machine learning, statistics and other related fields. The main drawback of using kernel methods (learning and inference using kernel matrices) is efficiency -- given $n$ input points, most kernel-based algorithms need to materialize the full $n \times n$ kernel matrix before performing any subsequent computation, thus incurring $\Omega(n^2)$ runtime. Breaking this quadratic barrier for various problems has therefore, been a subject of extensive research efforts. We break the quadratic barrier and obtain $\textit{subquadratic}$ time algorithms for several fundamental linear-algebraic and graph processing primitives, including approximating the top eigenvalue and eigenvector, spectral sparsification, solving linear systems, local clustering, low-rank approximation, arboricity estimation and counting weighted triangles. We build on the recent Kernel Density Estimation framework, which (after preprocessing in time subquadratic in $n$) can return estimates of row/column sums of the kernel matrix. In particular, we develop efficient reductions from $\textit{weighted vertex}$ and $\textit{weighted edge sampling}$ on kernel graphs, $\textit{simulating random walks}$ on kernel graphs, and $\textit{importance sampling}$ on matrices to Kernel Density Estimation and show that we can generate samples from these distributions in $\textit{sublinear}$ (in the support of the distribution) time. Our reductions are the central ingredient in each of our applications and we believe they may be of independent interest. We empirically demonstrate the efficacy of our algorithms on low-rank approximation (LRA) and spectral sparsification, where we observe a $\textbf{9x}$ decrease in the number of kernel evaluations over baselines for LRA and a $\textbf{41x}$ reduction in the graph size for spectral sparsification.
translated by 谷歌翻译
A current goal in the graph neural network literature is to enable transformers to operate on graph-structured data, given their success on language and vision tasks. Since the transformer's original sinusoidal positional encodings (PEs) are not applicable to graphs, recent work has focused on developing graph PEs, rooted in spectral graph theory or various spatial features of a graph. In this work, we introduce a new graph PE, Graph Automaton PE (GAPE), based on weighted graph-walking automata (a novel extension of graph-walking automata). We compare the performance of GAPE with other PE schemes on both machine translation and graph-structured tasks, and we show that it generalizes several other PEs. An additional contribution of this study is a theoretical and controlled experimental comparison of many recent PEs in graph transformers, independent of the use of edge features.
translated by 谷歌翻译
马尔可夫链是一类概率模型,在定量科学中已广泛应用。这部分是由于它们的多功能性,但是可以通过分析探测的便利性使其更加复杂。本教程为马尔可夫连锁店提供了深入的介绍,并探索了它们与图形和随机步行的联系。我们利用从线性代数和图形论的工具来描述不同类型的马尔可夫链的过渡矩阵,特别着眼于探索与这些矩阵相对应的特征值和特征向量的属性。提出的结果与机器学习和数据挖掘中的许多方法有关,我们在各个阶段描述了这些方法。本文并没有本身就成为一项新颖的学术研究,而是提出了一些已知结果的集合以及一些新概念。此外,该教程的重点是向读者提供直觉,而不是正式的理解,并且仅假定对线性代数和概率理论的概念的基本曝光。因此,来自各种学科的学生和研究人员可以访问它。
translated by 谷歌翻译
Graph embedding algorithms embed a graph into a vector space where the structure and the inherent properties of the graph are preserved. The existing graph embedding methods cannot preserve the asymmetric transitivity well, which is a critical property of directed graphs. Asymmetric transitivity depicts the correlation among directed edges, that is, if there is a directed path from u to v, then there is likely a directed edge from u to v. Asymmetric transitivity can help in capturing structures of graphs and recovering from partially observed graphs. To tackle this challenge, we propose the idea of preserving asymmetric transitivity by approximating high-order proximity which are based on asymmetric transitivity. In particular, we develop a novel graph embedding algorithm, High-Order Proximity preserved Embedding (HOPE for short), which is scalable to preserve high-order proximities of large scale graphs and capable of capturing the asymmetric transitivity. More specifically, we first derive a general formulation that cover multiple popular highorder proximity measurements, then propose a scalable embedding algorithm to approximate the high-order proximity measurements based on their general formulation. Moreover, we provide a theoretical upper bound on the RMSE (Root Mean Squared Error) of the approximation. Our empirical experiments on a synthetic dataset and three realworld datasets demonstrate that HOPE can approximate the high-order proximities significantly better than the state-ofart algorithms and outperform the state-of-art algorithms in tasks of reconstruction, link prediction and vertex recommendation.
translated by 谷歌翻译
对网络中的用户如何根据邻居的意见更新他们的意见的理解吸引了网络科学领域的极大兴趣,并且越来越多的文献认识到了这个问题的重要性。在这篇研究论文中,我们提出了有指导网络中意见形成的新动态模型。在此模型中,每个节点的意见被更新为邻居意见的加权平均值,而权重代表社会影响力。我们将一种新的中心度度量定义为基于影响和整合性的社会影响度量。我们使用两个意见形成模型来衡量这种新方法:(i)degroot模型和(ii)我们自己提出的模型。先前发表的研究没有考虑合格,并且仅考虑计算社会影响时节点的影响。在我们的定义中,与高度和较低程度的节点相关的较低度和高度的节点具有较高的中心性。作为这项研究的主要贡献,我们提出了一种算法,用于在社交网络中找到一小部分节点,该节点可能会对其他节点的观点产生重大影响。关于现实世界数据的实验表明,所提出的算法显着优于先前发布的最新方法。
translated by 谷歌翻译
Since the invention of word2vec [28,29], the skip-gram model has significantly advanced the research of network embedding, such as the recent emergence of the DeepWalk, LINE, PTE, and node2vec approaches. In this work, we show that all of the aforementioned models with negative sampling can be unified into the matrix factorization framework with closed forms. Our analysis and proofs reveal that: (1) DeepWalk [31] empirically produces a low-rank transformation of a network's normalized Laplacian matrix; (2) LINE [37], in theory, is a special case of DeepWalk when the size of vertices' context is set to one; (3) As an extension of LINE, PTE [36] can be viewed as the joint factorization of multiple networks' Laplacians; (4) node2vec [16] is factorizing a matrix related to the stationary distribution and transition probability tensor of a 2nd-order random walk. We further provide the theoretical connections between skip-gram based network embedding algorithms and the theory of graph Laplacian. Finally, we present the NetMF method 1 as well as its approximation algorithm for computing network embedding. Our method offers significant improvements over DeepWalk and LINE for conventional network mining tasks. This work lays the theoretical foundation for skip-gram based network embedding methods, leading to a better understanding of latent network representation learning.
translated by 谷歌翻译
复杂的网络是代表现实生活系统的图形,这些系统表现出独特的特征,这些特征在纯粹的常规或完全随机的图中未发现。由于基础过程的复杂性,对此类系统的研究至关重要,但具有挑战性。然而,由于大量网络数据的可用性,近几十年来,这项任务变得更加容易。复杂网络中的链接预测旨在估计网络中缺少两个节点之间的链接的可能性。由于数据收集的不完美或仅仅是因为它们尚未出现,因此可能会缺少链接。发现网络数据中实体之间的新关系吸引了研究人员在社会学,计算机科学,物理学和生物学等各个领域的关注。大多数现有研究的重点是无向复杂网络中的链接预测。但是,并非所有现实生活中的系统都可以忠实地表示为无向网络。当使用链接预测算法时,通常会做出这种简化的假设,但不可避免地会导致有关节点之间关系和预测性能中降解的信息的丢失。本文介绍了针对有向网络的明确设计的链接预测方法。它基于相似性范式,该范式最近已证明在无向网络中成功。提出的算法通过在相似性和受欢迎程度上将其建模为不对称性来处理节点关系中的不对称性。鉴于观察到的网络拓扑结构,该算法将隐藏的相似性近似为最短路径距离,并使用边缘权重捕获并取消链接的不对称性和节点的受欢迎程度。在现实生活中评估了所提出的方法,实验结果证明了其在预测各种网络数据类型和大小的丢失链接方面的有效性。
translated by 谷歌翻译
社交网络通常是使用签名图对社交网络进行建模的,其中顶点与用户相对应,并且边缘具有一个指示用户之间的交互作用的符号。出现的签名图通常包含一个清晰的社区结构,因为该图可以分配到少数极化社区中,每个群落都定义了稀疏切割,并且不可分割地分为较小的极化亚共同体。我们为具有如此清晰的社区结构的签名图提供了本地聚类甲骨文图的小部分。正式地,当图形具有最高度且社区数量最多为$ o(\ log n)$时,则使用$ \ tilde {o}(\ sqrt {n} \ sqrt {n} \ propatatorName {poly}(1/\ varepsilon) )$预处理时间,我们的Oracle可以回答$ \ tilde {o}(\ sqrt {n} \ operatorname {poly}(1/\ varepsilon))$ time的每个成员查询,并且它正确地分类了$(1--1-(1-) \ varepsilon)$ - 顶点W.R.T.的分数一组隐藏的种植地面真实社区。我们的Oracle在仅需要少数顶点需要的聚类信息的应用中是可取的。以前,此类局部聚类牙齿仅因无符号图而闻名。我们对签名图的概括需要许多新的想法,并对随机步行的行为进行了新的光谱分析。我们评估了我们的算法,用于在合成和现实世界数据集上构建这种甲骨文和回答成员资格查询,从而在实践中验证其性能。
translated by 谷歌翻译
In recent years, spectral clustering has become one of the most popular modern clustering algorithms. It is simple to implement, can be solved efficiently by standard linear algebra software, and very often outperforms traditional clustering algorithms such as the k-means algorithm. On the first glance spectral clustering appears slightly mysterious, and it is not obvious to see why it works at all and what it really does. The goal of this tutorial is to give some intuition on those questions. We describe different graph Laplacians and their basic properties, present the most common spectral clustering algorithms, and derive those algorithms from scratch by several different approaches. Advantages and disadvantages of the different spectral clustering algorithms are discussed.
translated by 谷歌翻译
图表表示学习是一种快速增长的领域,其中一个主要目标是在低维空间中产生有意义的图形表示。已经成功地应用了学习的嵌入式来执行各种预测任务,例如链路预测,节点分类,群集和可视化。图表社区的集体努力提供了数百种方法,但在所有评估指标下没有单一方法擅长,例如预测准确性,运行时间,可扩展性等。该调查旨在通过考虑算法来评估嵌入方法的所有主要类别的图表变体,参数选择,可伸缩性,硬件和软件平台,下游ML任务和多样化数据集。我们使用包含手动特征工程,矩阵分解,浅神经网络和深图卷积网络的分类法组织了图形嵌入技术。我们使用广泛使用的基准图表评估了节点分类,链路预测,群集和可视化任务的这些类别算法。我们在Pytorch几何和DGL库上设计了我们的实验,并在不同的多核CPU和GPU平台上运行实验。我们严格地审查了各种性能指标下嵌入方法的性能,并总结了结果。因此,本文可以作为比较指南,以帮助用户选择最适合其任务的方法。
translated by 谷歌翻译