动态图是指结构随时间变化的图形。尽管学习顶点表示(即嵌入)对动态图的好处,但现有作品仅将动态图视为顶点连接中的一系列变化,忽略了这种动态的至关重要的异步性,在其中每个局部结构的演变都在每个局部结构开始在每个局部结构的演变开始,不同的时间和持续时间在各个持续时间内。为了在图中维持异步结构演变,我们将动态图作为与角度(TOV)和边缘(toe)的时间板相关的时间边缘序列进行创新。然后,提出了一个时间感知的变压器将顶点的动态连接和脚趾嵌入到学习的顶点表示中。同时,我们将每个边缘序列视为一个整体,并嵌入第一个顶点的TOV,以进一步编码时间敏感的信息。在几个数据集上进行了广泛的评估表明,我们的方法在广泛的图形挖掘任务中优于最先进的方法。同时,它非常有效且可扩展,可用于嵌入大规模的动态图。
translated by 谷歌翻译
图表表示学习是一种快速增长的领域,其中一个主要目标是在低维空间中产生有意义的图形表示。已经成功地应用了学习的嵌入式来执行各种预测任务,例如链路预测,节点分类,群集和可视化。图表社区的集体努力提供了数百种方法,但在所有评估指标下没有单一方法擅长,例如预测准确性,运行时间,可扩展性等。该调查旨在通过考虑算法来评估嵌入方法的所有主要类别的图表变体,参数选择,可伸缩性,硬件和软件平台,下游ML任务和多样化数据集。我们使用包含手动特征工程,矩阵分解,浅神经网络和深图卷积网络的分类法组织了图形嵌入技术。我们使用广泛使用的基准图表评估了节点分类,链路预测,群集和可视化任务的这些类别算法。我们在Pytorch几何和DGL库上设计了我们的实验,并在不同的多核CPU和GPU平台上运行实验。我们严格地审查了各种性能指标下嵌入方法的性能,并总结了结果。因此,本文可以作为比较指南,以帮助用户选择最适合其任务的方法。
translated by 谷歌翻译
动态图形表示学习是具有广泛应用程序的重要任务。以前关于动态图形学习的方法通常对嘈杂的图形信息(如缺失或虚假连接)敏感,可以产生退化的性能和泛化。为了克服这一挑战,我们提出了一种基于变换器的动态图表学习方法,命名为动态图形变换器(DGT),带有空间 - 时间编码,以有效地学习图形拓扑并捕获隐式链接。为了提高泛化能力,我们介绍了两个补充自我监督的预训练任务,并表明共同优化了两种预训练任务,通过信息理论分析导致较小的贝叶斯错误率。我们还提出了一个时间联盟图形结构和目标 - 上下文节点采样策略,用于高效和可扩展的培训。与现实世界数据集的广泛实验说明了与几个最先进的基线相比,DGT呈现出优异的性能。
translated by 谷歌翻译
Clustering is a fundamental problem in network analysis that finds closely connected groups of nodes and separates them from other nodes in the graph, while link prediction is to predict whether two nodes in a network are likely to have a link. The definition of both naturally determines that clustering must play a positive role in obtaining accurate link prediction tasks. Yet researchers have long ignored or used inappropriate ways to undermine this positive relationship. In this article, We construct a simple but efficient clustering-driven link prediction framework(ClusterLP), with the goal of directly exploiting the cluster structures to obtain connections between nodes as accurately as possible in both undirected graphs and directed graphs. Specifically, we propose that it is easier to establish links between nodes with similar representation vectors and cluster tendencies in undirected graphs, while nodes in a directed graphs can more easily point to nodes similar to their representation vectors and have greater influence in their own cluster. We customized the implementation of ClusterLP for undirected and directed graphs, respectively, and the experimental results using multiple real-world networks on the link prediction task showed that our models is highly competitive with existing baseline models. The code implementation of ClusterLP and baselines we use are available at https://github.com/ZINUX1998/ClusterLP.
translated by 谷歌翻译
时间图代表实体之间的动态关系,并发生在许多现实生活中的应用中,例如社交网络,电子商务,通信,道路网络,生物系统等。他们需要根据其生成建模和表示学习的研究超出与静态图有关的研究。在这项调查中,我们全面回顾了近期针对处理时间图提出的神经时间依赖图表的学习和生成建模方法。最后,我们确定了现有方法的弱点,并讨论了我们最近发表的论文提格的研究建议[24]。
translated by 谷歌翻译
良好的研究努力致力于利用股票预测中的深度神经网络。虽然远程依赖性和混沌属性仍然是在预测未来价格趋势之前降低最先进的深度学习模型的表现。在这项研究中,我们提出了一个新的框架来解决这两个问题。具体地,在将时间序列转换为复杂网络方面,我们将市场价格系列转换为图形。然后,从映射的图表中提取参考时间点和节点权重之间的关联的结构信息以解决关于远程依赖性和混沌属性的问题。我们采取图形嵌入式以表示时间点之间的关联作为预测模型输入。节点重量被用作先验知识,以增强时间关注的学习。我们拟议的框架的有效性通过现实世界股票数据验证,我们的方法在几个最先进的基准中获得了最佳性能。此外,在进行的交易模拟中,我们的框架进一步获得了最高的累积利润。我们的结果补充了复杂网络方法在金融领域的现有应用,并为金融市场中决策支持的投资应用提供了富有识别的影响。
translated by 谷歌翻译
学习在动态环境中网络的低维拓扑表示由于许多真实网络的时间不断发展而引起了很多关注。动态网络嵌入(DNE)的主要和共同目标是有效更新节点嵌入品,同时在每次步骤保留网络拓扑时。大多数现有DNE方法的想法是捕获受影响的节点(而不是所有节点)的拓扑变化,并因此更新节点嵌入。遗憾的是,这种近似虽然可以提高效率,但是在每次步骤中不能有效地保留动态网络的全局拓扑,因为没有考虑通过高阶接近传播的累积拓扑变化的非活动子网。为了解决这一挑战,我们提出了一种新颖的节点选择策略,以在网络上多移地选择代表节点,这与基于Skip-gram的嵌入方法的新增量学习范例协调。广泛的实验显示Glodyne,较小的节点部分被选中,可以实现优越或相当的性能W.R.T.在三个典型的下游任务中最先进的DNE方法。特别是,Glodyne显着优于图形重建任务中的其他方法,这表明了其全球拓扑保存能力。源代码可在https://github.com/houchengbin/glodyne获得
translated by 谷歌翻译
接触犯罪和暴力会损害个人的生活质量和社区的经济增长。鉴于机器学习的迅速发展,需要探索自动解决方案以防止犯罪。随着细粒度的城市和公共服务数据的可用性越来越多,最近融合了这种跨域信息以促进犯罪预测的激增。通过捕获有关社会结构,环境和犯罪趋势的信息,现有的机器学习预测模型从不同观点探索了动态犯罪模式。但是,这些方法主要将这种多源知识转换为隐性和潜在表示(例如,学区的嵌入),这仍然是研究显式因素对幕后犯罪发生的影响的影响仍然是一个挑战。在本文中,我们提出了一个时空的元数据指导性犯罪预测(STMEC)框架,以捕获犯罪行为的动态模式,并明确地表征了环境和社会因素如何相互互动以产生预测。广泛的实验表明,与其他先进的时空模型相比,STMEC的优越性,尤其是在预测重罪(例如使用危险武器的抢劫和袭击)时。
translated by 谷歌翻译
This paper studies the problem of embedding very large information networks into low-dimensional vector spaces, which is useful in many tasks such as visualization, node classification, and link prediction. Most existing graph embedding methods do not scale for real world information networks which usually contain millions of nodes. In this paper, we propose a novel network embedding method called the "LINE," which is suitable for arbitrary types of information networks: undirected, directed, and/or weighted. The method optimizes a carefully designed objective function that preserves both the local and global network structures. An edge-sampling algorithm is proposed that addresses the limitation of the classical stochastic gradient descent and improves both the effectiveness and the efficiency of the inference. Empirical experiments prove the effectiveness of the LINE on a variety of real-world information networks, including language networks, social networks, and citation networks. The algorithm is very efficient, which is able to learn the embedding of a network with millions of vertices and billions of edges in a few hours on a typical single machine. The source code of the LINE is available online. 1
translated by 谷歌翻译
图形结构化数据通常在自然界中具有动态字符,例如,在许多现实世界中,链接和节点的添加。近年来见证了对这种图形数据进行建模的动态图神经网络所支付的越来越多的注意力,几乎所有现有方法都假设,当建立新的链接时,应通过学习时间动态来传播邻居节点的嵌入。新的信息。但是,这种方法遭受了这样的限制,如果新连接引入的节点包含嘈杂的信息,那么将其知识传播到其他节点是不可靠的,甚至导致模型崩溃。在本文中,我们提出了Adanet:通过增强动态图神经网络的强化知识适应框架。与以前的方法相反,一旦添加了新链接,就立即更新邻居节点的嵌入方式,Adanet试图自适应地确定由于涉及的新链接而应更新哪些节点。考虑到是否更新一个邻居节点的嵌入的决定将对其他邻居节点产生很大的影响,因此,我们将节点更新的选择作为序列决策问题,并通过强化学习解决此问题。通过这种方式,我们可以将知识自适应地传播到其他节点,以学习健壮的节点嵌入表示。据我们所知,我们的方法构成了通过强化学习的动态图神经网络来探索强大知识适应的首次尝试。在三个基准数据集上进行的广泛实验表明,Adanet可以实现最新的性能。此外,我们通过在数据集中添加不同程度的噪声来执行实验,并定量和定性地说明ADANET的鲁棒性。
translated by 谷歌翻译
在低维空间中节点的学习表示是一项至关重要的任务,在网络分析中具有许多有趣的应用,包括链接预测,节点分类和可视化。解决此问题的两种流行方法是矩阵分解和基于步行的随机模型。在本文中,我们旨在将两全其美的最好的人融合在一起,以学习节点表示。特别是,我们提出了一个加权矩阵分解模型,该模型编码有关网络节点的随机步行信息。这种新颖的表述的好处是,它使我们能够利用内核函数,而无需意识到确切的接近矩阵,从而增强现有矩阵分解方法的表达性,并减轻其计算复杂性。我们通过多个内核学习公式扩展了方法,该公式提供了学习内核作为以数据驱动方式的词典的线性组合的灵活性。我们在现实世界网络上执行经验评估,表明所提出的模型优于基线节点嵌入下游机器学习任务中的算法。
translated by 谷歌翻译
图表是一个宇宙数据结构,广泛用于组织现实世界中的数据。像交通网络,社交和学术网络这样的各种实际网络网络可以由图表代表。近年来,目睹了在网络中代表顶点的快速发展,进入低维矢量空间,称为网络表示学习。表示学习可以促进图形数据上的新算法的设计。在本调查中,我们对网络代表学习的当前文献进行了全面审查。现有算法可以分为三组:浅埋模型,异构网络嵌入模型,图形神经网络的模型。我们为每个类别审查最先进的算法,并讨论这些算法之间的基本差异。调查的一个优点是,我们系统地研究了不同类别的算法底层的理论基础,这提供了深入的见解,以更好地了解网络表示学习领域的发展。
translated by 谷歌翻译
许多实际关系系统,如社交网络和生物系统,包含动态相互作用。在学习动态图形表示时,必须采用连续的时间信息和几何结构。主流工作通过消息传递网络(例如,GCN,GAT)实现拓扑嵌入。另一方面,时间演进通常通过在栅极机构中具有方便信息过滤的存储单元(例如,LSTM或GU)来表达。但是,由于过度复杂的编码,这种设计可以防止大规模的输入序列。这项工作从自我关注的哲学中学习,并提出了一种高效的基于频谱的神经单元,采用信息的远程时间交互。发达的频谱窗口单元(SWINIT)模型预测了具有保证效率的可扩展动态图形。该架构与一些构成随机SVD,MLP和图形帧卷积的一些简单的有效计算块组装。 SVD加MLP模块编码动态图事件的长期特征演进。帧卷积中的快速帧图形变换嵌入了结构动态。两种策略都提高了模型对可扩展分析的能力。特别地,迭代的SVD近似度将注意力的计算复杂性缩小到具有n个边缘和D边缘特征的动态图形的关注的计算复杂性,并且帧卷积的多尺度变换允许在网络训练中具有足够的可扩展性。我们的Swinit在各种在线连续时间动态图表学习任务中实现了最先进的性能,而与基线方法相比,可学习参数的数量可达七倍。
translated by 谷歌翻译
我们研究大规模网络嵌入问题,旨在学习网络挖掘应用的低维潜在表示。网络嵌入领域的最新研究导致了大型进展,如深散,线,NetMF,NetSMF。然而,许多真实网络的巨大尺寸使得从整个网络学习网络嵌入的网络昂贵。在这项工作中,我们提出了一种新的网络嵌入方法,称为“NES”,其学习来自小型代表性子图的网络嵌入。 NES利用图表采样的理论,以有效地构建具有较小尺寸的代表性子图,该子图尺寸可用于对完整网络进行推断,使得能够显着提高嵌入学习的效率。然后,NES有效地计算从该代表子图嵌入的网络。与众所周知的方法相比,对各种规模和类型网络的广泛实验表明NES实现了可比性和显着的效率优势。
translated by 谷歌翻译
网络表示学习(NRL)方法在过去几年中受到了重大关注,因此由于它们在几个图形分析问题中的成功,包括节点分类,链路预测和聚类。这种方法旨在以一种保留网络的结构信息的方式将网络的每个顶点映射到低维空间中。特别感兴趣的是基于随机行走的方法;这些方法将网络转换为节点序列的集合,旨在通过预测序列内每个节点的上下文来学习节点表示。在本文中,我们介绍了一种通用框架,以增强通过基于主题信息的随机行走方法获取的节点的嵌入。类似于自然语言处理中局部单词嵌入的概念,所提出的模型首先将每个节点分配给潜在社区,并有利于各种统计图模型和社区检测方法,然后了解增强的主题感知表示。我们在两个下游任务中评估我们的方法:节点分类和链路预测。实验结果表明,通过纳入节点和社区嵌入,我们能够以广泛的广泛的基线NRL模型表明。
translated by 谷歌翻译
Network embedding is an important method to learn low-dimensional representations of vertexes in networks, aiming to capture and preserve the network structure. Almost all the existing network embedding methods adopt shallow models. However, since the underlying network structure is complex, shallow models cannot capture the highly non-linear network structure, resulting in sub-optimal network representations. Therefore, how to find a method that is able to effectively capture the highly non-linear network structure and preserve the global and local structure is an open yet important problem. To solve this problem, in this paper we propose a Structural Deep Network Embedding method, namely SDNE. More specifically, we first propose a semi-supervised deep model, which has multiple layers of non-linear functions, thereby being able to capture the highly non-linear network structure. Then we propose to exploit the first-order and second-order proximity jointly to preserve the network structure. The second-order proximity is used by the unsupervised component to capture the global network structure. While the first-order proximity is used as the supervised information in the supervised component to preserve the local network structure. By jointly optimizing them in the semi-supervised deep model, our method can preserve both the local and global network structure and is robust to sparse networks. Empirically, we conduct the experiments on five real-world networks, including a language network, a citation network and three social networks. The results show that compared to the baselines, our method can reconstruct the original network significantly better and achieves substantial gains in three applications, i.e. multi-label classification, link prediction and visualization.
translated by 谷歌翻译
We present DeepWalk, a novel approach for learning latent representations of vertices in a network. These latent representations encode social relations in a continuous vector space, which is easily exploited by statistical models. Deep-Walk generalizes recent advancements in language modeling and unsupervised feature learning (or deep learning) from sequences of words to graphs.DeepWalk uses local information obtained from truncated random walks to learn latent representations by treating walks as the equivalent of sentences. We demonstrate DeepWalk's latent representations on several multi-label network classification tasks for social networks such as Blog-Catalog, Flickr, and YouTube. Our results show that Deep-Walk outperforms challenging baselines which are allowed a global view of the network, especially in the presence of missing information. DeepWalk's representations can provide F1 scores up to 10% higher than competing methods when labeled data is sparse. In some experiments, Deep-Walk's representations are able to outperform all baseline methods while using 60% less training data.DeepWalk is also scalable. It is an online learning algorithm which builds useful incremental results, and is trivially parallelizable. These qualities make it suitable for a broad class of real world applications such as network classification, and anomaly detection.
translated by 谷歌翻译
变压器架构已成为许多域中的主导选择,例如自然语言处理和计算机视觉。然而,与主流GNN变体相比,它对图形水平预测的流行排行榜没有竞争表现。因此,它仍然是一个谜,变形金机如何对图形表示学习表现良好。在本文中,我们通过提出了基于标准变压器架构构建的Gragemer来解决这一神秘性,并且可以在广泛的图形表示学习任务中获得优异的结果,特别是在最近的OGB大规模挑战上。我们在图中利用变压器的关键洞察是有效地将图形的结构信息有效地编码到模型中。为此,我们提出了几种简单但有效的结构编码方法,以帮助Gramemormer更好的模型图形结构数据。此外,我们在数学上表征了Gramemormer的表现力,并展示了我们编码图形结构信息的方式,许多流行的GNN变体都可以被涵盖为GrameRormer的特殊情况。
translated by 谷歌翻译
GPS trajectories are the essential foundations for many trajectory-based applications, such as travel time estimation, traffic prediction and trajectory similarity measurement. Most applications require a large amount of high sample rate trajectories to achieve a good performance. However, many real-life trajectories are collected with low sample rate due to energy concern or other constraints.We study the task of trajectory recovery in this paper as a means for increasing the sample rate of low sample trajectories. Currently, most existing works on trajectory recovery follow a sequence-to-sequence diagram, with an encoder to encode a trajectory and a decoder to recover real GPS points in the trajectory. However, these works ignore the topology of road network and only use grid information or raw GPS points as input. Therefore, the encoder model is not able to capture rich spatial information of the GPS points along the trajectory, making the prediction less accurate and lack spatial consistency. In this paper, we propose a road network enhanced transformer-based framework, namely RNTrajRec, for trajectory recovery. RNTrajRec first uses a graph model, namely GridGNN, to learn the embedding features of each road segment. It next develops a spatial-temporal transformer model, namely GPSFormer, to learn rich spatial and temporal features along with a Sub-Graph Generation module to capture the spatial features for each GPS point in the trajectory. It finally forwards the outputs of encoder model into a multi-task decoder model to recover the missing GPS points. Extensive experiments based on three large-scale real-life trajectory datasets confirm the effectiveness of our approach.
translated by 谷歌翻译
Deep learning has revolutionized many machine learning tasks in recent years, ranging from image classification and video processing to speech recognition and natural language understanding. The data in these tasks are typically represented in the Euclidean space. However, there is an increasing number of applications where data are generated from non-Euclidean domains and are represented as graphs with complex relationships and interdependency between objects. The complexity of graph data has imposed significant challenges on existing machine learning algorithms. Recently, many studies on extending deep learning approaches for graph data have emerged. In this survey, we provide a comprehensive overview of graph neural networks (GNNs) in data mining and machine learning fields. We propose a new taxonomy to divide the state-of-the-art graph neural networks into four categories, namely recurrent graph neural networks, convolutional graph neural networks, graph autoencoders, and spatial-temporal graph neural networks. We further discuss the applications of graph neural networks across various domains and summarize the open source codes, benchmark data sets, and model evaluation of graph neural networks. Finally, we propose potential research directions in this rapidly growing field.
translated by 谷歌翻译