道路网络和轨迹表示学习对于交通系统至关重要,因为学习的表示形式可以直接用于各种下游任务(例如,交通速度推理和旅行时间估计)。但是,大多数现有方法仅在同一规模内对比,即分别处理道路网络和轨迹,这些方法忽略了有价值的相互关系。在本文中,我们旨在提出一个统一的框架,该框架共同学习道路网络和轨迹表示端到端。我们为公路对比度和轨迹 - 轨迹对比度分别设计了特定领域的增强功能,即路段及其上下文邻居和轨迹分别替换和丢弃了替代方案。最重要的是,我们进一步引入了路面跨尺度对比,与最大化总互信息桥接了这两个尺度。与仅在形成对比的图形及其归属节点上的现有跨尺度对比度学习方法不同,路段和轨迹之间的对比是通过新颖的正面抽样和适应性加权策略精心量身定制的。我们基于两个实际数据集进行了审慎的实验,这些数据集具有四个下游任务,证明了性能和有效性的提高。该代码可在https://github.com/mzy94/jclrnt上找到。
translated by 谷歌翻译
GPS trajectories are the essential foundations for many trajectory-based applications, such as travel time estimation, traffic prediction and trajectory similarity measurement. Most applications require a large amount of high sample rate trajectories to achieve a good performance. However, many real-life trajectories are collected with low sample rate due to energy concern or other constraints.We study the task of trajectory recovery in this paper as a means for increasing the sample rate of low sample trajectories. Currently, most existing works on trajectory recovery follow a sequence-to-sequence diagram, with an encoder to encode a trajectory and a decoder to recover real GPS points in the trajectory. However, these works ignore the topology of road network and only use grid information or raw GPS points as input. Therefore, the encoder model is not able to capture rich spatial information of the GPS points along the trajectory, making the prediction less accurate and lack spatial consistency. In this paper, we propose a road network enhanced transformer-based framework, namely RNTrajRec, for trajectory recovery. RNTrajRec first uses a graph model, namely GridGNN, to learn the embedding features of each road segment. It next develops a spatial-temporal transformer model, namely GPSFormer, to learn rich spatial and temporal features along with a Sub-Graph Generation module to capture the spatial features for each GPS point in the trajectory. It finally forwards the outputs of encoder model into a multi-task decoder model to recover the missing GPS points. Extensive experiments based on three large-scale real-life trajectory datasets confirm the effectiveness of our approach.
translated by 谷歌翻译
在异质图上的自我监督学习(尤其是对比度学习)方法可以有效地摆脱对监督数据的依赖。同时,大多数现有的表示学习方法将异质图嵌入到欧几里得或双曲线的单个几何空间中。这种单个几何视图通常不足以观察由于其丰富的语义和复杂结构而观察到异质图的完整图片。在这些观察结果下,本文提出了一种新型的自我监督学习方法,称为几何对比度学习(GCL),以更好地表示监督数据是不可用时的异质图。 GCL同时观察了从欧几里得和双曲线观点的异质图,旨在强烈合并建模丰富的语义和复杂结构的能力,这有望为下游任务带来更多好处。 GCL通过在局部局部和局部全球语义水平上对比表示两种几何视图之间的相互信息。在四个基准数据集上进行的广泛实验表明,在三个任务上,所提出的方法在包括节点分类,节点群集和相似性搜索在内的三个任务上都超过了强基础,包括无监督的方法和监督方法。
translated by 谷歌翻译
训练前轨迹嵌入是空间轨迹挖掘中的一个基本和关键程序,对各种下游任务都是有益的。产生有效轨迹嵌入的关键是从轨迹(包括运动模式和旅行目的)中提取高级旅行语义,并考虑轨迹的长期空间时间相关性。尽管有现有的努力,但训练前轨迹嵌入仍存在重大挑战。首先,常用的生成借个任务不适合从轨迹中提取高级语义。其次,现有的数据增强方法非常适合轨迹数据集。第三,当前的编码器设计无法完全合并轨迹中隐藏的长期时空相关性。为了应对这些挑战,我们提出了一种新型的对比性时空轨迹嵌入(CSTTE)模型,用于学习全面的轨迹嵌入。 CSTTE采用了对比度学习框架,以使其借口任务对噪音具有牢固的态度。一种专门设计的轨迹数据增强方法与对比度借口任务相结合,以保留高级旅行语义。我们还构建了有效的时空轨迹编码器,以有效,全面地对轨迹中的长期空间 - 周期性相关性进行建模。与现有的轨迹嵌入方法相比,对两个下游任务和三个现实世界数据集进行了广泛的实验证明了我们的模型的优势。
translated by 谷歌翻译
We introduce a self-supervised approach for learning node and graph level representations by contrasting structural views of graphs. We show that unlike visual representation learning, increasing the number of views to more than two or contrasting multi-scale encodings do not improve performance, and the best performance is achieved by contrasting encodings from first-order neighbors and a graph diffusion. We achieve new state-ofthe-art results in self-supervised learning on 8 out of 8 node and graph classification benchmarks under the linear evaluation protocol. For example, on Cora (node) and Reddit-Binary (graph) classification benchmarks, we achieve 86.8% and 84.5% accuracy, which are 5.5% and 2.4% relative improvements over previous state-of-the-art. When compared to supervised baselines, our approach outperforms them in 4 out of 8 benchmarks.
translated by 谷歌翻译
关于图表的深度学习最近吸引了重要的兴趣。然而,大多数作品都侧重于(半)监督学习,导致缺点包括重标签依赖,普遍性差和弱势稳健性。为了解决这些问题,通过良好设计的借口任务在不依赖于手动标签的情况下提取信息知识的自我监督学习(SSL)已成为图形数据的有希望和趋势的学习范例。与计算机视觉和自然语言处理等其他域的SSL不同,图表上的SSL具有独家背景,设计理念和分类。在图表的伞下自我监督学习,我们对采用图表数据采用SSL技术的现有方法及时及全面的审查。我们构建一个统一的框架,数学上正式地规范图表SSL的范例。根据借口任务的目标,我们将这些方法分为四类:基于生成的,基于辅助性的,基于对比的和混合方法。我们进一步描述了曲线图SSL在各种研究领域的应用,并总结了绘图SSL的常用数据集,评估基准,性能比较和开源代码。最后,我们讨论了该研究领域的剩余挑战和潜在的未来方向。
translated by 谷歌翻译
尽管图表学习(GRL)取得了重大进展,但要以足够的方式提取和嵌入丰富的拓扑结构和特征信息仍然是一个挑战。大多数现有方法都集中在本地结构上,并且无法完全融合全球拓扑结构。为此,我们提出了一种新颖的结构保留图表学习(SPGRL)方法,以完全捕获图的结构信息。具体而言,为了减少原始图的不确定性和错误信息,我们通过k-nearest邻居方法构建了特征图作为互补视图。该特征图可用于对比节点级别以捕获本地关系。此外,我们通过最大化整个图形和特征嵌入的相互信息(MI)来保留全局拓扑结构信息,从理论上讲,该信息可以简化为交换功能的特征嵌入和原始图以重建本身。广泛的实验表明,我们的方法在半监督节点分类任务上具有相当出色的性能,并且在图形结构或节点特征上噪声扰动下的鲁棒性出色。
translated by 谷歌翻译
图表表示学习(GRL)对于图形结构数据分析至关重要。然而,大多数现有的图形神经网络(GNNS)严重依赖于标签信息,这通常是在现实世界中获得的昂贵。现有无监督的GRL方法遭受某些限制,例如对单调对比和可扩展性有限的沉重依赖。为了克服上述问题,鉴于最近的图表对比学习的进步,我们通过曲线图介绍了一种新颖的自我监控图形表示学习算法,即通过利用所提出的调整变焦方案来学习节点表示来学习节点表示。具体地,该机制使G-Zoom能够从多个尺度的图表中探索和提取自我监督信号:MICRO(即,节点级别),MESO(即,邻域级)和宏(即,子图级) 。首先,我们通过两个不同的图形增强生成输入图的两个增强视图。然后,我们逐渐地从节点,邻近逐渐为上述三个尺度建立三种不同的对比度,在那里我们最大限度地提高了横跨尺度的图形表示之间的协议。虽然我们可以从微距和宏观视角上从给定图中提取有价值的线索,但是邻域级对比度基于我们的调整后的缩放方案提供了可自定义选项的能力,以便手动选择位于微观和介于微观之间的最佳视点宏观透视更好地理解图数据。此外,为了使我们的模型可扩展到大图,我们采用了并行图形扩散方法来从图形尺寸下解耦模型训练。我们对现实世界数据集进行了广泛的实验,结果表明,我们所提出的模型始终始终优于最先进的方法。
translated by 谷歌翻译
Graph Contrastive Learning (GCL) has recently drawn much research interest for learning generalizable node representations in a self-supervised manner. In general, the contrastive learning process in GCL is performed on top of the representations learned by a graph neural network (GNN) backbone, which transforms and propagates the node contextual information based on its local neighborhoods. However, nodes sharing similar characteristics may not always be geographically close, which poses a great challenge for unsupervised GCL efforts due to their inherent limitations in capturing such global graph knowledge. In this work, we address their inherent limitations by proposing a simple yet effective framework -- Simple Neural Networks with Structural and Semantic Contrastive Learning} (S^3-CL). Notably, by virtue of the proposed structural and semantic contrastive learning algorithms, even a simple neural network can learn expressive node representations that preserve valuable global structural and semantic patterns. Our experiments demonstrate that the node representations learned by S^3-CL achieve superior performance on different downstream tasks compared with the state-of-the-art unsupervised GCL methods. Implementation and more experimental details are publicly available at \url{https://github.com/kaize0409/S-3-CL.}
translated by 谷歌翻译
Inspired by the impressive success of contrastive learning (CL), a variety of graph augmentation strategies have been employed to learn node representations in a self-supervised manner. Existing methods construct the contrastive samples by adding perturbations to the graph structure or node attributes. Although impressive results are achieved, it is rather blind to the wealth of prior information assumed: with the increase of the perturbation degree applied on the original graph, 1) the similarity between the original graph and the generated augmented graph gradually decreases; 2) the discrimination between all nodes within each augmented view gradually increases. In this paper, we argue that both such prior information can be incorporated (differently) into the contrastive learning paradigm following our general ranking framework. In particular, we first interpret CL as a special case of learning to rank (L2R), which inspires us to leverage the ranking order among positive augmented views. Meanwhile, we introduce a self-ranking paradigm to ensure that the discriminative information among different nodes can be maintained and also be less altered to the perturbations of different degrees. Experiment results on various benchmark datasets verify the effectiveness of our algorithm compared with the supervised and unsupervised models.
translated by 谷歌翻译
Recently, contrastive learning (CL) has emerged as a successful method for unsupervised graph representation learning. Most graph CL methods first perform stochastic augmentation on the input graph to obtain two graph views and maximize the agreement of representations in the two views. Despite the prosperous development of graph CL methods, the design of graph augmentation schemes-a crucial component in CL-remains rarely explored. We argue that the data augmentation schemes should preserve intrinsic structures and attributes of graphs, which will force the model to learn representations that are insensitive to perturbation on unimportant nodes and edges. However, most existing methods adopt uniform data augmentation schemes, like uniformly dropping edges and uniformly shuffling features, leading to suboptimal performance. In this paper, we propose a novel graph contrastive representation learning method with adaptive augmentation that incorporates various priors for topological and semantic aspects of the graph. Specifically, on the topology level, we design augmentation schemes based on node centrality measures to highlight important connective structures. On the node attribute level, we corrupt node features by adding more noise to unimportant node features, to enforce the model to recognize underlying semantic information. We perform extensive experiments of node classification on a variety of real-world datasets. Experimental results demonstrate that our proposed method consistently outperforms existing state-of-the-art baselines and even surpasses some supervised counterparts, which validates the effectiveness of the proposed contrastive framework with adaptive augmentation. CCS CONCEPTS• Computing methodologies → Unsupervised learning; Neural networks; Learning latent representations.
translated by 谷歌翻译
估计到达时间(ETA)预测时间(也称为旅行时间估计)是针对各种智能运输应用程序(例如导航,路线规划和乘车服务)的基本任务。为了准确预测一条路线的旅行时间,必须考虑到上下文和预测因素,例如空间 - 周期性的互动,驾驶行为和交通拥堵传播的推断。先前在百度地图上部署的ETA预测模型已经解决了时空相互作用(constgat)和驾驶行为(SSML)的因素。在这项工作中,我们专注于建模交通拥堵传播模式以提高ETA性能。交通拥堵的传播模式建模具有挑战性,它需要考虑到随着时间的推移影响区域的影响区域,以及延迟变化随时间变化的累积影响,这是由于道路网络上的流量事件引起的。在本文中,我们提出了一个实用的工业级ETA预测框架,名为Dueta。具体而言,我们基于交通模式的相关性构建了一个对拥堵敏感的图,并开发了一种路线感知图形变压器,以直接学习路段的长距离相关性。该设计使Dueta能够捕获空间遥远但与交通状况高度相关的路段对之间的相互作用。广泛的实验是在从百度地图收集的大型现实世界数据集上进行的。实验结果表明,ETA预测可以从学习的交通拥堵传播模式中显着受益。此外,Dueta已经在Baidu Maps的生产中部署,每天都有数十亿个请求。这表明Dueta是用于大规模ETA预测服务的工业级和强大的解决方案。
translated by 谷歌翻译
Existing graph contrastive learning methods rely on augmentation techniques based on random perturbations (e.g., randomly adding or dropping edges and nodes). Nevertheless, altering certain edges or nodes can unexpectedly change the graph characteristics, and choosing the optimal perturbing ratio for each dataset requires onerous manual tuning. In this paper, we introduce Implicit Graph Contrastive Learning (iGCL), which utilizes augmentations in the latent space learned from a Variational Graph Auto-Encoder by reconstructing graph topological structure. Importantly, instead of explicitly sampling augmentations from latent distributions, we further propose an upper bound for the expected contrastive loss to improve the efficiency of our learning algorithm. Thus, graph semantics can be preserved within the augmentations in an intelligent way without arbitrary manual design or prior human knowledge. Experimental results on both graph-level and node-level tasks show that the proposed method achieves state-of-the-art performance compared to other benchmarks, where ablation studies in the end demonstrate the effectiveness of modules in iGCL.
translated by 谷歌翻译
Contrastive learning methods based on InfoNCE loss are popular in node representation learning tasks on graph-structured data. However, its reliance on data augmentation and its quadratic computational complexity might lead to inconsistency and inefficiency problems. To mitigate these limitations, in this paper, we introduce a simple yet effective contrastive model named Localized Graph Contrastive Learning (Local-GCL in short). Local-GCL consists of two key designs: 1) We fabricate the positive examples for each node directly using its first-order neighbors, which frees our method from the reliance on carefully-designed graph augmentations; 2) To improve the efficiency of contrastive learning on graphs, we devise a kernelized contrastive loss, which could be approximately computed in linear time and space complexity with respect to the graph size. We provide theoretical analysis to justify the effectiveness and rationality of the proposed methods. Experiments on various datasets with different scales and properties demonstrate that in spite of its simplicity, Local-GCL achieves quite competitive performance in self-supervised node representation learning tasks on graphs with various scales and properties.
translated by 谷歌翻译
人和车辆轨迹体现了运输基础设施的重要信息,轨迹相似性计算是许多涉及轨迹数据分析的现实世界应用中的功能。最近,基于深度学习的轨迹相似性技术使得能够提高传统相似性技术提高效率和适应性。然而,现有的轨迹相似度学习提案强调了时间相似性的空间相似性,使得它们次开用于时光分析。为此,我们提出了ST2VEC,这是一种基于轨迹表示的学习架构,其考虑了道路网络中的时空相似度学习的对轨迹对之间的细粒度的空间和时间相关性。据我们所知,这是第一个用于时空轨迹相似性分析的深学习建议。具体而言,ST2VEC包含三个阶段:(i)培训选择代表性培训样本的数据准备; (ii)设计轨迹的空间和时间建模,其中设计了通用时间建模模块(TMM)的轨迹的空间和时间特征; (iii)时空共关节融合(STCF),其中开发了统一的融合(UF)方法,以帮助产生统一的时空轨迹嵌入,以捕获轨迹之间的时空相似关系。此外,由课程概念启发,ST2VEC采用课程学习进行模型优化,以提高融合和有效性。实验研究提供了证据表明,ST2VEC显着胜过了所有最先进的竞争对手,在有效性,效率和可扩展性方面,同时显示出低参数敏感性和良好的模型稳健性。
translated by 谷歌翻译
图形相似性学习是指计算两个图之间的相似性得分,这在许多现实的应用程序(例如视觉跟踪,图形分类和协作过滤)中需要。由于大多数现有的图形神经网络产生了单个图的有效图表,因此几乎没有努力共同学习两个图表并计算其相似性得分。此外,现有的无监督图相似性学习方法主要基于聚类,它忽略了图对中体现的有价值的信息。为此,我们提出了一个对比度图匹配网络(CGMN),以进行自我监督的图形相似性学习,以计算任何两个输入图对象之间的相似性。具体而言,我们分别在一对中为每个图生成两个增强视图。然后,我们采用两种策略,即跨视图相互作用和跨刻画相互作用,以实现有效的节点表示学习。前者求助于两种观点中节点表示的一致性。后者用于识别不同图之间的节点差异。最后,我们通过汇总操作进行图形相似性计算将节点表示形式转换为图形表示。我们已经在八个现实世界数据集上评估了CGMN,实验结果表明,所提出的新方法优于图形相似性学习下游任务的最新方法。
translated by 谷歌翻译
尽管有关超图的机器学习吸引了很大的关注,但大多数作品都集中在(半)监督的学习上,这可能会导致繁重的标签成本和不良的概括。最近,对比学习已成为一种成功的无监督表示学习方法。尽管其他领域中对比度学习的发展繁荣,但对超图的对比学习仍然很少探索。在本文中,我们提出了Tricon(三个方向对比度学习),这是对超图的对比度学习的一般框架。它的主要思想是三个方向对比度,具体来说,它旨在在两个增强视图中最大化同一节点之间的协议(a),(b)在同一节点之间以及(c)之间,每个组之间的成员及其成员之间的协议(b) 。加上简单但令人惊讶的有效数据增强和负抽样方案,这三种形式的对比使Tricon能够在节点嵌入中捕获显微镜和介观结构信息。我们使用13种基线方法,5个数据集和两个任务进行了广泛的实验,这证明了Tricon的有效性,最明显的是,Tricon始终优于无监督的竞争对手,而且(半)受监督的竞争对手,大多数是由大量的节点分类的大量差额。
translated by 谷歌翻译
图表是一个宇宙数据结构,广泛用于组织现实世界中的数据。像交通网络,社交和学术网络这样的各种实际网络网络可以由图表代表。近年来,目睹了在网络中代表顶点的快速发展,进入低维矢量空间,称为网络表示学习。表示学习可以促进图形数据上的新算法的设计。在本调查中,我们对网络代表学习的当前文献进行了全面审查。现有算法可以分为三组:浅埋模型,异构网络嵌入模型,图形神经网络的模型。我们为每个类别审查最先进的算法,并讨论这些算法之间的基本差异。调查的一个优点是,我们系统地研究了不同类别的算法底层的理论基础,这提供了深入的见解,以更好地了解网络表示学习领域的发展。
translated by 谷歌翻译
Deep learning has revolutionized many machine learning tasks in recent years, ranging from image classification and video processing to speech recognition and natural language understanding. The data in these tasks are typically represented in the Euclidean space. However, there is an increasing number of applications where data are generated from non-Euclidean domains and are represented as graphs with complex relationships and interdependency between objects. The complexity of graph data has imposed significant challenges on existing machine learning algorithms. Recently, many studies on extending deep learning approaches for graph data have emerged. In this survey, we provide a comprehensive overview of graph neural networks (GNNs) in data mining and machine learning fields. We propose a new taxonomy to divide the state-of-the-art graph neural networks into four categories, namely recurrent graph neural networks, convolutional graph neural networks, graph autoencoders, and spatial-temporal graph neural networks. We further discuss the applications of graph neural networks across various domains and summarize the open source codes, benchmark data sets, and model evaluation of graph neural networks. Finally, we propose potential research directions in this rapidly growing field.
translated by 谷歌翻译
图级表示在各种现实世界中至关重要,例如预测分子的特性。但是实际上,精确的图表注释通常非常昂贵且耗时。为了解决这个问题,图形对比学习构造实例歧视任务,将正面对(同一图的增强对)汇总在一起,并将负面对(不同图的增强对)推开,以进行无监督的表示。但是,由于为了查询,其负面因素是从所有图中均匀抽样的,因此现有方法遭受关键采样偏置问题的损失,即,否定物可能与查询具有相同的语义结构,从而导致性能降解。为了减轻这种采样偏见问题,在本文中,我们提出了一种典型的图形对比度学习(PGCL)方法。具体而言,PGCL通过将语义相似的图形群群归为同一组的群集数据的基础语义结构,并同时鼓励聚类的一致性,以实现同一图的不同增强。然后给出查询,它通过从与查询群集不同的群集中绘制图形进行负采样,从而确保查询及其阴性样本之间的语义差异。此外,对于查询,PGCL根据其原型(集群质心)和查询原型之间的距离进一步重新重新重新重新重新享受其负样本,从而使那些具有中等原型距离的负面因素具有相对较大的重量。事实证明,这种重新加权策略比统一抽样更有效。各种图基准的实验结果证明了我们的PGCL比最新方法的优势。代码可在https://github.com/ha-lins/pgcl上公开获取。
translated by 谷歌翻译