Recent advancements in deep neural networks for graph-structured data have led to state-of-the-art performance on recommender system benchmarks. However, making these methods practical and scalable to web-scale recommendation tasks with billions of items and hundreds of millions of users remains a challenge.Here we describe a large-scale deep recommendation engine that we developed and deployed at Pinterest. We develop a dataefficient Graph Convolutional Network (GCN) algorithm PinSage, which combines efficient random walks and graph convolutions to generate embeddings of nodes (i.e., items) that incorporate both graph structure as well as node feature information. Compared to prior GCN approaches, we develop a novel method based on highly efficient random walks to structure the convolutions and design a novel training strategy that relies on harder-and-harder training examples to improve robustness and convergence of the model.We deploy PinSage at Pinterest and train it on 7.5 billion examples on a graph with 3 billion nodes representing pins and boards, and 18 billion edges. According to offline metrics, user studies and A/B tests, PinSage generates higher-quality recommendations than comparable deep learning and graph-based alternatives. To our knowledge, this is the largest application of deep graph embeddings to date and paves the way for a new generation of web-scale recommender systems based on graph convolutional architectures.
translated by 谷歌翻译
Low-dimensional embeddings of nodes in large graphs have proved extremely useful in a variety of prediction tasks, from content recommendation to identifying protein functions. However, most existing approaches require that all nodes in the graph are present during training of the embeddings; these previous approaches are inherently transductive and do not naturally generalize to unseen nodes. Here we present GraphSAGE, a general inductive framework that leverages node feature information (e.g., text attributes) to efficiently generate node embeddings for previously unseen data. Instead of training individual embeddings for each node, we learn a function that generates embeddings by sampling and aggregating features from a node's local neighborhood. Our algorithm outperforms strong baselines on three inductive node-classification benchmarks: we classify the category of unseen nodes in evolving information graphs based on citation and Reddit post data, and we show that our algorithm generalizes to completely unseen graphs using a multi-graph dataset of protein-protein interactions. * The two first authors made equal contributions. 1 While it is common to refer to these data structures as social or biological networks, we use the term graph to avoid ambiguity with neural network terminology.
translated by 谷歌翻译
Machine learning on graphs is an important and ubiquitous task with applications ranging from drug design to friendship recommendation in social networks. The primary challenge in this domain is finding a way to represent, or encode, graph structure so that it can be easily exploited by machine learning models. Traditionally, machine learning approaches relied on user-defined heuristics to extract features encoding structural information about a graph (e.g., degree statistics or kernel functions). However, recent years have seen a surge in approaches that automatically learn to encode graph structure into low-dimensional embeddings, using techniques based on deep learning and nonlinear dimensionality reduction. Here we provide a conceptual review of key advancements in this area of representation learning on graphs, including matrix factorization-based methods, random-walk based algorithms, and graph neural networks. We review methods to embed individual nodes as well as approaches to embed entire (sub)graphs. In doing so, we develop a unified framework to describe these recent approaches, and we highlight a number of important applications and directions for future work.
translated by 谷歌翻译
Deep learning has revolutionized many machine learning tasks in recent years, ranging from image classification and video processing to speech recognition and natural language understanding. The data in these tasks are typically represented in the Euclidean space. However, there is an increasing number of applications where data are generated from non-Euclidean domains and are represented as graphs with complex relationships and interdependency between objects. The complexity of graph data has imposed significant challenges on existing machine learning algorithms. Recently, many studies on extending deep learning approaches for graph data have emerged. In this survey, we provide a comprehensive overview of graph neural networks (GNNs) in data mining and machine learning fields. We propose a new taxonomy to divide the state-of-the-art graph neural networks into four categories, namely recurrent graph neural networks, convolutional graph neural networks, graph autoencoders, and spatial-temporal graph neural networks. We further discuss the applications of graph neural networks across various domains and summarize the open source codes, benchmark data sets, and model evaluation of graph neural networks. Finally, we propose potential research directions in this rapidly growing field.
translated by 谷歌翻译
Learning vector representations (aka. embeddings) of users and items lies at the core of modern recommender systems. Ranging from early matrix factorization to recently emerged deep learning based methods, existing efforts typically obtain a user's (or an item's) embedding by mapping from pre-existing features that describe the user (or the item), such as ID and attributes. We argue that an inherent drawback of such methods is that, the collaborative signal, which is latent in user-item interactions, is not encoded in the embedding process. As such, the resultant embeddings may not be sufficient to capture the collaborative filtering effect.In this work, we propose to integrate the user-item interactionsmore specifically the bipartite graph structure -into the embedding process. We develop a new recommendation framework Neural Graph Collaborative Filtering (NGCF), which exploits the useritem graph structure by propagating embeddings on it. This leads to the expressive modeling of high-order connectivity in useritem graph, effectively injecting the collaborative signal into the embedding process in an explicit manner. We conduct extensive experiments on three public benchmarks, demonstrating significant improvements over several state-of-the-art models like HOP-Rec [40] and Collaborative Memory Network [5]. Further analysis verifies the importance of embedding propagation for learning better user and item representations, justifying the rationality and effectiveness of NGCF. Codes are available at https://github.com/ xiangwang1223/neural_graph_collaborative_filtering. CCS CONCEPTS• Information systems → Recommender systems. * In the version published in ACM Digital Library, we find some small bugs; the bugs do not change the comparison results and the empirical findings. In this latest version, we update and correct the experimental results (i.e., the preprocessing of Yelp2018 dataset and the ndcg metric). All updates are highlighted in footnotes.
translated by 谷歌翻译
Pre-publication draft of a book to be published byMorgan & Claypool publishers. Unedited version released with permission. All relevant copyrights held by the author and publisher extend to this pre-publication draft.
translated by 谷歌翻译
图表表示学习是一种快速增长的领域,其中一个主要目标是在低维空间中产生有意义的图形表示。已经成功地应用了学习的嵌入式来执行各种预测任务,例如链路预测,节点分类,群集和可视化。图表社区的集体努力提供了数百种方法,但在所有评估指标下没有单一方法擅长,例如预测准确性,运行时间,可扩展性等。该调查旨在通过考虑算法来评估嵌入方法的所有主要类别的图表变体,参数选择,可伸缩性,硬件和软件平台,下游ML任务和多样化数据集。我们使用包含手动特征工程,矩阵分解,浅神经网络和深图卷积网络的分类法组织了图形嵌入技术。我们使用广泛使用的基准图表评估了节点分类,链路预测,群集和可视化任务的这些类别算法。我们在Pytorch几何和DGL库上设计了我们的实验,并在不同的多核CPU和GPU平台上运行实验。我们严格地审查了各种性能指标下嵌入方法的性能,并总结了结果。因此,本文可以作为比较指南,以帮助用户选择最适合其任务的方法。
translated by 谷歌翻译
冷启动是推荐系统中的必要且持久的问题。最先进的解决方案依赖于基于辅助信息的冷启动和现有用户/项目的培训混合模型。这种混合模型将损害现有用户/项目的性能,这可能使这些解决方案不适用于现实世界中的推荐系统,在这些系统中,必须保证现有用户/项目的体验。同时,已证明图形神经网络(GNN)可以有效地进行温暖(非冷淡)建议。但是,从未应用它们来处理用户项目两部分图中的冷启动问题。这是一项具有挑战性但有意义的任务,因为冷启动用户/项目没有链接。此外,设计合适的GNN来进行冷启动建议是不算气的,同时保持现有用户/项目的性能。为了弥合差距,我们提出了一个量身定制的基于GNN的框架(GPATCH),其中包含两个单独但相关的组件。首先,有效的GNN体系结构 - Gwarmer,旨在建模暖用户/物品。其次,我们通过进行冷启动建议来构建相关的补丁网络,以模拟和补丁Gwarmer。基准和大规模商业数据集的实验表明,GPATCH在为现有和冷启动的用户/项目提供建议方面非常出色。
translated by 谷歌翻译
包括传统浅层模型和深图神经网络(GNN)在内的图形嵌入方法已导致有希望的应用。然而,由于其优化范式,浅层模型尤其是基于随机步行的算法无法充分利用采样子图或序列中的邻居接近度。基于GNN的算法遇到了高阶信息的利用不足,在堆叠过多的层时很容易引起过度平滑的问题,这可能会恶化低度(长尾)项目的建议,从而限制了表现力和可伸缩性。在本文中,我们提出了一个新颖的框架SAC,即空间自动回归编码,以统一的方式解决上述问题。为了充分利用邻居接近和高级信息,我们设计了一种新型的空间自回旋范式。具体而言,我们首先随机掩盖了多跳的邻居,并通过以明确的多跳上注意来整合所有其他周围的邻居来嵌入目标节点。然后,我们加强模型,通过对比编码和蒙面邻居的嵌入来学习目标节点的邻居预测性编码,并配备了新的硬性阴性采样策略。为了了解目标到邻居预测任务的最小足够表示并删除邻居的冗余,我们通过最大化目标预测性编码和蒙面邻居的嵌入以及同时约束编码之间的相互信息来设计邻居信息瓶颈和周围的邻居的嵌入。公共推荐数据集和实际方案网络规模数据集Douyin-Friend-Recormendation的实验结果证明了SAC的优势与最先进的方法相比。
translated by 谷歌翻译
时间图代表实体之间的动态关系,并发生在许多现实生活中的应用中,例如社交网络,电子商务,通信,道路网络,生物系统等。他们需要根据其生成建模和表示学习的研究超出与静态图有关的研究。在这项调查中,我们全面回顾了近期针对处理时间图提出的神经时间依赖图表的学习和生成建模方法。最后,我们确定了现有方法的弱点,并讨论了我们最近发表的论文提格的研究建议[24]。
translated by 谷歌翻译
近年来,由于图表代表学习的出色表现,图形神经网络(GNN)技术在许多真实情景中获得了相当大的兴趣,例如推荐系统和社交网络。在推荐系统中,主要挑战是从其互动中学习有效的用户/项目表示。但是,由于它们对数据集和评估度量的差异,比较使用GNNS用于推荐系统的GNN的许多出版物。此外,其中许多只提供了一个演示,以对小型数据集进行实验,这很远可在现实世界推荐系统中应用。为了解决这个问题,我们介绍了Graph4Rec,这是一个Universal Toolkit,它统一地将GNN模型培训到以下部分:图表输入,随机步行生成,自我图形生成,对生成和GNNS选择。从这个训练管道,可以通过一些配置轻松建立自己的GNN模型。此外,我们开发了一个大规模的图形引擎和参数服务器,以支持分布式GNN培训。我们进行系统和全面的实验,以比较不同GNN模型在不同规模中的若干场景中的性能。证明了广泛的实验以识别GNN的关键组分。我们还尝试弄清楚稀疏和密集的参数如何影响GNN的性能。最后,我们研究了包括负面采样,自我图形建设顺序和温暖开始策略的方法,以找到更有效和高效的GNNS在推荐系统上做法。我们的工具包基于PGL HTTPS://github.com/paddlePaddle/pgl,并且在https://github.com/paddlepaddle/pgl/tree/main/apps/graph4rec中打开代码。
translated by 谷歌翻译
知识图表通常掺入到推荐系统,以提高整体性能。由于知识图的推广和规模,大多数知识的关系是不是目标用户项预测有帮助。要利用知识图在推荐系统捕捉目标具体知识的关系,我们需要提炼知识图,以保留有用的信息和完善的知识来捕捉用户的喜好。为了解决这个问题,我们提出了知识感知条件注意网络(KCAN),这是一个终端到终端的模式纳入知识图形转换为推荐系统。具体来说,我们使用一个知识感知注意传播方式,以获得所述节点表示第一,其捕获用户 - 项目网络和知识图表对全球语义相似度。然后给出一个目标,即用户 - 项对,我们会自动提炼出知识图到基于知识感知关注的具体目标子。随后,通过在应用子有条件的注意力聚集,我们细化知识图,以获得特定目标节点表示。因此,我们可以得到两个表示性和个性化,以实现整体性能。现实世界的数据集实验结果表明,我们对国家的最先进的算法框架的有效性。
translated by 谷歌翻译
建议中的用户项交互可以自然地将其作为用户项二分钟图。鉴于图形表示学习中图形神经网络(GNN)的成功,已提出基于GNN的C方法来推进推荐系统。这些方法通常根据学习的用户和项目嵌入式提出建议。但是,我们发现它们不会在真实建议中表现出很常见的稀疏稀疏用户项目图。因此,在这项工作中,我们介绍了一种新颖的视角,以建立基于GNN的CF方法,了解建议的框架局部图协作滤波(LGCF)。 LGCF的一个关键优势在于它不需要为每个用户和项目学习嵌入,这在稀疏方案中具有挑战性。或者,LGCF旨在将有用的CF信息编码为本地化的图表并基于这些图形提出建议。关于各种数据集的广泛实验验证了LGCF的有效性,尤其是稀疏场景。此外,经验结果表明LGCF为基于嵌入的CF模型提供了互补信息,该模型可用于提高推荐性能。
translated by 谷歌翻译
基于图神经网络(GNN)方法已饱和推荐系统的领域。这些系统的收益很大,显示了通过网络结构解释数据的优势。但是,尽管在建议任务中使用图形结构有明显的好处,但这种表示形式也带来了新的挑战,这些挑战加剧了缓解算法偏见的复杂性。当将GNN集成到下游任务中时,例如建议,缓解偏差可能会变得更加困难。此外,将现有的公平促进方法应用于大型现实世界数据集的棘手性对缓解尝试更加严重的限制。我们的工作着手通过采用现有方法来促进图形上的个人公平性并将其扩展以支持Mini批次或基于子样本的培训,从而填补了这一空白下游建议任务。我们评估了两种流行的GNN方法:图形卷积网络(GCN),该方法在整个图上进行训练,以及使用概率随机步行的图形,以创建用于迷你批次训练的子图,并评估子采样对个人公平性的影响。我们实施了一个由Dong等人提出的称为\ textit {redress}的个人公平概念,该概念使用等级优化来学习单个公平节点或项目,嵌入。我们在两个现实世界数据集上进行了经验证明,图形不仅能够达到可比的精度,而且与GCN模型相比,还可以提高公平性。这些发现对个人的公平促进,GNN和下游形式产生了影响,推荐系统,表明小批量培训通过允许当地的细微努力指导代表性学习中的公平促进过程来促进个人公平促进。
translated by 谷歌翻译
Graph Neural Networks (GNNs) have become increasingly important in recent years due to their state-of-the-art performance on many important downstream applications. Existing GNNs have mostly focused on learning a single node representation, despite that a node often exhibits polysemous behavior in different contexts. In this work, we develop a persona-based graph neural network framework called PersonaSAGE that learns multiple persona-based embeddings for each node in the graph. Such disentangled representations are more interpretable and useful than a single embedding. Furthermore, PersonaSAGE learns the appropriate set of persona embeddings for each node in the graph, and every node can have a different number of assigned persona embeddings. The framework is flexible enough and the general design helps in the wide applicability of the learned embeddings to suit the domain. We utilize publicly available benchmark datasets to evaluate our approach and against a variety of baselines. The experiments demonstrate the effectiveness of PersonaSAGE for a variety of important tasks including link prediction where we achieve an average gain of 15% while remaining competitive for node classification. Finally, we also demonstrate the utility of PersonaSAGE with a case study for personalized recommendation of different entity types in a data management platform.
translated by 谷歌翻译
Graph Convolution Network (GCN) has become new state-ofthe-art for collaborative filtering. Nevertheless, the reasons of its effectiveness for recommendation are not well understood. Existing work that adapts GCN to recommendation lacks thorough ablation analyses on GCN, which is originally designed for graph classification tasks and equipped with many neural network operations. However, we empirically find that the two most common designs in GCNs -feature transformation and nonlinear activation -contribute little to the performance of collaborative filtering. Even worse, including them adds to the difficulty of training and degrades recommendation performance.In this work, we aim to simplify the design of GCN to make it more concise and appropriate for recommendation. We propose a new model named LightGCN, including only the most essential component in GCN -neighborhood aggregation -for collaborative filtering. Specifically, LightGCN learns user and item embeddings by linearly propagating them on the user-item interaction graph, and uses the weighted sum of the embeddings learned at all layers as the final embedding. Such simple, linear, and neat model is much easier to implement and train, exhibiting substantial improvements (about 16.0% relative improvement on average) over Neural Graph Collaborative Filtering (NGCF) -a state-of-the-art GCN-based recommender model -under exactly the same experimental setting. Further analyses are provided towards the rationality of the simple LightGCN from both analytical and empirical perspectives. Our implementations are available in both TensorFlow
translated by 谷歌翻译
图表神经网络(GNNS)已广泛应用于推荐任务,并获得了非常吸引人的性能。然而,大多数基于GNN的推荐方法在实践中遭受数据稀疏问题。同时,预训练技术在减轻了各个领域(如自然语言处理(NLP)和计算机视觉(CV)等域中的数据稀疏而取得了巨大成功。因此,图形预培训具有扩大基于GNN的建议的数据稀疏的巨大潜力。但是,预先培训GNN,建议面临独特的挑战。例如,不同推荐任务中的用户项交互图具有不同的用户和项目集,并且它们通常存在不同的属性。因此,在NLP和CV中常用的成功机制将知识从预训练任务转移到下游任务,例如共享所学习的嵌入式或特征提取器,而不是直接适用于现有的基于GNN的推荐模型。为了解决这些挑战,我们精致地设计了一个自适应图形预训练框架,用于本地化协作滤波(适应)。它不需要传输用户/项目嵌入式,并且能够跨越不同图的共同知识和每个图形的唯一性。广泛的实验结果表明了适应的有效性和优越性。
translated by 谷歌翻译
图表卷积网络(GCN)已广泛应用于推荐系统,以其在用户和项目嵌入物上的表示学习功能。然而,由于其递归消息传播机制,GCN容易受到现实世界中常见的噪声和不完整的图表。在文献中,一些工作建议在消息传播期间删除功能转换,但是使其无法有效地捕获图形结构特征。此外,它们在欧几里德空间中的用户和项目模拟了欧几里德空间中的项目,该空间已经在建模复杂的图表时具有高失真,进一步降低了捕获图形结构特征并导致次优性能的能力。为此,在本文中,我们提出了一个简单而有效的四元数图卷积网络(QGCN)推荐模型。在所提出的模型中,我们利用超复杂的四元数空间来学习用户和项目表示,并进行功能转换,以提高性能和鲁棒性。具体来说,我们首先将所有用户和项目嵌入到四元数空间中。然后,我们将eMaterNion嵌入传播层与四元数特征转换介绍以执行消息传播。最后,我们将在每层生成的嵌入物结合在一起,平均汇集策略以获得最终嵌入的推荐。在三个公共基准数据集上进行了广泛的实验表明,我们提出的QGCN模型优于大幅度的基线方法。
translated by 谷歌翻译
最近提出了基于子图的图表学习(SGRL)来应对规范图神经网络(GNNS)遇到的一些基本挑战,并在许多重要的数据科学应用(例如链接,关系和主题预测)中证明了优势。但是,当前的SGRL方法遇到了可伸缩性问题,因为它们需要为每个培训或测试查询提取子图。扩大规范GNN的最新解决方案可能不适用于SGRL。在这里,我们通过共同设计学习算法及其系统支持,为可扩展的SGRL提出了一种新颖的框架Surel。 Surel采用基于步行的子图表分解,并将步行重新形成子图,从而大大降低了子图提取的冗余并支持并行计算。具有数百万个节点和边缘的六个同质,异质和高阶图的实验证明了Surel的有效性和可扩展性。特别是,与SGRL基线相比,Surel可以实现10 $ \ times $ Quad-Up,具有可比甚至更好的预测性能;与规范GNN相比,Surel可实现50%的预测准确性。
translated by 谷歌翻译
基于观察到的图,对在关系结构数据上应用机器学习技术的兴趣增加了。通常,该图并不能完全代表节点之间的真实关系。在这些设置中,构建以观测图为条件的生成模型可以考虑图形不确定性。各种现有技术要么依赖于限制性假设,无法在样品中保留拓扑特性,要么在较大的图表中昂贵。在这项工作中,我们介绍了用于通过图形构建分布的节点复制模型。随机图的采样是通过替换每个节点的邻居的邻居来进行采样的。采样图保留图形结构的关键特征,而无需明确定位它们。此外,该模型的采样非常简单,并与节点线性缩放。我们在三个任务中显示了复制模型的有用性。首先,在节点分类中,基于节点复制的贝叶斯公式在稀疏数据设置中实现了更高的精度。其次,我们采用建议的模型来减轻对抗攻击对图形拓扑的影响。最后,将模型纳入推荐系统设置,改善了对最新方法的回忆。
translated by 谷歌翻译