链接预测(LP)已被认为是图形学习的重要任务,其广泛的实际应用。 LP的典型应用是为给定的源节点(例如朋友推荐)检索最高的评分邻居。这些服务希望具有很高的推理可伸缩性,以找到低潜伏期中许多候选节点的最高评分邻居。最近有两个流行的解码器主要用于计算节点嵌入的边缘得分:HadamArdMLP和DOT产品解码器。经过理论和经验分析后,我们发现HadamardMLP解码器通常对LP更有效。但是,HadamardMLP缺乏在大图上检索最高得分的邻居的可扩展性,因为据我们所知,并不存在算法来检索sublinearearightions中的HadamardMLP解码器的最高得分邻居。为了使HadamardMLP可扩展,我们建议使用手电筒算法加速HadamardMLP的最高得分邻居检索:一种弹性算法,该算法逐渐应用了具有适应性调整的查询嵌入的近似最大内部产品搜索(MIPS)技术。经验结果表明,手电筒在不牺牲效力的情况下将LP的推理速度提高了100倍以上。我们的工作为大规模LP应用程序铺平了道路,并通过大大加速其推断,并通过有效的HadamArdMLP解码器铺平了道路。
translated by 谷歌翻译
知识图(kgs)由于能够存储适用于许多领域的关系知识的能力,因此有助于多种应用。尽管在创造和维护方面进行了巨大的努力,但即使是最大的公斤也远非完整。因此,KG完成(KGC)已成为KG研究最关键的任务之一。最近,该领域的大量文献围绕着使用图神经网络(GNN)学习强大的嵌入,从而利用KGS中的拓扑结构。具体而言,已经做出了专门的努力,以扩展GNN,通常是为简单的同质和单一相关图设计的,以通过设计更复杂的聚合方案而不是相邻节点(关键的节点)(通过设计更复杂的聚合方案)(为GNN绩效)适当利用多关系信息。这些方法的成功自然归因于GNN在简单的多层感知器(MLP)模型上使用,这是由于它们的附加聚合功能。在这项工作中,我们发现简单的MLP模型能够达到与GNN的可比性能,这表明聚集可能并不像以前那样重要。通过进一步的探索,我们显示出仔细的评分功能和损失功能设计对KGC模型性能的影响要大得多,并且实际上不需要聚集。这表明了评分功能设计,损失功能设计和先前工作中的聚集结合,并有很有希望的见解当今最先进的KGC方法的可伸缩性,以及对KGC任务更合适的聚合设计的仔细注意明天。该实现可在线获得:https://github.com/juanhui28/are_mpnns_helpful。
translated by 谷歌翻译
Explaining machine learning models is an important and increasingly popular area of research interest. The Shapley value from game theory has been proposed as a prime approach to compute feature importance towards model predictions on images, text, tabular data, and recently graph neural networks (GNNs) on graphs. In this work, we revisit the appropriateness of the Shapley value for GNN explanation, where the task is to identify the most important subgraph and constituent nodes for GNN predictions. We claim that the Shapley value is a non-ideal choice for graph data because it is by definition not structure-aware. We propose a Graph Structure-aware eXplanation (GStarX) method to leverage the critical graph structure information to improve the explanation. Specifically, we define a scoring function based on a new structure-aware value from the cooperative game theory proposed by Hamiache and Navarro (HN). When used to score node importance, the HN value utilizes graph structures to attribute cooperation surplus between neighbor nodes, resembling message passing in GNNs, so that node importance scores reflect not only the node feature importance, but also the node structural roles. We demonstrate that GStarX produces qualitatively more intuitive explanations, and quantitatively improves explanation fidelity over strong baselines on chemical graph property prediction and text graph sentiment classification.
translated by 谷歌翻译
图形神经网络(GNNS)在学习图表表示中取得了前所未有的成功,以识别图形的分类标签。然而,GNN的大多数现有图形分类问题遵循平衡数据拆分协议,这与许多真实情景中的许多实际方案都有比其他类别更少的标签。在这种不平衡情况下直接培训GNN可能导致少数群体类别中的图形的无色表达,并损害下游分类的整体性能,这意味着开发有效GNN处理不平衡图分类的重要性。现有方法是针对非图形结构数据量身定制的,或专为不平衡节点分类而设计,而少数关注不平衡图分类。为此,我们介绍了一个新颖的框架,图形图形 - 图形神经网络(G $ ^ 2 $ GNN),通过从邻近图和本地从图形本身来源地通过全局导出额外的监督来减轻图形不平衡问题。在全球范围内,我们基于内核相似性构建图表(GOG)的图表,并执行GOG传播以聚合相邻图形表示,其最初通过通过GNN编码器汇集的节点级传播而获得。在本地,我们通过掩模节点或丢弃边缘采用拓扑增强,以改善辨别说明书测试图的拓扑结构中的模型概括性。在七个基准数据集中进行的广泛图形分类实验证明了我们提出的G $ ^ $ ^ 2 $ GNN优于F1-Macro和F1-Micro Scores的大约5 \%的大量基线。 G $ ^ 2 $ GNN的实现可用于\ href {https://github.com/yuwvandy/g2gnn} {https://github.com/yuwvandy/g2gnn}。
translated by 谷歌翻译
鉴于在现实世界应用中大规模图的流行率,训练神经模型的存储和时间引起了人们的关注。为了减轻关注点,我们提出和研究图形神经网络(GNNS)的图形凝结问题。具体而言,我们旨在将大型原始图凝结成一个小的,合成的和高度信息的图,以便在小图和大图上训练的GNN具有可比性的性能。我们通过优化梯度匹配损失并设计一种凝结节点期货和结构信息的策略来模仿原始图上的GNN训练轨迹,以解决凝结问题。广泛的实验证明了所提出的框架在将不同的图形数据集凝结成信息较小的较小图中的有效性。特别是,我们能够在REDDIT上近似于95.3%的原始测试准确性,Flickr的99.8%和CiteSeer的99.0%,同时将其图形尺寸降低了99.9%以上,并且可以使用冷凝图来训练各种GNN架构Code在https://github.com/chandlerbang/gcond上发布。
translated by 谷歌翻译
消息传递神经网络(MPNNs)是格拉夫神经网络(GNN)的一个常见的类型,其中,每个节点的表示是通过聚集从表示其直接邻居(消息)类似于一个星形图案递归计算。 MPNNs的呼吁是有效的,可扩展的,怎么样,曾经它们的表现是由一阶Weisfeiler雷曼同构测试(1-WL)的上界。对此,之前的作品提出在可扩展性的成本极富表现力的模型,有时泛化性能。我们的工作表示这两个政权:我们介绍抬升任何MPNN更加传神,具有可扩展性有限的开销,大大提高了实用性能的总体框架。我们从星星图案一般的子模式(例如,K-egonets)在MPNNs扩展本地聚合实现这一点:在我们的框架中,每个节点表示被计算为周边诱发子的编码,而不是唯一的近邻编码(即一个明星)。我们选择子编码器是一个GNN(主要是MPNNs,考虑到可扩展性)来设计用作一个包装掀任何GNN的总体框架。我们把我们提出的方法GNN-AK(GNN为核心),作为框架用GNNS更换内核类似于卷积神经网络。从理论上讲,我们表明,我们的框架比1和2-WL确实更强大,并且不超过3-WL那么强大。我们还设计子取样策略,可大大降低内存占用和提高速度的同时保持性能。我们的方法将大利润率多家知名图形ML任务新的国家的最先进的性能;具体地,0.08 MAE锌,74.79%和86.887%的准确度上CIFAR10和分别PATTERN。
translated by 谷歌翻译
给定来自动态图的图形边缘,我们如何以在线方式将异常得分分配给边缘和子图,以便使用恒定的时间和内存来检测异常行为?例如,在入侵检测中,现有工作试图检测异常的边缘或异常子图,但并非两者兼而有之。在本文中,我们首先将Count-Min草图数据结构扩展到高阶草图。该高阶草图具有保留密集的子图结构的有用属性(输入中的密集子图转换为数据结构中的密集子膜)。然后,我们提出了4种利用这种增强数据结构的在线算法,该算法(a)检测边缘和图异常; (b)在恒定内存和每个新到达边缘的恒定内存和恒定更新时间中处理每个边缘,并且; (c)在4个现实世界数据集上优于最先进的基线。我们的方法是第一种流媒体方法,该方法结合了密集的子图搜索以在恒定内存和时间中检测图形异常。
translated by 谷歌翻译
When robots learn reward functions using high capacity models that take raw state directly as input, they need to both learn a representation for what matters in the task -- the task ``features" -- as well as how to combine these features into a single objective. If they try to do both at once from input designed to teach the full reward function, it is easy to end up with a representation that contains spurious correlations in the data, which fails to generalize to new settings. Instead, our ultimate goal is to enable robots to identify and isolate the causal features that people actually care about and use when they represent states and behavior. Our idea is that we can tune into this representation by asking users what behaviors they consider similar: behaviors will be similar if the features that matter are similar, even if low-level behavior is different; conversely, behaviors will be different if even one of the features that matter differs. This, in turn, is what enables the robot to disambiguate between what needs to go into the representation versus what is spurious, as well as what aspects of behavior can be compressed together versus not. The notion of learning representations based on similarity has a nice parallel in contrastive learning, a self-supervised representation learning technique that maps visually similar data points to similar embeddings, where similarity is defined by a designer through data augmentation heuristics. By contrast, in order to learn the representations that people use, so we can learn their preferences and objectives, we use their definition of similarity. In simulation as well as in a user study, we show that learning through such similarity queries leads to representations that, while far from perfect, are indeed more generalizable than self-supervised and task-input alternatives.
translated by 谷歌翻译
We address the problem of extracting key steps from unlabeled procedural videos, motivated by the potential of Augmented Reality (AR) headsets to revolutionize job training and performance. We decompose the problem into two steps: representation learning and key steps extraction. We employ self-supervised representation learning via a training strategy that adapts off-the-shelf video features using a temporal module. Training implements self-supervised learning losses involving multiple cues such as appearance, motion and pose trajectories extracted from videos to learn generalizable representations. Our method extracts key steps via a tunable algorithm that clusters the representations extracted from procedural videos. We quantitatively evaluate our approach with key step localization and also demonstrate the effectiveness of the extracted representations on related downstream tasks like phase classification. Qualitative results demonstrate that the extracted key steps are meaningful to succinctly represent the procedural tasks.
translated by 谷歌翻译
Kernels are efficient in representing nonlocal dependence and they are widely used to design operators between function spaces. Thus, learning kernels in operators from data is an inverse problem of general interest. Due to the nonlocal dependence, the inverse problem can be severely ill-posed with a data-dependent singular inversion operator. The Bayesian approach overcomes the ill-posedness through a non-degenerate prior. However, a fixed non-degenerate prior leads to a divergent posterior mean when the observation noise becomes small, if the data induces a perturbation in the eigenspace of zero eigenvalues of the inversion operator. We introduce a data-adaptive prior to achieve a stable posterior whose mean always has a small noise limit. The data-adaptive prior's covariance is the inversion operator with a hyper-parameter selected adaptive to data by the L-curve method. Furthermore, we provide a detailed analysis on the computational practice of the data-adaptive prior, and demonstrate it on Toeplitz matrices and integral operators. Numerical tests show that a fixed prior can lead to a divergent posterior mean in the presence of any of the four types of errors: discretization error, model error, partial observation and wrong noise assumption. In contrast, the data-adaptive prior always attains posterior means with small noise limits.
translated by 谷歌翻译