变压器架构已成为许多域中的主导选择,例如自然语言处理和计算机视觉。然而,与主流GNN变体相比,它对图形水平预测的流行排行榜没有竞争表现。因此,它仍然是一个谜,变形金机如何对图形表示学习表现良好。在本文中,我们通过提出了基于标准变压器架构构建的Gragemer来解决这一神秘性,并且可以在广泛的图形表示学习任务中获得优异的结果,特别是在最近的OGB大规模挑战上。我们在图中利用变压器的关键洞察是有效地将图形的结构信息有效地编码到模型中。为此,我们提出了几种简单但有效的结构编码方法,以帮助Gramemormer更好的模型图形结构数据。此外,我们在数学上表征了Gramemormer的表现力,并展示了我们编码图形结构信息的方式,许多流行的GNN变体都可以被涵盖为GrameRormer的特殊情况。
translated by 谷歌翻译
The Transformer architecture has become a dominant choice in many domains, such as natural language processing and computer vision. Yet, it has not achieved competitive performance on popular leaderboards of graph-level prediction compared to mainstream GNN variants. Therefore, it remains a mystery how Transformers could perform well for graph representation learning. In this paper, we solve this mystery by presenting Graphormer, which is built upon the standard Transformer architecture, and could attain excellent results on a broad range of graph representation learning tasks, especially on the recent OGB Large-Scale Challenge. Our key insight to utilizing Transformer in the graph is the necessity of effectively encoding the structural information of a graph into the model. To this end, we propose several simple yet effective structural encoding methods to help Graphormer better model graph-structured data. Besides, we mathematically characterize the expressive power of Graphormer and exhibit that with our ways of encoding the structural information of graphs, many popular GNN variants could be covered as the special cases of Graphormer. The code and models of Graphormer will be made publicly available at https://github.com/Microsoft/Graphormer.
translated by 谷歌翻译
近年来,图形变压器在各种图形学习任务上表现出了优势。但是,现有图形变压器的复杂性与节点的数量二次缩放,因此难以扩展到具有数千个节点的图形。为此,我们提出了一个邻域聚集图变压器(Nagphormer),该变压器可扩展到具有数百万节点的大图。在将节点特征馈送到变压器模型中之前,Nagphormer构造令牌由称为Hop2Token的邻域聚合模块为每个节点。对于每个节点,Hop2token聚合从每个跳跃到表示形式的邻域特征,从而产生一系列令牌向量。随后,不同HOP信息的结果序列是变压器模型的输入。通过将每个节点视为一个序列,可以以迷你批量的方式训练Nagphormer,从而可以扩展到大图。 Nagphormer进一步开发了基于注意力的读数功能,以便学习每个跳跃的重要性。我们在各种流行的基准测试中进行了广泛的实验,包括六个小数据集和三个大数据集。结果表明,Nagphormer始终优于现有的图形变压器和主流图神经网络。
translated by 谷歌翻译
我们提出了一个食谱,讲述了如何建立具有线性复杂性和最先进的结果的一般,功能可扩展的(GPS)图形变压器,并在各种基准测试基准上。 Graph Transformers(GTS)在图形表示学习领域中获得了多种近期出版物的知名度,但它们对构成良好的位置或结构编码的共同基础以及与众不同的区别。在本文中,我们总结了具有更清晰的定义的不同类型的编码,并将其分类为$ \ textit {local} $,$ \ textit {global} $或$ \ textit {fextit {ferseal} $。此外,GTS仍被限制在具有数百个节点的小图上,我们提出了第一个具有复杂性线性的体系结构对节点和边缘$ O(n+e)$的数量,通过将局部实质汇总从完全 - 连接的变压器。我们认为,这种解耦并不会对表现性产生负面影响,而我们的体系结构是图形的通用函数近似器。我们的GPS配方包括选择3种主要成分:(i)位置/结构编码,(ii)局部消息通讯机制和(iii)全局注意机制。我们构建和开源一个模块化框架$ \ textit {graphgps} $,该{GraphGps} $支持多种类型的编码,并且在小图和大图中提供效率和可扩展性。我们在11个基准测试上测试了我们的体系结构,并对所有这些基准显示出非常具竞争力的结果,展示了由模块化和不同策略组合获得的经验益处。
translated by 谷歌翻译
Graph Neural Networks (GNNs) have shown great potential in the field of graph representation learning. Standard GNNs define a local message-passing mechanism which propagates information over the whole graph domain by stacking multiple layers. This paradigm suffers from two major limitations, over-squashing and poor long-range dependencies, that can be solved using global attention but significantly increases the computational cost to quadratic complexity. In this work, we propose an alternative approach to overcome these structural limitations by leveraging the ViT/MLP-Mixer architectures introduced in computer vision. We introduce a new class of GNNs, called Graph MLP-Mixer, that holds three key properties. First, they capture long-range dependency and mitigate the issue of over-squashing as demonstrated on the Long Range Graph Benchmark (LRGB) and the TreeNeighbourMatch datasets. Second, they offer better speed and memory efficiency with a complexity linear to the number of nodes and edges, surpassing the related Graph Transformer and expressive GNN models. Third, they show high expressivity in terms of graph isomorphism as they can distinguish at least 3-WL non-isomorphic graphs. We test our architecture on 4 simulated datasets and 7 real-world benchmarks, and show highly competitive results on all of them.
translated by 谷歌翻译
变压器架构最近在图表表示学习中引起了人们的注意,因为它自然地克服了图神经网络(GNN)的几个局限性,避免了它们严格的结构电感偏置,而仅通过位置编码来编码图形结构。在这里,我们表明,具有位置编码的变压器生成的节点表示不一定捕获它们之间的结构相似性。为了解决这个问题,我们提出了结构感知的变压器,这是一类简单而灵活的图形变压器,建立在新的自我发项机制的基础上。这一新的自我注意力通过在计算注意力之前提取植根于每个节点的子图表来结合结构信息。我们提出了几种自动生成子图表表示的方法,并从理论上说明结果表示至少与子图表一样表现力。从经验上讲,我们的方法在五个图预测基准上实现了最先进的性能。我们的结构感知框架可以利用任何现有的GNN提取子图表表示,我们表明它系统地改善了相对于基本GNN模型的性能,成功地结合了GNN和变形金刚的优势。我们的代码可在https://github.com/borgwardtlab/sat上找到。
translated by 谷歌翻译
Although substantial efforts have been made using graph neural networks (GNNs) for AI-driven drug discovery (AIDD), effective molecular representation learning remains an open challenge, especially in the case of insufficient labeled molecules. Recent studies suggest that big GNN models pre-trained by self-supervised learning on unlabeled datasets enable better transfer performance in downstream molecular property prediction tasks. However, they often require large-scale datasets and considerable computational resources, which is time-consuming, computationally expensive, and environmentally unfriendly. To alleviate these limitations, we propose a novel pre-training model for molecular representation learning, Bi-branch Masked Graph Transformer Autoencoder (BatmanNet). BatmanNet features two tailored and complementary graph autoencoders to reconstruct the missing nodes and edges from a masked molecular graph. To our surprise, BatmanNet discovered that the highly masked proportion (60%) of the atoms and bonds achieved the best performance. We further propose an asymmetric graph-based encoder-decoder architecture for either nodes and edges, where a transformer-based encoder only takes the visible subset of nodes or edges, and a lightweight decoder reconstructs the original molecule from the latent representation and mask tokens. With this simple yet effective asymmetrical design, our BatmanNet can learn efficiently even from a much smaller-scale unlabeled molecular dataset to capture the underlying structural and semantic information, overcoming a major limitation of current deep neural networks for molecular representation learning. For instance, using only 250K unlabelled molecules as pre-training data, our BatmanNet with 2.575M parameters achieves a 0.5% improvement on the average AUC compared with the current state-of-the-art method with 100M parameters pre-trained on 11M molecules.
translated by 谷歌翻译
Graph classification is an important area in both modern research and industry. Multiple applications, especially in chemistry and novel drug discovery, encourage rapid development of machine learning models in this area. To keep up with the pace of new research, proper experimental design, fair evaluation, and independent benchmarks are essential. Design of strong baselines is an indispensable element of such works. In this thesis, we explore multiple approaches to graph classification. We focus on Graph Neural Networks (GNNs), which emerged as a de facto standard deep learning technique for graph representation learning. Classical approaches, such as graph descriptors and molecular fingerprints, are also addressed. We design fair evaluation experimental protocol and choose proper datasets collection. This allows us to perform numerous experiments and rigorously analyze modern approaches. We arrive to many conclusions, which shed new light on performance and quality of novel algorithms. We investigate application of Jumping Knowledge GNN architecture to graph classification, which proves to be an efficient tool for improving base graph neural network architectures. Multiple improvements to baseline models are also proposed and experimentally verified, which constitutes an important contribution to the field of fair model comparison.
translated by 谷歌翻译
动态图形表示学习是具有广泛应用程序的重要任务。以前关于动态图形学习的方法通常对嘈杂的图形信息(如缺失或虚假连接)敏感,可以产生退化的性能和泛化。为了克服这一挑战,我们提出了一种基于变换器的动态图表学习方法,命名为动态图形变换器(DGT),带有空间 - 时间编码,以有效地学习图形拓扑并捕获隐式链接。为了提高泛化能力,我们介绍了两个补充自我监督的预训练任务,并表明共同优化了两种预训练任务,通过信息理论分析导致较小的贝叶斯错误率。我们还提出了一个时间联盟图形结构和目标 - 上下文节点采样策略,用于高效和可扩展的培训。与现实世界数据集的广泛实验说明了与几个最先进的基线相比,DGT呈现出优异的性能。
translated by 谷歌翻译
基于变压器的模型已在各个领域(例如自然语言处理和计算机视觉)中广泛使用并实现了最先进的性能。最近的作品表明,变压器也可以推广到图形结构化数据。然而,由于技术挑战,诸如节点数量和非本地聚集的技术挑战之类的技术挑战,因此成功限于小规模图,这通常会导致对常规图神经网络的概括性能。在本文中,为了解决这些问题,我们提出了可变形的图形变压器(DGT),以动态采样的键和值对进行稀疏注意。具体而言,我们的框架首先构建具有各种标准的多个节点序列,以考虑结构和语义接近。然后,将稀疏的注意力应用于节点序列,以减少计算成本,以学习节点表示。我们还设计简单有效的位置编码,以捕获节点之间的结构相似性和距离。实验表明,我们的新型图形变压器始终胜过现有的基于变压器的模型,并且与8个图形基准数据集(包括大型图形)的最新模型相比,与最新的模型相比表现出竞争性能。
translated by 谷歌翻译
基于1-HOP邻居之间的消息传递(MP)范式交换信息的图形神经网络(GNN),以在每一层构建节点表示。原则上,此类网络无法捕获在图形上学习给定任务的可能或必需的远程交互(LRI)。最近,人们对基于变压器的图的开发产生了越来越多的兴趣,这些方法可以考虑超出原始稀疏结构以外的完整节点连接,从而实现了LRI的建模。但是,仅依靠1跳消息传递的MP-gnn与位置特征表示形式结合使用时通常在几个现有的图形基准中表现得更好,因此,限制了Transferter类似体系结构的感知效用和排名。在这里,我们介绍了5个图形学习数据集的远程图基准(LRGB):Pascalvoc-SP,Coco-SP,PCQM-Contact,Peptides-Func和肽结构,可以说需要LRI推理以在给定的任务中实现强大的性能。我们基准测试基线GNN和Graph Transformer网络,以验证捕获长期依赖性的模型在这些任务上的性能明显更好。因此,这些数据集适用于旨在捕获LRI的MP-GNN和Graph Transformer架构的基准测试和探索。
translated by 谷歌翻译
This technical report presents GPS++, the first-place solution to the Open Graph Benchmark Large-Scale Challenge (OGB-LSC 2022) for the PCQM4Mv2 molecular property prediction task. Our approach implements several key principles from the prior literature. At its core our GPS++ method is a hybrid MPNN/Transformer model that incorporates 3D atom positions and an auxiliary denoising task. The effectiveness of GPS++ is demonstrated by achieving 0.0719 mean absolute error on the independent test-challenge PCQM4Mv2 split. Thanks to Graphcore IPU acceleration, GPS++ scales to deep architectures (16 layers), training at 3 minutes per epoch, and large ensemble (112 models), completing the final predictions in 1 hour 32 minutes, well under the 4 hour inference budget allocated. Our implementation is publicly available at: https://github.com/graphcore/ogb-lsc-pcqm4mv2.
translated by 谷歌翻译
In the last few years, graph neural networks (GNNs) have become the standard toolkit for analyzing and learning from data on graphs. This emerging field has witnessed an extensive growth of promising techniques that have been applied with success to computer science, mathematics, biology, physics and chemistry. But for any successful field to become mainstream and reliable, benchmarks must be developed to quantify progress. This led us in March 2020 to release a benchmark framework that i) comprises of a diverse collection of mathematical and real-world graphs, ii) enables fair model comparison with the same parameter budget to identify key architectures, iii) has an open-source, easy-to-use and reproducible code infrastructure, and iv) is flexible for researchers to experiment with new theoretical ideas. As of December 2022, the GitHub repository has reached 2,000 stars and 380 forks, which demonstrates the utility of the proposed open-source framework through the wide usage by the GNN community. In this paper, we present an updated version of our benchmark with a concise presentation of the aforementioned framework characteristics, an additional medium-sized molecular dataset AQSOL, similar to the popular ZINC, but with a real-world measured chemical target, and discuss how this framework can be leveraged to explore new GNN designs and insights. As a proof of value of our benchmark, we study the case of graph positional encoding (PE) in GNNs, which was introduced with this benchmark and has since spurred interest of exploring more powerful PE for Transformers and GNNs in a robust experimental setting.
translated by 谷歌翻译
在各种图形相关的任务中出现了计算两个图之间的距离/相似性的图形相似性测量。最近的基于学习的方法缺乏可解释性,因为它们直接将两个图之间的交互信息转换为一个隐藏的向量,然后将其映射到相似性。为了解决这个问题,这项研究提出了图形相似性学习的端到端更容易解释的范式,并通过最大的常见子图推理(INFMC)命名相似性计算。我们对INFMCS的关键见解是相似性评分与最大公共子图(MCS)之间的牢固相关性。我们隐含地推断MC获得标准化的MCS大小,其监督信息仅在训练过程中的相似性得分。为了捕获更多的全局信息,我们还使用图形卷积层堆叠一些香草变压器编码层,并提出一种新颖的置换不变的节点位置编码。整个模型非常简单却有效。全面的实验表明,INFMC始终优于用于图形分类和回归任务的最先进基线。消融实验验证了提出的计算范式和其他组件的有效性。同样,结果的可视化和统计数据揭示了INFMC的解释性。
translated by 谷歌翻译
Models based on machine learning can enable accurate and fast molecular property predictions, which is of interest in drug discovery and material design. Various supervised machine learning models have demonstrated promising performance, but the vast chemical space and the limited availability of property labels make supervised learning challenging. Recently, unsupervised transformer-based language models pretrained on a large unlabelled corpus have produced state-of-the-art results in many downstream natural language processing tasks. Inspired by this development, we present molecular embeddings obtained by training an efficient transformer encoder model, MoLFormer, which uses rotary positional embeddings. This model employs a linear attention mechanism, coupled with highly distributed training, on SMILES sequences of 1.1 billion unlabelled molecules from the PubChem and ZINC datasets. We show that the learned molecular representation outperforms existing baselines, including supervised and self-supervised graph neural networks and language models, on several downstream tasks from ten benchmark datasets. They perform competitively on two others. Further analyses, specifically through the lens of attention, demonstrate that MoLFormer trained on chemical SMILES indeed learns the spatial relationships between atoms within a molecule. These results provide encouraging evidence that large-scale molecular language models can capture sufficient chemical and structural information to predict various distinct molecular properties, including quantum-chemical properties.
translated by 谷歌翻译
我们表明,没有图形特异性修改的标准变压器可以在理论和实践中都带来图形学习的有希望的结果。鉴于图,我们只是将所有节点和边缘视为独立的令牌,用令牌嵌入增强它们,然后将它们馈入变压器。有了适当的令牌嵌入选择,我们证明这种方法在理论上至少与不变的图形网络(2-ign)一样表达,由等效线性层组成,它已经比所有消息传播的图形神经网络(GNN)更具表现力)。当在大规模图数据集(PCQM4MV2)上接受训练时,与具有精致的图形特异性电感偏置相比,与GNN基准相比,与GNN基准相比,与GNN基准相比,与GNN基准相比,我们创造的令牌化图形变压器(Tokengt)取得了明显更好的结果。我们的实施可从https://github.com/jw9730/tokengt获得。
translated by 谷歌翻译
由于它们在元素之间代表复杂互动的能力,变压器已成为许多应用中的选择方法。然而,将变压器架构扩展到非顺序数据,例如分子,并使其对小型数据集的训练仍然是一个挑战。在这项工作中,我们引入了一种用于分子性能预测的基于变压器的架构,其能够捕获分子的几何形状。我们通过分子几何形状的初始编码来修改经典位置编码器,以及学习的门控自我关注机制。我们进一步提出了一种增强方案,用于避免通过过次分辨率的架构引起的过度拟合的分子数据。所提出的框架优于最先进的方法,同时仅基于纯机器学习,即,即该方法不包含量子化学的域知识,并且在成对原子距离旁边没有使用延伸的几何输入。
translated by 谷歌翻译
日志分析是工程师用来解决大规模软件系统故障的主要技术之一。在过去的几十年中,已经提出了许多日志分析方法来检测日志反映的系统异常。他们通常将日志事件计数或顺序日志事件作为输入,并利用机器学习算法,包括深度学习模型来检测系统异常。这些异常通常被确定为对数序列中对数事件的定量关系模式或顺序模式的违反。但是,现有方法无法利用日志事件之间的空间结构关系,从而导致潜在的错误警报和不稳定的性能。在这项研究中,我们提出了一种新型的基于图的对数异常检测方法loggd,以通过将日志序列转换为图来有效解决问题。我们利用了图形变压器神经网络的强大功能,该网络结合了图结构和基于日志异常检测的节点语义。我们在四个广泛使用的公共日志数据集上评估了建议的方法。实验结果表明,Loggd可以胜过最先进的基于定量和基于序列的方法,并在不同的窗口大小设置下实现稳定的性能。结果证实LOGGD在基于对数的异常检测中有效。
translated by 谷歌翻译
Many applications of machine learning require a model to make accurate predictions on test examples that are distributionally different from training ones, while task-specific labels are scarce during training. An effective approach to this challenge is to pre-train a model on related tasks where data is abundant, and then fine-tune it on a downstream task of interest. While pre-training has been effective in many language and vision domains, it remains an open question how to effectively use pre-training on graph datasets. In this paper, we develop a new strategy and self-supervised methods for pre-training Graph Neural Networks (GNNs). The key to the success of our strategy is to pre-train an expressive GNN at the level of individual nodes as well as entire graphs so that the GNN can learn useful local and global representations simultaneously. We systematically study pre-training on multiple graph classification datasets. We find that naïve strategies, which pre-train GNNs at the level of either entire graphs or individual nodes, give limited improvement and can even lead to negative transfer on many downstream tasks. In contrast, our strategy avoids negative transfer and improves generalization significantly across downstream tasks, leading up to 9.4% absolute improvements in ROC-AUC over non-pre-trained models and achieving state-of-the-art performance for molecular property prediction and protein function prediction.However, pre-training on graph datasets remains a hard challenge. Several key studies (
translated by 谷歌翻译
图形神经网络(GNNS)最流行的设计范例是1跳消息传递 - 反复反复从1跳邻居聚集特征。但是,1-HOP消息传递的表达能力受Weisfeiler-Lehman(1-WL)测试的界定。最近,研究人员通过同时从节点的K-Hop邻居汇总信息传递到K-HOP消息。但是,尚无分析K-Hop消息传递的表达能力的工作。在这项工作中,我们从理论上表征了K-Hop消息传递的表达力。具体而言,我们首先正式区分了两种k-hop消息传递的内核,它们在以前的作品中经常被滥用。然后,我们通过表明它比1-Hop消息传递更强大,从而表征了K-Hop消息传递的表现力。尽管具有较高的表达能力,但我们表明K-Hop消息传递仍然无法区分一些简单的常规图。为了进一步增强其表现力,我们引入了KP-GNN框架,该框架通过利用每个跳跃中的外围子图信息来改善K-HOP消息。我们证明,KP-GNN可以区分几乎所有常规图,包括一些距离常规图,这些图无法通过以前的距离编码方法来区分。实验结果验证了KP-GNN的表达能力和有效性。 KP-GNN在所有基准数据集中都取得了竞争成果。
translated by 谷歌翻译