There is increasing adoption of artificial intelligence in drug discovery. However, existing works use machine learning to mainly utilize the chemical structures of molecules yet ignore the vast textual knowledge available in chemistry. Incorporating textual knowledge enables us to realize new drug design objectives, adapt to text-based instructions, and predict complex biological activities. We present a multi-modal molecule structure-text model, MoleculeSTM, by jointly learning molecule's chemical structures and textual descriptions via a contrastive learning strategy. To train MoleculeSTM, we construct the largest multi-modal dataset to date, namely PubChemSTM, with over 280K chemical structure-text pairs. To demonstrate the effectiveness and utility of MoleculeSTM, we design two challenging zero-shot tasks based on text instructions, including structure-text retrieval and molecule editing. MoleculeSTM possesses two main properties: open vocabulary and compositionality via natural language. In experiments, MoleculeSTM obtains the state-of-the-art generalization ability to novel biochemical concepts across various benchmarks.
translated by 谷歌翻译
We explore the downstream task performances for graph neural network (GNN) self-supervised learning (SSL) methods trained on subgraphs extracted from relational databases (RDBs). Intuitively, this joint use of SSL and GNNs should allow to leverage more of the available data, which could translate to better results. However, we found that naively porting contrastive SSL techniques can cause ``negative transfer'': linear evaluation on fixed representations from a pretrained model performs worse than on representations from the randomly-initialized model. Based on the conjecture that contrastive SSL conflicts with the message passing layers of the GNN, we propose InfoNode: a contrastive loss aiming to maximize the mutual information between a node's initial- and final-layer representation. The primary empirical results support our conjecture and the effectiveness of InfoNode.
translated by 谷歌翻译
由于标记的分子数量有限,预处理分子表示在药物和材料发现中的应用至关重要,但是大多数现有工作都集中在2D分子图上进行预处理。然而,对3D几何结构进行预处理的力量已经较少探索,因此难以找到足够的代理任务,以增强预训练的能力,从而有效地从几何结构中提取基本特征。由3D分子的动态性质激励,其中3D欧几里得空间中分子的连续运动形成平滑的势能表面,我们提出了一个3D坐标,以降级预处理框架来建模这种能量景观。利用SE(3) - 激烈的得分匹配方法,我们提出了SE(3)-DDM,其中坐标定位代理任务有效地归结为分子中成对原子距离的脱氧。我们的全面实验证实了我们提出的方法的有效性和鲁棒性。
translated by 谷歌翻译
图表自学学习(GSSL)铺平了无需专家注释的学习图嵌入的方式,这对分子图特别有影响,因为可能的分子数量很大,并且标签昂贵。但是,通过设计,GSSL方法没有经过训练,可以在一个下游任务上表现良好,而是旨在将其转移到许多人方面,从而使评估不那么直接。作为获得具有多种多样且可解释属性的分子图嵌入曲线的一步,我们引入了分子图表示评估(Molgrapheval),这是一组探针任务,分为(i)拓扑 - ,(ii)子结构 - 和(iii)和(iii)嵌入空间属性。通过对现有下游数据集和Molgrapheval上的现有GSSL方法进行基准测试,我们发现单独从现有数据集中得出的结论与更细粒度的探测之间存在令人惊讶的差异,这表明当前的评估协议没有提供整个图片。我们的模块化,自动化的端到端GSSL管道代码将在接受后发布,包括标准化的图形加载,实验管理和嵌入评估。
translated by 谷歌翻译
图神经网络〜(GNNS)是用于图表学习的有效工具。大多数GNN依靠递归邻里聚合方案,称为消息传递,因此其理论表达力仅限于第一阶Weisfeiler-Lehman测试(1-WL)。受到基于检索的模型和现成的高性能检索系统的成功的激励,我们提出了一种称为GraphRetReval的非参数和模型 - 敏捷方案,以增强现有的GNN模型。在GraphRetRieval中,与其地面真实标签相关的类似训练图被检索为可以与输入图表示共同利用的增强功能,以完成各种图形属性预测任务。特别是,为了有效地从检索的图中“吸收”有用的信息,并“忽略”可能的噪声,我们引入了基于自我注意的适配器,以明确了解输入图与其检索到的类似图之间的相互作用。通过在12个不同的数据集上尝试三个经典的GNN模型,我们证明了GraphRetReval能够为现有GNN模型带来实质性改进,而无需包括模型大小和预测效率。我们的工作还首先验证了检索增强图神经网络的可行性和有效性。
translated by 谷歌翻译
图形神经网络(GNN)已被证明具有强大的表示能力,可以利用该图形在图形结构数据(例如分子和社交网络)上的下游预测任务。他们通常通过从单个顶点的$ K $ - 霍普社区或图表中的枚举步行中汇总信息来学习表示形式。先前的研究表明,将加权方案纳入GNN的有效性。但是,到目前为止,这主要仅限于$ k $ hop的社区GNNS。在本文中,我们旨在设计一种将加权方案纳入步行式GNN并分析其效果的算法。我们提出了一种称为Aware的新型GNN模型,该模型使用注意方案汇总了有关图中的步行的信息。这导致了在标准设置中用于图形预测任务的端到端监督学习方法,其中输入是图形的邻接和顶点信息,并且输出是图形的预测标签。然后,我们对Aware进行理论,经验和解释性分析。我们在简化设置中的理论分析确定了可证明的保证的成功条件,证明了图表信息如何在表示中编码,以及意识中的加权方案如何影响表示和学习绩效。我们的实验表明,在分子财产预测和社交网络领域的标准设置中,在图形预测任务中意识到的强劲表现。最后,我们的解释研究表明,意识到可以成功捕获输入图的重要子结构。该代码可在$ \ href {https://github.com/mehmetfdemirel/aware} {github} $上获得。
translated by 谷歌翻译
Transformer, originally devised for natural language processing, has also attested significant success in computer vision. Thanks to its super expressive power, researchers are investigating ways to deploy transformers to reinforcement learning (RL) and the transformer-based models have manifested their potential in representative RL benchmarks. In this paper, we collect and dissect recent advances on transforming RL by transformer (transformer-based RL or TRL), in order to explore its development trajectory and future trend. We group existing developments in two categories: architecture enhancement and trajectory optimization, and examine the main applications of TRL in robotic manipulation, text-based games, navigation and autonomous driving. For architecture enhancement, these methods consider how to apply the powerful transformer structure to RL problems under the traditional RL framework, which model agents and environments much more precisely than deep RL methods, but they are still limited by the inherent defects of traditional RL algorithms, such as bootstrapping and "deadly triad". For trajectory optimization, these methods treat RL problems as sequence modeling and train a joint state-action model over entire trajectories under the behavior cloning framework, which are able to extract policies from static datasets and fully use the long-sequence modeling capability of the transformer. Given these advancements, extensions and challenges in TRL are reviewed and proposals about future direction are discussed. We hope that this survey can provide a detailed introduction to TRL and motivate future research in this rapidly developing field.
translated by 谷歌翻译
本文提出了一种基于中风的新型渲染方法(SBR)方法,该方法将图像转化为生动的油画。以前的SBR技术通常将油画问题作为像素近似。与这种技术路线不同,我们将油漆的创造视为一种自适应抽样问题。首先,我们根据输入图像的纹理复杂性计算概率密度图。然后,我们使用Voronoi算法将一组像素作为中风锚进行采样。接下来,我们在每个锚点上搜索并生成一个单独的石油冲程。最后,我们将所有笔触都放在画布上以获取油画。通过调整高参数的最大抽样概率,我们可以以线性方式控制油漆的细度。与现有的最先进的油画技术进行比较表明,我们的结果具有更高的保真度和更现实的纹理。用户意见测试表明,与其他方法的结果相比,人们对我们的油画的行为更加偏爱。更有趣的结果和代码在https://github.com/tzysjtu/im2oil中。
translated by 谷歌翻译
许多现有的自动驾驶范式涉及多个任务的多个阶段离散管道。为了更好地预测控制信号并增强用户安全性,希望从联合时空特征学习中受益的端到端方法是可取的。尽管基于激光雷达的输入或隐式设计有一些开创性的作品,但在本文中,我们在可解释的基于视觉的设置中提出了问题。特别是,我们提出了一种空间性特征学习方案,以同时同时进行感知,预测和计划任务的一组更具代表性的特征,称为ST-P3。具体而言,提出了一种以自我为中心的积累技术来保留3D空间中的几何信息,然后才能感知鸟类视图转化。设计了双重途径建模,以考虑将来的预测,以将过去的运动变化考虑到过去。引入了基于时间的精炼单元,以弥补识别基于视觉的计划的元素。据我们所知,我们是第一个系统地研究基于端视力的自主驾驶系统的每个部分。我们在开环Nuscenes数据集和闭环CARLA模拟上对以前的最先进的方法进行基准测试。结果显示了我们方法的有效性。源代码,模型和协议详细信息可在https://github.com/openperceptionx/st-p3上公开获得。
translated by 谷歌翻译
In this paper, we propose a robust 3D detector, named Cross Modal Transformer (CMT), for end-to-end 3D multi-modal detection. Without explicit view transformation, CMT takes the image and point clouds tokens as inputs and directly outputs accurate 3D bounding boxes. The spatial alignment of multi-modal tokens is performed implicitly, by encoding the 3D points into multi-modal features. The core design of CMT is quite simple while its performance is impressive. CMT obtains 73.0% NDS on nuScenes benchmark. Moreover, CMT has a strong robustness even if the LiDAR is missing. Code will be released at https://github.com/junjie18/CMT.
translated by 谷歌翻译