There is increasing adoption of artificial intelligence in drug discovery. However, existing works use machine learning to mainly utilize the chemical structures of molecules yet ignore the vast textual knowledge available in chemistry. Incorporating textual knowledge enables us to realize new drug design objectives, adapt to text-based instructions, and predict complex biological activities. We present a multi-modal molecule structure-text model, MoleculeSTM, by jointly learning molecule's chemical structures and textual descriptions via a contrastive learning strategy. To train MoleculeSTM, we construct the largest multi-modal dataset to date, namely PubChemSTM, with over 280K chemical structure-text pairs. To demonstrate the effectiveness and utility of MoleculeSTM, we design two challenging zero-shot tasks based on text instructions, including structure-text retrieval and molecule editing. MoleculeSTM possesses two main properties: open vocabulary and compositionality via natural language. In experiments, MoleculeSTM obtains the state-of-the-art generalization ability to novel biochemical concepts across various benchmarks.
translated by 谷歌翻译
We explore the downstream task performances for graph neural network (GNN) self-supervised learning (SSL) methods trained on subgraphs extracted from relational databases (RDBs). Intuitively, this joint use of SSL and GNNs should allow to leverage more of the available data, which could translate to better results. However, we found that naively porting contrastive SSL techniques can cause ``negative transfer'': linear evaluation on fixed representations from a pretrained model performs worse than on representations from the randomly-initialized model. Based on the conjecture that contrastive SSL conflicts with the message passing layers of the GNN, we propose InfoNode: a contrastive loss aiming to maximize the mutual information between a node's initial- and final-layer representation. The primary empirical results support our conjecture and the effectiveness of InfoNode.
translated by 谷歌翻译
Large language models (LLMs) have been shown to be able to perform new tasks based on a few demonstrations or natural language instructions. While these capabilities have led to widespread adoption, most LLMs are developed by resource-rich organizations and are frequently kept from the public. As a step towards democratizing this powerful technology, we present BLOOM, a 176B-parameter open-access language model designed and built thanks to a collaboration of hundreds of researchers. BLOOM is a decoder-only Transformer language model that was trained on the ROOTS corpus, a dataset comprising hundreds of sources in 46 natural and 13 programming languages (59 in total). We find that BLOOM achieves competitive performance on a wide variety of benchmarks, with stronger results after undergoing multitask prompted finetuning. To facilitate future research and applications using LLMs, we publicly release our models and code under the Responsible AI License.
translated by 谷歌翻译
Formulating and answering logical queries is a standard communication interface for knowledge graphs (KGs). Alleviating the notorious incompleteness of real-world KGs, neural methods achieved impressive results in link prediction and complex query answering tasks by learning representations of entities, relations, and queries. Still, most existing query answering methods rely on transductive entity embeddings and cannot generalize to KGs containing new entities without retraining the entity embeddings. In this work, we study the inductive query answering task where inference is performed on a graph containing new entities with queries over both seen and unseen entities. To this end, we devise two mechanisms leveraging inductive node and relational structure representations powered by graph neural networks (GNNs). Experimentally, we show that inductive models are able to perform logical reasoning at inference time over unseen nodes generalizing to graphs up to 500% larger than training ones. Exploring the efficiency--effectiveness trade-off, we find the inductive relational structure representation method generally achieves higher performance, while the inductive node representation method is able to answer complex queries in the inference-only regime without any training on queries and scales to graphs of millions of nodes. Code is available at https://github.com/DeepGraphLearning/InductiveQE.
translated by 谷歌翻译
大多数图形神经网络(GNN)通过学习输入图和标签之间的相关性来预测看不见的图的标签。但是,通过对具有严重偏见的训练图进行图形分类调查,我们发现GNN始终倾向于探索伪造的相关性以做出决定,即使因果关系始终存在。这意味着在此类偏见的数据集中接受培训的现有GNN将遭受概括能力差。通过在因果观点中分析此问题,我们发现从偏见图中解开和去偏置因果和偏见的潜在变量对于偏见至关重要。在此鼓舞下,我们提出了一个普遍的分解GNN框架,分别学习因果子结构和偏见子结构。特别是,我们设计了一个参数化的边蒙版生成器,以将输入图明确分为因果和偏置子图。然后,分别由因果/偏见感知损失函数监督的两个GNN模块进行培训,以编码因果关系和偏置子图表中的相应表示。通过分离的表示,我们合成了反事实无偏的训练样本,以进一步脱离因果变量和偏见变量。此外,为了更好地基于严重的偏见问题,我们构建了三个新的图形数据集,这些数据集具有可控的偏置度,并且更容易可视化和解释。实验结果很好地表明,我们的方法比现有基线实现了优越的概括性能。此外,由于学习的边缘面膜,该拟议的模型具有吸引人的解释性和可转让性。代码和数据可在以下网址获得:https://github.com/googlebaba/disc。
translated by 谷歌翻译
最近,稀疏培训已成为有希望的范式,可在边缘设备上有效地深入学习。当前的研究主要致力于通过进一步增加模型稀疏性来降低培训成本。但是,增加的稀疏性并不总是理想的,因为它不可避免地会在极高的稀疏度下引入严重的准确性降解。本文打算探索其他可能的方向,以有效,有效地降低稀疏培训成本,同时保持准确性。为此,我们研究了两种技术,即层冻结和数据筛分。首先,层冻结方法在密集的模型训练和微调方面取得了成功,但在稀疏训练域中从未采用过。然而,稀疏训练的独特特征可能会阻碍层冻结技术的结合。因此,我们分析了在稀疏培训中使用层冻结技术的可行性和潜力,并发现它有可能节省大量培训成本。其次,我们提出了一种用于数据集有效培训的数据筛分方法,该方法通过确保在整个培训过程中仅使用部分数据集来进一步降低培训成本。我们表明,这两种技术都可以很好地整合到稀疏训练算法中,以形成一个通用框架,我们将其配置为SPFDE。我们的广泛实验表明,SPFDE可以显着降低培训成本,同时从三个维度中保留准确性:重量稀疏性,层冻结和数据集筛分。
translated by 谷歌翻译
本文回顾了AIM 2022上压缩图像和视频超级分辨率的挑战。这项挑战包括两条曲目。轨道1的目标是压缩图像的超分辨率,轨迹〜2靶向压缩视频的超分辨率。在轨道1中,我们使用流行的数据集DIV2K作为培训,验证和测试集。在轨道2中,我们提出了LDV 3.0数据集,其中包含365个视频,包括LDV 2.0数据集(335个视频)和30个其他视频。在这一挑战中,有12支球队和2支球队分别提交了赛道1和赛道2的最终结果。所提出的方法和解决方案衡量了压缩图像和视频上超分辨率的最先进。提出的LDV 3.0数据集可在https://github.com/renyang-home/ldv_dataset上找到。此挑战的首页是在https://github.com/renyang-home/aim22_compresssr。
translated by 谷歌翻译
基于暹罗网络的跟踪器将3D单一对象跟踪作为模板和搜索区域的点特征之间的互相关学习。由于跟踪过程中模板和搜索区域之间的外观差异很大,因此如何学习它们之间的稳健跨相关性以识别搜索区域中的潜在目标仍然是一个挑战性的问题。在本文中,我们明确使用变压器形成一个3D Siamese变压器网络,以学习模板和点云的搜索区域之间的强大互相关。具体来说,我们开发了一个暹罗点变压器网络,以了解目标的形状上下文信息。它的编码器使用自我注意力来捕获点云的非本地信息来表征对象的形状信息,而解码器则利用交叉注意来提取歧视点特征。之后,我们开发了一个迭代的粗到加密相关网络,以了解模板与搜索区域之间的稳健跨相关性。它通过交叉注意将模板与搜索区域中的潜在目标联系起来,制定了交叉功能的增强。为了进一步增强潜在目标,它采用了自我功能增强,该增强功能将自我注意力应用于特征空间的本地K-NN图来汇总目标特征。 Kitti,Nuscenes和Waymo数据集的实验表明,我们的方法在3D单一对象跟踪任务上实现了最先进的性能。
translated by 谷歌翻译
从点云中检测3D对象是一项实用但充满挑战的任务,最近引起了越来越多的关注。在本文中,我们提出了针对3D对象检测的标签引导辅助训练方法(LG3D),该方法是增强现有3D对象检测器的功能学习的辅助网络。具体而言,我们提出了两个新型模块:一个标签 - 通道诱导器,该模块诱导器将框架中的注释和点云映射到特定于任务的表示形式和一个标签 - 知识式插曲器,该标签知识映射器有助于获得原始特征以获得检测临界表示。提出的辅助网络被推理丢弃,因此在测试时间没有额外的计算成本。我们对室内和室外数据集进行了广泛的实验,以验证我们的方法的有效性。例如,我们拟议的LG3D分别在SUN RGB-D和SCANNETV2数据集上将投票人员分别提高了2.5%和3.1%的地图。
translated by 谷歌翻译
由于文件传达了丰富的人类知识,并且通常存在于企业中,因此建筑文档的对话系统已经越来越兴趣。其中,如何理解和从文档中检索信息是一个具有挑战性的研究问题。先前的工作忽略了文档的视觉属性,并将其视为纯文本,从而导致不完整的方式。在本文中,我们提出了一个布局感知文档级信息提取数据集,以促进从视觉上丰富文档(VRD)中提取结构和语义知识的研究,以在对话系统中产生准确的响应。 Lie包含来自4,061页的产品和官方文件的三个提取任务的62K注释,成为我们最大的知识,成为最大的基于VRD的信息提取数据集。我们还开发了扩展基于令牌的语言模型的基准方法,以考虑像人类这样的布局功能。经验结果表明,布局对于基于VRD的提取至关重要,系统演示还验证了提取的知识可以帮助找到用户关心的答案。
translated by 谷歌翻译