Multimodal machine translation (MMT) aims to improve translation quality by incorporating information from other modalities, such as vision. Previous MMT systems mainly focus on better access and use of visual information and tend to validate their methods on image-related datasets. These studies face two challenges. First, they can only utilize triple data (bilingual texts with images), which is scarce; second, current benchmarks are relatively restricted and do not correspond to realistic scenarios. Therefore, this paper correspondingly establishes new methods and new datasets for MMT. First, we propose a framework 2/3-Triplet with two new approaches to enhance MMT by utilizing large-scale non-triple data: monolingual image-text data and parallel text-only data. Second, we construct an English-Chinese {e}-commercial {m}ulti{m}odal {t}ranslation dataset (including training and testing), named EMMT, where its test set is carefully selected as some words are ambiguous and shall be translated mistakenly without the help of images. Experiments show that our method is more suitable for real-world scenarios and can significantly improve translation performance by using more non-triple data. In addition, our model also rivals various SOTA models in conventional multimodal translation benchmarks.
translated by 谷歌翻译
Transformer-based models have gained large popularity and demonstrated promising results in long-term time-series forecasting in recent years. In addition to learning attention in time domain, recent works also explore learning attention in frequency domains (e.g., Fourier domain, wavelet domain), given that seasonal patterns can be better captured in these domains. In this work, we seek to understand the relationships between attention models in different time and frequency domains. Theoretically, we show that attention models in different domains are equivalent under linear conditions (i.e., linear kernel to attention scores). Empirically, we analyze how attention models of different domains show different behaviors through various synthetic experiments with seasonality, trend and noise, with emphasis on the role of softmax operation therein. Both these theoretical and empirical analyses motivate us to propose a new method: TDformer (Trend Decomposition Transformer), that first applies seasonal-trend decomposition, and then additively combines an MLP which predicts the trend component with Fourier attention which predicts the seasonal component to obtain the final prediction. Extensive experiments on benchmark time-series forecasting datasets demonstrate that TDformer achieves state-of-the-art performance against existing attention-based models.
translated by 谷歌翻译
Classifying forecasting methods as being either of a "machine learning" or "statistical" nature has become commonplace in parts of the forecasting literature and community, as exemplified by the M4 competition and the conclusion drawn by the organizers. We argue that this distinction does not stem from fundamental differences in the methods assigned to either class. Instead, this distinction is probably of a tribal nature, which limits the insights into the appropriateness and effectiveness of different forecasting methods. We provide alternative characteristics of forecasting methods which, in our view, allow to draw meaningful conclusions. Further, we discuss areas of forecasting which could benefit most from cross-pollination between the ML and the statistics communities.
translated by 谷歌翻译
Ionic Liquids (ILs) provide a promising solution for CO$_2$ capture and storage to mitigate global warming. However, identifying and designing the high-capacity IL from the giant chemical space requires expensive, and exhaustive simulations and experiments. Machine learning (ML) can accelerate the process of searching for desirable ionic molecules through accurate and efficient property predictions in a data-driven manner. But existing descriptors and ML models for the ionic molecule suffer from the inefficient adaptation of molecular graph structure. Besides, few works have investigated the explainability of ML models to help understand the learned features that can guide the design of efficient ionic molecules. In this work, we develop both fingerprint-based ML models and Graph Neural Networks (GNNs) to predict the CO$_2$ absorption in ILs. Fingerprint works on graph structure at the feature extraction stage, while GNNs directly handle molecule structure in both the feature extraction and model prediction stage. We show that our method outperforms previous ML models by reaching a high accuracy (MAE of 0.0137, $R^2$ of 0.9884). Furthermore, we take the advantage of GNNs feature representation and develop a substructure-based explanation method that provides insight into how each chemical fragments within IL molecules contribute to the CO$_2$ absorption prediction of ML models. We also show that our explanation result agrees with some ground truth from the theoretical reaction mechanism of CO$_2$ absorption in ILs, which can advise on the design of novel and efficient functional ILs in the future.
translated by 谷歌翻译
了解出版物思想的起源和影响对于进行科学研究至关重要。但是,科学出版物的扩散使研究人员难以弄清所有相关文献的演变。为此,我们介绍了Ideareader,这是一种机器阅读系统,发现哪些论文最有可能激发或受到目标出版物的影响,并以自然语言总结了这些论文的想法。具体而言,Ideareader首先将目标出版物的参考和引用(一阶或高阶)和所获得的集群视为激发或受到目标出版物影响的主题。然后,它从每个集群中挑选出重要的论文来提取想法流的骨骼。最后,Ideareader会自动生成对每个主题中重要论文的文献综述。我们的系统可以帮助研究人员深入了解科学思想如何通过自动生成的调查和想法流的可视化来从目标出版物的引用引用。
translated by 谷歌翻译
变异贝叶斯后推理通常需要简化近似值,例如平均场参数化,以确保障碍性。但是,先前的工作已将贝叶斯神经网络的变异平均场近似与小数据集或大型型号相关联。在这项工作中,我们表明,过度参数模型的可能性函数的不变函数有助于这种现象,因为这些不变通过引入离散和/或连续模式来使后验的结构复杂化,而高斯均值均不能很好地近似。特别是,我们表明平均场近似在证据下界限的额外差距与专门建造的后部相比,考虑到已知的不变。重要的是,这种不变差距并不恒定。随着近似值恢复为先验,它消失了。我们首先在线性模型中首先考虑具有单个数据点的线性模型中的翻译不变。我们表明,尽管可以从平均场参数化构建真实的后验,但仅当目标函数考虑不变性差距时才能实现。然后,我们将线性模型的分析转移到神经网络。我们的分析为将来的工作提供了一个框架,以探索解决不变性问题的解决方案。
translated by 谷歌翻译
能够从图形数据中学习表示形式的图形神经网络(GNNS)自然适合对分子系统进行建模。这篇综述介绍了GNN及其对小有机分子的各种应用。GNNS依靠消息通用操作(一种通用而强大的框架)来迭代更新节点功能。许多研究设计GNN体系结构,以有效地学习2D分子图的拓扑信息以及3D分子系统的几何信息。GNN已在各种分子应用中实施,包括分子属性预测,分子评分和对接,分子优化和从头产生,分子动力学仿真等。此外,综述还总结了最近的自我治疗学习的发展,用于带有GNN的分子。
translated by 谷歌翻译
对聚合物性质的准确预测在聚合物的开发和设计中具有重要意义。通常,需要进行昂贵且耗时的实验或模拟来评估聚合物的功能。最近,配备了注意力机制的变压器模型在各种自然语言处理任务中表现出卓越的性能。但是,这种方法尚未在聚合物科学中进行研究。在此,我们报告了TransPolymer,这是一种基于变压器的语言模型,用于聚合物属性预测。由于我们提出的具有化学意识的聚合物令牌,转染剂可以直接从聚合物序列中学习表示。该模型通过在大型未标记数据集上进行预处理,从而学习表达性表示,然后在下游数据集上进行有关各种聚合物属性的模型。转聚合物在所有八个数据集中都能达到卓越的性能,并且在大多数下游任务上都显着超过其他基线。此外,预处理的转聚合物对监督转聚合物和其他语言模型的改善增强了对代表学习中大型未标记数据预处理的显着好处。实验结果进一步证明了注意机制在理解聚合物序列中的重要作用。我们强调该模型是一种有前途的计算工具,用于促进数据科学视图中的结构 - 质谱关系。
translated by 谷歌翻译
从传统上讲,地球系统(例如天气和气候)的预测依赖于具有复杂物理模型的数值模拟,因此在计算中既昂贵又对领域专业知识的需求既昂贵。在过去十年中时空地球观察数据的爆炸性增长中,应用深度学习(DL)的数据驱动模型表明了各种地球系统预测任务的潜力。尽管在其他领域取得了广泛的成功,但作为新兴DL架构的变压器在该领域的采用量有限。在本文中,我们提出了Earthformer,这是一种用于地球系统预测的时空变压器。 Earthformer基于一个通用,灵活和有效的时空注意块,名为Cuboid的注意力。这个想法是将数据分解为立方体,并平行应用立方体级别的自我注意力。这些立方体与全球矢量的集合进一步相关。我们对MovingMnist数据集和新提出的混沌N体MNIST数据集进行了实验,以验证Cuboid注意的有效性,并找出地球形式的最佳设计。关于降水现象和El Nino/Southern振荡(ENSO)预测的两个现实基准测试的实验表明,Earthformer实现了最新的性能。
translated by 谷歌翻译
Twitter机器人检测已成为打击错误信息,促进社交媒体节制并保持在线话语的完整性的越来越重要的任务。最先进的机器人检测方法通常利用Twitter网络的图形结构,在面对传统方法无法检测到的新型Twitter机器人时,它们表现出令人鼓舞的性能。但是,现有的Twitter机器人检测数据集很少是基于图形的,即使这些基于图形的数据集也遭受有限的数据集量表,不完整的图形结构以及低注释质量。实际上,缺乏解决这些问题的大规模基于图的Twitter机器人检测基准,严重阻碍了基于图形的机器人检测方法的开发和评估。在本文中,我们提出了Twibot-22,这是一个综合基于图的Twitter机器人检测基准,它显示了迄今为止最大的数据集,在Twitter网络上提供了多元化的实体和关系,并且与现有数据集相比具有更好的注释质量。此外,我们重新实施35代表性的Twitter机器人检测基线,并在包括Twibot-22在内的9个数据集上进行评估,以促进对模型性能和对研究进度的整体了解的公平比较。为了促进进一步的研究,我们将所有实施的代码和数据集巩固到Twibot-22评估框架中,研究人员可以在其中始终如一地评估新的模型和数据集。 Twibot-22 Twitter机器人检测基准和评估框架可在https://twibot22.github.io/上公开获得。
translated by 谷歌翻译