Recent pre-trained language models have shown promising capabilities in generating fluent and realistic natural language text. However, generating multi-sentence text with global content planning has been a long-existing research question. Current approaches for controlled text generation can hardly address this issue, as they usually condition on single known control attributes. In this study, we propose a low-cost yet effective framework which explicitly models the global content plan of the generated text. Specifically, it optimizes the joint distribution of the natural language sequence and the global content plan in a plug-and-play manner. We conduct extensive experiments on the well-established Recipe1M+ benchmark. Both automatic and human evaluations verify that our model achieves the state-of-the-art performance on the task of recipe generation
translated by 谷歌翻译
神经网络语言模型的最新进展表明,通过利用大规模自然语言数据中的语言关联来得出表达意义表示。这些潜在的格式塔表示已实现许多实际应用的最新性能。看来我们正处于经验得出强大而表达的可计算语义的途径。出现的一个关键问题是,仅语言数据才能使计算机能够理解有关物理世界的必要真相?必须关注这个问题,因为我们与智能机器的未来相互作用取决于我们的技术正确地表示和处理人类通常观察到的概念(对象,属性和过程)。在审查了现有协议之后,这项工作的目的是使用新颖且严格控制的推理测试探索这个问题,并突出显示哪些模型可能直接从纯语言数据中学习。
translated by 谷歌翻译
端到端的语音到文本翻译模型通常使用预训练的语音编码器和预训练的文本解码器初始化。这导致了预训练和微调之间的显着训练差距,这在很大程度上是由于语音输出与解码器的文本输入之间的形式差异。在这项工作中,我们旨在弥合语音和文本之间的方式差距,以提高翻译质量。我们提出了一种基于变压器的新型模块M-Adapter,以使语音表示为文本。在缩小语音序列的同时,M-ADAPTER通过建模语音序列的全局和局部依赖性产生了对语音到文本翻译所需的特征。我们的实验结果表明,我们的模型在必要的基线上优于强大的基线,最高1个BLEU得分在必要时$ \ rightarrow $ de DataSet。\ footNote {我们的代码可在https://github.com/mingzi151/w2v2-v2-v2--proce上获得。英石。}
translated by 谷歌翻译
屏蔽语言模型(MLMS),如BERT和ROBERTA,在过去几年中彻底改变了自然语言理解领域。然而,现有的预先训练的MLMS通常输出令牌表示的各向异性分布,其占据整个表示空间的窄子集。这些令牌表示不理想,特别是对于要求不同令牌的判别语义含义的任务。在这项工作中,我们提出了TACL(令牌感知的对比学习),这是一种新的持续预训练方法,鼓励伯特来学习令牌陈述的各向同性和鉴别分布。TACL完全无监督,无需其他数据。我们在广泛的英语和中国基准上广泛地测试了我们的方法。结果表明,TACL通过原始BERT模型带来一致和显着的改进。此外,我们进行了详细的分析,以揭示我们方法的优点和内在运作。
translated by 谷歌翻译
Finding and localizing the conceptual changes in two scenes in terms of the presence or removal of objects in two images belonging to the same scene at different times in special care applications is of great significance. This is mainly due to the fact that addition or removal of important objects for some environments can be harmful. As a result, there is a need to design a program that locates these differences using machine vision. The most important challenge of this problem is the change in lighting conditions and the presence of shadows in the scene. Therefore, the proposed methods must be resistant to these challenges. In this article, a method based on deep convolutional neural networks using transfer learning is introduced, which is trained with an intelligent data synthesis process. The results of this method are tested and presented on the dataset provided for this purpose. It is shown that the presented method is more efficient than other methods and can be used in a variety of real industrial environments.
translated by 谷歌翻译
Transformers have recently gained attention in the computer vision domain due to their ability to model long-range dependencies. However, the self-attention mechanism, which is the core part of the Transformer model, usually suffers from quadratic computational complexity with respect to the number of tokens. Many architectures attempt to reduce model complexity by limiting the self-attention mechanism to local regions or by redesigning the tokenization process. In this paper, we propose DAE-Former, a novel method that seeks to provide an alternative perspective by efficiently designing the self-attention mechanism. More specifically, we reformulate the self-attention mechanism to capture both spatial and channel relations across the whole feature dimension while staying computationally efficient. Furthermore, we redesign the skip connection path by including the cross-attention module to ensure the feature reusability and enhance the localization power. Our method outperforms state-of-the-art methods on multi-organ cardiac and skin lesion segmentation datasets without requiring pre-training weights. The code is publicly available at https://github.com/mindflow-institue/DAEFormer.
translated by 谷歌翻译
The stock market prediction has been a traditional yet complex problem researched within diverse research areas and application domains due to its non-linear, highly volatile and complex nature. Existing surveys on stock market prediction often focus on traditional machine learning methods instead of deep learning methods. Deep learning has dominated many domains, gained much success and popularity in recent years in stock market prediction. This motivates us to provide a structured and comprehensive overview of the research on stock market prediction focusing on deep learning techniques. We present four elaborated subtasks of stock market prediction and propose a novel taxonomy to summarize the state-of-the-art models based on deep neural networks from 2011 to 2022. In addition, we also provide detailed statistics on the datasets and evaluation metrics commonly used in the stock market. Finally, we highlight some open issues and point out several future directions by sharing some new perspectives on stock market prediction.
translated by 谷歌翻译
Existing training criteria in automatic speech recognition(ASR) permit the model to freely explore more than one time alignments between the feature and label sequences. In this paper, we use entropy to measure a model's uncertainty, i.e. how it chooses to distribute the probability mass over the set of allowed alignments. Furthermore, we evaluate the effect of entropy regularization in encouraging the model to distribute the probability mass only on a smaller subset of allowed alignments. Experiments show that entropy regularization enables a much simpler decoding method without sacrificing word error rate, and provides better time alignment quality.
translated by 谷歌翻译
Climate change has increased the intensity, frequency, and duration of extreme weather events and natural disasters across the world. While the increased data on natural disasters improves the scope of machine learning (ML) in this field, progress is relatively slow. One bottleneck is the lack of benchmark datasets that would allow ML researchers to quantify their progress against a standard metric. The objective of this short paper is to explore the state of benchmark datasets for ML tasks related to natural disasters, categorizing them according to the disaster management cycle. We compile a list of existing benchmark datasets introduced in the past five years. We propose a web platform - NADBenchmarks - where researchers can search for benchmark datasets for natural disasters, and we develop a preliminary version of such a platform using our compiled list. This paper is intended to aid researchers in finding benchmark datasets to train their ML models on, and provide general directions for topics where they can contribute new benchmark datasets.
translated by 谷歌翻译
We present a new algorithm to learn a deep neural network model robust against adversarial attacks. Previous algorithms demonstrate an adversarially trained Bayesian Neural Network (BNN) provides improved robustness. We recognize the adversarial learning approach for approximating the multi-modal posterior distribution of a Bayesian model can lead to mode collapse; consequently, the model's achievements in robustness and performance are sub-optimal. Instead, we first propose preventing mode collapse to better approximate the multi-modal posterior distribution. Second, based on the intuition that a robust model should ignore perturbations and only consider the informative content of the input, we conceptualize and formulate an information gain objective to measure and force the information learned from both benign and adversarial training instances to be similar. Importantly. we prove and demonstrate that minimizing the information gain objective allows the adversarial risk to approach the conventional empirical risk. We believe our efforts provide a step toward a basis for a principled method of adversarially training BNNs. Our model demonstrate significantly improved robustness--up to 20%--compared with adversarial training and Adv-BNN under PGD attacks with 0.035 distortion on both CIFAR-10 and STL-10 datasets.
translated by 谷歌翻译