Humans can understand and produce new utterances effortlessly, thanks to their compositional skills. Once a person learns the meaning of a new verb "dax," he or she can immediately understand the meaning of "dax twice" or "sing and dax." In this paper, we introduce the SCAN domain, consisting of a set of simple compositional navigation commands paired with the corresponding action sequences. We then test the zero-shot generalization capabilities of a variety of recurrent neural networks (RNNs) trained on SCAN with sequence-to-sequence methods. We find that RNNs can make successful zero-shot generalizations when the differences between training and test commands are small, so that they can apply "mix-and-match" strategies to solve the task. However, when generalization requires systematic compositional skills (as in the "dax" example above), RNNs fail spectacularly. We conclude with a proof-of-concept experiment in neural machine translation, suggesting that lack of systematicity might be partially responsible for neural networks' notorious training data thirst.
translated by 谷歌翻译
Recent datasets expose the lack of the systematic generalization ability in standard sequence-to-sequence models. In this work, we analyze this behavior of seq2seq models and identify two contributing factors: a lack of mutual exclusivity bias (i.e., a source sequence already mapped to a target sequence is less likely to be mapped to other target sequences), and the tendency to memorize whole examples rather than separating structures from contents. We propose two techniques to address these two issues respectively: Mutual Exclusivity Training that prevents the model from producing seen generations when facing novel, unseen examples via an unlikelihood-based loss; and prim2primX data augmentation that automatically diversifies the arguments of every syntactic function to prevent memorizing and provide a compositional inductive bias without exposing test-set data. Combining these two techniques, we show substantial empirical improvements using standard sequence-to-sequence models (LSTMs and Transformers) on two widely-used compositionality datasets: SCAN and COGS. Finally, we provide analysis characterizing the improvements as well as the remaining challenges, and provide detailed ablations of our method. Our code is available at https://github.com/owenzx/met-primaug
translated by 谷歌翻译
Deep Neural Networks (DNNs) are powerful models that have achieved excellent performance on difficult learning tasks. Although DNNs work well whenever large labeled training sets are available, they cannot be used to map sequences to sequences. In this paper, we present a general end-to-end approach to sequence learning that makes minimal assumptions on the sequence structure. Our method uses a multilayered Long Short-Term Memory (LSTM) to map the input sequence to a vector of a fixed dimensionality, and then another deep LSTM to decode the target sequence from the vector. Our main result is that on an English to French translation task from the WMT'14 dataset, the translations produced by the LSTM achieve a BLEU score of 34.8 on the entire test set, where the LSTM's BLEU score was penalized on out-of-vocabulary words. Additionally, the LSTM did not have difficulty on long sentences. For comparison, a phrase-based SMT system achieves a BLEU score of 33.3 on the same dataset. When we used the LSTM to rerank the 1000 hypotheses produced by the aforementioned SMT system, its BLEU score increases to 36.5, which is close to the previous best result on this task. The LSTM also learned sensible phrase and sentence representations that are sensitive to word order and are relatively invariant to the active and the passive voice. Finally, we found that reversing the order of the words in all source sentences (but not target sentences) improved the LSTM's performance markedly, because doing so introduced many short term dependencies between the source and the target sentence which made the optimization problem easier.
translated by 谷歌翻译
具有神经网络的顺序序列学习已成为序列预测任务的事实标准。这种方法通常使用强大的神经网络模拟本地分布,该方法可以在任意上下文上条件。虽然灵活和性能,这些模型通常需要大型数据集进行培训,并且可以在旨在测试组成概括的基准上非常失败。这项工作探讨了与准同步语法的序列到序列学习的替代,分层方法,其中目标树中的每个节点由源区中的节点传电。源和靶树木都被视为潜在的并在训练期间诱导。我们开发了语法的神经参数化,它能够在没有手动功能工程的情况下通过Combinatial规则的组合空间共享参数。我们将此潜在的神经语法应用于各种域 - 一种诊断语言导航任务,旨在测试组成泛化(扫描),样式转移和小型机器翻译,并发现它与标准基线相比表现得尊重。
translated by 谷歌翻译
在本文中,我们试图通过引入深度学习模型的句法归纳偏见来建立两所学校之间的联系。我们提出了两个归纳偏见的家族,一个家庭用于选区结构,另一个用于依赖性结构。选区归纳偏见鼓励深度学习模型使用不同的单位(或神经元)分别处理长期和短期信息。这种分离为深度学习模型提供了一种方法,可以从顺序输入中构建潜在的层次表示形式,即更高级别的表示由高级表示形式组成,并且可以分解为一系列低级表示。例如,在不了解地面实际结构的情况下,我们提出的模型学会通过根据其句法结构组成变量和运算符的表示来处理逻辑表达。另一方面,依赖归纳偏置鼓励模型在输入序列中找到实体之间的潜在关系。对于自然语言,潜在关系通常被建模为一个定向依赖图,其中一个单词恰好具有一个父节点和零或几个孩子的节点。将此约束应用于类似变压器的模型之后,我们发现该模型能够诱导接近人类专家注释的有向图,并且在不同任务上也优于标准变压器模型。我们认为,这些实验结果为深度学习模型的未来发展展示了一个有趣的选择。
translated by 谷歌翻译
Neural Machine Translation (NMT) is an end-to-end learning approach for automated translation, with the potential to overcome many of the weaknesses of conventional phrase-based translation systems. Unfortunately, NMT systems are known to be computationally expensive both in training and in translation inference -sometimes prohibitively so in the case of very large data sets and large models. Several authors have also charged that NMT systems lack robustness, particularly when input sentences contain rare words. These issues have hindered NMT's use in practical deployments and services, where both accuracy and speed are essential. In this work, we present GNMT, Google's Neural Machine Translation system, which attempts to address many of these issues. Our model consists of a deep LSTM network with 8 encoder and 8 decoder layers using residual connections as well as attention connections from the decoder network to the encoder. To improve parallelism and therefore decrease training time, our attention mechanism connects the bottom layer of the decoder to the top layer of the encoder. To accelerate the final translation speed, we employ low-precision arithmetic during inference computations. To improve handling of rare words, we divide words into a limited set of common sub-word units ("wordpieces") for both input and output. This method provides a good balance between the flexibility of "character"-delimited models and the efficiency of "word"-delimited models, naturally handles translation of rare words, and ultimately improves the overall accuracy of the system. Our beam search technique employs a length-normalization procedure and uses a coverage penalty, which encourages generation of an output sentence that is most likely to cover all the words in the source sentence. To directly optimize the translation BLEU scores, we consider refining the models by using reinforcement learning, but we found that the improvement in the BLEU scores did not reflect in the human evaluation. On the WMT'14 English-to-French and English-to-German benchmarks, GNMT achieves competitive results to state-of-the-art. Using a human side-by-side evaluation on a set of isolated simple sentences, it reduces translation errors by an average of 60% compared to Google's phrase-based production system.
translated by 谷歌翻译
Neural machine translation is a recently proposed approach to machine translation. Unlike the traditional statistical machine translation, the neural machine translation aims at building a single neural network that can be jointly tuned to maximize the translation performance. The models proposed recently for neural machine translation often belong to a family of encoder-decoders and encode a source sentence into a fixed-length vector from which a decoder generates a translation. In this paper, we conjecture that the use of a fixed-length vector is a bottleneck in improving the performance of this basic encoder-decoder architecture, and propose to extend this by allowing a model to automatically (soft-)search for parts of a source sentence that are relevant to predicting a target word, without having to form these parts as a hard segment explicitly. With this new approach, we achieve a translation performance comparable to the existing state-of-the-art phrase-based system on the task of English-to-French translation. Furthermore, qualitative analysis reveals that the (soft-)alignments found by the model agree well with our intuition.
translated by 谷歌翻译
The prevalent approach to sequence to sequence learning maps an input sequence to a variable length output sequence via recurrent neural networks. We introduce an architecture based entirely on convolutional neural networks. 1 Compared to recurrent models, computations over all elements can be fully parallelized during training to better exploit the GPU hardware and optimization is easier since the number of non-linearities is fixed and independent of the input length. Our use of gated linear units eases gradient propagation and we equip each decoder layer with a separate attention module. We outperform the accuracy of the deep LSTM setup of Wu et al. (2016) on both WMT'14 English-German and WMT'14 English-French translation at an order of magnitude faster speed, both on GPU and CPU.
translated by 谷歌翻译
受到人类掌握算术和普遍不见问题的非凡能力的启发,我们提出了一个新的数据集,提示,以研究机器在三个层面上学习可推广概念的能力:感知,语法和语义。学习代理人是从图像(即感知)等原始信号中观察到的概念,如何在结构上组合多个概念来形成有效的表达(即语法),以及如何实现概念以提供各种推理任务(即语义学),都是以弱监督的方式。以系统的概括为重点,我们仔细设计了一个五倍的测试集,以评估插值和推断学概念W.R.T.这三个级别。我们进一步设计了一些学习的分割,以测试模型是否可以快速学习新概念并将其推广到更复杂的场景。为了了解现有模型的局限性,我们通过包括RNN,Transformers和GPT-3在内的各种顺序到序列模型(以及思想提示链)进行了广泛的实验。结果表明,当前的模型仍在推断出远程句法依赖性和语义方面仍在努力。当在几次设置中使用新概念测试时,模型显示出对人级概括的显着差距。此外,我们发现通过简单地扩大数据集和模型大小来解决提示是不可行的。该策略几乎没有帮助推断语法和语义。最后,在零拍的GPT-3实验中,思想链提示链显示出令人印象深刻的结果,并显着提高了测试准确性。我们认为,拟议的数据集以及实验发现在系统概括方面引起了极大的兴趣。
translated by 谷歌翻译
在构图上概括的能力是理解只能用有限数量的单词以人类语言构建的潜在无限句子数量的关键。研究NLP模型是否具有这种能力一直是一个有趣的话题:Scan(Lake and Baroni,2018)是专门针对该物业测试的一项任务。先前的工作已经使用群体等级的神经网络实现了令人印象深刻的经验结果,该神经网络自然编码了扫描的有用感应偏置(Gordon等,2020)。受此启发,我们引入了一种新型的团体等级架构,该结构结合了一个组不变的硬对准机制。我们发现,与现有的群体等级方法相比,我们的网络结构使其能够开发出更强的白毒属性。我们还发现,在扫描任务上,它的表现优于先前的群体等级网络。我们的结果表明,将群体等级性整合到各种神经体系结构中是一种潜在的研究途径,并证明了对此类架构的理论特性进行仔细分析的价值。
translated by 谷歌翻译
The word alignment task, despite its prominence in the era of statistical machine translation (SMT), is niche and under-explored today. In this two-part tutorial, we argue for the continued relevance for word alignment. The first part provides a historical background to word alignment as a core component of the traditional SMT pipeline. We zero-in on GIZA++, an unsupervised, statistical word aligner with surprising longevity. Jumping forward to the era of neural machine translation (NMT), we show how insights from word alignment inspired the attention mechanism fundamental to present-day NMT. The second part shifts to a survey approach. We cover neural word aligners, showing the slow but steady progress towards surpassing GIZA++ performance. Finally, we cover the present-day applications of word alignment, from cross-lingual annotation projection, to improving translation.
translated by 谷歌翻译
当前的语言模型可以产生高质量的文本。他们只是复制他们之前看到的文本,或者他们学习了普遍的语言抽象吗?要取笑这些可能性,我们介绍了乌鸦,这是一套评估生成文本的新颖性,专注于顺序结构(n-gram)和句法结构。我们将这些分析应用于四种神经语言模型(LSTM,变压器,变换器-XL和GPT-2)。对于本地结构 - 例如,单个依赖性 - 模型生成的文本比来自每个模型的测试集的人类生成文本的基线显着不那么新颖。对于大规模结构 - 例如,总句结构 - 模型生成的文本与人生成的基线一样新颖甚至更新颖,但模型仍然有时复制,在某些情况下,在训练集中重复超过1000字超过1,000字的通道。我们还表现了广泛的手动分析,表明GPT-2的新文本通常在形态学和语法中形成良好,但具有合理的语义问题(例如,是自相矛盾)。
translated by 谷歌翻译
我用Hunglish2语料库训练神经电脑翻译任务的模型。这项工作的主要贡献在培训NMT模型期间评估不同的数据增强方法。我提出了5种不同的增强方法,这些方法是结构感知的,这意味着而不是随机选择用于消隐或替换的单词,句子的依赖树用作增强的基础。我首先关于神经网络的详细文献综述,顺序建模,神经机翻译,依赖解析和数据增强。经过详细的探索性数据分析和Hunglish2语料库的预处理之后,我使用所提出的数据增强技术进行实验。匈牙利语的最佳型号达到了33.9的BLEU得分,而英国匈牙利最好的模型达到了28.6的BLEU得分。
translated by 谷歌翻译
传统上,组成性被理解为语言生产力和更广泛的人类认知的主要因素。然而,最近,一些研究开始质疑其状态,表明即使没有明显的组成行为,人工神经网络也擅长概括。我们认为其中一些结论太强和/或不完整。在两个代理通信游戏的背景下,我们表明,当在适当的数据集上进行评估时,合成性确实对于成功的概括至关重要。
translated by 谷歌翻译
深度学习模型概括到分销数据很好,但扭动概括为合作方式,即结合一组学习的原语来解决更复杂的任务。以顺序到序列(SEQ2SEQ)学习,变压器通常无法预测比在训练中看到的更长示例的正确输出。本文介绍了迭代解码,SEQ2SEQ的替代方案(i)改善了PCFG和笛卡尔产品数据集中的变压器组成概括和(ii)在这些数据集中的证据中,SEQ2Seq变压器不学习未展开的迭代。在迭代解码中,训练示例被分解为变压器迭代地学习的一系列中间步骤。在推断时间下,中间输出被馈送回变压器,直到预测迭代令牌结束令牌。我们通过说明CFQ数据集中的迭代解码的一些限制来得出结论。
translated by 谷歌翻译
The rapid advancement of AI technology has made text generation tools like GPT-3 and ChatGPT increasingly accessible, scalable, and effective. This can pose serious threat to the credibility of various forms of media if these technologies are used for plagiarism, including scientific literature and news sources. Despite the development of automated methods for paraphrase identification, detecting this type of plagiarism remains a challenge due to the disparate nature of the datasets on which these methods are trained. In this study, we review traditional and current approaches to paraphrase identification and propose a refined typology of paraphrases. We also investigate how this typology is represented in popular datasets and how under-representation of certain types of paraphrases impacts detection capabilities. Finally, we outline new directions for future research and datasets in the pursuit of more effective paraphrase detection using AI.
translated by 谷歌翻译
可以复发性神经网络,灵感来自人类顺序数据处理,学会了解语言吗?我们构建简化的数据集反映了自然语言的核心属性,如同正式语法和语义中的建模:递归句法结构和组成性。我们发现LSTM和GRU网络概括到组成解释良好,但只有在最有利的学习环境中,具有良好的课程,广泛的培训数据和左右(但不是左右)的组成。
translated by 谷歌翻译
我们提出了一项合成任务,乐高(学习平等和小组操作),该任务封装了遵循推理链的问题,我们研究了变压器体系结构如何学习这项任务。我们特别注意数据效应,例如预处理(看似无关的NLP任务)和数据集组成(例如,训练和测试时间时的链长度不同),以及体系结构变体,例如重量绑定层或添加卷积组件。我们研究了受过训练的模型最终如何在任务中取得成功,尤其是我们能够在某种程度上(一定程度地)理解一些注意力头以及网络中的信息如何流动。基于这些观察结果,我们提出了一个假设,即在这里进行预训练仅是因为是智能初始化而不是网络中存储的深层知识。我们还观察到,在某些数据制度中,受过训练的变压器发现“快捷方式”解决方案遵循推理链,这阻碍了该模型将其推广到主要任务的简单变体的能力,而且我们发现人们可以防止适当的快捷方式架构修改或仔细的数据准备。在我们的发现的激励下,我们开始探索学习执行C程序的任务,在此过程中,对变压器进行了卷积修改,即在密钥/查询/值图中添加卷积结构,显示出令人鼓舞的优势。
translated by 谷歌翻译
Recent progress in artificial intelligence (AI) has renewed interest in building systems that learn and think like people. Many advances have come from using deep neural networks trained end-to-end in tasks such as object recognition, video games, and board games, achieving performance that equals or even beats humans in some respects. Despite their biological inspiration and performance achievements, these systems differ from human intelligence in crucial ways. We review progress in cognitive science suggesting that truly human-like learning and thinking machines will have to reach beyond current engineering trends in both what they learn, and how they learn it. Specifically, we argue that these machines should (a) build causal models of the world that support explanation and understanding, rather than merely solving pattern recognition problems; (b) ground learning in intuitive theories of physics and psychology, to support and enrich the knowledge that is learned; and (c) harness compositionality and learning-to-learn to rapidly acquire and generalize knowledge to new tasks and situations. We suggest concrete challenges and promising routes towards these goals that can combine the strengths of recent neural network advances with more structured cognitive models.
translated by 谷歌翻译
Neural machine translation is a relatively new approach to statistical machine translation based purely on neural networks. The neural machine translation models often consist of an encoder and a decoder. The encoder extracts a fixed-length representation from a variable-length input sentence, and the decoder generates a correct translation from this representation. In this paper, we focus on analyzing the properties of the neural machine translation using two models; RNN Encoder-Decoder and a newly proposed gated recursive convolutional neural network. We show that the neural machine translation performs relatively well on short sentences without unknown words, but its performance degrades rapidly as the length of the sentence and the number of unknown words increase. Furthermore, we find that the proposed gated recursive convolutional network learns a grammatical structure of a sentence automatically.
translated by 谷歌翻译