An attentional mechanism has lately been used to improve neural machine translation (NMT) by selectively focusing on parts of the source sentence during translation. However, there has been little work exploring useful architectures for attention-based NMT. This paper examines two simple and effective classes of attentional mechanism: a global approach which always attends to all source words and a local one that only looks at a subset of source words at a time. We demonstrate the effectiveness of both approaches on the WMT translation tasks between English and German in both directions. With local attention, we achieve a significant gain of 5.0 BLEU points over non-attentional systems that already incorporate known techniques such as dropout. Our ensemble model using different attention architectures yields a new state-of-the-art result in the WMT'15 English to German translation task with 25.9 BLEU points, an improvement of 1.0 BLEU points over the existing best system backed by NMT and an n-gram reranker. 1
translated by 谷歌翻译
The prevalent approach to sequence to sequence learning maps an input sequence to a variable length output sequence via recurrent neural networks. We introduce an architecture based entirely on convolutional neural networks. 1 Compared to recurrent models, computations over all elements can be fully parallelized during training to better exploit the GPU hardware and optimization is easier since the number of non-linearities is fixed and independent of the input length. Our use of gated linear units eases gradient propagation and we equip each decoder layer with a separate attention module. We outperform the accuracy of the deep LSTM setup of Wu et al. (2016) on both WMT'14 English-German and WMT'14 English-French translation at an order of magnitude faster speed, both on GPU and CPU.
translated by 谷歌翻译
Neural Machine Translation (NMT) is an end-to-end learning approach for automated translation, with the potential to overcome many of the weaknesses of conventional phrase-based translation systems. Unfortunately, NMT systems are known to be computationally expensive both in training and in translation inference -sometimes prohibitively so in the case of very large data sets and large models. Several authors have also charged that NMT systems lack robustness, particularly when input sentences contain rare words. These issues have hindered NMT's use in practical deployments and services, where both accuracy and speed are essential. In this work, we present GNMT, Google's Neural Machine Translation system, which attempts to address many of these issues. Our model consists of a deep LSTM network with 8 encoder and 8 decoder layers using residual connections as well as attention connections from the decoder network to the encoder. To improve parallelism and therefore decrease training time, our attention mechanism connects the bottom layer of the decoder to the top layer of the encoder. To accelerate the final translation speed, we employ low-precision arithmetic during inference computations. To improve handling of rare words, we divide words into a limited set of common sub-word units ("wordpieces") for both input and output. This method provides a good balance between the flexibility of "character"-delimited models and the efficiency of "word"-delimited models, naturally handles translation of rare words, and ultimately improves the overall accuracy of the system. Our beam search technique employs a length-normalization procedure and uses a coverage penalty, which encourages generation of an output sentence that is most likely to cover all the words in the source sentence. To directly optimize the translation BLEU scores, we consider refining the models by using reinforcement learning, but we found that the improvement in the BLEU scores did not reflect in the human evaluation. On the WMT'14 English-to-French and English-to-German benchmarks, GNMT achieves competitive results to state-of-the-art. Using a human side-by-side evaluation on a set of isolated simple sentences, it reduces translation errors by an average of 60% compared to Google's phrase-based production system.
translated by 谷歌翻译
The word alignment task, despite its prominence in the era of statistical machine translation (SMT), is niche and under-explored today. In this two-part tutorial, we argue for the continued relevance for word alignment. The first part provides a historical background to word alignment as a core component of the traditional SMT pipeline. We zero-in on GIZA++, an unsupervised, statistical word aligner with surprising longevity. Jumping forward to the era of neural machine translation (NMT), we show how insights from word alignment inspired the attention mechanism fundamental to present-day NMT. The second part shifts to a survey approach. We cover neural word aligners, showing the slow but steady progress towards surpassing GIZA++ performance. Finally, we cover the present-day applications of word alignment, from cross-lingual annotation projection, to improving translation.
translated by 谷歌翻译
Neural machine translation is a recently proposed approach to machine translation. Unlike the traditional statistical machine translation, the neural machine translation aims at building a single neural network that can be jointly tuned to maximize the translation performance. The models proposed recently for neural machine translation often belong to a family of encoder-decoders and encode a source sentence into a fixed-length vector from which a decoder generates a translation. In this paper, we conjecture that the use of a fixed-length vector is a bottleneck in improving the performance of this basic encoder-decoder architecture, and propose to extend this by allowing a model to automatically (soft-)search for parts of a source sentence that are relevant to predicting a target word, without having to form these parts as a hard segment explicitly. With this new approach, we achieve a translation performance comparable to the existing state-of-the-art phrase-based system on the task of English-to-French translation. Furthermore, qualitative analysis reveals that the (soft-)alignments found by the model agree well with our intuition.
translated by 谷歌翻译
Deep Neural Networks (DNNs) are powerful models that have achieved excellent performance on difficult learning tasks. Although DNNs work well whenever large labeled training sets are available, they cannot be used to map sequences to sequences. In this paper, we present a general end-to-end approach to sequence learning that makes minimal assumptions on the sequence structure. Our method uses a multilayered Long Short-Term Memory (LSTM) to map the input sequence to a vector of a fixed dimensionality, and then another deep LSTM to decode the target sequence from the vector. Our main result is that on an English to French translation task from the WMT'14 dataset, the translations produced by the LSTM achieve a BLEU score of 34.8 on the entire test set, where the LSTM's BLEU score was penalized on out-of-vocabulary words. Additionally, the LSTM did not have difficulty on long sentences. For comparison, a phrase-based SMT system achieves a BLEU score of 33.3 on the same dataset. When we used the LSTM to rerank the 1000 hypotheses produced by the aforementioned SMT system, its BLEU score increases to 36.5, which is close to the previous best result on this task. The LSTM also learned sensible phrase and sentence representations that are sensitive to word order and are relatively invariant to the active and the passive voice. Finally, we found that reversing the order of the words in all source sentences (but not target sentences) improved the LSTM's performance markedly, because doing so introduced many short term dependencies between the source and the target sentence which made the optimization problem easier.
translated by 谷歌翻译
Neural Machine Translation (NMT) has obtained state-of-the art performance for several language pairs, while only using parallel data for training. Targetside monolingual data plays an important role in boosting fluency for phrasebased statistical machine translation, and we investigate the use of monolingual data for NMT. In contrast to previous work, which combines NMT models with separately trained language models, we note that encoder-decoder NMT architectures already have the capacity to learn the same information as a language model, and we explore strategies to train with monolingual data without changing the neural network architecture. By pairing monolingual training data with an automatic backtranslation, we can treat it as additional parallel training data, and we obtain substantial improvements on the WMT 15 task English↔German (+2.8-3.7 BLEU), and for the low-resourced IWSLT 14 task Turkish→English (+2.1-3.4 BLEU), obtaining new state-of-the-art results. We also show that fine-tuning on in-domain monolingual and parallel data gives substantial improvements for the IWSLT 15 task English→German.
translated by 谷歌翻译
In this paper, we propose a novel neural network model called RNN Encoder-Decoder that consists of two recurrent neural networks (RNN). One RNN encodes a sequence of symbols into a fixedlength vector representation, and the other decodes the representation into another sequence of symbols. The encoder and decoder of the proposed model are jointly trained to maximize the conditional probability of a target sequence given a source sequence. The performance of a statistical machine translation system is empirically found to improve by using the conditional probabilities of phrase pairs computed by the RNN Encoder-Decoder as an additional feature in the existing log-linear model. Qualitatively, we show that the proposed model learns a semantically and syntactically meaningful representation of linguistic phrases.
translated by 谷歌翻译
Neural machine translation (NMT) models typically operate with a fixed vocabulary, but translation is an open-vocabulary problem. Previous work addresses the translation of out-of-vocabulary words by backing off to a dictionary. In this paper, we introduce a simpler and more effective approach, making the NMT model capable of open-vocabulary translation by encoding rare and unknown words as sequences of subword units. This is based on the intuition that various word classes are translatable via smaller units than words, for instance names (via character copying or transliteration), compounds (via compositional translation), and cognates and loanwords (via phonological and morphological transformations). We discuss the suitability of different word segmentation techniques, including simple character ngram models and a segmentation based on the byte pair encoding compression algorithm, and empirically show that subword models improve over a back-off dictionary baseline for the WMT 15 translation tasks English→German and English→Russian by up to 1.1 and 1.3 BLEU, respectively.
translated by 谷歌翻译
Neural machine translation is a relatively new approach to statistical machine translation based purely on neural networks. The neural machine translation models often consist of an encoder and a decoder. The encoder extracts a fixed-length representation from a variable-length input sentence, and the decoder generates a correct translation from this representation. In this paper, we focus on analyzing the properties of the neural machine translation using two models; RNN Encoder-Decoder and a newly proposed gated recursive convolutional neural network. We show that the neural machine translation performs relatively well on short sentences without unknown words, but its performance degrades rapidly as the length of the sentence and the number of unknown words increase. Furthermore, we find that the proposed gated recursive convolutional network learns a grammatical structure of a sentence automatically.
translated by 谷歌翻译
我用Hunglish2语料库训练神经电脑翻译任务的模型。这项工作的主要贡献在培训NMT模型期间评估不同的数据增强方法。我提出了5种不同的增强方法,这些方法是结构感知的,这意味着而不是随机选择用于消隐或替换的单词,句子的依赖树用作增强的基础。我首先关于神经网络的详细文献综述,顺序建模,神经机翻译,依赖解析和数据增强。经过详细的探索性数据分析和Hunglish2语料库的预处理之后,我使用所提出的数据增强技术进行实验。匈牙利语的最佳型号达到了33.9的BLEU得分,而英国匈牙利最好的模型达到了28.6的BLEU得分。
translated by 谷歌翻译
机器翻译历史上的重要突破之一是变压器模型的发展。不仅对于各种翻译任务,而且对于大多数其他NLP任务都是革命性的。在本文中,我们针对一个基于变压器的系统,该系统能够将德语用源句子转换为其英语的对应目标句子。我们对WMT'13数据集的新闻评论德语 - 英语并行句子进行实验。此外,我们研究了来自IWSLT'16数据集的培训中包含其他通用域数据以改善变压器模型性能的效果。我们发现,在培训中包括IWSLT'16数据集,有助于在WMT'13数据集的测试集中获得2个BLEU得分点。引入定性分析以分析通用域数据的使用如何有助于提高产生的翻译句子的质量。
translated by 谷歌翻译
神经机翻译(NMT)模型足以从源语言传达到目标语言的语义和句法信息。然而,这些模型遭受需要大量数据来学习参数。因此,对于具有稀缺数据的语言,这些模型面临效率低廉的风险。我们建议增加基于神经网络的关注,并重新排序信息来缓解缺乏数据。这一增强可以通过基线模型,通过最多6%的Bleu绝对提高英语到波斯语和波斯语的翻译质量。
translated by 谷歌翻译
The machine translation mechanism translates texts automatically between different natural languages, and Neural Machine Translation (NMT) has gained attention for its rational context analysis and fluent translation accuracy. However, processing low-resource languages that lack relevant training attributes like supervised data is a current challenge for Natural Language Processing (NLP). We incorporated a technique known Active Learning with the NMT toolkit Joey NMT to reach sufficient accuracy and robust predictions of low-resource language translation. With active learning, a semi-supervised machine learning strategy, the training algorithm determines which unlabeled data would be the most beneficial for obtaining labels using selected query techniques. We implemented two model-driven acquisition functions for selecting the samples to be validated. This work uses transformer-based NMT systems; baseline model (BM), fully trained model (FTM) , active learning least confidence based model (ALLCM), and active learning margin sampling based model (ALMSM) when translating English to Hindi. The Bilingual Evaluation Understudy (BLEU) metric has been used to evaluate system results. The BLEU scores of BM, FTM, ALLCM and ALMSM systems are 16.26, 22.56 , 24.54, and 24.20, respectively. The findings in this paper demonstrate that active learning techniques helps the model to converge early and improve the overall quality of the translation system.
translated by 谷歌翻译
随着自然语言处理领域的最新发展,在使用不同架构的神经机翻译中的使用情况上升了。变压器架构用于实现最先进的准确性,但它们是训练的非常昂贵的昂贵。每个人都不能拥有由高端GPU和其他资源组成的等待。我们在低计算资源上培训我们的模型,并调查结果。正如预期的那样,变形金刚表现出其他架构,但结果有一些令人惊讶的结果。由更多编码器和解码器组成的变形金刚需要花更多的时间来训练,但有更少的BLEU分数。LSTM在实验中表现良好,比较少花时间训练而不是变压器,适合在具有时间限制的情况下使用。
translated by 谷歌翻译
最近,作品数量表明,通过使用视觉信息,可以在一定程度上改进神经机器翻译(NMT)的性能。但是,这些结论中的大多数是根据基于有限的双语句子图像对的实验结果的分析得出的,例如Multi30k。在这类数据集中,必须通过手动注释的图像很好地表示一个双语平行句子对的内容,这与实际翻译情况不同。提出了一些先前的作品,以通过从退出的句子图像对中检索图像与主题模型来解决问题。但是,由于他们使用的句子图像对收集有限,因此很难处理其图像检索方法,并且很难证明视觉信息增强了NMT,而不是图像和图像的共发生句子。在本文中,我们提出了一种开放式摄影图像检索方法,以使用图像搜索引擎收集双语平行语料库的描述性图像。接下来,我们提出文本感知的专注视觉编码器,以过滤错误收集的噪声图像。多30K和其他两个翻译数据集的实验结果表明,我们提出的方法对强基础可取得重大改进。
translated by 谷歌翻译
We describe an open-source toolkit for neural machine translation (NMT). The toolkit prioritizes efficiency, modularity, and extensibility with the goal of supporting NMT research into model architectures, feature representations, and source modalities, while maintaining competitive performance and reasonable training requirements. The toolkit consists of modeling and translation support, as well as detailed pedagogical documentation about the underlying techniques.
translated by 谷歌翻译
Automatically describing the content of an image is a fundamental problem in artificial intelligence that connects computer vision and natural language processing. In this paper, we present a generative model based on a deep recurrent architecture that combines recent advances in computer vision and machine translation and that can be used to generate natural sentences describing an image. The model is trained to maximize the likelihood of the target description sentence given the training image. Experiments on several datasets show the accuracy of the model and the fluency of the language it learns solely from image descriptions. Our model is often quite accurate, which we verify both qualitatively and quantitatively. For instance, while the current state-of-the-art BLEU-1 score (the higher the better) on the Pascal dataset is 25, our approach yields 59, to be compared to human performance around 69. We also show BLEU-1 score improvements on Flickr30k, from 56 to 66, and on SBU, from 19 to 28. Lastly, on the newly released COCO dataset, we achieve a BLEU-4 of 27.7, which is the current state-of-the-art.
translated by 谷歌翻译
We propose a simple solution to use a single Neural Machine Translation (NMT) model to translate between multiple languages. Our solution requires no changes to the model architecture from a standard NMT system but instead introduces an artificial token at the beginning of the input sentence to specify the required target language. The rest of the model, which includes an encoder, decoder and attention module, remains unchanged and is shared across all languages. Using a shared wordpiece vocabulary, our approach enables Multilingual NMT using a single model without any increase in parameters, which is significantly simpler than previous proposals for Multilingual NMT. On the WMT'14 benchmarks, a single multilingual model achieves comparable performance for English→French and surpasses state-of-the-art results for English→German. Similarly, a single multilingual model surpasses state-of-the-art results for French→English and German→English on WMT'14 and WMT'15 benchmarks, respectively. On production corpora, multilingual models of up to twelve language pairs allow for better translation of many individual pairs. In addition to improving the translation quality of language pairs that the model was trained with, our models can also learn to perform implicit bridging between language pairs never seen explicitly during training, showing that transfer learning and zero-shot translation is possible for neural translation. Finally, we show analyses that hints at a universal interlingua representation in our models and show some interesting examples when mixing languages.
translated by 谷歌翻译
Directly training a document-to-document (Doc2Doc) neural machine translation (NMT) via Transformer from scratch, especially on small datasets usually fails to converge. Our dedicated probing tasks show that 1) both the absolute position and relative position information gets gradually weakened or even vanished once it reaches the upper encoder layers, and 2) the vanishing of absolute position information in encoder output causes the training failure of Doc2Doc NMT. To alleviate this problem, we propose a position-aware Transformer (P-Transformer) to enhance both the absolute and relative position information in both self-attention and cross-attention. Specifically, we integrate absolute positional information, i.e., position embeddings, into the query-key pairs both in self-attention and cross-attention through a simple yet effective addition operation. Moreover, we also integrate relative position encoding in self-attention. The proposed P-Transformer utilizes sinusoidal position encoding and does not require any task-specified position embedding, segment embedding, or attention mechanism. Through the above methods, we build a Doc2Doc NMT model with P-Transformer, which ingests the source document and completely generates the target document in a sequence-to-sequence (seq2seq) way. In addition, P-Transformer can be applied to seq2seq-based document-to-sentence (Doc2Sent) and sentence-to-sentence (Sent2Sent) translation. Extensive experimental results of Doc2Doc NMT show that P-Transformer significantly outperforms strong baselines on widely-used 9 document-level datasets in 7 language pairs, covering small-, middle-, and large-scales, and achieves a new state-of-the-art. Experimentation on discourse phenomena shows that our Doc2Doc NMT models improve the translation quality in both BLEU and discourse coherence. We make our code available on Github.
translated by 谷歌翻译