基于$ K $ NN的神经电机翻译($ K $ NN-MT)已经实现了最先进的MT任务。 $ k $ nn-mt的一个重要缺点在于识别来自整个数据存储的查询表示的$ k $最近邻居的效率低下,这在数据存储大小大的情况下是毫无疑问的。在这项工作中,我们提出\ TextBF {更快$ k $ nn-mt}来解决这个问题。更快的k $ nn-mt的核心思想是使用分层聚类策略来近似数据存储区中的查询和数据点之间的距离,该数据点被分解为两个部分:查询与中心之间的距离群集数据点属于,以及数据点与群集中心之间的距离。我们提出了实际的方法来以明显更快的方式计算这两个部分。通过对不同的MT基准测试的大量实验,我们展示了\ TextBF {更快$ K $ NN-MT}速度快于Fast $ K $ NN-MT \ CITEP {Meng2021Fast},只略微(1.2次)比其香草对应物慢保持模型性能为$ k $ nn-mt。更快$ k $ nn-mt,可以在现实世界MT服务上部署$ K $ NN-MT模型。
translated by 谷歌翻译
Semi-parametric models, which augment generation with retrieval, have led to impressive results in language modeling and machine translation, due to their ability to retrieve fine-grained information from a datastore of examples. One of the most prominent approaches, $k$NN-MT, exhibits strong domain adaptation capabilities by retrieving tokens from domain-specific datastores \citep{khandelwal2020nearest}. However, $k$NN-MT requires an expensive retrieval operation for every single generated token, leading to a very low decoding speed (around 8 times slower than a parametric model). In this paper, we introduce a \textit{chunk-based} $k$NN-MT model which retrieves chunks of tokens from the datastore, instead of a single token. We propose several strategies for incorporating the retrieved chunks into the generation process, and for selecting the steps at which the model needs to search for neighbors in the datastore. Experiments on machine translation in two settings, static and ``on-the-fly'' domain adaptation, show that the chunk-based $k$NN-MT model leads to significant speed-ups (up to 4 times) with only a small drop in translation quality.
translated by 谷歌翻译
在这项工作中,由{\它复制的概念更容易记住}的概念,我们介绍了GNN-LM,它通过允许在整个训练语料库中引用类似的上下文来扩展Vanilla神经语言模型(LM)。我们在输入上下文和从训练语料库中选择的语义相关邻居之间构建一个定向的异构图,其中节点是输入上下文中的令牌和检索到的邻居上下文,并且边缘表示节点之间的连接。图形神经网络(GNNS)在图表上构建,以聚合来自类似上下文的信息来解码令牌。此学习范例提供了直接访问参考上下文,并有助于提高模型的泛化能力。我们进行全面的实验以验证GNN-LM的有效性:GNN-LM在Wikitext-103上实现了14.8的新的最先进的困惑(在Vanilla LM模型的对应于的4.5点改进)和显示对强大基线的十亿个单词和enWiki8数据集进行大量改进。进行深度消融研究以了解GNN-LM的机制。可以在\ url {https://github.com/shannonai/gnn-lm}中找到代码}
translated by 谷歌翻译
我们研究了在循环机器翻译中对人体反馈的在线学习问题,其中人类翻译人员修改了机器生成的翻译,然后使用校正的翻译来改善神经电机翻译(NMT)系统。然而,以前的方法需要在线模型更新或额外的翻译记忆网络来实现高质量的性能,使它们在实践中不灵活和效率低下。在本文中,我们提出了一种新颖的非参数在线学习方法而不改变模型结构。这种方法引入了两个K-Cirelte-邻(KNN)模块:一个模块记住了人类反馈,这是人类翻译人员提供的正确句子,而另一个模块是自适应地平衡历史人体反馈和原始NMT模型的使用。在EMEA和JRC-ACQUIS基准上进行的实验表明,我们所提出的方法对翻译准确性的大量改进,并通过更少的人力校正操作实现更好的适应性能。
translated by 谷歌翻译
K-Nearest邻居神经机器翻译(KNN-MT)通过在测试时间检索单词级表示,成功地纳入了外部语料库。通常,KNN-MT在翻译任务中借用了现成的上下文表示,例如最后一个解码器层的输出,作为检索任务的查询向量。在这项工作中,我们强调说,将这两个任务的表示形式结合起来是优质检索的最佳选择。为了减轻它,我们利用受监督的对比学习来学习从原始上下文表示得出的独特检索表示。我们还提出了一种快速有效的方法来构建硬性样本。对五个领域的实验结果表明,与香草KNN-MT相比,我们的方法提高了检索准确性和BLEU评分。
translated by 谷歌翻译
基于检索的语言模型(R-LM)通过将标准语言模型(LM)与在测试时从外部数据存储中检索的示例结合使用自然语言文本的概率。虽然有效,但在实践中使用这些模型的主要瓶颈是计算昂贵的数据存储搜索,可以像每个时间步骤一样频繁地执行。在本文中,我们提出了retomaton-检索自动机 - 基于(1)在连续的数据存储条目之间保存指针,以及(2)将条目聚类到“状态”中。这有效地导致了在数据存储顶部构建的加权有限自动机,而不是将数据存储表示为平面列表。自动机的创建是无监督的,可以从任何文本集合中构造一个retomaton:原始训练语料库或另一个域。在推理时与LM推理并行遍历此自动机,将其困惑降低到1.85,或者可节省多达$ k $ nn-lm的最近邻居搜索的83%(Khandelwal等,2020年,没有),没有伤害困惑。我们的代码和训练有素的模型可在https://github.com/neulab/retomaton上找到。
translated by 谷歌翻译
非参数神经语言模型(NLMS)学习利用外部数据存储的预测性的文本分布,这允许他们通过显式记忆训练数据点来学习。虽然有效,这些模型通常需要从测试时间的大型数据存储中检索,从而显着增加推断开销,从而限制了在实际应用中的非参数NLMS的部署。在本文中,我们采取最近提出的$ k $-n $邻居语言模型(Khandelwal等,2020),例如探索沿各种尺寸提高其效率的方法。标准Wikitext-103基准和域 - 适应数据集的实验表明,我们的方法能够在推理速度的推动速度上实现高达6倍,同时保留可比性。我们所呈现的实证分析可以为未来的研究指导提供寻求开发或部署更高效的非参数NLM的指导。
translated by 谷歌翻译
kNN-MT presents a new paradigm for domain adaptation by building an external datastore, which usually saves all target language token occurrences in the parallel corpus. As a result, the constructed datastore is usually large and possibly redundant. In this paper, we investigate the interpretability issue of this approach: what knowledge does the NMT model need? We propose the notion of local correctness (LAC) as a new angle, which describes the potential translation correctness for a single entry and for a given neighborhood. Empirical study shows that our investigation successfully finds the conditions where the NMT model could easily fail and need related knowledge. Experiments on six diverse target domains and two language-pairs show that pruning according to local correctness brings a light and more explainable memory for kNN-MT domain adaptation.
translated by 谷歌翻译
如何根据新出现的情况有效地调整神经电机翻译(NMT)模型而不会再培训?尽管神经机翻译成功,但更新部署的型号在线仍然是一个挑战。现有的非参数方法从数据库中检索类似的示例以指导翻译过程是有希望的,但容易被检索到的示例过度。在这项工作中,我们建议使用示例检索(Kster)进行内核平滑的翻译,这是一种在线调整神经计算机翻译模型的有效方法。域适应和多域机平移数据集的实验表明,即使没有昂贵的再培训,Kster也能够通过最佳现有在线适应方法实现1.1至1.5 BLEU分数的提高。代码和培训的型号在https://github.com/jiangqn/kster发布。
translated by 谷歌翻译
Recent work has improved language models (LMs) remarkably by equipping them with a non-parametric memory component. However, most existing approaches only introduce mem-ories at testing time or represent them using a separately trained encoder, resulting in suboptimal training of the language model. In this work, we present TRIME, a novel yet simple training approach designed for training LMs with memory augmentation. Our approach uses a training objective that directly takes in-batch examples as accessible memory. We also present new methods for memory construction and data batching, which are used for adapting to different sets of memories--local, long-term, and external memory--at testing time. We evaluate TRIME on multiple language modeling and machine translation benchmarks and show that it is able to achieve significant improvements across all the settings. Concretely, TRIME reduces the perplexity from 18.70 to 15.37 on WIKITEXT-103, by effectively leveraging a large memory set from the training corpus. Compared to standard LM training, TRIME adds negligible computational overhead and is compatible with different neural architectures, making it a versatile solution for training memory-augmented LMs.
translated by 谷歌翻译
端到端语音翻译(E2E-ST)由于其误差传播的潜力,较低的延迟和较少的参数而受到了越来越多的关注。但是,基于神经的方法对该任务的有效性受到可用培训语料库的严重限制,尤其是对于较少或不存在的域中三重障碍培训数据的领域适应性。在本文中,我们提出了一种新型的非参数方法,该方法利用特定于域的文本翻译语料库来实现E2E-ST系统的域适应性。为此,我们首先将一个附加的编码器纳入预先训练的E2E-ST模型中,以实现文本翻译建模,然后通过减少可用三重态训练数据中的通讯表示不匹配来统一解码器的输出表示形式,以实现文本和语音翻译任务。在域适应过程中,引入了K-Nearest-neighbor(KNN)分类器,以使用由域特异性文本翻译语料库构建的外部数据存储器生成最终的翻译分布,而采用通用输出表示来执行相似性搜索。 Europarl-St基准的实验表明,仅涉及内域文本翻译数据时,我们提出的方法在所有翻译方向上平均将基线显着提高了基线,即使表现出强大的强度内域微调方法。
translated by 谷歌翻译
Nearest Neighbor Machine Translation (kNNMT) is a simple and effective method of augmenting neural machine translation (NMT) with a token-level nearest neighbor retrieval mechanism. The effectiveness of kNNMT directly depends on the quality of retrieved neighbors. However, original kNNMT builds datastores based on representations from NMT models, which would result in poor retrieval accuracy when NMT models are not good enough, leading to sub-optimal translation performance. In this paper, we propose PRED, a framework that leverages Pre-trained models for Datastores in kNN-MT. Better representations from pre-trained models allow us to build datastores of better quality. We also design a novel contrastive alignment objective to mitigate the representation gap between the NMT model and pre-trained models, enabling the NMT model to retrieve from better datastores. We conduct extensive experiments on both bilingual and multilingual translation benchmarks, including WMT17 English $\leftrightarrow$ Chinese, WMT14 English $\leftrightarrow$ German, IWSLT14 German $\leftrightarrow$ English, and IWSLT14 multilingual datasets. Empirical results demonstrate the effectiveness of PRED.
translated by 谷歌翻译
非自动回旋(NAR)模型的计算能力比自回归模型较少,但牺牲生成质量可以生成句子。先前的研究通过迭代解码解决了这个问题。这项研究建议将最近的邻居用作NAR解码器的初始状态,并迭代编辑。我们提出了一种新颖的培训策略,以了解有关邻居的编辑操作,以改善NAR文本生成。实验结果表明,所提出的方法(邻域)在JRC-ACQUISIE EN-DE DATASET上获得了更高的翻译质量(比香草变压器高1.69点(比香草变压器高1.69点),而解码迭代率较少(少于十分之一)使用最近的邻居翻译。我们还确认了所提出的方法对数据到文本任务(Wikibio)的有效性。此外,所提出的方法在WMT'14 EN-DE数据集上优于NAR基线。我们还报告了建议方法中使用的邻居示例的分析。
translated by 谷歌翻译
使用变压器模型,多语言神经机器的翻译一直显示出巨大的成功。部署这些模型是具有挑战性的,因为它们通常需要各种语言的大词汇(词汇)尺寸。这限制了在上一个词汇投影层中预测输出令牌的速度。为了减轻这些挑战,本文提出了一种通过聚类的快速词汇投影方法,该方法可用于GPU上的多语言变压器。首先,我们脱机将词汇搜索空间分为不同的结合群,鉴于解码器输出的隐藏上下文向量,这导致词汇投影的词汇列要小得多。其次,在推理时,提出的方法预测了词汇投影中隐藏上下文向量的簇和候选候选代币。本文还包括对在多语言环境中构建这些群集的不同方式的分析。我们的结果表明,FLOAT16 GPU推断中的端到端速度增长高达25%,同时保持BLEU得分并略有增加记忆成本。所提出的方法将词汇投影步骤加速自身最多2.6倍。我们还进行了广泛的人类评估,以验证所提出的方法保留了原始模型的翻译质量。
translated by 谷歌翻译
To better handle long-tail cases in the sequence labeling (SL) task, in this work, we introduce graph neural networks sequence labeling (GNN-SL), which augments the vanilla SL model output with similar tagging examples retrieved from the whole training set. Since not all the retrieved tagging examples benefit the model prediction, we construct a heterogeneous graph, and leverage graph neural networks (GNNs) to transfer information between the retrieved tagging examples and the input word sequence. The augmented node which aggregates information from neighbors is used to do prediction. This strategy enables the model to directly acquire similar tagging examples and improves the general quality of predictions. We conduct a variety of experiments on three typical sequence labeling tasks: Named Entity Recognition (NER), Part of Speech Tagging (POS), and Chinese Word Segmentation (CWS) to show the significant performance of our GNN-SL. Notably, GNN-SL achieves SOTA results of 96.9 (+0.2) on PKU, 98.3 (+0.4) on CITYU, 98.5 (+0.2) on MSR, and 96.9 (+0.2) on AS for the CWS task, and results comparable to SOTA performances on NER datasets, and POS datasets.
translated by 谷歌翻译
在本文中,我们通过将任务视为无监督的机器翻译(UMT),提出了一种新的释义范式,该任务是基于这样的假设,即必须有成对的句子在大型未标记的单语单语体中表达相同含义的句子。提出的范式首先将大型未标记的语料库分成多个簇,并使用这些簇对训练多个UMT模型。然后,基于这些UMT模型产生的释义对,可以训练统一的替代模型作为生成释义的最终\ STS模型,可以在无监督的设置中直接用于测试,也可以在列出在标签数据集中在标签数据集中进行测试。有监督的设置。提出的方法提供了基于机器翻译的释义方法的优点,因为它避免了对双语句子对的依赖。它还允许人类干预该模型,以便可以使用不同的过滤标准生成更多不同的释义。对被监督和无监督的设置的现有释义数据集进行了广泛的实验,证明了拟议的范式的有效性。
translated by 谷歌翻译
Large-scale generative models show an impressive ability to perform a wide range of Natural Language Processing (NLP) tasks using in-context learning, where a few examples are used to describe a task to the model. For Machine Translation (MT), these examples are typically randomly sampled from the development dataset with a similar distribution as the evaluation set. However, it is unclear how the choice of these in-context examples and their ordering impacts the output translation quality. In this work, we aim to understand the properties of good in-context examples for MT in both in-domain and out-of-domain settings. We show that the translation quality and the domain of the in-context examples matter and that 1-shot noisy unrelated example can have a catastrophic impact on output quality. While concatenating multiple random examples reduces the effect of noise, a single good prompt optimized to maximize translation quality on the development dataset can elicit learned information from the pre-trained language model. Adding similar examples based on an n-gram overlap with the test source significantly and consistently improves the translation quality of the outputs, outperforming a strong kNN-MT baseline in 2 out of 4 out-of-domain datasets.
translated by 谷歌翻译
The word alignment task, despite its prominence in the era of statistical machine translation (SMT), is niche and under-explored today. In this two-part tutorial, we argue for the continued relevance for word alignment. The first part provides a historical background to word alignment as a core component of the traditional SMT pipeline. We zero-in on GIZA++, an unsupervised, statistical word aligner with surprising longevity. Jumping forward to the era of neural machine translation (NMT), we show how insights from word alignment inspired the attention mechanism fundamental to present-day NMT. The second part shifts to a survey approach. We cover neural word aligners, showing the slow but steady progress towards surpassing GIZA++ performance. Finally, we cover the present-day applications of word alignment, from cross-lingual annotation projection, to improving translation.
translated by 谷歌翻译
在几乎所有文本生成应用中,Word序列在左右(L2R)或左右(R2L)方式中构造,因为自然语言句子是写入L2R或R2L。但是,我们发现自然语言书面订单对文本生成至关重要。在本文中,我们提出了一种螺旋语言建模(SLM),这是一种普遍的方法,使人们能够构建超出L2R和R2L订单的自然语言句子。 SLM允许其中一个从结果文本内的任意令牌开始,并在所选的任意令牌中展开REST令牌。它使解码顺序除了语言模型困惑之外的新优化目标,这进一步提高了所生成文本的分集和质量。此外,SLM使得可以通过选择正确的开始令牌来操纵文本构建过程。 SLM还将生成排序引入了额外的正则化,以提高低资源方案中的模型稳健性。 8次广泛研究的神经机翻译(NMT)任务的实验表明,与传统的L2R解码方法相比,SLM高达4.7 BLEU增加。
translated by 谷歌翻译
具有释义生成的长期问题是如何获得可靠的监督信号。在本文中,我们基于假设产生与鉴定相同的上下文相同的含义的两个句子的概率应该是相同的,提出了一种无监督的范例。灵感来自这一基本因的主意,我们提出了一种流水线系统,该系统由基于上下文语言模型的候选候选生成组成,使用评分函数的候选滤波,以及基于所选候选者的释放模型训练。提议的范例提供了现有的释义生成方法的优点:(1)使用上下文规范器在含义上,该模型能够产生大量的高质量释义对; (2)使用人为可解释的评分功能来选择来自候选者的释义对,所提出的框架为开发人员提供了一种与数据生成过程进行干预的通道,导致更可控的模型。不同任务和数据集的实验结果表明,拟议模型在监督和无人监督的设置中的有效性。
translated by 谷歌翻译