我们研究了在循环机器翻译中对人体反馈的在线学习问题,其中人类翻译人员修改了机器生成的翻译,然后使用校正的翻译来改善神经电机翻译(NMT)系统。然而,以前的方法需要在线模型更新或额外的翻译记忆网络来实现高质量的性能,使它们在实践中不灵活和效率低下。在本文中,我们提出了一种新颖的非参数在线学习方法而不改变模型结构。这种方法引入了两个K-Cirelte-邻(KNN)模块:一个模块记住了人类反馈,这是人类翻译人员提供的正确句子,而另一个模块是自适应地平衡历史人体反馈和原始NMT模型的使用。在EMEA和JRC-ACQUIS基准上进行的实验表明,我们所提出的方法对翻译准确性的大量改进,并通过更少的人力校正操作实现更好的适应性能。
translated by 谷歌翻译
Semi-parametric models, which augment generation with retrieval, have led to impressive results in language modeling and machine translation, due to their ability to retrieve fine-grained information from a datastore of examples. One of the most prominent approaches, $k$NN-MT, exhibits strong domain adaptation capabilities by retrieving tokens from domain-specific datastores \citep{khandelwal2020nearest}. However, $k$NN-MT requires an expensive retrieval operation for every single generated token, leading to a very low decoding speed (around 8 times slower than a parametric model). In this paper, we introduce a \textit{chunk-based} $k$NN-MT model which retrieves chunks of tokens from the datastore, instead of a single token. We propose several strategies for incorporating the retrieved chunks into the generation process, and for selecting the steps at which the model needs to search for neighbors in the datastore. Experiments on machine translation in two settings, static and ``on-the-fly'' domain adaptation, show that the chunk-based $k$NN-MT model leads to significant speed-ups (up to 4 times) with only a small drop in translation quality.
translated by 谷歌翻译
如何根据新出现的情况有效地调整神经电机翻译(NMT)模型而不会再培训?尽管神经机翻译成功,但更新部署的型号在线仍然是一个挑战。现有的非参数方法从数据库中检索类似的示例以指导翻译过程是有希望的,但容易被检索到的示例过度。在这项工作中,我们建议使用示例检索(Kster)进行内核平滑的翻译,这是一种在线调整神经计算机翻译模型的有效方法。域适应和多域机平移数据集的实验表明,即使没有昂贵的再培训,Kster也能够通过最佳现有在线适应方法实现1.1至1.5 BLEU分数的提高。代码和培训的型号在https://github.com/jiangqn/kster发布。
translated by 谷歌翻译
端到端语音翻译(E2E-ST)由于其误差传播的潜力,较低的延迟和较少的参数而受到了越来越多的关注。但是,基于神经的方法对该任务的有效性受到可用培训语料库的严重限制,尤其是对于较少或不存在的域中三重障碍培训数据的领域适应性。在本文中,我们提出了一种新型的非参数方法,该方法利用特定于域的文本翻译语料库来实现E2E-ST系统的域适应性。为此,我们首先将一个附加的编码器纳入预先训练的E2E-ST模型中,以实现文本翻译建模,然后通过减少可用三重态训练数据中的通讯表示不匹配来统一解码器的输出表示形式,以实现文本和语音翻译任务。在域适应过程中,引入了K-Nearest-neighbor(KNN)分类器,以使用由域特异性文本翻译语料库构建的外部数据存储器生成最终的翻译分布,而采用通用输出表示来执行相似性搜索。 Europarl-St基准的实验表明,仅涉及内域文本翻译数据时,我们提出的方法在所有翻译方向上平均将基线显着提高了基线,即使表现出强大的强度内域微调方法。
translated by 谷歌翻译
基于$ K $ NN的神经电机翻译($ K $ NN-MT)已经实现了最先进的MT任务。 $ k $ nn-mt的一个重要缺点在于识别来自整个数据存储的查询表示的$ k $最近邻居的效率低下,这在数据存储大小大的情况下是毫无疑问的。在这项工作中,我们提出\ TextBF {更快$ k $ nn-mt}来解决这个问题。更快的k $ nn-mt的核心思想是使用分层聚类策略来近似数据存储区中的查询和数据点之间的距离,该数据点被分解为两个部分:查询与中心之间的距离群集数据点属于,以及数据点与群集中心之间的距离。我们提出了实际的方法来以明显更快的方式计算这两个部分。通过对不同的MT基准测试的大量实验,我们展示了\ TextBF {更快$ K $ NN-MT}速度快于Fast $ K $ NN-MT \ CITEP {Meng2021Fast},只略微(1.2次)比其香草对应物慢保持模型性能为$ k $ nn-mt。更快$ k $ nn-mt,可以在现实世界MT服务上部署$ K $ NN-MT模型。
translated by 谷歌翻译
Nearest Neighbor Machine Translation (kNNMT) is a simple and effective method of augmenting neural machine translation (NMT) with a token-level nearest neighbor retrieval mechanism. The effectiveness of kNNMT directly depends on the quality of retrieved neighbors. However, original kNNMT builds datastores based on representations from NMT models, which would result in poor retrieval accuracy when NMT models are not good enough, leading to sub-optimal translation performance. In this paper, we propose PRED, a framework that leverages Pre-trained models for Datastores in kNN-MT. Better representations from pre-trained models allow us to build datastores of better quality. We also design a novel contrastive alignment objective to mitigate the representation gap between the NMT model and pre-trained models, enabling the NMT model to retrieve from better datastores. We conduct extensive experiments on both bilingual and multilingual translation benchmarks, including WMT17 English $\leftrightarrow$ Chinese, WMT14 English $\leftrightarrow$ German, IWSLT14 German $\leftrightarrow$ English, and IWSLT14 multilingual datasets. Empirical results demonstrate the effectiveness of PRED.
translated by 谷歌翻译
kNN-MT presents a new paradigm for domain adaptation by building an external datastore, which usually saves all target language token occurrences in the parallel corpus. As a result, the constructed datastore is usually large and possibly redundant. In this paper, we investigate the interpretability issue of this approach: what knowledge does the NMT model need? We propose the notion of local correctness (LAC) as a new angle, which describes the potential translation correctness for a single entry and for a given neighborhood. Empirical study shows that our investigation successfully finds the conditions where the NMT model could easily fail and need related knowledge. Experiments on six diverse target domains and two language-pairs show that pruning according to local correctness brings a light and more explainable memory for kNN-MT domain adaptation.
translated by 谷歌翻译
K-Nearest邻居神经机器翻译(KNN-MT)通过在测试时间检索单词级表示,成功地纳入了外部语料库。通常,KNN-MT在翻译任务中借用了现成的上下文表示,例如最后一个解码器层的输出,作为检索任务的查询向量。在这项工作中,我们强调说,将这两个任务的表示形式结合起来是优质检索的最佳选择。为了减轻它,我们利用受监督的对比学习来学习从原始上下文表示得出的独特检索表示。我们还提出了一种快速有效的方法来构建硬性样本。对五个领域的实验结果表明,与香草KNN-MT相比,我们的方法提高了检索准确性和BLEU评分。
translated by 谷歌翻译
非自动回旋(NAR)模型的计算能力比自回归模型较少,但牺牲生成质量可以生成句子。先前的研究通过迭代解码解决了这个问题。这项研究建议将最近的邻居用作NAR解码器的初始状态,并迭代编辑。我们提出了一种新颖的培训策略,以了解有关邻居的编辑操作,以改善NAR文本生成。实验结果表明,所提出的方法(邻域)在JRC-ACQUISIE EN-DE DATASET上获得了更高的翻译质量(比香草变压器高1.69点(比香草变压器高1.69点),而解码迭代率较少(少于十分之一)使用最近的邻居翻译。我们还确认了所提出的方法对数据到文本任务(Wikibio)的有效性。此外,所提出的方法在WMT'14 EN-DE数据集上优于NAR基线。我们还报告了建议方法中使用的邻居示例的分析。
translated by 谷歌翻译
Recent work has improved language models (LMs) remarkably by equipping them with a non-parametric memory component. However, most existing approaches only introduce mem-ories at testing time or represent them using a separately trained encoder, resulting in suboptimal training of the language model. In this work, we present TRIME, a novel yet simple training approach designed for training LMs with memory augmentation. Our approach uses a training objective that directly takes in-batch examples as accessible memory. We also present new methods for memory construction and data batching, which are used for adapting to different sets of memories--local, long-term, and external memory--at testing time. We evaluate TRIME on multiple language modeling and machine translation benchmarks and show that it is able to achieve significant improvements across all the settings. Concretely, TRIME reduces the perplexity from 18.70 to 15.37 on WIKITEXT-103, by effectively leveraging a large memory set from the training corpus. Compared to standard LM training, TRIME adds negligible computational overhead and is compatible with different neural architectures, making it a versatile solution for training memory-augmented LMs.
translated by 谷歌翻译
基于检索的语言模型(R-LM)通过将标准语言模型(LM)与在测试时从外部数据存储中检索的示例结合使用自然语言文本的概率。虽然有效,但在实践中使用这些模型的主要瓶颈是计算昂贵的数据存储搜索,可以像每个时间步骤一样频繁地执行。在本文中,我们提出了retomaton-检索自动机 - 基于(1)在连续的数据存储条目之间保存指针,以及(2)将条目聚类到“状态”中。这有效地导致了在数据存储顶部构建的加权有限自动机,而不是将数据存储表示为平面列表。自动机的创建是无监督的,可以从任何文本集合中构造一个retomaton:原始训练语料库或另一个域。在推理时与LM推理并行遍历此自动机,将其困惑降低到1.85,或者可节省多达$ k $ nn-lm的最近邻居搜索的83%(Khandelwal等,2020年,没有),没有伤害困惑。我们的代码和训练有素的模型可在https://github.com/neulab/retomaton上找到。
translated by 谷歌翻译
本文介绍了流媒体和非流定向晶体翻译的统一端到端帧工作。虽然非流媒体语音翻译的培训配方已经成熟,但尚未建立流媒体传播的食谱。在这项工作中,WEFOCUS在开发一个统一的模型(UNIST),它从基本组成部分的角度支持流媒体和非流媒体ST,包括培训目标,注意机制和解码政策。对最流行的语音到文本翻译基准数据集,MERE-C的实验表明,与媒体ST的BLEU评分和延迟度量有更好的折衷和液化标准端到端基线和级联模型。我们将公开提供我们的代码和评估工具。
translated by 谷歌翻译
Large-scale generative models show an impressive ability to perform a wide range of Natural Language Processing (NLP) tasks using in-context learning, where a few examples are used to describe a task to the model. For Machine Translation (MT), these examples are typically randomly sampled from the development dataset with a similar distribution as the evaluation set. However, it is unclear how the choice of these in-context examples and their ordering impacts the output translation quality. In this work, we aim to understand the properties of good in-context examples for MT in both in-domain and out-of-domain settings. We show that the translation quality and the domain of the in-context examples matter and that 1-shot noisy unrelated example can have a catastrophic impact on output quality. While concatenating multiple random examples reduces the effect of noise, a single good prompt optimized to maximize translation quality on the development dataset can elicit learned information from the pre-trained language model. Adding similar examples based on an n-gram overlap with the test source significantly and consistently improves the translation quality of the outputs, outperforming a strong kNN-MT baseline in 2 out of 4 out-of-domain datasets.
translated by 谷歌翻译
非参数神经语言模型(NLMS)学习利用外部数据存储的预测性的文本分布,这允许他们通过显式记忆训练数据点来学习。虽然有效,这些模型通常需要从测试时间的大型数据存储中检索,从而显着增加推断开销,从而限制了在实际应用中的非参数NLMS的部署。在本文中,我们采取最近提出的$ k $-n $邻居语言模型(Khandelwal等,2020),例如探索沿各种尺寸提高其效率的方法。标准Wikitext-103基准和域 - 适应数据集的实验表明,我们的方法能够在推理速度的推动速度上实现高达6倍,同时保留可比性。我们所呈现的实证分析可以为未来的研究指导提供寻求开发或部署更高效的非参数NLM的指导。
translated by 谷歌翻译
合并个人喜好对于高级机器翻译任务至关重要。尽管机器翻译最近进步,但正确反映个人风格仍然是一项艰巨的任务。在本文中,我们引入了一个个性化的自动后编辑框架来应对这一挑战,该挑战有效地产生了考虑不同个人行为的句子。为了构建此框架,我们首先收集后编辑数据,该数据表示来自Live Machine Translation系统的用户偏好。具体而言,现实世界的用户输入源句子进行翻译,并根据用户的首选样式编辑机器翻译的输出。然后,我们提出了一个模型,该模型结合了APE框架上的歧视器模块和特定于用户的参数。实验结果表明,该方法的表现优于四个不同指标(即BLEU,TER,YISI-1和人类评估)的其他基线模型。
translated by 谷歌翻译
Pre-trained language models (PLMs) have exhibited remarkable few-shot learning capabilities when provided a few examples in a natural language prompt as demonstrations of test instances, i.e., in-context learning. However, the performance of in-context learning is susceptible to the choice of prompt format, training examples and the ordering of the training examples. In this paper, we propose a novel nearest-neighbor calibration framework for in-context learning to ease this issue. It is inspired by a phenomenon that the in-context learning paradigm produces incorrect labels when inferring training instances, which provides a useful supervised signal to calibrate predictions. Thus, our method directly augments the predictions with a $k$-nearest-neighbor ($k$NN) classifier over a datastore of cached few-shot instance representations obtained by PLMs and their corresponding labels. Then adaptive neighbor selection and feature regularization modules are introduced to make full use of a few support instances to reduce the $k$NN retrieval noise. Experiments on various few-shot text classification tasks demonstrate that our method significantly improves in-context learning, while even achieving comparable performance with state-of-the-art tuning-based approaches in some sentiment analysis tasks.
translated by 谷歌翻译
Retrieval-augmented Neural Machine Translation models have been successful in many translation scenarios. Different from previous works that make use of mutually similar but redundant translation memories~(TMs), we propose a new retrieval-augmented NMT to model contrastively retrieved translation memories that are holistically similar to the source sentence while individually contrastive to each other providing maximal information gains in three phases. First, in TM retrieval phase, we adopt a contrastive retrieval algorithm to avoid redundancy and uninformativeness of similar translation pieces. Second, in memory encoding stage, given a set of TMs we propose a novel Hierarchical Group Attention module to gather both local context of each TM and global context of the whole TM set. Finally, in training phase, a Multi-TM contrastive learning objective is introduced to learn salient feature of each TM with respect to target sentence. Experimental results show that our framework obtains improvements over strong baselines on the benchmark datasets.
translated by 谷歌翻译
Large language models (LLMs) that have been trained on multilingual but not parallel text exhibit a remarkable ability to translate between languages. We probe this ability in an in-depth study of the pathways language model (PaLM), which has demonstrated the strongest machine translation (MT) performance among similarly-trained LLMs to date. We investigate various strategies for choosing translation examples for few-shot prompting, concluding that example quality is the most important factor. Using optimized prompts, we revisit previous assessments of PaLM's MT capabilities with more recent test sets, modern MT metrics, and human evaluation, and find that its performance, while impressive, still lags that of state-of-the-art supervised systems. We conclude by providing an analysis of PaLM's MT output which reveals some interesting properties and prospects for future work.
translated by 谷歌翻译
在这项工作中,由{\它复制的概念更容易记住}的概念,我们介绍了GNN-LM,它通过允许在整个训练语料库中引用类似的上下文来扩展Vanilla神经语言模型(LM)。我们在输入上下文和从训练语料库中选择的语义相关邻居之间构建一个定向的异构图,其中节点是输入上下文中的令牌和检索到的邻居上下文,并且边缘表示节点之间的连接。图形神经网络(GNNS)在图表上构建,以聚合来自类似上下文的信息来解码令牌。此学习范例提供了直接访问参考上下文,并有助于提高模型的泛化能力。我们进行全面的实验以验证GNN-LM的有效性:GNN-LM在Wikitext-103上实现了14.8的新的最先进的困惑(在Vanilla LM模型的对应于的4.5点改进)和显示对强大基线的十亿个单词和enWiki8数据集进行大量改进。进行深度消融研究以了解GNN-LM的机制。可以在\ url {https://github.com/shannonai/gnn-lm}中找到代码}
translated by 谷歌翻译
现有的文档级神经计算机翻译(NMT)模型具有足够探索的不同上下文设置,为目标生成提供指导。但是,对于慷慨的上下文信息,对揭开更多样化的背景的注意力很少。在本文中,我们提出了一种选择性的内存增强神经文件翻译模型,以处理包含上下文的大假设空间的文档。具体而言,我们从训练语料库中检索类似的双语句子对来增强全局上下文,然后通过选择性机制扩展双流注意模型,以捕获本地上下文和不同的全局背景。这种统一的方法允许我们的模型在三个公开的文档级机器翻译数据集上优雅地培训,并且显着优于以前的文档级NMT型号。
translated by 谷歌翻译