我们定义了一个名为“扩展单词对齐”的新颖概念,以提高后编辑辅助效率。基于扩展的单词对齐方式,我们进一步提出了一个名为精制单词级量化宽松的新颖任务,该任务输出精制标签和单词级对应关系。与原始单词级别的量化宽松相比,新任务能够直接指出编辑操作,从而提高效率。为了提取扩展单词对齐,我们采用了基于Mbert的监督方法。为了解决精致的单词级量化宽松,我们首先通过训练基于Mbert和XLM-R的序列标记的回归模型来预测原始量化量子标签。然后,我们使用扩展单词对齐来完善原始文字标签。另外,我们提取源差距对应关系,同时获得GAP标签。两种语言对的实验显示了我们方法的可行性,并为我们提供了进一步改进的灵感。
translated by 谷歌翻译
机器翻译(MT)的单词级质量估计(QE)旨在在不参考的情况下找出翻译句子中的潜在翻译错误。通常,关于文字级别量化宽松的传统作品旨在根据文章编辑工作来预测翻译质量,其中通过比较MT句子之间的单词来自动生成单词标签(“ OK”和“ BAD”)。通过翻译错误率(TER)工具包编辑的句子。虽然可以使用后编辑的工作来在一定程度上测量翻译质量,但我们发现它通常与人类对单词是否良好或翻译不良的判断相抵触。为了克服限制,我们首先创建了一个金色基准数据集,即\ emph {hjqe}(人类对质量估计的判断),专家翻译直接注释了对其判断的不良翻译单词。此外,为了进一步利用平行语料库,我们提出了使用两个标签校正策略的自我监督的预训练,即标记改进策略和基于树的注释策略,以使基于TER的人工量化量子ceper更接近\ emph {HJQE}。我们根据公开可用的WMT en-de和en-ZH Corpora进行实质性实验。结果不仅表明我们提出的数据集与人类的判断更加一致,而且还确认了提议的标签纠正策略的有效性。 。}
translated by 谷歌翻译
With the recent advance in neural machine translation demonstrating its importance, research on quality estimation (QE) has been steadily progressing. QE aims to automatically predict the quality of machine translation (MT) output without reference sentences. Despite its high utility in the real world, there remain several limitations concerning manual QE data creation: inevitably incurred non-trivial costs due to the need for translation experts, and issues with data scaling and language expansion. To tackle these limitations, we present QUAK, a Korean-English synthetic QE dataset generated in a fully automatic manner. This consists of three sub-QUAK datasets QUAK-M, QUAK-P, and QUAK-H, produced through three strategies that are relatively free from language constraints. Since each strategy requires no human effort, which facilitates scalability, we scale our data up to 1.58M for QUAK-P, H and 6.58M for QUAK-M. As an experiment, we quantitatively analyze word-level QE results in various ways while performing statistical analysis. Moreover, we show that datasets scaled in an efficient way also contribute to performance improvements by observing meaningful performance gains in QUAK-M, P when adding data up to 1.58M.
translated by 谷歌翻译
翻译质量估计(QE)是预测机器翻译(MT)输出质量的任务,而无需任何参考。作为MT实际应用中的重要组成部分,这项任务已越来越受到关注。在本文中,我们首先提出了XLMRScore,这是一种基于使用XLM-Roberta(XLMR)模型计算的BertScore的简单无监督的QE方法,同时讨论了使用此方法发生的问题。接下来,我们建议两种减轻问题的方法:用未知令牌和预训练模型的跨语性对准替换未翻译的单词,以表示彼此之间的一致性单词。我们在WMT21 QE共享任务的四个低资源语言对上评估了所提出的方法,以及本文介绍的新的英语FARSI测试数据集。实验表明,我们的方法可以在两个零射击方案的监督基线中获得可比的结果,即皮尔森相关性的差异少于0.01,同时在所有低资源语言对中的平均低资源语言对中的无人看管竞争对手的平均水平超过8%的平均水平超过8%。 。
translated by 谷歌翻译
Translating training data into many languages has emerged as a practical solution for improving cross-lingual transfer. For tasks that involve span-level annotations, such as information extraction or question answering, an additional label projection step is required to map annotated spans onto the translated texts. Recently, a few efforts have utilized a simple mark-then-translate method to jointly perform translation and projection by inserting special markers around the labeled spans in the original sentence. However, as far as we are aware, no empirical analysis has been conducted on how this approach compares to traditional annotation projection based on word alignment. In this paper, we present an extensive empirical study across 42 languages and three tasks (QA, NER, and Event Extraction) to evaluate the effectiveness and limitations of both methods, filling an important gap in the literature. Experimental results show that our optimized version of mark-then-translate, which we call EasyProject, is easily applied to many languages and works surprisingly well, outperforming the more complex word alignment-based methods. We analyze several key factors that affect end-task performance, and show EasyProject works well because it can accurately preserve label span boundaries after translation. We will publicly release all our code and data.
translated by 谷歌翻译
We report the result of the first edition of the WMT shared task on Translation Suggestion (TS). The task aims to provide alternatives for specific words or phrases given the entire documents generated by machine translation (MT). It consists two sub-tasks, namely, the naive translation suggestion and translation suggestion with hints. The main difference is that some hints are provided in sub-task two, therefore, it is easier for the model to generate more accurate suggestions. For sub-task one, we provide the corpus for the language pairs English-German and English-Chinese. And only English-Chinese corpus is provided for the sub-task two. We received 92 submissions from 5 participating teams in sub-task one and 6 submissions for the sub-task 2, most of them covering all of the translation directions. We used the automatic metric BLEU for evaluating the performance of each submission.
translated by 谷歌翻译
我们假设现有的句子级机器翻译(MT)指标在人类参考包含歧义时会效率降低。为了验证这一假设,我们提出了一种非常简单的方法,用于扩展预审计的指标以在文档级别合并上下文。我们将我们的方法应用于三个流行的指标,即Bertscore,Prism和Comet,以及无参考的公制Comet-QE。我们使用提供的MQM注释评估WMT 2021指标共享任务的扩展指标。我们的结果表明,扩展指标的表现在约85%的测试条件下优于其句子级别的级别,而在排除低质量人类参考的结果时。此外,我们表明我们的文档级扩展大大提高了其对话语现象任务的准确性,从而优于专用基线高达6.1%。我们的实验结果支持我们的初始假设,并表明对指标的简单扩展使他们能够利用上下文来解决参考中的歧义。
translated by 谷歌翻译
We present, Naamapadam, the largest publicly available Named Entity Recognition (NER) dataset for the 11 major Indian languages from two language families. In each language, it contains more than 400k sentences annotated with a total of at least 100k entities from three standard entity categories (Person, Location and Organization) for 9 out of the 11 languages. The training dataset has been automatically created from the Samanantar parallel corpus by projecting automatically tagged entities from an English sentence to the corresponding Indian language sentence. We also create manually annotated testsets for 8 languages containing approximately 1000 sentences per language. We demonstrate the utility of the obtained dataset on existing testsets and the Naamapadam-test data for 8 Indic languages. We also release IndicNER, a multilingual mBERT model fine-tuned on the Naamapadam training set. IndicNER achieves the best F1 on the Naamapadam-test set compared to an mBERT model fine-tuned on existing datasets. IndicNER achieves an F1 score of more than 80 for 7 out of 11 Indic languages. The dataset and models are available under open-source licenses at https://ai4bharat.iitm.ac.in/naamapadam.
translated by 谷歌翻译
Machine Translation Quality Estimation (QE) is the task of evaluating translation output in the absence of human-written references. Due to the scarcity of human-labeled QE data, previous works attempted to utilize the abundant unlabeled parallel corpora to produce additional training data with pseudo labels. In this paper, we demonstrate a significant gap between parallel data and real QE data: for QE data, it is strictly guaranteed that the source side is original texts and the target side is translated (namely translationese). However, for parallel data, it is indiscriminate and the translationese may occur on either source or target side. We compare the impact of parallel data with different translation directions in QE data augmentation, and find that using the source-original part of parallel corpus consistently outperforms its target-original counterpart. Moreover, since the WMT corpus lacks direction information for each parallel sentence, we train a classifier to distinguish source- and target-original bitext, and carry out an analysis of their difference in both style and domain. Together, these findings suggest using source-original parallel data for QE data augmentation, which brings a relative improvement of up to 4.0% and 6.4% compared to undifferentiated data on sentence- and word-level QE tasks respectively.
translated by 谷歌翻译
Grammatical Error Correction (GEC) is the task of automatically detecting and correcting errors in text. The task not only includes the correction of grammatical errors, such as missing prepositions and mismatched subject-verb agreement, but also orthographic and semantic errors, such as misspellings and word choice errors respectively. The field has seen significant progress in the last decade, motivated in part by a series of five shared tasks, which drove the development of rule-based methods, statistical classifiers, statistical machine translation, and finally neural machine translation systems which represent the current dominant state of the art. In this survey paper, we condense the field into a single article and first outline some of the linguistic challenges of the task, introduce the most popular datasets that are available to researchers (for both English and other languages), and summarise the various methods and techniques that have been developed with a particular focus on artificial error generation. We next describe the many different approaches to evaluation as well as concerns surrounding metric reliability, especially in relation to subjective human judgements, before concluding with an overview of recent progress and suggestions for future work and remaining challenges. We hope that this survey will serve as comprehensive resource for researchers who are new to the field or who want to be kept apprised of recent developments.
translated by 谷歌翻译
Word alignment is to find translationally equivalent words between source and target sentences. Previous work has demonstrated that self-training can achieve competitive word alignment results. In this paper, we propose to use word alignments generated by a third-party word aligner to supervise the neural word alignment training. Specifically, source word and target word of each word pair aligned by the third-party aligner are trained to be close neighbors to each other in the contextualized embedding space when fine-tuning a pre-trained cross-lingual language model. Experiments on the benchmarks of various language pairs show that our approach can surprisingly do self-correction over the third-party supervision by finding more accurate word alignments and deleting wrong word alignments, leading to better performance than various third-party word aligners, including the currently best one. When we integrate all supervisions from various third-party aligners, we achieve state-of-the-art word alignment performances, with averagely more than two points lower alignment error rates than the best third-party aligner. We released our code at https://github.com/sdongchuanqi/Third-Party-Supervised-Aligner.
translated by 谷歌翻译
自动编辑(APE)旨在通过自动纠正机器翻译输出中的错误来减少手动后编辑工作。由于人类注销的培训数据数量有限,数据稀缺是所有猿类系统所面临的主要挑战之一。为了减轻缺乏真正的培训数据,当前的大多数猿类系统采用数据增强方法来生成大规模的人工语料库。鉴于APE数据增强的重要性,我们分别研究了人工语料库的构建方法和人工数据域对猿类模型性能的影响。此外,猿类的难度在不同的机器翻译(MT)系统之间有所不同。我们在困难的猿数据集上研究了最先进的APE模型的输出,以分析现有的APE系统中的问题。首先,我们发现1)具有高质量源文本和机器翻译文本的人工语料库更有效地改善了猿类模型的性能; 2)内域人工训练数据可以更好地改善猿类模型的性能,而无关紧要的外域数据实际上会干扰该模型; 3)现有的APE模型与包含长源文本或高质量机器翻译文本的案例斗争; 4)最先进的猿类模型在语法和语义添加问题上很好地工作,但是输出容易出现实体和语义遗漏误差。
translated by 谷歌翻译
The word alignment task, despite its prominence in the era of statistical machine translation (SMT), is niche and under-explored today. In this two-part tutorial, we argue for the continued relevance for word alignment. The first part provides a historical background to word alignment as a core component of the traditional SMT pipeline. We zero-in on GIZA++, an unsupervised, statistical word aligner with surprising longevity. Jumping forward to the era of neural machine translation (NMT), we show how insights from word alignment inspired the attention mechanism fundamental to present-day NMT. The second part shifts to a survey approach. We cover neural word aligners, showing the slow but steady progress towards surpassing GIZA++ performance. Finally, we cover the present-day applications of word alignment, from cross-lingual annotation projection, to improving translation.
translated by 谷歌翻译
最近的神经机翻译研究探索了灵活的发行订单,作为左右一代的替代品。然而,培训非单调模型带来了新的并发症:如何在同一最终结果到达的订单组合爆炸时搜索良好的订单?此外,这些自动排序如何与人类翻译的实际行为进行比较?目前的模型依靠手动构建的偏见或留下自己的所有可能性。在本文中,我们分析了人工后编辑所产生的排序,并使用它们培训自动编辑后系统。我们将生成的系统与由左右和随机编辑排序训练的人进行比较。我们观察到人类倾向于遵循几乎左右的顺序,而是有趣的偏差,例如首选通过纠正标点符号或动词而开始。
translated by 谷歌翻译
In the absence of readily available labeled data for a given task and language, annotation projection has been proposed as one of the possible strategies to automatically generate annotated data which may then be used to train supervised systems. Annotation projection has often been formulated as the task of projecting, on parallel corpora, some labels from a source into a target language. In this paper we present T-Projection, a new approach for annotation projection that leverages large pretrained text2text language models and state-of-the-art machine translation technology. T-Projection decomposes the label projection task into two subtasks: (i) The candidate generation step, in which a set of projection candidates using a multilingual T5 model is generated and, (ii) the candidate selection step, in which the candidates are ranked based on translation probabilities. We evaluate our method in three downstream tasks and five different languages. Our results show that T-projection improves the average F1 score of previous methods by more than 8 points.
translated by 谷歌翻译
As machine translation (MT) metrics improve their correlation with human judgement every year, it is crucial to understand the limitations of such metrics at the segment level. Specifically, it is important to investigate metric behaviour when facing accuracy errors in MT because these can have dangerous consequences in certain contexts (e.g., legal, medical). We curate ACES, a translation accuracy challenge set, consisting of 68 phenomena ranging from simple perturbations at the word/character level to more complex errors based on discourse and real-world knowledge. We use ACES to evaluate a wide range of MT metrics including the submissions to the WMT 2022 metrics shared task and perform several analyses leading to general recommendations for metric developers. We recommend: a) combining metrics with different strengths, b) developing metrics that give more weight to the source and less to surface-level overlap with the reference and c) explicitly modelling additional language-specific information beyond what is available via multilingual embeddings.
translated by 谷歌翻译
We present the task of PreQuEL, Pre-(Quality-Estimation) Learning. A PreQuEL system predicts how well a given sentence will be translated, without recourse to the actual translation, thus eschewing unnecessary resource allocation when translation quality is bound to be low. PreQuEL can be defined relative to a given MT system (e.g., some industry service) or generally relative to the state-of-the-art. From a theoretical perspective, PreQuEL places the focus on the source text, tracing properties, possibly linguistic features, that make a sentence harder to machine translate. We develop a baseline model for the task and analyze its performance. We also develop a data augmentation method (from parallel corpora), that improves results substantially. We show that this augmentation method can improve the performance of the Quality-Estimation task as well. We investigate the properties of the input text that our model is sensitive to, by testing it on challenge sets and different languages. We conclude that it is aware of syntactic and semantic distinctions, and correlates and even over-emphasizes the importance of standard NLP features.
translated by 谷歌翻译
逆文本归一化(ITN)是自动语音识别(ASR)中必不可少的后处理步骤。它将数字,日期,缩写和其他符号类别从ASR产生的口头形式转换为其书面形式。人们可以将ITN视为机器翻译任务,并使用神经序列到序列模型来解决它。不幸的是,这种神经模型容易产生可能导致不可接受的错误的幻觉。为了减轻此问题,我们提出了一个单个令牌分类器模型,将ITN视为标记任务。该模型将替换片段分配给每个输入令牌,或将其标记为删除或复制而无需更改。我们提出了基于ITN示例的粒状对齐方式的数据集准备方法。提出的模型不太容易出现幻觉错误。该模型在Google文本归一化数据集上进行了培训,并在英语和俄罗斯测试集上实现了最先进的句子精度。标签和输入单词之间的一对一对应关系可改善模型预测的解释性,简化调试并允许后处理更正。该模型比序列到序列模型更简单,并且在生产设置中更易于优化。准备数据集的模型和代码作为NEMO项目的一部分发布。
translated by 谷歌翻译
监督机器翻译的绝大多数评估指标,即(i)假设参考翻译的存在,(ii)受到人体得分的培训,或(iii)利用并行数据。这阻碍了其适用于此类监督信号的情况。在这项工作中,我们开发了完全无监督的评估指标。为此,我们利用评估指标,平行语料库开采和MT系统之间的相似性和协同作用。特别是,我们使用无监督的评估指标来开采伪并行数据,我们用来重塑缺陷的基础向量空间(以迭代方式),并诱导无监督的MT系统,然后提供伪引用作为伪参考作为在中的附加组件中的附加组件指标。最后,我们还从伪并行数据中诱导无监督的多语言句子嵌入。我们表明,我们完全无监督的指标是有效的,即,他们在5个评估数据集中的4个击败了受监督的竞争对手。
translated by 谷歌翻译
我们引入了翻译误差校正(TEC),这是自动校正人类生成的翻译的任务。机器翻译(MT)的瑕疵具有长期的动机系统,可以通过自动编辑后改善变化后的转换。相比之下,尽管人类直觉上犯了不同的错误,但很少有人注意自动纠正人类翻译的问题,从错别字到翻译约定的矛盾之处。为了调查这一点,我们使用三个TEC数据集构建和释放ACED语料库。我们表明,与自动后编辑数据集中的MT错误相比,TEC中的人类错误表现出更加多样化的错误,翻译流利性误差要少得多,这表明需要专门用于纠正人类错误的专用TEC模型。我们表明,基于人类错误的合成错误的预训练可将TEC F-SCORE提高多达5.1点。我们通过九名专业翻译编辑进行了人类的用户研究,发现我们的TEC系统的帮助使他们产生了更高质量的修订翻译。
translated by 谷歌翻译