This paper presents the results of the WMT16 shared tasks, which included five machine translation (MT) tasks (standard news, IT-domain, biomedical, multimodal, pronoun), three evaluation tasks (metrics, tuning, run-time estimation of MT quality), and an automatic post-editing task and bilingual document alignment task. This year, 102 MT systems from 24 institutions (plus 36 anonymized online systems) were submitted to the 12 translation directions in the news translation task. The IT-domain task received 31 submissions from 12 institutions in 7 directions and the Biomedical task received 15 submissions systems from 5 institutions. Evaluation was both automatic and manual (relative ranking and 100-point scale assessments). The quality estimation task had three sub-tasks, with a total of 14 teams, submitting 39 entries. The automatic post-editing task had a total of 6 teams, submitting 11 entries .
translated by 谷歌翻译
最近开发了大量机器翻译方法以促进跨语言的内容的流畅迁移。然而,文献表明仍然必须处理许多障碍以实现更好的自动翻译。其中一个障碍是词汇和语法模糊。克服这个问题的一种有希望的方法是使用语义Web技术。本文介绍了依赖语义Web技术翻译文本的机器翻译方法的系统评价结果。总体而言,我们的调查表明,虽然语义Web技术可以提高各种问题的机器翻译输出质量,但两者的结合仍处于起步阶段。
translated by 谷歌翻译
对于没有注释资源的语言,无资源的自然语言处理模型的转移(如来自资源丰富的语言的命名实体识别(NER))将是一种吸引人的能力。但是,跨语言的单词和单词顺序的差异使其成为一个具有挑战性的问题。为了改进跨语言的词汇项目的映射,我们提出了一种基于双语词汇嵌入的翻译方法。为了提高鲁棒性顺序差异,我们建议使用自我关注,这允许在词序方面具有足够的灵活性。我们证明这些方法在跨语言环境下对常用语言实现了最先进或竞争性的NER性能,其资源要求比过去的方法低得多。我们还评估了将这些方法应用于低资源语言维吾尔语的挑战。
translated by 谷歌翻译
在这项工作中,我们专注于有效地利用和整合来自概念层面和词汇层面的信息,通过将概念和文字投影到较低维空间,同时保留最关键的语义。在舆论理解系统的广泛背景下,我们研究了融合嵌入在若干核心NLP任务中的使用:命名实体检测和分类,自动语音识别重新排名和有针对性的情感分析。
translated by 谷歌翻译
自动语音识别(ASR)系统通常需要开发极低资源的语言,以满足最终用途,如音频内容分类和搜索。虽然没有转录语音可以用语言训练ASR系统时通用电话识别是自然的,但是需要研究使用非常少量(几小时以上)转录语音的通用电话模型,特别是在这种状态下。基于DNN的艺术声学模型。 DARPA LORELEI为这种资源非常低的ASR研究提供了一个框架,并为评估人道主义援助,救灾环境中的ASR绩效提供了一个外在的指标。本文介绍了我们基于Kaldi的程序系统,该系统采用ASR的通用电话建模方法,并描述了非常快速适应这种通用ASR系统的方法。我们获得的结果显着优于NIST LoReHLT 2017评估数据集中许多竞争方法所获得的结果。
translated by 谷歌翻译
命名实体识别(NER)是识别命名实体的文本跨度的任务,并将它们分类为预定义的类别,例如人员,位置,组织等.NER用作各种自然语言应用的基础,例如问答,文本摘要。和机器翻译。虽然早期的NER系统能够成功地产生出色的识别精度,但它们在精心设计规则或特征时往往需要大量人力。近年来,通过非线性处理的连续实值向量表示和语义组合赋予的深度学习已经被用于NER系统,产生了最先进的性能。在本文中,我们对现有的NER深度学习技术进行了全面的回顾。我们首先介绍NER资源,包括标记的NER语料库和现成的NER工具。然后,基于沿三个轴的分类法对现有作品进行系统分类:输入,上下文编码器和标签解码器的分布式表示。接下来,我们调查了最新的NER问题设置和应用中最近应用的深度学习技术的最有代表性的方法。最后,我们向读者介绍了NER系统面临的挑战,并概述了该领域的未来发展方向。
translated by 谷歌翻译
The task of paraphrasing is inherently familiar to speakers of all languages. Moreover, the task of automatically generating or extracting semantic equivalences for the various units of language-words, phrases, and sentences-is an important part of natural language processing (NLP) and is being increasingly employed to improve the performance of several NLP applications. In this article, we attempt to conduct a comprehensive and application-independent survey of data-driven phrasal and sentential paraphrase generation methods, while also conveying an appreciation for the importance and potential use of paraphrases in the field of NLP research. Recent work done in manual and automatic construction of paraphrase corpora is also examined. We also discuss the strategies used for evaluating paraphrase generation techniques and briefly explore some future trends in paraphrase generation.
translated by 谷歌翻译
In this paper we give an overview of the Tri-lingual Entity Discovery and Linking (EDL) task at the Knowledge Base Population (KBP) track at TAC2017, and of the Ten Low Resource Language EDL Pilot. We will summarize several new and effective research directions including multilingual common space construction for cross-lingual knowledge transfer, rapid approaches for silver-standard training data generation and joint entity and word representation. We will also sketch out remaining challenges and future research directions.
translated by 谷歌翻译
Cross-lingual representations of words enable us to reason about word meaning in multilingual contexts and are a key facilitator of cross-lingual transfer when developing natural language processing models for low-resource languages. In this survey, we provide a comprehensive typology of cross-lingual word embedding models. We compare their data requirements and objective functions. The recurring theme of the survey is that many of the models presented in the literature optimize for the same objectives, and that seemingly different models are often equivalent modulo optimization strategies, hyper-parameters, and such. We also discuss the different ways cross-lingual word embeddings are evaluated, as well as future challenges and research horizons.
translated by 谷歌翻译
Recent work in NLP has attempted to deal with low-resource languages but still assumed a resource level that is not present for most languages, e.g., the availability of Wikipedia in the target language. We propose a simple method for cross-lingual named entity recognition (NER) that works well in settings with very minimal resources. Our approach makes use of a lexicon to "translate" annotated data available in one or several high resource language(s) into the target language, and learns a standard monolingual NER model there. Further, when Wikipedia is available in the target language, our method can enhance Wikipedia based methods to yield state-of-the-art NER results; we evaluate on 7 diverse languages, improving the state-of-the-art by an average of 5.5% F1 points. With the minimal resources required , this is an extremely portable cross-lingual NER approach, as illustrated using a truly low-resource language, Uyghur.
translated by 谷歌翻译
This paper presents the results of the WMT17 shared tasks, which included three machine translation (MT) tasks (news, biomedical, and multimodal), two evaluation tasks (metrics and run-time estimation of MT quality), an automatic post-editing task, a neural MT training task, and a bandit learning task.
translated by 谷歌翻译
词汇外(OOV)词可能对机器翻译(MT)任务构成严重挑战,特别是对于低资源语言(LRL)。本文采用seq2seq模型的变体来执行从印地语到Bhojpuri的这些词的转换(一个LRL实例),从基于印地语 - Bhojpuri词的双语词典构建的一组同源词中学习。我们证明了我们的模型可以有效地用于具有有限数量的平行语料库的语言,通过在角色级别工作来掌握多种类型的词适应,轮次同步或历时,借词或同源词的语音和正字相似性。我们提供了对适合此任务的角色级NMT系统的训练方面的全面概述,并结合对其各自错误情况的详细分析。使用我们的方法,我们在Hindi到Bhojpuri翻译任务中实现了超过6个BLEU的改进。此外,我们通过将其成功应用于Hindi-Banglacognate对,我们证明了这种转换能够很好地扩展到其他语言。我们的工作可以看作是以下过程中的重要一步:(i)解决MT任务中出现的OOV词语问题,(ii)为资源受限语言创建有效的平行语料库,以及(iii)利用词汇捕获的增强语义知识级别嵌入ontocharacter级别的任务。
translated by 谷歌翻译