We train one multilingual model for dependency parsing and use it to parse sentences in several languages. The parsing model uses (i) multilingual word clusters and em-beddings; (ii) token-level language information ; and (iii) language-specific features (fine-grained POS tags). This input representation enables the parser not only to parse effectively in multiple languages, but also to generalize across languages based on linguistic uni-versals and typological similarities, making it more effective to learn from limited annotations. Our parser's performance compares favorably to strong baselines in a range of data scenarios, including when the target language has a large treebank, a small treebank, or no treebank for training.
translated by 谷歌翻译
对于没有注释资源的语言,无资源的自然语言处理模型的转移(如来自资源丰富的语言的命名实体识别(NER))将是一种吸引人的能力。但是,跨语言的单词和单词顺序的差异使其成为一个具有挑战性的问题。为了改进跨语言的词汇项目的映射,我们提出了一种基于双语词汇嵌入的翻译方法。为了提高鲁棒性顺序差异,我们建议使用自我关注,这允许在词序方面具有足够的灵活性。我们证明这些方法在跨语言环境下对常用语言实现了最先进或竞争性的NER性能,其资源要求比过去的方法低得多。我们还评估了将这些方法应用于低资源语言维吾尔语的挑战。
translated by 谷歌翻译
我们提出了一种无监督的方法来获得跨语言嵌入,而无需任何并行数据或预先训练的字嵌入。所提出的模型,我们呼叫多语言神经语言模型,将多种语言的句子作为输入。所提出的模型包含执行前向和后向语言模型的双向LSTM,并且这些网络在所有语言之间共享。其他参数,即隐藏状态和输出之间的字嵌入和线性转换,是每种语言特有的。共享LSTM可以捕获所有语言中的常见句子结构。因此,每种语言的单词嵌入被映射到共同潜在空间,使得可以测量跨多种语言的单词的相似性。我们在单词对齐任务上评估跨语言工作量的质量。我们的实验证明,当只有少量的单语数据(即50ksentences)可用时,或者单语数据的域在不同语言中不同时,我们的模型可以获得比现有的监督模型更高质量的跨语言嵌入。
translated by 谷歌翻译
How do we parse the languages for which no treebanks are available? This contribution addresses the cross-lingual viewpoint on statistical dependency parsing, in which we attempt to make use of resource-rich source language treebanks to build and adapt models for the under-resourced target languages. We outline the benefits, and indicate the drawbacks of the current major approaches. We emphasize synthetic treebanking: the automatic creation of target language treebanks by means of annotation projection and machine translation. We present competitive results in cross-lingual dependency parsing using a combination of various techniques that contribute to the overall success of the method. We further include a detailed discussion about the impact of part-of-speech label accuracy on parsing results that provide guidance in practical applications of cross-lingual methods for truly under-resourced languages.
translated by 谷歌翻译
Cross-lingual representations of words enable us to reason about word meaning in multilingual contexts and are a key facilitator of cross-lingual transfer when developing natural language processing models for low-resource languages. In this survey, we provide a comprehensive typology of cross-lingual word embedding models. We compare their data requirements and objective functions. The recurring theme of the survey is that many of the models presented in the literature optimize for the same objectives, and that seemingly different models are often equivalent modulo optimization strategies, hyper-parameters, and such. We also discuss the different ways cross-lingual word embeddings are evaluated, as well as future challenges and research horizons.
translated by 谷歌翻译
Leveraging zero-shot learning to learn mapping functions between vector spaces of different languages is a promising approach to bilingual dictionary induction. However, methods using this approach have not yet achieved high accuracy on the task. In this paper, we propose a bridging approach, where our main contribution is a knowledge distillation training objective. As teachers, rich resource translation paths are exploited in this role. And as learners, translation paths involving low resource languages learn from the teachers. Our training objective allows seamless addition of teacher translation paths for any given low resource pair. Since our approach relies on the quality of monolin-gual word embeddings, we also propose to enhance vector representations of both the source and target language with linguistic information. Our experiments on various languages show large performance gains from our distillation training objective, obtaining as high as 17% accuracy improvements .
translated by 谷歌翻译
跨语言嵌入在多语种NLP中变得越来越重要。最近,已经表明,通过线性变换对齐两个不相交的单语向量空间,使用不超过一个小的双语词典,可以有效地学习这些嵌入。在这项工作中,我们建议在初始对齐步骤之后应用额外的转换,这将跨语言同义词移向它们之间的中间点。通过应用这种转换,我们的目标是获得更好的矢量空间的跨语言整合。此外,令人惊讶的是,单语空间也通过这种转变得到改善。这与原始比对形成对比,原始比对通常被学习,使得单语空间的结构得以保留。我们的实验证实,由此产生的跨语言嵌入在单语和跨语言评估任务中都表现出最先进的模型。
translated by 谷歌翻译
We present a novel, count-based approach to obtaining inter-lingual word representations based on inverted indexing of Wikipedia. We present experiments applying these representations to 17 datasets in document classification, POS tagging, dependency parsing, and word alignment. Our approach has the advantage that it is simple, computationally efficient and almost parameter-free, and, more importantly , it enables multi-source cross-lingual learning. In 14/17 cases, we improve over using state-of-the-art bilingual embeddings.
translated by 谷歌翻译
最近的研究已经证明了生成预训练对于英语自然语言理解的效率。在这项工作中,我们将这种方法扩展到多种语言,并展示了跨语言预训练的有效性。我们提出了两种学习跨语言语言模型的方法:一种是仅依赖于单语数据的监督模式,另一种是监督使用并行数据的方法。一种新的跨语言语言模型目标。我们在跨语言分类,无监督和监督机器翻译方面取得了最先进的成果。在XNLI上,我们的方法以绝对增益4.9%的精度推动了现有技术。在无人监督的机器翻译中,我们在WMT'16德语 - 英语上获得了34.3 BLEU,提高了超过9个BLEU的先前技术水平。在有监督的机器翻译中,我们在WMT'16罗马尼亚语 - 英语上获得了38.5 BLEU的最新技术水平,超过了以前的最佳方法超过4个BLEU。我们的代码和预训练模型将公开发布。
translated by 谷歌翻译
This paper investigates the problem of cross-lingual dependency parsing, aiming at inducing dependency parsers for low-resource languages while using only training data from a resource-rich language (e.g. English). Existing approaches typically don't include lexical features, which are not transferable across languages. In this paper, we bridge the lexical feature gap by using distributed feature representations and their composition. We provide two algorithms for inducing cross-lingual distributed representations of words, which map vocabularies from two different languages into a common vector space. Consequently, both lexical features and non-lexical features can be used in our model for cross-lingual transfer. Furthermore, our framework is able to incorporate additional useful features such as cross-lingual word clusters. Our combined contributions achieve an average relative error reduction of 10.9% in labeled attachment score as compared with the delexicalized parser, trained on English universal treebank and transferred to three other languages. It also significantly out-performs McDonald et al. (2013) augmented with projected cluster features on identical data.
translated by 谷歌翻译
学习跨语言词汇嵌入的最先进方法依赖于双语词典或平行语料库。最近的研究表明,可以通过字符级信息减轻对并行数据监督的需求。虽然这些方法显示出令人鼓舞的结果,但它们与受监督的同行并不相同,并且仅限于共享共同字母的语言对。对于这项工作,我们表明我们可以在不使用任何平行的情况下建立两种语言之间的双语词典,通过以无人监督的方式对齐单语词嵌入空间。没有使用任何字符信息,我们的模型甚至优于现有的监督方法对一些语言的任务语言对。 Ourexperiments证明我们的方法也适用于远程语言对,如英语 - 俄语或英语 - 中文。我们最后描述了英语 - 世界语低资源语言对上的实验,其上存在有限数量的并行数据,以显示我们的方法在完全无监督的机器翻译中的潜在影响。我们的代码,嵌入和管理是公开的。
translated by 谷歌翻译
This paper proposes to learn language-independent word representations to address cross-lingual dependency parsing, which aims to predict the dependency parsing trees for sentences in the target language by training a dependency parser with labeled sentences from a source language. We first combine all sentences from both languages to induce real-valued distributed representation of words under a deep neural network architecture, which is expected to capture semantic similarities of words not only within the same language but also across different languages. We then use the induced interlingual word representation as augmenting features to train a delexicalized dependency parser on labeled sentences in the source language and apply it to the target sentences. To investigate the effectiveness of the proposed technique, extensive experiments are conducted on cross-lingual dependency parsing tasks with nine different languages. The experimental results demonstrate the superior cross-lingual generalizability of the word representation induced by the proposed approach, comparing to alternative comparison methods.
translated by 谷歌翻译
Even for common NLP tasks, sufficient supervision is not available in many languages-morphological tagging is no exception. In the work presented here, we explore a transfer learning scheme, whereby we train character-level recurrent neural taggers to predict morphological taggings for high-resource languages and low-resource languages together. Learning joint character representations among multiple related languages successfully enables knowledge transfer from the high-resource languages to the low-resource ones, improving accuracy by up to 30%.
translated by 谷歌翻译
跨语言转移是高资源语言帮助低资源语言的主要手段。在本文中,我们研究了跨语言跨语言的跨语言转换。我们假设基于递归神经网络(RNNs)的编码器,因为明确地包含表面词序,是跨越远程语言的英语转换,而自我注意模型在建立词序信息时更加灵活;在交叉语言转换设置中,thusthey会更强大。我们通过仅依赖于英语语言训练依赖性分析器并在其他31种语言上对它们进行评估来测试我们的假设。通过细分分析,我们发现有趣的模式表明,基于RNNs的基础设施可以很好地转移到接近英语的语言,而自我关注的模型在各种语言范围内具有更好的跨语言可转移性。
translated by 谷歌翻译
We present a novel approach for inducing unsupervised dependency parsers for languages that have no labeled training data, but have translated text in a resource-rich language. We train probabilistic parsing models for resource-poor languages by transferring cross-lingual knowledge from resource-rich language with entropy reg-ularization. Our method can be used as a purely monolingual dependency parser, requiring no human translations for the test data, thus making it applicable to a wide range of resource-poor languages. We perform experiments on three Data sets-Version 1.0 and version 2.0 of Google Universal Dependency Treebanks and Treebanks from CoNLL shared-tasks, across ten languages. We obtain state-of-the art performance of all the three data sets when compared with previously studied unsupervised and projected parsing systems.
translated by 谷歌翻译
多语言嵌入(MWE)在单个分布向量空间中表示来自多种语言的单词。无监督的MWE(UMWE)方法在没有跨语言监督的情况下获取多语言嵌入,这比传统的监督方法具有显着的优势,并为低资源语言开辟了许多新的可能性。然而,用于学习UMWE的现有技术仅依赖于许多独立训练的无监督双语词汇嵌入(UBWE)来获得多语言嵌入。这些方法未能充分利用许多语言之间存在的相互依赖关系。为了解决这个缺点,我们提出了一个完全无监督的学习型MWE框架,可以直接利用所有语言对之间的关​​系。我们的模型在多语言翻译和交叉实验中大大优于以前的方法。舌词相似度。此外,我们的模型甚至胜过用跨语言资源培训的监督方法。
translated by 谷歌翻译
Crosslingual word embeddings represent lexical items from different languages in the same vector space, enabling transfer of NLP tools. However, previous attempts had expensive resource requirements, difficulty incorporating monolingual data or were unable to handle polysemy. We address these drawbacks in our method which takes advantage of a high coverage dictionary in an EM style training algorithm over monolingual corpora in two languages. Our model achieves state-of-the-art performance on bilingual lexicon induction task exceeding models using large bilingual corpora, and competitive results on the mono-lingual word similarity and cross-lingual document classification task.
translated by 谷歌翻译
我们提出了一种新的神经网络模型,用于联合词性(POS)标记和依赖性解析。我们的模型通过结合基于BiLSTM的标记组件来扩展着名的基于BISTgraph的依赖性解析器(Kiperwasser和Goldberg,2016),以便为解析器生成自动预测的POS标记。在基准英语Penn树库上,我们的模型获得了分别为94.51%和92.87%的UAS和LAS分数,对基于BIST图的解析器产生了1.5 +%的绝对改进,并且还获得了最先进的POS标记准确度为97.97%。此外,从原始文本解析61个“大”通用依赖关系树库的实验结果表明我们的模型优于基线UDPipe(Straka和Strakov \'a,2017),平均POS标记得分高0.8%,平均LAS评分高3.6%。此外,通过我们的模型,我们还获得了生物医学事件提取和意见分析应用程序的最新下游任务分数。我们的代码可与所有预先训练的模型一起使用:https://github.com/datquocnguyen / jPTDP
translated by 谷歌翻译
Despite interest in using cross-lingual knowledge to learn word embeddings for various tasks, a systematic comparison of the possible approaches is lacking in the literature. We perform an extensive evaluation of four popular approaches of inducing cross-lingual embeddings, each requiring a different form of supervision, on four typologically different language pairs. Our evaluation setup spans four different tasks, including intrinsic evaluation on mono-lingual and cross-lingual similarity , and extrinsic evaluation on downstream semantic and syntactic applications. We show that models which require expensive cross-lingual knowledge almost always perform better, but cheaply supervised models often prove competitive on certain tasks.
translated by 谷歌翻译