Multi-lingual language models (LM), such as mBERT, XLM-R, mT5, mBART, have been remarkably successful in enabling natural language tasks in low-resource languages through cross-lingual transfer from high-resource ones. In this work, we try to better understand how such models, specifically mT5, transfer *any* linguistic and semantic knowledge across languages, even though no explicit cross-lingual signals are provided during pre-training. Rather, only unannotated texts from each language are presented to the model separately and independently of one another, and the model appears to implicitly learn cross-lingual connections. This raises several questions that motivate our study, such as: Are the cross-lingual connections between every language pair equally strong? What properties of source and target language impact the strength of cross-lingual transfer? Can we quantify the impact of those properties on the cross-lingual transfer? In our investigation, we analyze a pre-trained mT5 to discover the attributes of cross-lingual connections learned by the model. Through a statistical interpretation framework over 90 language pairs across three tasks, we show that transfer performance can be modeled by a few linguistic and data-derived features. These observations enable us to interpret cross-lingual understanding of the mT5 model. Through these observations, one can favorably choose the best source language for a task, and can anticipate its training data demands. A key finding of this work is that similarity of syntax, morphology and phonology are good predictors of cross-lingual transfer, significantly more than just the lexical similarity of languages. For a given language, we are able to predict zero-shot performance, that increases on a logarithmic scale with the number of few-shot target language data points.
translated by 谷歌翻译
Much recent progress in applications of machine learning models to NLP has been driven by benchmarks that evaluate models across a wide variety of tasks. However, these broad-coverage benchmarks have been mostly limited to English, and despite an increasing interest in multilingual models, a benchmark that enables the comprehensive evaluation of such methods on a diverse range of languages and tasks is still missing. To this end, we introduce the Cross-lingual TRansfer Evaluation of Multilingual Encoders (XTREME) benchmark, a multi-task benchmark for evaluating the cross-lingual generalization capabilities of multilingual representations across 40 languages and 9 tasks. We demonstrate that while models tested on English reach human performance on many tasks, there is still a sizable gap in the performance of cross-lingually transferred models, particularly on syntactic and sentence retrieval tasks. There is also a wide spread of results across languages. We release the benchmark 1 to encourage research on cross-lingual learning methods that transfer linguistic knowledge across a diverse and representative set of languages and tasks.
translated by 谷歌翻译
Multilingual BERT (mBERT) has demonstrated considerable cross-lingual syntactic ability, whereby it enables effective zero-shot cross-lingual transfer of syntactic knowledge. The transfer is more successful between some languages, but it is not well understood what leads to this variation and whether it fairly reflects difference between languages. In this work, we investigate the distributions of grammatical relations induced from mBERT in the context of 24 typologically different languages. We demonstrate that the distance between the distributions of different languages is highly consistent with the syntactic difference in terms of linguistic formalisms. Such difference learnt via self-supervision plays a crucial role in the zero-shot transfer performance and can be predicted by variation in morphosyntactic properties between languages. These results suggest that mBERT properly encodes languages in a way consistent with linguistic diversity and provide insights into the mechanism of cross-lingual transfer.
translated by 谷歌翻译
虽然最近关于多语种语言模型的工作已经证明了他们对下游任务的交叉零射击传输的能力,但社区缺乏符合语言之间的共享属性,可以实现这种转移。涉及成对的自然语言的分析通常是不确定的,并且矛盾以来,许多语言方面同时不同。在本文中,我们进行大规模的实证研究,通过测量四种不同的自然语言和通过修改脚本,单词顺序和语法等方面构造的零拍摄传递来隔离各种语言特性的影响。在其他事情之外,我们的实验表明,当语言的单词顺序不同时,缺乏子字重叠显着影响零拍摄传输,并且在语言之间的传输性能和Word嵌入对准之间存在强烈相关性(例如,r = 0.94关于NLI的任务)。我们的结果呼吁专注于在明确改进语言之间的嵌入对齐而不是依赖于隐含的出现。
translated by 谷歌翻译
多语言语言模型(\ mllms),如mbert,xlm,xlm-r,\ textit {etc。}已成为一种可行的选择,使预先估计到大量语言的力量。鉴于他们的成功在零射击转移学习中,在(i)建立更大的\ mllms〜覆盖了大量语言(ii)创建覆盖更广泛的任务和语言来评估的详尽工作基准mllms〜(iii)分析单音零点,零拍摄交叉和双语任务(iv)对Monolingual的性能,了解\ mllms〜(v)增强(通常)学习的通用语言模式(如果有的话)有限的容量\ mllms〜以提高他们在已见甚至看不见语言的表现。在这项调查中,我们审查了现有的文学,涵盖了上述与\ MLLMS有关的广泛研究领域。根据我们的调查,我们建议您有一些未来的研究方向。
translated by 谷歌翻译
与辅助语言的元学习已经表明了对交叉语言自然语言处理的有希望的改进。然而,以前的研究采样使用相同语言的元培训和元测试数据,这限制了模型交叉传输的能力。在本文中,我们提出了XLA-MAML,在元学习阶段执行直接交叉调整。我们对自然语言推理和问题进行零射击和几次拍摄实验。实验结果表明了我们在不同语言,任务和预磨料模型中的方法的有效性。我们还对元学习的各种交叉特定设置进行了分析,包括采样策略和并行性。
translated by 谷歌翻译
In this paper, we show that Multilingual BERT (M-BERT), released by Devlin et al. (2019) as a single language model pre-trained from monolingual corpora in 104 languages, is surprisingly good at zero-shot cross-lingual model transfer, in which task-specific annotations in one language are used to fine-tune the model for evaluation in another language. To understand why, we present a large number of probing experiments, showing that transfer is possible even to languages in different scripts, that transfer works best between typologically similar languages, that monolingual corpora can train models for code-switching, and that the model can find translation pairs. From these results, we can conclude that M-BERT does create multilingual representations, but that these representations exhibit systematic deficiencies affecting certain language pairs.
translated by 谷歌翻译
多语种预训练模型在许多多语言NLP任务中展示了它们的有效性,并使从高资源语言到低资源的零射击或几秒钟传输。然而,由于某种语言之间的显着的类型差异和矛盾,这些模型通常在许多语言和交叉语言设置上表现不佳,这表明了学习单一模型同时处理大规模不同语言的难度。为了减轻这个问题,我们提出了一个新的多语言预训练管道。我们建议从多语言预先训练的模型产生语言表示,并进行语言分析,以表明语言表示相似度反映了从多个角度来看的语言相似度,包括语言家庭,地理蓝星,词汇表演和语法。然后,我们将所有目标语言集成到多个组中,并将每个组名称为表示SprachBund。因此,在同一表示SprachBund中的语言应该在培训和微调中互相提升,因为它们共享丰富的语言相似性。我们预先列车为每个代表斯普拉克班达一个多语言模型。实验在交叉基准上进行,与强基线相比,实现了显着的改进。
translated by 谷歌翻译
几乎没有射击转移通常显示出比零射击转移〜\ c​​ite {lauscher2020zero}的大幅增长,这实际上是对基于多语言的基于模型的系统的完全监督和无监督的学习方法之间的一个有用的权衡。本文探讨了选择注释数据的各种策略,这些策略可能会导致更好的几杆转移。所提出的方法依赖于多种措施,例如使用$ n $ gram语言模型,预测性熵和梯度嵌入。我们提出了一种用于序列标记任务的损失嵌入方法,该方法诱导了类似于梯度嵌入的多样性和不确定性采样。评估了建议的数据选择策略,并将最多20种语言的POS标签,NER和NLI任务进行比较。我们的实验表明,基于梯度和损失嵌入的策略始终优于随机数据选择基线,并且随着零拍传输的初始性能而变化。此外,即使模型使用原始特定任务特定标记的训练数据进行了零拍传输的较低比例,提出的方法也显示出改进的相似趋势。
translated by 谷歌翻译
一种有效的横向传输方法是在一种语言中微调在监督数据集上的双语或多语言模型,并以零拍方式在另一种语言上进行评估。在培训时间或推理时间翻译例子也是可行的替代方案。然而,存在与文献中很少有关的这些方法相关的成本。在这项工作中,我们在其有效性(例如,准确性),开发和部署成本方面分析交叉语言方法,以及推理时间的延迟。我们的三个任务的实验表明最好的交叉方法是高度任务依赖性的。最后,通过结合零射和翻译方法,我们在这项工作中使用的三个数据集中实现了最先进的。基于这些结果,我们对目标语言手动标记的培训数据有所了解。代码和翻译的数据集可在https://github.com/unicamp-dl/cross-lingsual-analysis上获得
translated by 谷歌翻译
随着预培训的语言模型变得更加要求资源,因此资源丰富的语言(例如英语和资源筛选)语言之间的不平等正在恶化。这可以归因于以下事实:每种语言中的可用培训数据量都遵循幂律分布,并且大多数语言都属于分布的长尾巴。一些研究领域试图缓解这个问题。例如,在跨语言转移学习和多语言培训中,目标是通过从资源丰富的语言中获得的知识使长尾语言受益。尽管成功,但现有工作主要集中于尝试尽可能多的语言。结果,有针对性的深入分析主要不存在。在这项研究中,我们专注于单一的低资源语言,并使用跨语性培训(XPT)进行广泛的评估和探测实验。为了使转移方案具有挑战性,我们选择韩语作为目标语言,因为它是一种孤立的语言,因此与英语几乎没有类型的分类。结果表明,XPT不仅优于表现或与单语模型相当,该模型训练有大小的数据,而且在传输过程中也很高。
translated by 谷歌翻译
在过去几年中,已经提出了多语言预训练的语言模型(PLMS)的激增,以实现许多交叉曲线下游任务的最先进的性能。但是,了解为什么多语言PLMS表现良好仍然是一个开放域。例如,目前尚不清楚多语言PLM是否揭示了不同语言的一致令牌归因。要解决此问题,请在本文中提出了令牌归因(CCTA)评估框架的交叉致新一致性。三个下游任务中的广泛实验表明,多语言PLMS为多语素同义词分配了显着不同的归因。此外,我们有以下观察结果:1)当它用于培训PLMS时,西班牙语在不同语言中实现了最常见的令牌归属;2)令牌归属的一致性与下游任务中的性能强烈相关。
translated by 谷歌翻译
GPT-3等大型自回归语言模型是几秒钟的学习者,可以在没有微调的情况下执行各种语言任务。虽然已知这些模型能够共同代表许多不同的语言,但他们的培训数据由英语主导,可能限制了它们的交叉概括。在这项工作中,我们在覆盖多种语言的平衡语料库上培训多语言自回归语言模型,并在广泛的任务中研究他们几乎没有零点的学习能力。我们最大的模型,具有75亿参数,在20多种代表语言中,在几种代表语言中,在几种代表性语言中,在几种代表性语言中,在多语言型号推理中表现出可比大小的GPT-3(在0次设置和0次拍摄设置中的绝对精度改善+ 7.4% 4-拍摄设置中的9.4%)和自然语言推理(每次拍摄和4次设置中的每一个+ 5.4%)。在Flores-101机器翻译基准测试中,我们的模型优于GPT-3在182个翻译方向上有32个培训例子,同时超过45个方向的官方监督基线。我们介绍了模型成功和失败的位置的详细分析,特别是它尤其显示在某些任务中实现交叉语境的内容学习,而仍然存在改善表面的鲁棒性和适应没有a的任务的余地自然冻结形式。最后,我们评估我们在仇恨语音检测中以五种语言的仇恨语音检测的模型,并发现它具有与可比大小的GPT-3模型类似的限制。
translated by 谷歌翻译
虽然对多语言视觉语言预测的模型实现了一些好处,但是当将多句预训练的视力语言模型应用于非英语数据时,各种任务和语言的最新基准测试表明,跨语性概括不佳,并且在有监督之间存在很大的差距( )英语表现和(零射)跨语性转移。在这项工作中,我们探讨了这些模型在零拍的跨语性视觉响应(VQA)任务上的糟糕性能,其中模型在英语视觉问题数据上进行了微调,并对7种类型上多样的语言进行了评估。我们通过三种策略改善了跨语性转移:(1)我们引入了语言的先验目标,以增加基于相似性损失以指导模型在培训期间的跨渗透损失,(2)我们学习了一个特定于任务的子网络,改善跨语性概括并减少不修改模型的方差,(3)我们使用合成代码混合来扩大培训示例,以促进源和目标语言之间的嵌入。我们使用预审计的多语言多模式变压器UC2和M3P进行的XGQA实验证明了针对7种语言提出的微调策略的一致有效性,以稀疏模型优于现有的转移方法。复制我们发现的代码和数据已公开可用。
translated by 谷歌翻译
Universal cross-lingual sentence embeddings map semantically similar cross-lingual sentences into a shared embedding space. Aligning cross-lingual sentence embeddings usually requires supervised cross-lingual parallel sentences. In this work, we propose mSimCSE, which extends SimCSE to multilingual settings and reveal that contrastive learning on English data can surprisingly learn high-quality universal cross-lingual sentence embeddings without any parallel data. In unsupervised and weakly supervised settings, mSimCSE significantly improves previous sentence embedding methods on cross-lingual retrieval and multilingual STS tasks. The performance of unsupervised mSimCSE is comparable to fully supervised methods in retrieving low-resource languages and multilingual STS. The performance can be further enhanced when cross-lingual NLI data is available. Our code is publicly available at https://github.com/yaushian/mSimCSE.
translated by 谷歌翻译
深层语言语言模型(LMS)如Elmo,BERT及其继任者通过预先训练单个模型来迅速缩放自然语言处理的景观,然后是任务特定的微调。此外,像XLM-R和MBERT这样的这种模型的多语言版本使得有希望的零射击交叉传输导致,可能在许多不足和资源不足的语言中实现NLP应用。由于此初步成功,预先接受的模型被用作“通用语言模型”作为不同任务,域和语言的起点。这项工作通过识别通用模型应该能够扩展的七个维度来探讨“普遍性”的概念,即同样良好或相当良好地执行,在不同的环境中有用。我们概述了当前支持这些维度的模型性能的当前理论和经验结果,以及可能有助于解决其当前限制的扩展。通过这项调查,我们为理解大规模上下文语言模型的能力和限制奠定了基础,并帮助辨别研究差距和未来工作的方向,使这些LMS包含多样化和公平的应用,用户和语言现象。
translated by 谷歌翻译
对于许多任务,基于变压器的体系结构已经实现了最新的结果,从而导致实践从使用特定于任务的架构到预先训练的语言模型的微调。持续的趋势包括具有越来越多的数据和参数的培训模型,这需要大量资源。它导致了强有力的搜索,以提高基于仅针对英语评估的算法和硬件改进的算法和硬件改进。这引发了有关其可用性的疑问,当应用于小规模的学习问题时,对于资源不足的语言任务,有限的培训数据可用。缺乏适当尺寸的语料库是应用数据驱动和转移学习的方法的障碍。在本文中,我们建立了致力于基于变压器模型的可用性的最新努力,并建议评估这些改进的法语表现,而法语的效果很少。我们通过通过数据增强,超参数优化和跨语性转移来调查各种培训策略来解决与数据稀缺有关的不稳定。我们还为法国弗拉伯特(Fralbert)引入了一种新的紧凑型模型,该模型在低资源环境中被证明具有竞争力。
translated by 谷歌翻译
基于变压器的架构在许多下游流动任务中显示出显着的结果,包括问题应答。另一方面,数据的可用性阻碍了获得低资源语言的合法性能。在本文中,我们调查了预先训练的多语言模型的适用性,以提高低资源语言的问题的表现。我们使用与MLQA DataSet类似的七种语言进行多语言变压器架构测试了四种语言和任务适配器的组合。此外,我们还提出了使用语言和任务适配器回答的低资源问题的零拍摄转移学习。我们观察到堆叠语言和任务适配器对低资源语言的微语文变压器模型的性能显着提高。
translated by 谷歌翻译
Translating training data into many languages has emerged as a practical solution for improving cross-lingual transfer. For tasks that involve span-level annotations, such as information extraction or question answering, an additional label projection step is required to map annotated spans onto the translated texts. Recently, a few efforts have utilized a simple mark-then-translate method to jointly perform translation and projection by inserting special markers around the labeled spans in the original sentence. However, as far as we are aware, no empirical analysis has been conducted on how this approach compares to traditional annotation projection based on word alignment. In this paper, we present an extensive empirical study across 42 languages and three tasks (QA, NER, and Event Extraction) to evaluate the effectiveness and limitations of both methods, filling an important gap in the literature. Experimental results show that our optimized version of mark-then-translate, which we call EasyProject, is easily applied to many languages and works surprisingly well, outperforming the more complex word alignment-based methods. We analyze several key factors that affect end-task performance, and show EasyProject works well because it can accurately preserve label span boundaries after translation. We will publicly release all our code and data.
translated by 谷歌翻译
Pre-trained multilingual language models show significant performance gains for zero-shot cross-lingual model transfer on a wide range of natural language understanding (NLU) tasks. Previously, for zero-shot cross-lingual evaluation, pre-trained models are only fine-tuned on English data and tested on a variety of target languages. In this paper, we do cross-lingual evaluation on various NLU tasks (sentence classification, sequence labeling, question answering) using prompt-tuning and compare it with fine-tuning. The results show that prompt tuning achieves much better cross-lingual transfer than fine-tuning across datasets, with only 0.1% to 0.3% tuned parameters. Additionally, we demonstrate through the analysis that prompt tuning can have better cross-lingual transferability of representations on downstream tasks with better aligned decision boundaries.
translated by 谷歌翻译