低资源语言(例如波罗的海语言)受益于具有非凡的跨语性转移性能功能的大型多语言模型(LMS)。这项工作是对多语言LMS的跨语性表示的解释和分析研究。先前的作品假设这些LMS内部项目表示不同语言的代表形式为共享的跨语言空间。但是,文献产生了矛盾的结果。在本文中,我们重新审视了先前的工作,声称“ Bert不是Interlingua”,并表明不同的语言确实会在此类语言模型中收敛到共享空间,并具有另一种选择策略或相似性索引。然后,我们对使用378个成对语言比较的两个最受欢迎的多语言LMS进行了跨语性代表性分析。我们发现,虽然大多数语言共享联合跨语言空间,但有些语言却没有。但是,我们观察到波罗的海语言确实属于共享空间。
translated by 谷歌翻译
Related works used indexes like CKA and variants of CCA to measure the similarity of cross-lingual representations in multilingual language models. In this paper, we argue that assumptions of CKA/CCA align poorly with one of the motivating goals of cross-lingual learning analysis, i.e., explaining zero-shot cross-lingual transfer. We highlight what valuable aspects of cross-lingual similarity these indexes fail to capture and provide a motivating case study \textit{demonstrating the problem empirically}. Then, we introduce \textit{Average Neuron-Wise Correlation (ANC)} as a straightforward alternative that is exempt from the difficulties of CKA/CCA and is good specifically in a cross-lingual context. Finally, we use ANC to construct evidence that the previously introduced ``first align, then predict'' pattern takes place not only in masked language models (MLMs) but also in multilingual models with \textit{causal language modeling} objectives (CLMs). Moreover, we show that the pattern extends to the \textit{scaled versions} of the MLMs and CLMs (up to 85x original mBERT).\footnote{Our code is publicly available at \url{https://github.com/TartuNLP/xsim}}
translated by 谷歌翻译
Universal cross-lingual sentence embeddings map semantically similar cross-lingual sentences into a shared embedding space. Aligning cross-lingual sentence embeddings usually requires supervised cross-lingual parallel sentences. In this work, we propose mSimCSE, which extends SimCSE to multilingual settings and reveal that contrastive learning on English data can surprisingly learn high-quality universal cross-lingual sentence embeddings without any parallel data. In unsupervised and weakly supervised settings, mSimCSE significantly improves previous sentence embedding methods on cross-lingual retrieval and multilingual STS tasks. The performance of unsupervised mSimCSE is comparable to fully supervised methods in retrieving low-resource languages and multilingual STS. The performance can be further enhanced when cross-lingual NLI data is available. Our code is publicly available at https://github.com/yaushian/mSimCSE.
translated by 谷歌翻译
一些基于变压器的模型可以执行跨语言转移学习:这些模型可以通过一种语言对特定任务进行培训,并以另一种语言的同一任务给予相对良好的结果,尽管仅在单语任务中进行了预先培训。但是,关于这些基于变压器的模型是否学习跨语言的通用模式,目前尚无共识。我们提出了一种单词级的任务不可能的方法,以评估此类模型构建的上下文化表示的对齐方式。我们表明,与以前的方法相比,我们的方法提供了更准确的翻译成对,以评估单词级别对齐。我们的结果表明,基于多语言变压器模型的某些内部层优于其他明确对齐的表示,甚至根据多语言对齐的更严格的定义,更是如此。
translated by 谷歌翻译
在过去几年中,已经提出了多语言预训练的语言模型(PLMS)的激增,以实现许多交叉曲线下游任务的最先进的性能。但是,了解为什么多语言PLMS表现良好仍然是一个开放域。例如,目前尚不清楚多语言PLM是否揭示了不同语言的一致令牌归因。要解决此问题,请在本文中提出了令牌归因(CCTA)评估框架的交叉致新一致性。三个下游任务中的广泛实验表明,多语言PLMS为多语素同义词分配了显着不同的归因。此外,我们有以下观察结果:1)当它用于培训PLMS时,西班牙语在不同语言中实现了最常见的令牌归属;2)令牌归属的一致性与下游任务中的性能强烈相关。
translated by 谷歌翻译
多语言语言模型(\ mllms),如mbert,xlm,xlm-r,\ textit {etc。}已成为一种可行的选择,使预先估计到大量语言的力量。鉴于他们的成功在零射击转移学习中,在(i)建立更大的\ mllms〜覆盖了大量语言(ii)创建覆盖更广泛的任务和语言来评估的详尽工作基准mllms〜(iii)分析单音零点,零拍摄交叉和双语任务(iv)对Monolingual的性能,了解\ mllms〜(v)增强(通常)学习的通用语言模式(如果有的话)有限的容量\ mllms〜以提高他们在已见甚至看不见语言的表现。在这项调查中,我们审查了现有的文学,涵盖了上述与\ MLLMS有关的广泛研究领域。根据我们的调查,我们建议您有一些未来的研究方向。
translated by 谷歌翻译
Pre-trained multilingual language models show significant performance gains for zero-shot cross-lingual model transfer on a wide range of natural language understanding (NLU) tasks. Previously, for zero-shot cross-lingual evaluation, pre-trained models are only fine-tuned on English data and tested on a variety of target languages. In this paper, we do cross-lingual evaluation on various NLU tasks (sentence classification, sequence labeling, question answering) using prompt-tuning and compare it with fine-tuning. The results show that prompt tuning achieves much better cross-lingual transfer than fine-tuning across datasets, with only 0.1% to 0.3% tuned parameters. Additionally, we demonstrate through the analysis that prompt tuning can have better cross-lingual transferability of representations on downstream tasks with better aligned decision boundaries.
translated by 谷歌翻译
Multilingual Pretrained Language Models (MPLMs) have shown their strong multilinguality in recent empirical cross-lingual transfer studies. In this paper, we propose the Prompts Augmented by Retrieval Crosslingually (PARC) pipeline to improve the zero-shot performance on low-resource languages (LRLs) by augmenting the context with semantically similar sentences retrieved from a high-resource language (HRL) as prompts. PARC improves the zero-shot performance on three downstream tasks (binary sentiment classification, topic categorization and natural language inference) with multilingual parallel test sets across 10 LRLs covering 6 language families in both unlabeled settings (+5.1%) and labeled settings (+16.3%). PARC-labeled also outperforms the finetuning baseline by 3.7%. We find a significant positive correlation between cross-lingual transfer performance on one side, and the similarity between the high- and low-resource languages as well as the amount of low-resource pretraining data on the other side. A robustness analysis suggests that PARC has the potential to achieve even stronger performance with more powerful MPLMs.
translated by 谷歌翻译
多语种伯格(M-BERT)中的令牌嵌入式包含语言和语义信息。我们发现,通过简单地平均语言的令牌的嵌入来获得语言的表示。鉴于这种语言表示,我们通过操纵令牌嵌入式来控制多语种倾斜的输出语言,从而实现无监督的令牌翻译。我们进一步提出了一种计算廉价但有效的方法来改善基于该观察的M-BERT的交叉能力。
translated by 谷歌翻译
在本文中,我们建议将不同语言的句子表示对齐到统一的嵌入空间,其中可以用简单的点产品计算语义相似之处(交叉语言和单晶)。预先接受的语言模型与翻译排名任务进行微调。现有工作(Feng等人,2020)使用与批量相同的句子作为否定,这可能会遭受易于否定的问题。我们适应MOCO(赫尔,2020)以进一步提高对准质量。作为实验结果表明,我们的模型产生的句子表示在包括Tatoeba en-Zh的许多任务中实现了新的最先进的,包括STATOEBA EN-ZH类似性搜索(Artetxe和Schwenk,2019b),Bucc en-Zh Bitext Mining,7个数据集上的语义文本相似性。
translated by 谷歌翻译
Providing better language tools for low-resource and endangered languages is imperative for equitable growth. Recent progress with massively multilingual pretrained models has proven surprisingly effective at performing zero-shot transfer to a wide variety of languages. However, this transfer is not universal, with many languages not currently understood by multilingual approaches. It is estimated that only 72 languages possess a "small set of labeled datasets" on which we could test a model's performance, the vast majority of languages not having the resources available to simply evaluate performances on. In this work, we attempt to clarify which languages do and do not currently benefit from such transfer. To that end, we develop a general approach that requires only unlabelled text to detect which languages are not well understood by a cross-lingual model. Our approach is derived from the hypothesis that if a model's understanding is insensitive to perturbations to text in a language, it is likely to have a limited understanding of that language. We construct a cross-lingual sentence similarity task to evaluate our approach empirically on 350, primarily low-resource, languages.
translated by 谷歌翻译
翻译质量估计(QE)是预测机器翻译(MT)输出质量的任务,而无需任何参考。作为MT实际应用中的重要组成部分,这项任务已越来越受到关注。在本文中,我们首先提出了XLMRScore,这是一种基于使用XLM-Roberta(XLMR)模型计算的BertScore的简单无监督的QE方法,同时讨论了使用此方法发生的问题。接下来,我们建议两种减轻问题的方法:用未知令牌和预训练模型的跨语性对准替换未翻译的单词,以表示彼此之间的一致性单词。我们在WMT21 QE共享任务的四个低资源语言对上评估了所提出的方法,以及本文介绍的新的英语FARSI测试数据集。实验表明,我们的方法可以在两个零射击方案的监督基线中获得可比的结果,即皮尔森相关性的差异少于0.01,同时在所有低资源语言对中的平均低资源语言对中的无人看管竞争对手的平均水平超过8%的平均水平超过8%。 。
translated by 谷歌翻译
表明多语言语言模型允许跨脚本和语言进行非平凡的转移。在这项工作中,我们研究了能够转移的内部表示的结构。我们将重点放在性别区分作为实际案例研究的表示上,并研究在跨不同语言的共享子空间中编码性别概念的程度。我们的分析表明,性别表示由几个跨语言共享的重要组成部分以及特定于语言的组成部分组成。与语言无关和特定语言的组成部分的存在为我们做出的有趣的经验观察提供了解释:虽然性别分类跨语言良好地传递了跨语言,对性别删除的干预措施,对单一语言进行了培训,但不会轻易转移给其他人。
translated by 谷歌翻译
知识库,例如Wikidata Amass大量命名实体信息,例如多语言标签,这些信息对于各种多语言和跨语义应用程序非常有用。但是,从信息一致性的角度来看,不能保证这样的标签可以跨语言匹配,从而极大地损害了它们对机器翻译等字段的有用性。在这项工作中,我们研究了单词和句子对准技术的应用,再加上匹配算法,以将从Wikidata提取的10种语言中提取的跨语性实体标签对齐。我们的结果表明,Wikidata的主标签之间的映射将通过任何使用的方法都大大提高(F1分数最高20美元)。我们展示了依赖句子嵌入的方法如何超过所有其他脚本,甚至在不同的脚本上。我们认为,这种技术在测量标签对的相似性上的应用,再加上富含高质量实体标签的知识库,是机器翻译的绝佳资产。
translated by 谷歌翻译
跨语言嵌入(CLWE)已被证明在许多跨语性任务中有用。但是,大多数现有的学习Clwe的方法,包括具有上下文嵌入的方法是无知的。在这项工作中,我们提出了一个新颖的框架,以通过仅利用双语词典的跨语性信号来使上下文嵌入在感觉层面上。我们通过首先提出一种新颖的感知感知的跨熵损失来明确地提出一种新颖的感知跨熵损失来实现我们的框架。通过感知感知的跨熵损失预算的单语Elmo和BERT模型显示出对单词感官歧义任务的显着改善。然后,我们提出了一个感官对齐目标,除了跨语义模型预训练的感知感知跨熵损失以及几种语言对的跨语义模型(英语对德语/西班牙语/日本/中文)。与最佳的基线结果相比,我们的跨语言模型分别在零摄影,情感分类和XNLI任务上达到0.52%,2.09%和1.29%的平均绩效提高。
translated by 谷歌翻译
Multilingual BERT (mBERT) has demonstrated considerable cross-lingual syntactic ability, whereby it enables effective zero-shot cross-lingual transfer of syntactic knowledge. The transfer is more successful between some languages, but it is not well understood what leads to this variation and whether it fairly reflects difference between languages. In this work, we investigate the distributions of grammatical relations induced from mBERT in the context of 24 typologically different languages. We demonstrate that the distance between the distributions of different languages is highly consistent with the syntactic difference in terms of linguistic formalisms. Such difference learnt via self-supervision plays a crucial role in the zero-shot transfer performance and can be predicted by variation in morphosyntactic properties between languages. These results suggest that mBERT properly encodes languages in a way consistent with linguistic diversity and provide insights into the mechanism of cross-lingual transfer.
translated by 谷歌翻译
State-of-the-art natural language processing systems rely on supervision in the form of annotated data to learn competent models. These models are generally trained on data in a single language (usually English), and cannot be directly used beyond that language. Since collecting data in every language is not realistic, there has been a growing interest in crosslingual language understanding (XLU) and low-resource cross-language transfer. In this work, we construct an evaluation set for XLU by extending the development and test sets of the Multi-Genre Natural Language Inference Corpus (MultiNLI) to 15 languages, including low-resource languages such as Swahili and Urdu. We hope that our dataset, dubbed XNLI, will catalyze research in cross-lingual sentence understanding by providing an informative standard evaluation task. In addition, we provide several baselines for multilingual sentence understanding, including two based on machine translation systems, and two that use parallel data to train aligned multilingual bag-of-words and LSTM encoders. We find that XNLI represents a practical and challenging evaluation suite, and that directly translating the test data yields the best performance among available baselines.
translated by 谷歌翻译
评估指标是文本生成系统的关键成分。近年来,已经提出了几十年前的文本生成质量的人类评估,提出了几个基于伯特的评估指标(包括Bertscore,Moverscore,BLEurt等),这些评估与文本生成质量的人类评估比Bleu或Rouge进行了更好。但是,很少是已知这些度量基于黑盒语言模型表示的指标实际捕获(通常假设它们模型语义相似性)。在这项工作中,我们使用基于简单的回归的全局解释技术来沿着语言因素解开度量标准分数,包括语义,语法,形态和词汇重叠。我们表明,不同的指标捕获了一定程度的各个方面,但它们对词汇重叠大大敏感,就像Bleu和Rouge一样。这暴露了这些新颖性拟议的指标的限制,我们还在对抗对抗测试场景中突出显示。
translated by 谷歌翻译
虽然审慎的语言模型(PLM)主要用作通用文本编码器,可以对各种下游任务进行微调,但最近的工作表明它们也可以重新连接以产生高质量的单词表示(即静态单词)嵌入)并在类型级词汇任务中产生良好的性能。虽然现有的工作主要集中在单语和双语环境中PLM的词汇专业化,但在这项工作中,我们将大规模多语言变压器(例如MMTS,例如Mbert或XLM-R)公开,以此为大规模的多语言词法知识,并利用Babelnet作为易于获得的丰富来源。多语言和跨语性类型级词汇知识。具体来说,我们利用Babelnet的多语言合成器来创建$ 50 $语言的同义词对,然后对MMTS(Mbert和XLM-R)进行对比目标指导的词汇专业化程序。我们表明,如此庞大的多语言词汇专业化为两项标准的跨语性词汇任务,双语词典感应和跨语性单词相似性以及跨语性句子检索带来了巨大的收益。至关重要的是,我们观察到在专业化中看不见的语言的收益,表明多语言词汇专业化使得概括无词法约束。在一系列随后的受控实验中,我们证明了MMT对专业化语言中单词表示的预处理质量对性能的影响要比一组约束集的语言多样性更大。令人鼓舞的是,这表明涉及低资源语言的词汇任务从资源丰富的语言的词汇知识中受益最大,通常更多。
translated by 谷歌翻译
在这项工作中,我们提出了一个系统的实证研究,专注于最先进的多语言编码器在跨越多种不同语言对的交叉语言文档和句子检索任务的适用性。我们首先将这些模型视为多语言文本编码器,并在无监督的ad-hoc句子和文档级CLIR中基准性能。与监督语言理解相比,我们的结果表明,对于无监督的文档级CLIR - 一个没有针对IR特定的微调 - 预训练的多语言编码器的相关性判断,平均未能基于CLWE显着优于早期模型。对于句子级检索,我们确实获得了最先进的性能:然而,通过多语言编码器来满足高峰分数,这些编码器已经进一步专注于监督的时尚,以便句子理解任务,而不是使用他们的香草'现货'变体。在这些结果之后,我们介绍了文档级CLIR的本地化相关性匹配,在那里我们独立地对文件部分进行了查询。在第二部分中,我们评估了在一系列零拍语言和域转移CLIR实验中的英语相关数据中进行微调的微调编码器精细调整的微调我们的结果表明,监督重新排名很少提高多语言变压器作为无监督的基数。最后,只有在域名对比度微调(即,同一域名,只有语言转移),我们设法提高排名质量。我们在目标语言中单次检索的交叉定向检索结果和结果(零拍摄)交叉传输之间的显着实证差异,这指出了在单机数据上训练的检索模型的“单声道过度装备”。
translated by 谷歌翻译