框架转移是翻译中的横向现象,导致相应的语言材料对唤起不同帧。预测帧移位的能力使通过注释投影自动创建多语言架构。这里,我们提出了帧移位预测任务,并演示了图表关注网络,与辅助训练相结合,可以学习跨语言帧到帧对应关系并预测帧移位。
translated by 谷歌翻译
Contextualized representation models such as ELMo (Peters et al., 2018a) and BERT (Devlin et al., 2018) have recently achieved state-of-the-art results on a diverse array of downstream NLP tasks. Building on recent token-level probing work, we introduce a novel edge probing task design and construct a broad suite of sub-sentence tasks derived from the traditional structured NLP pipeline. We probe word-level contextual representations from four recent models and investigate how they encode sentence structure across a range of syntactic, semantic, local, and long-range phenomena. We find that existing models trained on language modeling and translation produce strong representations for syntactic phenomena, but only offer comparably small improvements on semantic tasks over a non-contextual baseline.
translated by 谷歌翻译
经过一段时间的减少,对单词一致性的兴趣再次增加,因为它们在类型学研究,跨语言注释投影和机器翻译等领域的有用性中再次增加。通常,对齐算法仅使用bitext,并且不利用许多平行语料库是多面关系的事实。在这里,我们通过考虑所有语言对,计算多种语言对之间的高质量单词对齐。首先,我们创建一个多平行单词对齐图,并将所有双语单词对齐对在一个图中。接下来,我们使用图形神经网络(GNN)来利用图形结构。我们的GNN方法(i)利用有关输入词的含义,位置和语言的信息,(ii)合并了来自多个并行句子的信息,(iii)添加并删除了初始对齐的边缘,并且(iv)产生了预测可以概括训练句子的模型。我们表明,社区检测为多平行单词对齐提供了有价值的信息。我们的方法在三个单词分配数据集和下游任务上的先前工作优于先前的工作。
translated by 谷歌翻译
数据饥饿的深度神经网络已经将自己作为许多NLP任务的标准建立为包括传统序列标记的标准。尽管他们在高资源语言上表现最先进的表现,但它们仍然落后于低资源场景的统计计数器。一个方法来反击攻击此问题是文本增强,即,从现有数据生成新的合成训练数据点。虽然NLP最近目睹了一种文本增强技术的负载,但该领域仍然缺乏对多种语言和序列标记任务的系统性能分析。为了填补这一差距,我们调查了三类文本增强方法,其在语法(例如,裁剪子句子),令牌(例如,随机字插入)和字符(例如,字符交换)级别上执行更改。我们系统地将它们与语音标记,依赖解析和语义角色标记的分组进行了比较,用于使用各种模型的各种语言系列,包括依赖于诸如MBERT的普赖金的多语言语境化语言模型的架构。增强最显着改善了解析,然后是语音标记和语义角色标记的依赖性解析。我们发现实验技术通常在形态上丰富的语言,而不是越南语等分析语言。我们的研究结果表明,增强技术可以进一步改善基于MBERT的强基线。我们将字符级方法标识为最常见的表演者,而同义词替换和语法增强仪提供不一致的改进。最后,我们讨论了最大依赖于任务,语言对和模型类型的结果。
translated by 谷歌翻译
基于变压器的语言模型最近在许多自然语言任务中取得了显着的结果。但是,通常通过利用大量培训数据来实现排行榜的性能,并且很少通过将明确的语言知识编码为神经模型。这使许多人质疑语言学对现代自然语言处理的相关性。在本文中,我介绍了几个案例研究,以说明理论语言学和神经语言模型仍然相互关联。首先,语言模型通过提供一个客观的工具来测量语义距离,这对语言学家很有用,语义距离很难使用传统方法。另一方面,语言理论通过提供框架和数据源来探究我们的语言模型,以了解语言理解的特定方面,从而有助于语言建模研究。本论文贡献了三项研究,探讨了语言模型中语法 - 听觉界面的不同方面。在论文的第一部分中,我将语言模型应用于单词类灵活性的问题。我将Mbert作为语义距离测量的来源,我提供了有利于将单词类灵活性分析为方向过程的证据。在论文的第二部分中,我提出了一种方法来测量语言模型中间层的惊奇方法。我的实验表明,包含形态句法异常的句子触发了语言模型早期的惊喜,而不是语义和常识异常。最后,在论文的第三部分中,我适应了一些心理语言学研究,以表明语言模型包含了论证结构结构的知识。总而言之,我的论文在自然语言处理,语言理论和心理语言学之间建立了新的联系,以为语言模型的解释提供新的观点。
translated by 谷歌翻译
对于自然语言处理应用可能是有问题的,因为它们的含义不能从其构成词语推断出来。缺乏成功的方法方法和足够大的数据集防止了用于检测成语的机器学习方法的开发,特别是对于在训练集中不发生的表达式。我们提出了一种叫做小鼠的方法,它使用上下文嵌入来实现此目的。我们展示了一个新的多字表达式数据集,具有文字和惯用含义,并使用它根据两个最先进的上下文单词嵌入式培训分类器:Elmo和Bert。我们表明,使用两个嵌入式的深度神经网络比现有方法更好地执行,并且能够检测惯用词使用,即使对于训练集中不存在的表达式。我们展示了开发模型的交叉传输,并分析了所需数据集的大小。
translated by 谷歌翻译
Multilingual BERT (mBERT) has demonstrated considerable cross-lingual syntactic ability, whereby it enables effective zero-shot cross-lingual transfer of syntactic knowledge. The transfer is more successful between some languages, but it is not well understood what leads to this variation and whether it fairly reflects difference between languages. In this work, we investigate the distributions of grammatical relations induced from mBERT in the context of 24 typologically different languages. We demonstrate that the distance between the distributions of different languages is highly consistent with the syntactic difference in terms of linguistic formalisms. Such difference learnt via self-supervision plays a crucial role in the zero-shot transfer performance and can be predicted by variation in morphosyntactic properties between languages. These results suggest that mBERT properly encodes languages in a way consistent with linguistic diversity and provide insights into the mechanism of cross-lingual transfer.
translated by 谷歌翻译
The word alignment task, despite its prominence in the era of statistical machine translation (SMT), is niche and under-explored today. In this two-part tutorial, we argue for the continued relevance for word alignment. The first part provides a historical background to word alignment as a core component of the traditional SMT pipeline. We zero-in on GIZA++, an unsupervised, statistical word aligner with surprising longevity. Jumping forward to the era of neural machine translation (NMT), we show how insights from word alignment inspired the attention mechanism fundamental to present-day NMT. The second part shifts to a survey approach. We cover neural word aligners, showing the slow but steady progress towards surpassing GIZA++ performance. Finally, we cover the present-day applications of word alignment, from cross-lingual annotation projection, to improving translation.
translated by 谷歌翻译
我们通过纳入通用依赖性(UD)的句法特征来瞄准直接零射击设置中的跨语言机器阅读理解(MRC)的任务,以及我们使用的关键功能是每个句子中的语法关系。虽然以前的工作已经证明了有效的语法引导MRC模型,但我们建议采用句子际句法关系,除了基本的句子关系外,还可以进一步利用MRC任务的多句子输入中的句法依赖性。在我们的方法中,我们构建了句子间依赖图(ISDG)连接依赖树以形成横跨句子的全局句法关系。然后,我们提出了编码全局依赖关系图的ISDG编码器,通过明确地通过一个跳和多跳依赖性路径来解决句子间关系。三个多语言MRC数据集(XQUAD,MLQA,Tydiqa-Goldp)的实验表明,我们仅对英语培训的编码器能够在涵盖8种语言的所有14个测试集中提高零射性能,最高可达3.8 F1 / 5.2 EM平均改善,以及某些语言的5.2 F1 / 11.2 em。进一步的分析表明,改进可以归因于跨语言上一致的句法路径上的注意力。
translated by 谷歌翻译
尽管在理解深度NLP模型中学到的表示形式以及他们所捕获的知识方面已经做了很多工作,但对单个神经元的关注很少。我们提出了一种称为语言相关性分析的技术,可在任何外部特性中提取模型中的显着神经元 - 目的是了解如何保留这种知识在神经元中。我们进行了细粒度的分析以回答以下问题:(i)我们可以识别网络中捕获特定语言特性的神经元子集吗? (ii)整个网络中的局部或分布式神经元如何? iii)信息保留了多么冗余? iv)针对下游NLP任务的微调预训练模型如何影响学习的语言知识? iv)架构在学习不同的语言特性方面有何不同?我们的数据驱动的定量分析阐明了有趣的发现:(i)我们发现了可以预测不同语言任务的神经元的小亚集,ii)捕获基本的词汇信息(例如后缀),而这些神经元位于较低的大多数层中,iii,iii),而这些神经元,而那些神经元,而那些神经元则可以预测。学习复杂的概念(例如句法角色)主要是在中间和更高层中,iii),在转移学习过程中,显着的语言神经元从较高到较低的层移至较低的层,因为网络保留了较高的层以特定于任务信息,iv)我们发现很有趣在培训预训练模型之间的差异,关于如何保留语言信息,V)我们发现概念在多语言变压器模型中跨不同语言表现出相似的神经元分布。我们的代码作为Neurox工具包的一部分公开可用。
translated by 谷歌翻译
多语言语言模型(\ mllms),如mbert,xlm,xlm-r,\ textit {etc。}已成为一种可行的选择,使预先估计到大量语言的力量。鉴于他们的成功在零射击转移学习中,在(i)建立更大的\ mllms〜覆盖了大量语言(ii)创建覆盖更广泛的任务和语言来评估的详尽工作基准mllms〜(iii)分析单音零点,零拍摄交叉和双语任务(iv)对Monolingual的性能,了解\ mllms〜(v)增强(通常)学习的通用语言模式(如果有的话)有限的容量\ mllms〜以提高他们在已见甚至看不见语言的表现。在这项调查中,我们审查了现有的文学,涵盖了上述与\ MLLMS有关的广泛研究领域。根据我们的调查,我们建议您有一些未来的研究方向。
translated by 谷歌翻译
在本文中,我们试图通过引入深度学习模型的句法归纳偏见来建立两所学校之间的联系。我们提出了两个归纳偏见的家族,一个家庭用于选区结构,另一个用于依赖性结构。选区归纳偏见鼓励深度学习模型使用不同的单位(或神经元)分别处理长期和短期信息。这种分离为深度学习模型提供了一种方法,可以从顺序输入中构建潜在的层次表示形式,即更高级别的表示由高级表示形式组成,并且可以分解为一系列低级表示。例如,在不了解地面实际结构的情况下,我们提出的模型学会通过根据其句法结构组成变量和运算符的表示来处理逻辑表达。另一方面,依赖归纳偏置鼓励模型在输入序列中找到实体之间的潜在关系。对于自然语言,潜在关系通常被建模为一个定向依赖图,其中一个单词恰好具有一个父节点和零或几个孩子的节点。将此约束应用于类似变压器的模型之后,我们发现该模型能够诱导接近人类专家注释的有向图,并且在不同任务上也优于标准变压器模型。我们认为,这些实验结果为深度学习模型的未来发展展示了一个有趣的选择。
translated by 谷歌翻译
方面含义是指如何提出情况的内部时间结构。这包括情况是将情况描述为状态还是事件,无论情况已经完成还是正在进行,以及是否被视为一个整体,还是关注特定阶段。这项调查概述了对词汇和语法方面进行建模以及对必要语言概念和术语的直观解释的概述。特别是,我们描述了统计,远程感,习惯性,完美和不完美的概念,以及最终性和情况类型的有影响力的清单。我们认为,由于方面是语义的关键组成部分,尤其是在以精确的方式报告情况的时间结构时,未来的NLP方法需要能够系统地处理和评估它,以实现人类水平的语言理解。
translated by 谷歌翻译
We propose a transition-based approach that, by training a single model, can efficiently parse any input sentence with both constituent and dependency trees, supporting both continuous/projective and discontinuous/non-projective syntactic structures. To that end, we develop a Pointer Network architecture with two separate task-specific decoders and a common encoder, and follow a multitask learning strategy to jointly train them. The resulting quadratic system, not only becomes the first parser that can jointly produce both unrestricted constituent and dependency trees from a single model, but also proves that both syntactic formalisms can benefit from each other during training, achieving state-of-the-art accuracies in several widely-used benchmarks such as the continuous English and Chinese Penn Treebanks, as well as the discontinuous German NEGRA and TIGER datasets.
translated by 谷歌翻译
介词经常出现多元化词。歧义歧义在语义角色标记,问题应答,文本征报和名词复合释义中,歧义是至关重要的。在本文中,我们提出了一种新颖的介词意义消费者(PSD)方法,其不使用任何语言工具。在监督设置中,机器学习模型提出有句子,其中介词已经用感测量注释。这些感官是ID所谓的介词项目(TPP)。我们使用预先训练的BERT和BERT VARIANTS的隐藏层表示。然后使用多层Perceptron将潜在的表示分为正确的感测ID。用于此任务的数据集来自Semeval-2007任务-6。我们的方法理解为86.85%,比最先进的更好。
translated by 谷歌翻译
Transformer-based models have pushed state of the art in many areas of NLP, but our understanding of what is behind their success is still limited. This paper is the first survey of over 150 studies of the popular BERT model. We review the current state of knowledge about how BERT works, what kind of information it learns and how it is represented, common modifications to its training objectives and architecture, the overparameterization issue and approaches to compression. We then outline directions for future research.
translated by 谷歌翻译
我们提出了一种方法,通过将知识存储在外部知识图(kg)中,并使用密集的索引从该kg中检索,使自然语言理解模型更有效地有效。给定(可能是多语言的)下游任务数据,例如德语中的句子,我们从kg中检索实体,并使用其多模式表示形式来改善下游任务绩效。我们使用最近发布的VisualSem KG作为我们的外部知识存储库,涵盖了Wikipedia和WordNet实体的子集,并比较基于元组和基于图的算法的混合,以学习基于KG多模式信息的实体和关系表示。 。我们在两个下游任务上展示了学识渊博的实体表示形式的有用性,并在多语言命名实体识别任务上的性能提高了$ 0.3 \%$ - $ 0.7 \%\%$ f1,而我们的准确度最高为$ 2.5 \%\%$ $提高。在视觉意义上的歧义任务上。我们所有的代码和数据都提供:\ url {https://github.com/iacercalixto/visualsem-kg}。
translated by 谷歌翻译
In this paper, we show that Multilingual BERT (M-BERT), released by Devlin et al. (2019) as a single language model pre-trained from monolingual corpora in 104 languages, is surprisingly good at zero-shot cross-lingual model transfer, in which task-specific annotations in one language are used to fine-tune the model for evaluation in another language. To understand why, we present a large number of probing experiments, showing that transfer is possible even to languages in different scripts, that transfer works best between typologically similar languages, that monolingual corpora can train models for code-switching, and that the model can find translation pairs. From these results, we can conclude that M-BERT does create multilingual representations, but that these representations exhibit systematic deficiencies affecting certain language pairs.
translated by 谷歌翻译
Relation extraction (RE) is a sub-discipline of information extraction (IE) which focuses on the prediction of a relational predicate from a natural-language input unit (such as a sentence, a clause, or even a short paragraph consisting of multiple sentences and/or clauses). Together with named-entity recognition (NER) and disambiguation (NED), RE forms the basis for many advanced IE tasks such as knowledge-base (KB) population and verification. In this work, we explore how recent approaches for open information extraction (OpenIE) may help to improve the task of RE by encoding structured information about the sentences' principal units, such as subjects, objects, verbal phrases, and adverbials, into various forms of vectorized (and hence unstructured) representations of the sentences. Our main conjecture is that the decomposition of long and possibly convoluted sentences into multiple smaller clauses via OpenIE even helps to fine-tune context-sensitive language models such as BERT (and its plethora of variants) for RE. Our experiments over two annotated corpora, KnowledgeNet and FewRel, demonstrate the improved accuracy of our enriched models compared to existing RE approaches. Our best results reach 92% and 71% of F1 score for KnowledgeNet and FewRel, respectively, proving the effectiveness of our approach on competitive benchmarks.
translated by 谷歌翻译
The rapid advancement of AI technology has made text generation tools like GPT-3 and ChatGPT increasingly accessible, scalable, and effective. This can pose serious threat to the credibility of various forms of media if these technologies are used for plagiarism, including scientific literature and news sources. Despite the development of automated methods for paraphrase identification, detecting this type of plagiarism remains a challenge due to the disparate nature of the datasets on which these methods are trained. In this study, we review traditional and current approaches to paraphrase identification and propose a refined typology of paraphrases. We also investigate how this typology is represented in popular datasets and how under-representation of certain types of paraphrases impacts detection capabilities. Finally, we outline new directions for future research and datasets in the pursuit of more effective paraphrase detection using AI.
translated by 谷歌翻译