伯特的预先接受的语言模型在广泛的自然语言处理任务中取得了巨大成功。然而,由于缺乏两个域知识,即短语级和产品级别,BERT不能很好地支持电子商务相关任务。一方面,许多电子商务任务需要准确地了解域短语,而这种细粒度的短语级知识没有通过BERT的训练目标明确建模。另一方面,产品级知识如产品关联可以增强电子商务的语言建模,但它们不是事实知识,从而不分青红皂白可以引入噪音。为了解决问题,我们提出了一个统一的训练框架,即E-BERT。具体地,为了保留短语级知识,我们引入自适应混合屏蔽,其允许模型基于两种模式的拟合进度自适应地切换到学习复杂短语的学习初步知识。为了利用产品级知识,我们引入了邻居产品重建,该重建将E-BERT列举,以预测产品的相关邻居,具有去噪的杂交层。我们的调查揭示了四个下游任务,即基于审查的问题回答,方面提取,宽度情绪分类和产品分类。
translated by 谷歌翻译
We present SpanBERT, a pre-training method that is designed to better represent and predict spans of text. Our approach extends BERT by (1) masking contiguous random spans, rather than random tokens, and (2) training the span boundary representations to predict the entire content of the masked span, without relying on the individual token representations within it. Span-BERT consistently outperforms BERT and our better-tuned baselines, with substantial gains on span selection tasks such as question answering and coreference resolution. In particular, with the same training data and model size as BERT large , our single model obtains 94.6% and 88.7% F1 on SQuAD 1.1 and 2.0 respectively. We also achieve a new state of the art on the OntoNotes coreference resolution task (79.6% F1), strong performance on the TACRED relation extraction benchmark, and even gains on GLUE. 1 * Equal contribution. 1 Our code and pre-trained models are available at https://github.com/facebookresearch/ SpanBERT.
translated by 谷歌翻译
事实证明,将先验知识纳入预训练的语言模型中对知识驱动的NLP任务有效,例如实体键入和关系提取。当前的培训程序通常通过使用知识掩盖,知识融合和知识更换将外部知识注入模型。但是,输入句子中包含的事实信息尚未完全开采,并且尚未严格检查注射的外部知识。结果,无法完全利用上下文信息,并将引入额外的噪音,或者注入的知识量受到限制。为了解决这些问题,我们提出了MLRIP,该MLRIP修改了Ernie-Baidu提出的知识掩盖策略,并引入了两阶段的实体替代策略。进行全面分析的广泛实验说明了MLRIP在军事知识驱动的NLP任务中基于BERT的模型的优势。
translated by 谷歌翻译
We introduce a new language representation model called BERT, which stands for Bidirectional Encoder Representations from Transformers. Unlike recent language representation models (Peters et al., 2018a;Radford et al., 2018), BERT is designed to pretrain deep bidirectional representations from unlabeled text by jointly conditioning on both left and right context in all layers. As a result, the pre-trained BERT model can be finetuned with just one additional output layer to create state-of-the-art models for a wide range of tasks, such as question answering and language inference, without substantial taskspecific architecture modifications.BERT is conceptually simple and empirically powerful. It obtains new state-of-the-art results on eleven natural language processing tasks, including pushing the GLUE score to 80.5% (7.7% point absolute improvement), MultiNLI accuracy to 86.7% (4.6% absolute improvement), SQuAD v1.1 question answering Test F1 to 93.2 (1.5 point absolute improvement) and SQuAD v2.0 Test F1 to 83.1 (5.1 point absolute improvement).
translated by 谷歌翻译
NLP是与计算机或机器理解和解释人类语言的能力有关的人工智能和机器学习的一种形式。语言模型在文本分析和NLP中至关重要,因为它们允许计算机解释定性输入并将其转换为可以在其他任务中使用的定量数据。从本质上讲,在转移学习的背景下,语言模型通常在大型通用语料库上进行培训,称为预训练阶段,然后对特定的基本任务进行微调。结果,预训练的语言模型主要用作基线模型,该模型包含了对上下文的广泛掌握,并且可以进一步定制以在新的NLP任务中使用。大多数预训练的模型都经过来自Twitter,Newswire,Wikipedia和Web等通用领域的Corpora培训。在一般文本中训练的现成的NLP模型可能在专业领域效率低下且不准确。在本文中,我们提出了一个名为Securebert的网络安全语言模型,该模型能够捕获网络安全域中的文本含义,因此可以进一步用于自动化,用于许多重要的网络安全任务,否则这些任务将依靠人类的专业知识和繁琐的手动努力。 Securebert受到了我们从网络安全和一般计算域的各种来源收集和预处理的大量网络安全文本培训。使用我们提出的令牌化和模型权重调整的方法,Securebert不仅能够保留对一般英语的理解,因为大多数预训练的语言模型都可以做到,而且在应用于具有网络安全含义的文本时也有效。
translated by 谷歌翻译
预训练的语言模型(PLM)在自然语言理解中的许多下游任务中取得了显着的性能增长。已提出了各种中文PLM,以学习更好的中文表示。但是,大多数当前模型都使用中文字符作为输入,并且无法编码中文单词中包含的语义信息。虽然最近的预训练模型同时融合了单词和字符,但它们通常会遭受不足的语义互动,并且无法捕获单词和字符之间的语义关系。为了解决上述问题,我们提出了一个简单而有效的PLM小扣手,该小扣子采用了对单词和性格表示的对比度学习。特别是,Clower通过对多透明信息的对比学习将粗粒的信息(即单词)隐式编码为细粒度表示(即字符)。在现实的情况下,小电动器具有很大的价值,因为它可以轻松地将其纳入任何现有的基于细粒的PLM中而无需修改生产管道。在一系列下游任务上进行的扩展实验表明,小动物的卓越性能超过了几个最先进的实验 - 艺术基线。
translated by 谷歌翻译
Neural language representation models such as BERT pre-trained on large-scale corpora can well capture rich semantic patterns from plain text, and be fine-tuned to consistently improve the performance of various NLP tasks. However, the existing pre-trained language models rarely consider incorporating knowledge graphs (KGs), which can provide rich structured knowledge facts for better language understanding. We argue that informative entities in KGs can enhance language representation with external knowledge. In this paper, we utilize both large-scale textual corpora and KGs to train an enhanced language representation model (ERNIE), which can take full advantage of lexical, syntactic, and knowledge information simultaneously. The experimental results have demonstrated that ERNIE achieves significant improvements on various knowledge-driven tasks, and meanwhile is comparable with the state-of-the-art model BERT on other common NLP tasks. The source code and experiment details of this paper can be obtained from https:// github.com/thunlp/ERNIE.
translated by 谷歌翻译
Natural Language Processing (NLP) has been revolutionized by the use of Pre-trained Language Models (PLMs) such as BERT. Despite setting new records in nearly every NLP task, PLMs still face a number of challenges including poor interpretability, weak reasoning capability, and the need for a lot of expensive annotated data when applied to downstream tasks. By integrating external knowledge into PLMs, \textit{\underline{K}nowledge-\underline{E}nhanced \underline{P}re-trained \underline{L}anguage \underline{M}odels} (KEPLMs) have the potential to overcome the above-mentioned limitations. In this paper, we examine KEPLMs systematically through a series of studies. Specifically, we outline the common types and different formats of knowledge to be integrated into KEPLMs, detail the existing methods for building and evaluating KEPLMS, present the applications of KEPLMs in downstream tasks, and discuss the future research directions. Researchers will benefit from this survey by gaining a quick and comprehensive overview of the latest developments in this field.
translated by 谷歌翻译
蒙版语言建模(MLM)已被广泛用作培训前语言模型(PRLMS)中的剥夺目标。现有的PRLMS通常采用随机掩盖策略,在该策略中应用固定的掩蔽率,并且在整个培训中都有均等的概率掩盖了不同的内容。但是,该模型可能会受到训练前状态的复杂影响,随着训练时间的发展,这种影响会发生相应的变化。在本文中,我们表明这种时间不变的MLM设置对掩盖比和掩盖内容不太可能提供最佳结果,这激发了我们探索时间变化的MLM设置的影响。我们提出了两种计划的掩蔽方法,可在不同的训练阶段适应掩盖比和内容,从而提高了训练前效率和在下游任务上验证的效率。我们的工作是一项关于比率和内容的时间变化掩盖策略的先驱研究,并更好地了解掩盖比率和掩盖内容如何影响MLM的MLM预训练。
translated by 谷歌翻译
本文旨在通过介绍第一个中国数学预训练的语言模型〜(PLM)来提高机器的数学智能,以有效理解和表示数学问题。与其他标准NLP任务不同,数学文本很难理解,因为它们在问题陈述中涉及数学术语,符号和公式。通常,它需要复杂的数学逻辑和背景知识来解决数学问题。考虑到数学文本的复杂性质,我们设计了一种新的课程预培训方法,用于改善由基本和高级课程组成的数学PLM的学习。特别是,我们首先根据位置偏见的掩盖策略执行令牌级预训练,然后设计基于逻辑的预训练任务,旨在分别恢复改组的句子和公式。最后,我们介绍了一项更加困难的预训练任务,该任务强制执行PLM以检测和纠正其生成的解决方案中的错误。我们对离线评估(包括九个与数学相关的任务)和在线$ A/B $测试进行了广泛的实验。实验结果证明了与许多竞争基线相比,我们的方法的有效性。我们的代码可在:\ textColor {blue} {\ url {https://github.com/rucaibox/jiuzhang}}}中获得。
translated by 谷歌翻译
来自变压器(BERT)的双向编码器表示显示了各种NLP任务的奇妙改进,并且已经提出了其连续的变体来进一步提高预先训练的语言模型的性能。在本文中,我们的目标是首先介绍中国伯特的全文掩蔽(WWM)策略,以及一系列中国预培训的语言模型。然后我们还提出了一种简单但有效的型号,称为Macbert,这在几种方面提高了罗伯塔。特别是,我们提出了一种称为MLM作为校正(MAC)的新掩蔽策略。为了展示这些模型的有效性,我们创建了一系列中国预先培训的语言模型,作为我们的基线,包括BERT,Roberta,Electra,RBT等。我们对十个中国NLP任务进行了广泛的实验,以评估创建的中国人托管语言模型以及提议的麦克白。实验结果表明,Macbert可以在许多NLP任务上实现最先进的表演,我们还通过几种可能有助于未来的研究的调查结果来消融细节。我们开源我们的预先培训的语言模型,以进一步促进我们的研究界。资源可用:https://github.com/ymcui/chinese-bert-wwm
translated by 谷歌翻译
用于预培训语言模型的自我监督学习的核心包括预训练任务设计以及适当的数据增强。语言模型中的大多数数据增强都是独立于上下文的。最近在电子中提出了一个开创性的增强,并通过引入辅助生成网络(发电机)来实现最先进的性能,以产生用于培训主要辨别网络(鉴别者)的上下文化数据增强。然而,这种设计引入了发电机的额外计算成本,并且需要调整发电机和鉴别器之间的相对能力。在本文中,我们提出了一种自增强策略(SAS),其中单个网络用于审视以后的时期的培训常规预训练和上下文化数据增强。基本上,该策略消除了单独的发电机,并使用单个网络共同执行具有MLM(屏蔽语言建模)和RTD(替换令牌检测)头的两个预训练任务。它避免了寻找适当大小的发电机的挑战,这对于在电子中证明的性能至关重要,以及其随后的变体模型至关重要。此外,SAS是一项常规策略,可以与最近或将来的许多新技术无缝地结合,例如杜伯塔省的解除关注机制。我们的实验表明,SAS能够在具有相似或更少的计算成本中优于胶水任务中的电磁和其他最先进的模型。
translated by 谷歌翻译
Transformer-based models have pushed state of the art in many areas of NLP, but our understanding of what is behind their success is still limited. This paper is the first survey of over 150 studies of the popular BERT model. We review the current state of knowledge about how BERT works, what kind of information it learns and how it is represented, common modifications to its training objectives and architecture, the overparameterization issue and approaches to compression. We then outline directions for future research.
translated by 谷歌翻译
基于方面的情绪分析旨在确定产品评论中特定方面的情感极性。我们注意到,大约30%的评论不包含明显的观点词,但仍然可以传达清晰的人类感知情绪取向,称为隐含情绪。然而,最近的基于神经网络的方法几乎没有关注隐性情绪,这一审查有所关注。为了克服这个问题,我们通过域名语言资源检索的大规模情绪注释的Corpora采用监督对比培训。通过将隐式情感表达式的表示对准与具有相同情绪标签的人,预培训过程可以更好地捕获隐含和明确的情绪方向,以便在评论中的方面。实验结果表明,我们的方法在Semeval2014基准上实现了最先进的性能,综合分析验证了其对学习隐含情绪的有效性。
translated by 谷歌翻译
与伯特(Bert)等语言模型相比,已证明知识增强语言表示的预培训模型在知识基础构建任务(即〜关系提取)中更有效。这些知识增强的语言模型将知识纳入预训练中,以生成实体或关系的表示。但是,现有方法通常用单独的嵌入表示每个实体。结果,这些方法难以代表播出的实体和大量参数,在其基础代币模型之上(即〜变压器),必须使用,并且可以处理的实体数量为由于内存限制,实践限制。此外,现有模型仍然难以同时代表实体和关系。为了解决这些问题,我们提出了一个新的预培训模型,该模型分别从图书中学习实体和关系的表示形式,并分别在文本中跨越跨度。通过使用SPAN模块有效地编码跨度,我们的模型可以代表实体及其关系,但所需的参数比现有模型更少。我们通过从Wikipedia中提取的知识图对我们的模型进行了预训练,并在广泛的监督和无监督的信息提取任务上进行了测试。结果表明,我们的模型比基线学习对实体和关系的表现更好,而在监督的设置中,微调我们的模型始终优于罗伯塔,并在信息提取任务上取得了竞争成果。
translated by 谷歌翻译
Recently, domain-specific PLMs have been proposed to boost the task performance of specific domains (e.g., biomedical and computer science) by continuing to pre-train general PLMs with domain-specific corpora. However, this Domain-Adaptive Pre-Training (DAPT; Gururangan et al. (2020)) tends to forget the previous general knowledge acquired by general PLMs, which leads to a catastrophic forgetting phenomenon and sub-optimal performance. To alleviate this problem, we propose a new framework of General Memory Augmented Pre-trained Language Model (G-MAP), which augments the domain-specific PLM by a memory representation built from the frozen general PLM without losing any general knowledge. Specifically, we propose a new memory-augmented layer, and based on it, different augmented strategies are explored to build the memory representation and then adaptively fuse it into the domain-specific PLM. We demonstrate the effectiveness of G-MAP on various domains (biomedical and computer science publications, news, and reviews) and different kinds (text classification, QA, NER) of tasks, and the extensive results show that the proposed G-MAP can achieve SOTA results on all tasks.
translated by 谷歌翻译
本文对过去二十年来对自然语言生成(NLG)的研究提供了全面的审查,特别是与数据到文本生成和文本到文本生成深度学习方法有关,以及NLG的新应用技术。该调查旨在(a)给出关于NLG核心任务的最新综合,以及该领域采用的建筑;(b)详细介绍各种NLG任务和数据集,并提请注意NLG评估中的挑战,专注于不同的评估方法及其关系;(c)强调一些未来的强调和相对近期的研究问题,因为NLG和其他人工智能领域的协同作用而增加,例如计算机视觉,文本和计算创造力。
translated by 谷歌翻译
Pre-trained Language Model (PLM) has become a representative foundation model in the natural language processing field. Most PLMs are trained with linguistic-agnostic pre-training tasks on the surface form of the text, such as the masked language model (MLM). To further empower the PLMs with richer linguistic features, in this paper, we aim to propose a simple but effective way to learn linguistic features for pre-trained language models. We propose LERT, a pre-trained language model that is trained on three types of linguistic features along with the original MLM pre-training task, using a linguistically-informed pre-training (LIP) strategy. We carried out extensive experiments on ten Chinese NLU tasks, and the experimental results show that LERT could bring significant improvements over various comparable baselines. Furthermore, we also conduct analytical experiments in various linguistic aspects, and the results prove that the design of LERT is valid and effective. Resources are available at https://github.com/ymcui/LERT
translated by 谷歌翻译
在这项工作中,我们探索如何学习专用的语言模型,旨在学习从文本文件中学习关键词的丰富表示。我们在判别和生成设置中进行预训练变压器语言模型(LMS)的不同掩蔽策略。在歧视性设定中,我们引入了一种新的预训练目标 - 关键边界,用替换(kbir)infifiling,在使用Kbir预先训练的LM进行微调时显示出在Sota上的性能(F1中高达9.26点)的大量增益关键酶提取的任务。在生成设置中,我们为BART - 键盘介绍了一个新的预训练设置,可再现与CATSeq格式中的输入文本相关的关键字,而不是Denoised原始输入。这也导致在关键词中的性能(F1 @ M)中的性能(高达4.33点),用于关键正版生成。此外,我们还微调了在命名实体识别(ner),问题应答(qa),关系提取(重新),抽象摘要和达到与SOTA的可比性表现的预训练的语言模型,表明学习丰富的代表关键词确实有利于许多其他基本的NLP任务。
translated by 谷歌翻译
As an important fine-grained sentiment analysis problem, aspect-based sentiment analysis (ABSA), aiming to analyze and understand people's opinions at the aspect level, has been attracting considerable interest in the last decade. To handle ABSA in different scenarios, various tasks are introduced for analyzing different sentiment elements and their relations, including the aspect term, aspect category, opinion term, and sentiment polarity. Unlike early ABSA works focusing on a single sentiment element, many compound ABSA tasks involving multiple elements have been studied in recent years for capturing more complete aspect-level sentiment information. However, a systematic review of various ABSA tasks and their corresponding solutions is still lacking, which we aim to fill in this survey. More specifically, we provide a new taxonomy for ABSA which organizes existing studies from the axes of concerned sentiment elements, with an emphasis on recent advances of compound ABSA tasks. From the perspective of solutions, we summarize the utilization of pre-trained language models for ABSA, which improved the performance of ABSA to a new stage. Besides, techniques for building more practical ABSA systems in cross-domain/lingual scenarios are discussed. Finally, we review some emerging topics and discuss some open challenges to outlook potential future directions of ABSA.
translated by 谷歌翻译