语言模型是使用大量通用数据(如Book Copus,Common Crawl和Wikipedia)进行预训练的,这对于模型了解语言的语言特征至关重要。新的研究建议将域自适应预训练(DAPT)和任务自适应预训练(TAPT)作为最终填充任务之前的中间步骤。此步骤有助于涵盖目标域词汇,并改善下游任务的模型性能。在这项工作中,我们仅研究训练在TAPT和特定于任务的填充过程中嵌入层对模型性能的影响。基于我们的研究,我们提出了一种简单的方法,以通过对BERT层进行选择性预训练,使基于BERT的模型的中间步骤更有效。我们表明,在TAPT期间仅训练BERT嵌入层足以适应目标域的词汇并实现可比的性能。我们的方法在计算上是有效的,在TAPT期间训练了78%的参数。所提出的嵌入层列式方法也可以是一种有效的域适应技术。
translated by 谷歌翻译
Language model pre-training has proven to be useful in learning universal language representations. As a state-of-the-art language model pre-training model, BERT (Bidirectional Encoder Representations from Transformers) has achieved amazing results in many language understanding tasks. In this paper, we conduct exhaustive experiments to investigate different fine-tuning methods of BERT on text classification task and provide a general solution for BERT fine-tuning. Finally, the proposed solution obtains new state-of-the-art results on eight widely-studied text classification datasets. 1
translated by 谷歌翻译
转移学习已通过深度审慎的语言模型广泛用于自然语言处理,例如来自变形金刚和通用句子编码器的双向编码器表示。尽管取得了巨大的成功,但语言模型应用于小型数据集时会过多地适合,并且很容易忘记与分类器进行微调时。为了解决这个忘记将深入的语言模型从一个域转移到另一个领域的问题,现有的努力探索了微调方法,以减少忘记。我们建议DeepeMotex是一种有效的顺序转移学习方法,以检测文本中的情绪。为了避免忘记问题,通过从Twitter收集的大量情绪标记的数据来仪器进行微调步骤。我们使用策划的Twitter数据集和基准数据集进行了一项实验研究。 DeepeMotex模型在测试数据集上实现多级情绪分类的精度超过91%。我们评估了微调DeepeMotex模型在分类Emoint和刺激基准数据集中的情绪时的性能。这些模型在基准数据集中的73%的实例中正确分类了情绪。所提出的DeepeMotex-Bert模型优于BI-LSTM在基准数据集上的BI-LSTM增长23%。我们还研究了微调数据集的大小对模型准确性的影响。我们的评估结果表明,通过大量情绪标记的数据进行微调提高了最终目标任务模型的鲁棒性和有效性。
translated by 谷歌翻译
Recently, domain-specific PLMs have been proposed to boost the task performance of specific domains (e.g., biomedical and computer science) by continuing to pre-train general PLMs with domain-specific corpora. However, this Domain-Adaptive Pre-Training (DAPT; Gururangan et al. (2020)) tends to forget the previous general knowledge acquired by general PLMs, which leads to a catastrophic forgetting phenomenon and sub-optimal performance. To alleviate this problem, we propose a new framework of General Memory Augmented Pre-trained Language Model (G-MAP), which augments the domain-specific PLM by a memory representation built from the frozen general PLM without losing any general knowledge. Specifically, we propose a new memory-augmented layer, and based on it, different augmented strategies are explored to build the memory representation and then adaptively fuse it into the domain-specific PLM. We demonstrate the effectiveness of G-MAP on various domains (biomedical and computer science publications, news, and reviews) and different kinds (text classification, QA, NER) of tasks, and the extensive results show that the proposed G-MAP can achieve SOTA results on all tasks.
translated by 谷歌翻译
Language models pretrained on text from a wide variety of sources form the foundation of today's NLP. In light of the success of these broad-coverage models, we investigate whether it is still helpful to tailor a pretrained model to the domain of a target task. We present a study across four domains (biomedical and computer science publications, news, and reviews) and eight classification tasks, showing that a second phase of pretraining indomain (domain-adaptive pretraining) leads to performance gains, under both high-and low-resource settings. Moreover, adapting to the task's unlabeled data (task-adaptive pretraining) improves performance even after domain-adaptive pretraining. Finally, we show that adapting to a task corpus augmented using simple data selection strategies is an effective alternative, especially when resources for domain-adaptive pretraining might be unavailable. Overall, we consistently find that multiphase adaptive pretraining offers large gains in task performance.
translated by 谷歌翻译
自然语言处理领域(NLP)最近看到使用预先接受训练的语言模型来解决几乎任何任务的大量变化。尽管对各种任务的基准数据集显示了很大的改进,但这些模型通常在非标准域中对临床领域的临床域进行次优,其中观察到预训练文件和目标文件之间的巨大差距。在本文中,我们的目标是通过对语言模型的域特定培训结束这种差距,我们调查其对多种下游任务和设置的影响。我们介绍了预先训练的Clin-X(临床XLM-R)语言模型,并展示了Clin-X如何通过两种语言的十个临床概念提取任务的大幅度优于其他预先训练的变压器模型。此外,我们展示了如何通过基于随机分裂和交叉句子上下文的集合来利用我们所提出的任务和语言 - 无人机模型架构进一步改善变压器模型。我们在低资源和转移设置中的研究显​​示,尽管只有250个标记的句子,但在只有250个标记的句子时,缺乏带注释数据的稳定模型表现。我们的结果突出了专业语言模型作为非标准域中的概念提取的Clin-X的重要性,但也表明我们的任务 - 无人机模型架构跨越测试任务和语言是强大的,以便域名或任务特定的适应不需要。 Clin-Xlanguage模型和用于微调和传输模型的源代码在https://github.com/boschresearch/clin\_x/和Huggingface模型集线器上公开使用。
translated by 谷歌翻译
GPT-2和BERT展示了在各种自然语言处理任务上使用预训练的语言模型(LMS)的有效性。但是,在应用于资源丰富的任务时,LM微调通常会遭受灾难性的遗忘。在这项工作中,我们引入了一个协同的培训框架(CTNMT),该框架是将预训练的LMS集成到神经机器翻译(NMT)的关键。我们提出的CTNMT包括三种技术:a)渐近蒸馏,以确保NMT模型可以保留先前的预训练知识; b)动态的开关门,以避免灾难性忘记预训练的知识; c)根据计划的政策调整学习步伐的策略。我们在机器翻译中的实验表明,WMT14英语 - 德语对的CTNMT获得了最高3个BLEU得分,甚至超过了先前的最先进的预培训辅助NMT NMT的NMT。尽管对于大型WMT14英语法国任务,有400万句话,但我们的基本模型仍然可以显着改善最先进的变压器大型模型,超过1个BLEU得分。代码和模型可以从https://github.com/bytedance/neurst/tree/Master/Master/examples/ctnmt下载。
translated by 谷歌翻译
We introduce a new language representation model called BERT, which stands for Bidirectional Encoder Representations from Transformers. Unlike recent language representation models (Peters et al., 2018a;Radford et al., 2018), BERT is designed to pretrain deep bidirectional representations from unlabeled text by jointly conditioning on both left and right context in all layers. As a result, the pre-trained BERT model can be finetuned with just one additional output layer to create state-of-the-art models for a wide range of tasks, such as question answering and language inference, without substantial taskspecific architecture modifications.BERT is conceptually simple and empirically powerful. It obtains new state-of-the-art results on eleven natural language processing tasks, including pushing the GLUE score to 80.5% (7.7% point absolute improvement), MultiNLI accuracy to 86.7% (4.6% absolute improvement), SQuAD v1.1 question answering Test F1 to 93.2 (1.5 point absolute improvement) and SQuAD v2.0 Test F1 to 83.1 (5.1 point absolute improvement).
translated by 谷歌翻译
Recent advances in NLP are brought by a range of large-scale pretrained language models (PLMs). These PLMs have brought significant performance gains for a range of NLP tasks, circumventing the need to customize complex designs for specific tasks. However, most current work focus on finetuning PLMs on a domain-specific datasets, ignoring the fact that the domain gap can lead to overfitting and even performance drop. Therefore, it is practically important to find an appropriate method to effectively adapt PLMs to a target domain of interest. Recently, a range of methods have been proposed to achieve this purpose. Early surveys on domain adaptation are not suitable for PLMs due to the sophisticated behavior exhibited by PLMs from traditional models trained from scratch and that domain adaptation of PLMs need to be redesigned to take effect. This paper aims to provide a survey on these newly proposed methods and shed light in how to apply traditional machine learning methods to newly evolved and future technologies. By examining the issues of deploying PLMs for downstream tasks, we propose a taxonomy of domain adaptation approaches from a machine learning system view, covering methods for input augmentation, model optimization and personalization. We discuss and compare those methods and suggest promising future research directions.
translated by 谷歌翻译
深语模型在NLP域中取得了显着的成功。培养深层语言模型的标准方法是从大型未标记的语料库中雇用无监督的学习。但是,这种大型公司仅适用于广泛采用和高资源语言和域名。本研究提出了第一款深语型号DPRK-BERT为朝鲜语言。我们通过编制朝鲜语言的第一个未标记的语料库和微调预先存在的ROK语言模型来实现这一目标。我们将所提出的模型与现有方法进行比较,并显示两个DPRK数据集的显着改进。我们还提供了这种模型的交叉语言版本,其在两种韩语语言中产生了更好的泛化。最后,我们提供与朝鲜语言相关的各种NLP工具,这些工具将培养未来的研究。
translated by 谷歌翻译
NLP是与计算机或机器理解和解释人类语言的能力有关的人工智能和机器学习的一种形式。语言模型在文本分析和NLP中至关重要,因为它们允许计算机解释定性输入并将其转换为可以在其他任务中使用的定量数据。从本质上讲,在转移学习的背景下,语言模型通常在大型通用语料库上进行培训,称为预训练阶段,然后对特定的基本任务进行微调。结果,预训练的语言模型主要用作基线模型,该模型包含了对上下文的广泛掌握,并且可以进一步定制以在新的NLP任务中使用。大多数预训练的模型都经过来自Twitter,Newswire,Wikipedia和Web等通用领域的Corpora培训。在一般文本中训练的现成的NLP模型可能在专业领域效率低下且不准确。在本文中,我们提出了一个名为Securebert的网络安全语言模型,该模型能够捕获网络安全域中的文本含义,因此可以进一步用于自动化,用于许多重要的网络安全任务,否则这些任务将依靠人类的专业知识和繁琐的手动努力。 Securebert受到了我们从网络安全和一般计算域的各种来源收集和预处理的大量网络安全文本培训。使用我们提出的令牌化和模型权重调整的方法,Securebert不仅能够保留对一般英语的理解,因为大多数预训练的语言模型都可以做到,而且在应用于具有网络安全含义的文本时也有效。
translated by 谷歌翻译
Understanding customer feedback is becoming a necessity for companies to identify problems and improve their products and services. Text classification and sentiment analysis can play a major role in analyzing this data by using a variety of machine and deep learning approaches. In this work, different transformer-based models are utilized to explore how efficient these models are when working with a German customer feedback dataset. In addition, these pre-trained models are further analyzed to determine if adapting them to a specific domain using unlabeled data can yield better results than off-the-shelf pre-trained models. To evaluate the models, two downstream tasks from the GermEval 2017 are considered. The experimental results show that transformer-based models can reach significant improvements compared to a fastText baseline and outperform the published scores and previous models. For the subtask Relevance Classification, the best models achieve a micro-averaged $F1$-Score of 96.1 % on the first test set and 95.9 % on the second one, and a score of 85.1 % and 85.3 % for the subtask Polarity Classification.
translated by 谷歌翻译
预审前的语言模型在自然语言处理的各个领域都取得了成功,包括阅读理解任务。但是,当将机器学习方法应用于新域时,标记的数据可能并不总是可用。为了解决这个问题,我们使用对源域数据进行预处理的监督,以降低特定于域的下游任务的样本复杂性。我们通过将任务转移与域适应性相结合以微调验证的模型,而没有目标任务中的数据来评估特定于领域的阅读理解任务的零射击性能。我们的方法在4个域中的3个域中的下游域特异性阅读理解任务上超过了域自适应预测。
translated by 谷歌翻译
Seeking legal advice is often expensive. Recent advancements in machine learning for solving complex problems can be leveraged to help make legal services more accessible to the public. However, real-life applications encounter significant challenges. State-of-the-art language models are growing increasingly large, making parameter-efficient learning increasingly important. Unfortunately, parameter-efficient methods perform poorly with small amounts of data, which are common in the legal domain (where data labelling costs are high). To address these challenges, we propose parameter-efficient legal domain adaptation, which uses vast unsupervised legal data from public legal forums to perform legal pre-training. This method exceeds or matches the fewshot performance of existing models such as LEGAL-BERT on various legal tasks while tuning only approximately 0.1% of model parameters. Additionally, we show that our method can achieve calibration comparable to existing methods across several tasks. To the best of our knowledge, this work is among the first to explore parameter-efficient methods of tuning language models in the legal domain.
translated by 谷歌翻译
深层语言语言模型(LMS)如Elmo,BERT及其继任者通过预先训练单个模型来迅速缩放自然语言处理的景观,然后是任务特定的微调。此外,像XLM-R和MBERT这样的这种模型的多语言版本使得有希望的零射击交叉传输导致,可能在许多不足和资源不足的语言中实现NLP应用。由于此初步成功,预先接受的模型被用作“通用语言模型”作为不同任务,域和语言的起点。这项工作通过识别通用模型应该能够扩展的七个维度来探讨“普遍性”的概念,即同样良好或相当良好地执行,在不同的环境中有用。我们概述了当前支持这些维度的模型性能的当前理论和经验结果,以及可能有助于解决其当前限制的扩展。通过这项调查,我们为理解大规模上下文语言模型的能力和限制奠定了基础,并帮助辨别研究差距和未来工作的方向,使这些LMS包含多样化和公平的应用,用户和语言现象。
translated by 谷歌翻译
基于方面的情绪分析(ABSA)是一种文本分析方法,其定义了与特定目标相关的某些方面的意见的极性。 ABSA的大部分研究都是英文,阿拉伯语有少量的工作。最先前的阿拉伯语研究依赖于深度学习模型,主要依赖于独立于上下文的单词嵌入(例如,e.g.word2vec),其中每个单词都有一个独立于其上下文的固定表示。本文探讨了从预先培训的语言模型(如BERT)的上下文嵌入的建模功能,例如BERT,以及在阿拉伯语方面情感极度分类任务中使用句子对输入。特别是,我们开发一个简单但有效的基于伯特的神经基线来处理这项任务。根据三种不同阿拉伯语数据集的实验结果,我们的BERT架构与简单的线性分类层超出了最先进的作品。在Arabic Hotel评论数据库中实现了89.51%的准确性,73%的人类注册书评论数据集和阿拉伯新闻数据集的85.73%。
translated by 谷歌翻译
由于表现强劲,预用的语言模型已成为许多NLP任务的标准方法,但他们培训价格昂贵。我们提出了一个简单高效的学习框架TLM,不依赖于大规模预制。给定一些标记的任务数据和大型常规语料库,TLM使用任务数据作为查询来检索一般语料库的微小子集,并联合优化任务目标和从头开始的语言建模目标。在四个域中的八个分类数据集上,TLM实现了比预用语言模型(例如Roberta-Light)更好地或类似的结果,同时减少了两个数量级的训练拖鞋。高精度和效率,我们希望TLM将有助于民主化NLP并加快发展。
translated by 谷歌翻译
The field of cybersecurity is evolving fast. Experts need to be informed about past, current and - in the best case - upcoming threats, because attacks are becoming more advanced, targets bigger and systems more complex. As this cannot be addressed manually, cybersecurity experts need to rely on machine learning techniques. In the texutual domain, pre-trained language models like BERT have shown to be helpful, by providing a good baseline for further fine-tuning. However, due to the domain-knowledge and many technical terms in cybersecurity general language models might miss the gist of textual information, hence doing more harm than good. For this reason, we create a high-quality dataset and present a language model specifically tailored to the cybersecurity domain, which can serve as a basic building block for cybersecurity systems that deal with natural language. The model is compared with other models based on 15 different domain-dependent extrinsic and intrinsic tasks as well as general tasks from the SuperGLUE benchmark. On the one hand, the results of the intrinsic tasks show that our model improves the internal representation space of words compared to the other models. On the other hand, the extrinsic, domain-dependent tasks, consisting of sequence tagging and classification, show that the model is best in specific application scenarios, in contrast to the others. Furthermore, we show that our approach against catastrophic forgetting works, as the model is able to retrieve the previously trained domain-independent knowledge. The used dataset and trained model are made publicly available
translated by 谷歌翻译
变压器负责自然语言处理的绝大多数近期进步。这些模型的大多数实际的自然语言处理应用程序通常通过转移学习启用。本文研究了用于微调用于微调的特异性标记提高了模型的结果。通过一系列实验,我们证明这种令牌化与词汇令牌的初始化和微调策略相结合,加速了转移并提高了微调模型的性能。我们称之为转让促进词汇转移的这个方面。
translated by 谷歌翻译
随着预培训的语言模型变得更加要求资源,因此资源丰富的语言(例如英语和资源筛选)语言之间的不平等正在恶化。这可以归因于以下事实:每种语言中的可用培训数据量都遵循幂律分布,并且大多数语言都属于分布的长尾巴。一些研究领域试图缓解这个问题。例如,在跨语言转移学习和多语言培训中,目标是通过从资源丰富的语言中获得的知识使长尾语言受益。尽管成功,但现有工作主要集中于尝试尽可能多的语言。结果,有针对性的深入分析主要不存在。在这项研究中,我们专注于单一的低资源语言,并使用跨语性培训(XPT)进行广泛的评估和探测实验。为了使转移方案具有挑战性,我们选择韩语作为目标语言,因为它是一种孤立的语言,因此与英语几乎没有类型的分类。结果表明,XPT不仅优于表现或与单语模型相当,该模型训练有大小的数据,而且在传输过程中也很高。
translated by 谷歌翻译