大多数NER方法都依赖于广泛的标记数据进行模型培训,这些数据在低资源场景中挣扎,培训数据有限。与资源丰富的源域相比,现有的主要方法通常会遇到目标域具有不同标签集的挑战,该标签集可以作为类传输和域转移得出的结论。在本文中,我们通过可拔出的提示(Lightner)提出了一个轻巧的调整范式,用于低资源。具体而言,我们构建了实体类别的统一可学习的语言器,以生成实体跨度序列和实体类别,而无需任何标签特定的分类器,从而解决了类转移问题。我们通过将可学习的参数纳入自我发言层作为指导,进一步提出了一个可插入的指导模块,该参数可以重新调节注意力并调整预训练的权重。请注意,我们仅通过修复了预训练的语言模型的整个参数来调整那些插入的模块,从而使我们的方法轻巧且灵活地适合低资源场景,并且可以更好地跨域传输知识。实验结果表明,Lightner可以在标准监督环境中获得可比的性能,并且在低资源设置中优于强大基线。代码在https://github.com/zjunlp/deepke/tree/main/main/example/ner/few-shot中。
translated by 谷歌翻译
Pre-trained Language Models (PLMs) have been applied in NLP tasks and achieve promising results. Nevertheless, the fine-tuning procedure needs labeled data of the target domain, making it difficult to learn in low-resource and non-trivial labeled scenarios. To address these challenges, we propose Prompt-based Text Entailment (PTE) for low-resource named entity recognition, which better leverages knowledge in the PLMs. We first reformulate named entity recognition as the text entailment task. The original sentence with entity type-specific prompts is fed into PLMs to get entailment scores for each candidate. The entity type with the top score is then selected as final label. Then, we inject tagging labels into prompts and treat words as basic units instead of n-gram spans to reduce time complexity in generating candidates by n-grams enumeration. Experimental results demonstrate that the proposed method PTE achieves competitive performance on the CoNLL03 dataset, and better than fine-tuned counterparts on the MIT Movie and Few-NERD dataset in low-resource settings.
translated by 谷歌翻译
几乎没有命名的实体识别(NER)对于在有限的资源领域中标记的实体标记至关重要,因此近年来受到了适当的关注。现有的几声方法主要在域内设置下进行评估。相比之下,对于这些固有的忠实模型如何使用一些标记的域内示例在跨域NER中执行的方式知之甚少。本文提出了一种两步以理性为中心的数据增强方法,以提高模型的泛化能力。几个数据集中的结果表明,与先前的最新方法相比,我们的模型无形方法可显着提高跨域NER任务的性能,包括反事实数据增强和及时调用方法。我们的代码可在\ url {https://github.com/lifan-yuan/factmix}上获得。
translated by 谷歌翻译
对于自然语言处理中的许多任务,将知识从一个域转移到另一个领域至关重要,尤其是当目标域中的可用数据量受到限制时。在这项工作中,我们在指定实体识别(NER)的背景下提出了一种新颖的域适应方法。我们提出了一种两步方法,该方法由可变基本模块和模板模块组成,该模块在简单的描述模式的帮助下利用了预训练的语言模型中捕获的知识。我们的方法简单而通用,可以在几次射击和零拍设置中应用。评估我们在许多不同数据集中的轻量级方法表明,它可以将最新基准的性能提高2-5%的F1分数。
translated by 谷歌翻译
提示方法被认为是几次自然语言处理的关键进展之一。最近对基于离散令牌的``硬提示''转移到连续``软提示''的最新研究,这些提示将可学习的向量用作伪提示代币并实现更好的性能。尽管显示出有希望的前景,但观察到这些软宣传的方法在很大程度上依赖良好的初始化来生效。不幸的是,获得软提示的完美初始化需要了解内在语言模型的工作和精心设计,这绝非易事,必须从头开始重新启动每个新任务。为了解决此问题,我们提出了一种称为Metaprompting的广义软提示方法,该方法采用了良好认可的模型 - 静态元学习算法,以自动找到更好的及时初始化,从而快速适应新的促进任务。问题并在四个不同的数据集上带来了显着改善(1次设置的准确性提高了6分),从而实现了新的最新性能。
translated by 谷歌翻译
神经桌面到文本的生成方法是渴望数据的,限制了它们对低资源现实世界应用的适应性。先前的工作主要诉诸于训练的语言模型(PLM),以生成表格的表格摘要。但是,由于PLM的性质不受控制,它们通常包含幻觉内容。此外,很少研究表和序列之间的拓扑差异。最后但并非最不重要的一点是,在PLM上进行少量实例进行微调可能会导致过度贴合和灾难性的遗忘。为了减轻这些问题,我们提出了一种基于及时的方法,前缀控制的发电机(即PCG),用于几乎没有表格到文本的生成。我们为PLM的特定于任务的前缀预备,以使表结构更适合预训练的输入。此外,我们生成一个特定于输入的前缀,以控制生成的文本的事实内容和单词顺序。对Wikibio数据集的不同领域(人类,书籍和歌曲)的自动评估和人类评估都显示出对基线方法的实质性改进。
translated by 谷歌翻译
Few-shot named entity recognition (NER) targets generalizing to unseen labels and/or domains with few labeled examples. Existing metric learning methods compute token-level similarities between query and support sets, but are not able to fully incorporate label semantics into modeling. To address this issue, we propose a simple method to largely improve metric learning for NER: 1) multiple prompt schemas are designed to enhance label semantics; 2) we propose a novel architecture to effectively combine multiple prompt-based representations. Empirically, our method achieves new state-of-the-art (SOTA) results under 16 of the 18 considered settings, substantially outperforming the previous SOTA by an average of 8.84% and a maximum of 34.51% in relative gains of micro F1. Our code is available at https://github.com/AChen-qaq/ProML.
translated by 谷歌翻译
Controllable Text Generation (CTG) is emerging area in the field of natural language generation (NLG). It is regarded as crucial for the development of advanced text generation technologies that are more natural and better meet the specific constraints in practical applications. In recent years, methods using large-scale pre-trained language models (PLMs), in particular the widely used transformer-based PLMs, have become a new paradigm of NLG, allowing generation of more diverse and fluent text. However, due to the lower level of interpretability of deep neural networks, the controllability of these methods need to be guaranteed. To this end, controllable text generation using transformer-based PLMs has become a rapidly growing yet challenging new research hotspot. A diverse range of approaches have emerged in the recent 3-4 years, targeting different CTG tasks which may require different types of controlled constraints. In this paper, we present a systematic critical review on the common tasks, main approaches and evaluation methods in this area. Finally, we discuss the challenges that the field is facing, and put forward various promising future directions. To the best of our knowledge, this is the first survey paper to summarize CTG techniques from the perspective of PLMs. We hope it can help researchers in related fields to quickly track the academic frontier, providing them with a landscape of the area and a roadmap for future research.
translated by 谷歌翻译
预训练模型已在许多代码智能任务中有效。这些模型在大规模未标记的语料库中进行了预训练,然后在下游任务中进行了微调。但是,由于预训练和下游任务的输入是不同的形式,因此很难充分探索预训练模型的知识。此外,微调的性能强烈依赖于下游数据的量,而实际上,具有稀缺数据的场景很常见。自然语言处理(NLP)领域的最新研究表明,迅速调整,一种调整的新范式,减轻上述问题并在各种NLP任务中实现了有希望的结果。在迅速调整中,在调整过程中插入的提示提供了特定于任务的知识,这对于具有相对较少数据的任务特别有益。在本文中,我们凭经验评估了代码智能任务中迅速调整的用法和效果。我们对流行的预训练模型Codebert和codet5进行及时调整,并尝试三个代码智能任务,包括缺陷预测,代码摘要和代码翻译。我们的实验结果表明,在所有三个任务中,迅速调整始终优于微调。此外,及时调整在低资源场景中显示出很大的潜力,例如,对于代码摘要,平均将微调的BLEU分数提高了26%以上。我们的结果表明,我们可以调整代码智能任务的迅速调整,以实现更好的性能,尤其是在缺乏特定于任务的数据时,我们可以调整及时调整。
translated by 谷歌翻译
迅速的学习方法通​​过诱导更好的几次表现,在他们仍然遵循基于参数的学习范式的同时,引起了自然语言处理的波动。学习中的遗忘和死记硬背的记忆问题可能会遇到不稳定的概括问题。具体而言,香草及时的学习可能难以利用死记硬背的非典型实例,在完全监督的培训或过度贴身模式的情况下使用低射击数据。为了减轻此类局限性,我们以将知识从记忆中解耦的动机发展为有助于模型在概括和记忆之间取得平衡。与香草及时学习相反,重新启动构造了培训实例中的开放式知识店,并在输入,培训和推理过程中实现检索机制,从而使该模型能够从培训语料库中检索相关环境作为能力为提示增强。广泛的实验表明,Retroppt可以在几次射击和零拍设置中获得更好的性能。此外,我们进一步说明,我们提出的撤退可以通过新数据集获得更好的概括能力。对记忆的详细分析确实显示逆转可以减少语言模型对记忆的依赖;因此,改善下游任务的概括。
translated by 谷歌翻译
The spread of rumors along with breaking events seriously hinders the truth in the era of social media. Previous studies reveal that due to the lack of annotated resources, rumors presented in minority languages are hard to be detected. Furthermore, the unforeseen breaking events not involved in yesterday's news exacerbate the scarcity of data resources. In this work, we propose a novel zero-shot framework based on prompt learning to detect rumors falling in different domains or presented in different languages. More specifically, we firstly represent rumor circulated on social media as diverse propagation threads, then design a hierarchical prompt encoding mechanism to learn language-agnostic contextual representations for both prompts and rumor data. To further enhance domain adaptation, we model the domain-invariant structural features from the propagation threads, to incorporate structural position representations of influential community response. In addition, a new virtual response augmentation method is used to improve model training. Extensive experiments conducted on three real-world datasets demonstrate that our proposed model achieves much better performance than state-of-the-art methods and exhibits a superior capacity for detecting rumors at early stages.
translated by 谷歌翻译
我们展示了一个新的开源和可扩展知识提取工具包,称为Deepke(基于深度学习的知识提取),支持标准完全监督,低资源少拍摄和文档级方案。 Deepke实现了各种信息提取任务,包括命名实体识别,关系提取和属性提取。使用统一的框架,DeePke允许开发人员和研究人员根据其要求,自定义数据集和模型以从非结构化文本中提取信息。具体而言,DeePke不仅为不同的任务和场景提供了各种功能模块和模型实现,而且还通过一致的框架组织所有组件以维持足够的模块化和可扩展性。此外,我们在\ URL {http://deepke.zjukg.cn/}中介绍一个在线平台,用于实时提取各种任务。 Deepke已经配备了Google Colab教程和初学者的综合文件。我们用演示视频发布\ url {https://github.com/zjunlp/deepke}源代码。
translated by 谷歌翻译
This work introduces a new multi-task, parameter-efficient language model (LM) tuning method that learns to transfer knowledge across different tasks via a mixture of soft prompts-small prefix embedding vectors pre-trained for different tasks. Our method, called ATTEMPT (ATTEntional Mixtures of Prompt Tuning), obtains source prompts as encodings of large-scale source tasks into a small number of parameters and trains an attention module to interpolate the source prompts and a newly initialized target prompt for every instance in the target task. During training, only the target task prompt and the attention weights, which are shared between tasks in multi-task training, are updated, while the original LM and source prompts are intact. ATTEMPT is highly parameter-efficient (e.g., updates 2,300 times fewer parameters than full fine-tuning) while achieving high task performance using knowledge from high-resource tasks. Moreover, it is modular using pre-trained soft prompts, and can flexibly add or remove source prompts for effective knowledge transfer. Our experimental results across 21 diverse NLP datasets show that ATTEMPT significantly outperforms prompt tuning and outperforms or matches fully fine-tuned or other parameter-efficient tuning approaches that use over ten times more parameters. Finally, ATTEMPT outperforms previous work in few-shot learning settings.
translated by 谷歌翻译
Recent advances in NLP are brought by a range of large-scale pretrained language models (PLMs). These PLMs have brought significant performance gains for a range of NLP tasks, circumventing the need to customize complex designs for specific tasks. However, most current work focus on finetuning PLMs on a domain-specific datasets, ignoring the fact that the domain gap can lead to overfitting and even performance drop. Therefore, it is practically important to find an appropriate method to effectively adapt PLMs to a target domain of interest. Recently, a range of methods have been proposed to achieve this purpose. Early surveys on domain adaptation are not suitable for PLMs due to the sophisticated behavior exhibited by PLMs from traditional models trained from scratch and that domain adaptation of PLMs need to be redesigned to take effect. This paper aims to provide a survey on these newly proposed methods and shed light in how to apply traditional machine learning methods to newly evolved and future technologies. By examining the issues of deploying PLMs for downstream tasks, we propose a taxonomy of domain adaptation approaches from a machine learning system view, covering methods for input augmentation, model optimization and personalization. We discuss and compare those methods and suggest promising future research directions.
translated by 谷歌翻译
我们研究了很少的细粒实体键入(FET)的问题,其中只有几个带注释的实体对每种实体类型提供了上下文。最近,基于及时的调整通过将实体类型分类任务作为“填补空白”的问题来表明在几次射击方案中表现出优越的性能。这允许有效利用预训练的语言模型(PLM)的强语建模能力。尽管当前基于及时的调整方法成功了,但仍有两个主要挑战:(1)提示中的口头化器要么是由外部知识基础手动设计或构建的,而无需考虑目标语料库和标签层次结构信息,而且(2)当前方法主要利用PLM的表示能力,但没有通过广泛的通用域预训练来探索其产生的功率。在这项工作中,我们为由两个模块组成的几个弹药fet提出了一个新颖的框架:(1)实体类型标签解释模块自动学习将类型标签与词汇联系起来,通过共同利用几个播放实例和标签层次结构和标签层次结构,以及(2)基于类型的上下文化实例生成器根据给定实例生成新实例,以扩大培训集以更好地概括。在三个基准数据集上,我们的模型优于大量利润的现有方法。可以在https://github.com/teapot123/fine-graining-entity-typing上找到代码。
translated by 谷歌翻译
最近的参数效率语言模型调整(PELT)方法可以使微调的性能与较少的可训练参数相匹配,并且在训练数据受到限制时尤其表现良好。但是,不同的PELT方法在相同的任务上的性能可能会有所不同,因此为特定任务选择最合适的方法是不平凡的,尤其是考虑到快速增长的新PELT方法和任务。鉴于模型多样性和模型选择的难度,我们提出了一个统一的框架Unipelt,该框架将不同的毛皮方法纳入了子模型,并学会了激活最适合当前数据或通过门控机制设置的方法。在胶水基准上,与最佳的单个毛皮方法相比,UniPelt始终达到1〜4%的增长,而其融合甚至超过了不同设置下的微调。此外,UniPelt通常超过上限,该上限在每个任务上单独使用的所有子模型的最佳性能,表明多种PELT方法的混合物可能本质上比单个方法更有效。
translated by 谷歌翻译
预训练的语言模型(PLM)在各种自然语言理解任务上取得了巨大的成功。另一方面,对PLM的简单微调对于特定于领域的任务可能是次优的,因为它们不可能涵盖所有域中的知识。尽管PLM的自适应预培训可以帮助他们获得特定于领域的知识,但需要大量的培训成本。此外,自适应预训练可能会通过造成灾难性忘记其常识来损害PLM在下游任务上的表现。为了克服PLM适应性适应性预训练的这种局限性,我们提出了一个新颖的域名适应框架,用于将PLMS创造为知识增强语言模型适应性(KALA),该框架调节了PLM的中间隐藏表示与域中的中间隐藏表示,由实体和实体和实体和实体和实体构成他们的关系事实。我们验证了Kala在问题答案中的性能,并在各个域的多个数据集上命名实体识别任务。结果表明,尽管在计算上有效,但我们的Kala在很大程度上优于适应性预训练。代码可在以下网址获得:https://github.com/nardien/kala/。
translated by 谷歌翻译
We present Pre-trained Machine Reader (PMR), a novel method to retrofit Pre-trained Language Models (PLMs) into Machine Reading Comprehension (MRC) models without acquiring labeled data. PMR is capable of resolving the discrepancy between model pre-training and downstream fine-tuning of existing PLMs, and provides a unified solver for tackling various extraction tasks. To achieve this, we construct a large volume of general-purpose and high-quality MRC-style training data with the help of Wikipedia hyperlinks and design a Wiki Anchor Extraction task to guide the MRC-style pre-training process. Although conceptually simple, PMR is particularly effective in solving extraction tasks including Extractive Question Answering and Named Entity Recognition, where it shows tremendous improvements over previous approaches especially under low-resource settings. Moreover, viewing sequence classification task as a special case of extraction task in our MRC formulation, PMR is even capable to extract high-quality rationales to explain the classification process, providing more explainability of the predictions.
translated by 谷歌翻译
Parameter-efficient methods (like Prompt or Adapters) for adapting pre-trained language models to downstream tasks have been popular recently. However, hindrances still prevent these methods from reaching their full potential. For example, two significant challenges are few-shot adaptation and cross-task generalization ability. To tackle these issues, we propose a general framework to enhance the few-shot adaptation and cross-domain generalization ability of parameter-efficient methods. In our framework, we prime the self-supervised model for parameter-efficient methods to rapidly adapt to various downstream few-shot tasks. To evaluate the authentic generalization ability of these parameter-efficient methods, we conduct experiments on a few-shot cross-domain benchmark containing 160 diverse NLP tasks. The experiment result reveals that priming by tuning PLM only with extra training tasks leads to the best performance. Also, we perform a comprehensive analysis of various parameter-efficient methods under few-shot cross-domain scenarios.
translated by 谷歌翻译
Abstractive summarization is the process of generating a summary given a document as input. Although significant progress has been made, the factual inconsistency between the document and the generated summary still limits its practical applications. Previous work found that the probabilities assigned by the generation model reflect its preferences for the generated summary, including the preference for factual consistency, and the preference for the language or knowledge prior as well. To separate the preference for factual consistency, we propose an unsupervised framework named CoP by controlling the preference of the generation model with the help of prompt. More specifically, the framework performs an extra inference step in which a text prompt is introduced as an additional input. In this way, another preference is described by the generation probability of this extra inference process. The difference between the above two preferences, i.e. the difference between the probabilities, could be used as measurements for detecting factual inconsistencies. Interestingly, we found that with the properly designed prompt, our framework could evaluate specific preferences and serve as measurements for fine-grained categories of inconsistency, such as entity-related inconsistency, coreference-related inconsistency, etc. Moreover, our framework could also be extended to the supervised setting to learn better prompt from the labeled data as well. Experiments show that our framework achieves new SOTA results on three factual inconsistency detection tasks.
translated by 谷歌翻译