在本文中,我们描述了我们参与Case-2022的子任务1,即与休闲新闻语料库的事件因果关系识别。我们通过在少数带注释的示例(即几次配置)上利用一组简单但互补的技术来解决因果关系识别(CRI)任务。我们遵循一种基于迅速的预测方法,用于微调LMS,其中CRI任务被视为掩盖语言建模问题(MLM)。这种方法允许LMS在MLM问题上进行本地预先训练,可以直接生成对CRI特异性提示的文本响应。我们将此方法的性能与在整个数据集中训练的集合技术进行比较。我们表现​​最佳的提交仅接受了每班256个实例,整个数据集的一小部分培训,但能够获得第二好的精度(0.82),第三好的精度(0.82)和F1得分。 (0.85)非常接近获胜者团队(0.86)的报道。
translated by 谷歌翻译
The recent GPT-3 model (Brown et al., 2020) achieves remarkable few-shot performance solely by leveraging a natural-language prompt and a few task demonstrations as input context. Inspired by their findings, we study few-shot learning in a more practical scenario, where we use smaller language models for which fine-tuning is computationally efficient. We present LM-BFF-better few-shot fine-tuning of language models 1 -a suite of simple and complementary techniques for finetuning language models on a small number of annotated examples. Our approach includes (1) prompt-based fine-tuning together with a novel pipeline for automating prompt generation; and (2) a refined strategy for dynamically and selectively incorporating demonstrations into each context. Finally, we present a systematic evaluation for analyzing few-shot performance on a range of NLP tasks, including classification and regression. Our experiments demonstrate that our methods combine to dramatically outperform standard fine-tuning procedures in this low resource setting, achieving up to 30% absolute improvement, and 11% on average across all tasks. Our approach makes minimal assumptions on task resources and domain expertise, and hence constitutes a strong task-agnostic method for few-shot learning. 2 * The first two authors contributed equally. 1 Alternatively, language models' best friends forever. 2 Our implementation is publicly available at https:// github.com/princeton-nlp/LM-BFF.
translated by 谷歌翻译
最近,与“预训练,及时和预测”的新范式相比,与“预训练,微调”范式相比,新的范式“预训练,及时和预测”取得了显着的成就。在基于及时的GPT-3成功之后,一系列基于蒙版的语言模型(MLM)(例如Bert,Roberta)及时学习方法变得流行并广泛使用。但是,另一个有效的预训练的判别模型Electra可能被忽略了。在本文中,我们尝试使用拟议的替换代替令牌检测(RTD)基于基于的及时学习方法来完成零摄像的几个NLP任务。实验结果表明,基于RTD-Prompt学习的Electra模型可达到令人惊讶的最先进的零拍性能。在数字上,与MLM-Roberta-Large和MLM-Bert-Large相比,我们的RTD-Electra-Large在所有15个任务上平均提高了约8.4%和13.7%。特别是在SST-2任务上,我们的RTD-Electra-Large在没有任何培训数据的情况下达到了令人惊讶的90.1%精度。总体而言,与预先训练的蒙版语言模型相比,预先训练的代替令牌检测模型在零拍学习中的性能更好。因此,Electra是一位出色的零球学习者。源代码可在以下网址获得:https://github.com/nishiwen1214/rtd-electra。
translated by 谷歌翻译
We describe PromptBoosting, a query-efficient procedure for building a text classifier from a neural language model (LM) without access to the LM's parameters, gradients, or hidden representations. This form of "black-box" classifier training has become increasingly important as the cost of training and inference in large-scale LMs grows. But existing black-box LM classifier learning approaches are themselves computationally inefficient, typically specializing LMs to the target task by searching in a large space of (discrete or continuous) prompts using zeroth-order optimization methods. Instead of directly optimizing in prompt space, PromptBoosting obtains a small pool of prompts via a gradient-free approach and then constructs a large pool of weak learners by pairing these prompts with different elements of the LM's output distribution. These weak learners are then ensembled using the AdaBoost algorithm. The entire learning process requires only a small number of forward passes and no backward pass. Experiments show that PromptBoosting achieves state-of-the-art performance in multiple black-box few-shot classification tasks, and matches or outperforms full fine-tuning in both few-shot and standard learning paradigms, while training 10x faster than existing black-box methods.
translated by 谷歌翻译
The remarkable success of pretrained language models has motivated the study of what kinds of knowledge these models learn during pretraining. Reformulating tasks as fillin-the-blanks problems (e.g., cloze tests) is a natural approach for gauging such knowledge, however, its usage is limited by the manual effort and guesswork required to write suitable prompts. To address this, we develop AUTOPROMPT, an automated method to create prompts for a diverse set of tasks, based on a gradient-guided search. Using AUTO-PROMPT, we show that masked language models (MLMs) have an inherent capability to perform sentiment analysis and natural language inference without additional parameters or finetuning, sometimes achieving performance on par with recent state-of-the-art supervised models. We also show that our prompts elicit more accurate factual knowledge from MLMs than the manually created prompts on the LAMA benchmark, and that MLMs can be used as relation extractors more effectively than supervised relation extraction models. These results demonstrate that automatically generated prompts are a viable parameter-free alternative to existing probing methods, and as pretrained LMs become more sophisticated and capable, potentially a replacement for finetuning.
translated by 谷歌翻译
提示将下游应用程序作为语言建模任务施放,与使用预训练的模型进行标准微调相比,已显示出样本有效的效率。但是,提示的一个陷阱是需要手动设计的模式,其结果可能是不直觉的,需要大量的验证集来调整。为了应对挑战,我们提出了一种全自动提示方法Autoseq:(1)我们在序列到序列模型上采用自然语言提示,从而实现自由形式生成和更大的标签搜索空间; (2)我们提出了标签序列 - 无限长度的短语以口头表达标签 - 这消除了手动模板的需求,并且比单个标签单词更具有表现力; (3)我们使用Beam Search自动生成大量的标签序列候选物,并提出对比度重新排列以获得最佳组合。 Autoseq显着胜过其他无手动设计方法,例如软提示调整,适配器调整和自动搜索单个标签单词;生成的标签序列比各种任务上的精选手动序列更好。我们的方法揭示了几次学习中序列模型的潜力,并阐明了通用通用和自动提示的途径。本文的源代码可以从https://github.com/thunlp/seq2seq-prompt获得。
translated by 谷歌翻译
We introduce a new language representation model called BERT, which stands for Bidirectional Encoder Representations from Transformers. Unlike recent language representation models (Peters et al., 2018a;Radford et al., 2018), BERT is designed to pretrain deep bidirectional representations from unlabeled text by jointly conditioning on both left and right context in all layers. As a result, the pre-trained BERT model can be finetuned with just one additional output layer to create state-of-the-art models for a wide range of tasks, such as question answering and language inference, without substantial taskspecific architecture modifications.BERT is conceptually simple and empirically powerful. It obtains new state-of-the-art results on eleven natural language processing tasks, including pushing the GLUE score to 80.5% (7.7% point absolute improvement), MultiNLI accuracy to 86.7% (4.6% absolute improvement), SQuAD v1.1 question answering Test F1 to 93.2 (1.5 point absolute improvement) and SQuAD v2.0 Test F1 to 83.1 (5.1 point absolute improvement).
translated by 谷歌翻译
大型预训练的语言模型(PLM)的最新进展导致了自然语言理解(NLU)任务的令人印象深刻的增长,并具有特定于任务的微调。但是,直接调整PLM在很大程度上依赖大量的标记实例,这些实例通常很难获得。迅速对PLM的调整已被证明对各种少数次任务很有价值。现有的作品研究基于迅速的NLU任务的基于及时的调整,主要集中于用语言器来得出正确的标签单词或生成及时的模板,以从PLM中启发语义。此外,还对常规数据增强方法进行了验证,可用于少量射击任务。但是,目前几乎没有针对基于及时的调整范式设计的数据增强方法。因此,我们研究了迅速的少数射击学习者的新数据增强问题。由于标签语义对于迅速的调整至关重要,因此我们提出了一种新颖的标签引导数据增强方法促进DA,该方法利用了丰富的标签语义信息以进行数据增强。很少的文本分类任务的广泛实验结果表明,我们提出的框架通过有效利用标签语义和数据扩展来实现自然语言理解来实现卓越的性能。
translated by 谷歌翻译
Pre-trained large language models can efficiently interpolate human-written prompts in a natural way. Multitask prompted learning can help generalization through a diverse set of tasks at once, thus enhancing the potential for more effective downstream fine-tuning. To perform efficient multitask-inference in the same batch, parameter-efficient fine-tuning methods such as prompt tuning have been proposed. However, the existing prompt tuning methods may lack generalization. We propose SPT, a semi-parametric prompt tuning method for multitask prompted learning. The novel component of SPT is a memory bank from where memory prompts are retrieved based on discrete prompts. Extensive experiments, such as (i) fine-tuning a full language model with SPT on 31 different tasks from 8 different domains and evaluating zero-shot generalization on 9 heldout datasets under 5 NLP task categories and (ii) pretraining SPT on the GLUE datasets and evaluating fine-tuning on the SuperGLUE datasets, demonstrate effectiveness of SPT.
translated by 谷歌翻译
How can we extend a pre-trained model to many language understanding tasks, without labeled or additional unlabeled data? Pre-trained language models (PLMs) have been effective for a wide range of NLP tasks. However, existing approaches either require fine-tuning on downstream labeled datasets or manually constructing proper prompts. In this paper, we propose nonparametric prompting PLM (NPPrompt) for fully zero-shot language understanding. Unlike previous methods, NPPrompt uses only pre-trained language models and does not require any labeled data or additional raw corpus for further fine-tuning, nor does it rely on humans to construct a comprehensive set of prompt label words. We evaluate NPPrompt against previous major few-shot and zero-shot learning methods on diverse NLP tasks: including text classification, text entailment, similar text retrieval, and paraphrasing. Experimental results demonstrate that our NPPrompt outperforms the previous best fully zero-shot method by big margins, with absolute gains of 12.8% in accuracy on text classification and 18.9% on the GLUE benchmark.
translated by 谷歌翻译
已显示迅速学习可以在大多数文本分类任务中实现近调调节性能,但很少有培训示例。对于样品稀缺的NLP任务是有利的。在本文中,我们试图将其应用于实际情况,即恢复信息提取,并增强现有方法,以使其更适用于简历信息提取任务。特别是,我们根据简历的文本特征创建了多组手动模板和语言器。此外,我们比较了蒙版语言模型(MLM)预培训语言模型(PLM)和SEQ2SEQ PLM在此任务上的性能。此外,我们改进了口头设计的设计方法,用于知识渊博的及时调整,以便为其他基于应用程序的NLP任务的迅速模板和语言设计的设计提供了示例。在这种情况下,我们提出了手动知识渊博的语言器(MKV)的概念。构造与应用程序方案相对应的知识渊博的口头表的规则。实验表明,基于我们的规则设计的模板和言语器比现有的手动模板更有效,更强大,并自动生成及时方法。已经确定,当前可用的自动提示方法无法与手动设计的及时模板竞争一些现实的任务方案。最终混淆矩阵的结果表明,我们提出的MKV显着解决了样本不平衡问题。
translated by 谷歌翻译
最紧迫的社会问题之一是与虚假新闻的斗争。虚假的主张很难暴露,造成了很多损害。为了解决这个问题,事实验证变得至关重要,因此是不同研究社区中感兴趣的话题。仅使用数据的文本形式,我们建议解决问题的解决方案,并通过其他方法实现竞争结果。我们基于两种方法(基于训练的语言模型)基于两种方法和基于提示的方法提供解决方案。基于PLM的方法使用传统的监督学习,其中训练模型以“ X”为输入和输出预测为P(Y | X)。鉴于,基于及时的学习反映了设计输入以适合模型的想法,以便可以将原始目标重新构成(掩盖)语言建模的问题。我们可能会进一步刺激PLM提供的丰富知识,以通过采用额外提示来微调PLM,以更好地完成下游任务。我们的实验表明,所提出的方法的性能不仅仅是微调PLM。我们在Trancify数据集中获得了0.6946的F1分数,在比赛负责人板上获得了第七名。
translated by 谷歌翻译
Pre-trained Language Models (PLMs) have been applied in NLP tasks and achieve promising results. Nevertheless, the fine-tuning procedure needs labeled data of the target domain, making it difficult to learn in low-resource and non-trivial labeled scenarios. To address these challenges, we propose Prompt-based Text Entailment (PTE) for low-resource named entity recognition, which better leverages knowledge in the PLMs. We first reformulate named entity recognition as the text entailment task. The original sentence with entity type-specific prompts is fed into PLMs to get entailment scores for each candidate. The entity type with the top score is then selected as final label. Then, we inject tagging labels into prompts and treat words as basic units instead of n-gram spans to reduce time complexity in generating candidates by n-grams enumeration. Experimental results demonstrate that the proposed method PTE achieves competitive performance on the CoNLL03 dataset, and better than fine-tuned counterparts on the MIT Movie and Few-NERD dataset in low-resource settings.
translated by 谷歌翻译
语言模型(LMS)已被证明在各种下游应用程序中很有用,例如摘要,翻译,问答和文本分类。由于它们可以存储的大量信息,LMS正在成为人工智能中越来越重要的工具。在这项工作中,我们提出了道具(提示为探测),该道具利用GPT-3(最初由OpenAI在2020年提出的大型语言模型)来执行知识基础构建任务(KBC)。 Prop实施了一种多步骤方法,该方法结合了各种提示技术来实现这一目标。我们的结果表明,手动提示策划是必不可少的,必须鼓励LM给出可变长度的答案集,特别是包括空的答案集,True/False问题是提高LM生成的建议精度的有用设备。 LM的大小是至关重要的因素,并且实体字典别名提高了LM评分。我们的评估研究表明,这些提出的技术可以大大提高最终预测的质量:Prop赢得了LM-KBC竞争的轨道2,表现优于基线36.4个百分点。我们的实施可在https://github.com/hemile/iswc-challenge上获得。
translated by 谷歌翻译
及时调整是将预训练模型调整到下游任务的极其有效的工具。但是,基于标准及时的方法主要考虑下游任务的足够数据的情况。目前尚不清楚是否可以将优势传输到几杆式制度,在每个下游任务中只有有限的数据。尽管有些作品证明了在几次弹奏设置下及时调整的潜力,但通过搜索离散提示或使用有限数据调整软提示的主流方法仍然非常具有挑战性。通过广泛的实证研究,我们发现迅速调整和完全微调之间的学习差距仍然存在差距。为了弥合差距,我们提出了一个新的及时调整框架,称为软模板调整(STT)。 STT结合了手册和自动提示,并将下游分类任务视为掩盖语言建模任务。对不同设置的全面评估表明,STT可以在不引入其他参数的情况下缩小微调和基于及时的方法之间的差距。值得注意的是,它甚至可以胜过情感分类任务的时间和资源消耗的微调方法。
translated by 谷歌翻译
With an increasing amount of data in the art world, discovering artists and artworks suitable to collectors' tastes becomes a challenge. It is no longer enough to use visual information, as contextual information about the artist has become just as important in contemporary art. In this work, we present a generic Natural Language Processing framework (called ArtLM) to discover the connections among contemporary artists based on their biographies. In this approach, we first continue to pre-train the existing general English language models with a large amount of unlabelled art-related data. We then fine-tune this new pre-trained model with our biography pair dataset manually annotated by a team of professionals in the art industry. With extensive experiments, we demonstrate that our ArtLM achieves 85.6% accuracy and 84.0% F1 score and outperforms other baseline models. We also provide a visualisation and a qualitative analysis of the artist network built from ArtLM's outputs.
translated by 谷歌翻译
迅速的学习方法通​​过诱导更好的几次表现,在他们仍然遵循基于参数的学习范式的同时,引起了自然语言处理的波动。学习中的遗忘和死记硬背的记忆问题可能会遇到不稳定的概括问题。具体而言,香草及时的学习可能难以利用死记硬背的非典型实例,在完全监督的培训或过度贴身模式的情况下使用低射击数据。为了减轻此类局限性,我们以将知识从记忆中解耦的动机发展为有助于模型在概括和记忆之间取得平衡。与香草及时学习相反,重新启动构造了培训实例中的开放式知识店,并在输入,培训和推理过程中实现检索机制,从而使该模型能够从培训语料库中检索相关环境作为能力为提示增强。广泛的实验表明,Retroppt可以在几次射击和零拍设置中获得更好的性能。此外,我们进一步说明,我们提出的撤退可以通过新数据集获得更好的概括能力。对记忆的详细分析确实显示逆转可以减少语言模型对记忆的依赖;因此,改善下游任务的概括。
translated by 谷歌翻译
Recent work has presented intriguing results examining the knowledge contained in language models (LM) by having the LM fill in the blanks of prompts such as "Obama is a by profession". These prompts are usually manually created, and quite possibly suboptimal; another prompt such as "Obama worked as a " may result in more accurately predicting the correct profession. Because of this, given an inappropriate prompt, we might fail to retrieve facts that the LM does know, and thus any given prompt only provides a lower bound estimate of the knowledge contained in an LM. In this paper, we attempt to more accurately estimate the knowledge contained in LMs by automatically discovering better prompts to use in this querying process. Specifically, we propose mining-based and paraphrasing-based methods to automatically generate high-quality and diverse prompts, as well as ensemble methods to combine answers from different prompts. Extensive experiments on the LAMA benchmark for extracting relational knowledge from LMs demonstrate that our methods can improve accuracy from 31.1% to 39.6%, providing a tighter lower bound on what LMs know. We have released the code and the resulting LM Prompt And Query Archive (LPAQA) at https://github. com/jzbjyb/LPAQA.1 Some models we use in this paper, e.g. BERT (Devlin et al., 2019), are bi-directional, and do not directly define probability distribution over text, which is the underlying definition of an LM. Nonetheless, we call them LMs for simplicity.
translated by 谷歌翻译
随着预训练的语言模型(PLM)的继续增长,精细调整PLM的硬件和数据要求也会增长。因此,研究人员提出了一种称为\ textit {提示学习}的较轻方法。但是,在调查过程中,我们观察到及时的学习方法是脆弱的,很容易被一些非法构造的提示攻击,从而导致分类错误和PLM的严重安全问题。当前的大多数研究都忽略了基于及时方法的安全问题。因此,在本文中,我们提出了一种恶意提示模板构建方法(\ textbf {stressAttack})来探测PLM的安全性能。研究了几种不友好的模板构建方法,以指导模型错误分类任务。在三个数据集和三个PLM上进行了广泛的实验证明了我们提出的方法提示的有效性。我们还进行实验,以验证我们的方法是否适用于几种镜头。
translated by 谷歌翻译
In this work, we explore "prompt tuning," a simple yet effective mechanism for learning "soft prompts" to condition frozen language models to perform specific downstream tasks. Unlike the discrete text prompts used by GPT-3, soft prompts are learned through backpropagation and can be tuned to incorporate signals from any number of labeled examples. Our end-to-end learned approach outperforms GPT-3's few-shot learning by a large margin. More remarkably, through ablations on model size using T5, we show that prompt tuning becomes more competitive with scale: as models exceed billions of parameters, our method "closes the gap" and matches the strong performance of model tuning (where all model weights are tuned). This finding is especially relevant because large models are costly to share and serve and the ability to reuse one frozen model for multiple downstream tasks can ease this burden. Our method can be seen as a simplification of the recently proposed "prefix tuning" of Li and Liang (2021) and we provide a comparison to this and other similar approaches. Finally, we show that conditioning a frozen model with soft prompts confers benefits in robustness to domain transfer and enables efficient "prompt ensembling." * Work done as a Google AI Resident.
translated by 谷歌翻译