Prompt tuning recently becomes a hot-spot in the applications of large pretrained language models on specific downstream tasks. Regarding the Language Model as a Service (LMaaS), black-box tuning using derivative-free optimization (DFO) provides a novel approach to expand the practical scenarios of pretrained models and enrich the researches of few-shot learning. In this report, we present our solution in this competition that is based on the LMaaS scenario. Our solution consists of several modifications to BBTv2, including multiple label words, selection of P0, rolling update strategy, multi-task loss from MLP classifier, and finally using the ensemble method to further improve generalization ability. We also shared some strategies that we tried but didn't use in the final submission for further discussion. In the end we raised a question about the SNLI dataset and the impact on the results, as well as our concerns about the competition.
translated by 谷歌翻译
非常大的预培训的语言模型(PTM)(如GPT-3)通常被释放为服务,允许用户设计特定于任务的提示以通过一些黑盒API查询PTMS。在这样的场景中,我们调用语言模型 - AS-Service(LMAAS),PTM的梯度通常不可用。我们可以通过仅访问模型推断API来优化任务提示吗?基于最近的观察结果,大型PTMS具有非常低的内在维度,这项工作提出了黑匣子调谐,通过无衍生算法优化PTM。特别是,我们通过迭代调用PTM推断API来调用CMA-es以优化预先提示的连续提示。我们的实验结果表明,黑匣子调整罗伯塔在少数标签样本上不仅显着优于手动提示和GPT-3的上下文学习,而且还超越了基于梯度的对应物,即提示调整和完整的模型调整。
translated by 谷歌翻译
With the evergrowing sizes of pre-trained models (PTMs), it has been an emerging practice to only provide the inference APIs for users, namely model-as-a-service (MaaS) setting. To adapt PTMs with model parameters frozen, most current approaches focus on the input side, seeking for powerful prompts to stimulate models for correct answers. However, we argue that input-side adaptation could be arduous due to the lack of gradient signals and they usually require thousands of API queries, resulting in high computation and time costs. In light of this, we present Decoder Tuning (DecT), which in contrast optimizes task-specific decoder networks on the output side. Specifically, DecT first extracts prompt-stimulated output scores for initial predictions. On top of that, we train an additional decoder network on the output representations to incorporate posterior data knowledge. By gradient-based optimization, DecT can be trained within several seconds and requires only one PTM query per sample. Empirically, we conduct extensive natural language understanding experiments and show that DecT significantly outperforms state-of-the-art algorithms with a $10^3\times$ speed-up.
translated by 谷歌翻译
We describe PromptBoosting, a query-efficient procedure for building a text classifier from a neural language model (LM) without access to the LM's parameters, gradients, or hidden representations. This form of "black-box" classifier training has become increasingly important as the cost of training and inference in large-scale LMs grows. But existing black-box LM classifier learning approaches are themselves computationally inefficient, typically specializing LMs to the target task by searching in a large space of (discrete or continuous) prompts using zeroth-order optimization methods. Instead of directly optimizing in prompt space, PromptBoosting obtains a small pool of prompts via a gradient-free approach and then constructs a large pool of weak learners by pairing these prompts with different elements of the LM's output distribution. These weak learners are then ensembled using the AdaBoost algorithm. The entire learning process requires only a small number of forward passes and no backward pass. Experiments show that PromptBoosting achieves state-of-the-art performance in multiple black-box few-shot classification tasks, and matches or outperforms full fine-tuning in both few-shot and standard learning paradigms, while training 10x faster than existing black-box methods.
translated by 谷歌翻译
How can we extend a pre-trained model to many language understanding tasks, without labeled or additional unlabeled data? Pre-trained language models (PLMs) have been effective for a wide range of NLP tasks. However, existing approaches either require fine-tuning on downstream labeled datasets or manually constructing proper prompts. In this paper, we propose nonparametric prompting PLM (NPPrompt) for fully zero-shot language understanding. Unlike previous methods, NPPrompt uses only pre-trained language models and does not require any labeled data or additional raw corpus for further fine-tuning, nor does it rely on humans to construct a comprehensive set of prompt label words. We evaluate NPPrompt against previous major few-shot and zero-shot learning methods on diverse NLP tasks: including text classification, text entailment, similar text retrieval, and paraphrasing. Experimental results demonstrate that our NPPrompt outperforms the previous best fully zero-shot method by big margins, with absolute gains of 12.8% in accuracy on text classification and 18.9% on the GLUE benchmark.
translated by 谷歌翻译
The recent GPT-3 model (Brown et al., 2020) achieves remarkable few-shot performance solely by leveraging a natural-language prompt and a few task demonstrations as input context. Inspired by their findings, we study few-shot learning in a more practical scenario, where we use smaller language models for which fine-tuning is computationally efficient. We present LM-BFF-better few-shot fine-tuning of language models 1 -a suite of simple and complementary techniques for finetuning language models on a small number of annotated examples. Our approach includes (1) prompt-based fine-tuning together with a novel pipeline for automating prompt generation; and (2) a refined strategy for dynamically and selectively incorporating demonstrations into each context. Finally, we present a systematic evaluation for analyzing few-shot performance on a range of NLP tasks, including classification and regression. Our experiments demonstrate that our methods combine to dramatically outperform standard fine-tuning procedures in this low resource setting, achieving up to 30% absolute improvement, and 11% on average across all tasks. Our approach makes minimal assumptions on task resources and domain expertise, and hence constitutes a strong task-agnostic method for few-shot learning. 2 * The first two authors contributed equally. 1 Alternatively, language models' best friends forever. 2 Our implementation is publicly available at https:// github.com/princeton-nlp/LM-BFF.
translated by 谷歌翻译
最近,与“预训练,及时和预测”的新范式相比,与“预训练,微调”范式相比,新的范式“预训练,及时和预测”取得了显着的成就。在基于及时的GPT-3成功之后,一系列基于蒙版的语言模型(MLM)(例如Bert,Roberta)及时学习方法变得流行并广泛使用。但是,另一个有效的预训练的判别模型Electra可能被忽略了。在本文中,我们尝试使用拟议的替换代替令牌检测(RTD)基于基于的及时学习方法来完成零摄像的几个NLP任务。实验结果表明,基于RTD-Prompt学习的Electra模型可达到令人惊讶的最先进的零拍性能。在数字上,与MLM-Roberta-Large和MLM-Bert-Large相比,我们的RTD-Electra-Large在所有15个任务上平均提高了约8.4%和13.7%。特别是在SST-2任务上,我们的RTD-Electra-Large在没有任何培训数据的情况下达到了令人惊讶的90.1%精度。总体而言,与预先训练的蒙版语言模型相比,预先训练的代替令牌检测模型在零拍学习中的性能更好。因此,Electra是一位出色的零球学习者。源代码可在以下网址获得:https://github.com/nishiwen1214/rtd-electra。
translated by 谷歌翻译
随着预训练的语言模型(PLM)的继续增长,精细调整PLM的硬件和数据要求也会增长。因此,研究人员提出了一种称为\ textit {提示学习}的较轻方法。但是,在调查过程中,我们观察到及时的学习方法是脆弱的,很容易被一些非法构造的提示攻击,从而导致分类错误和PLM的严重安全问题。当前的大多数研究都忽略了基于及时方法的安全问题。因此,在本文中,我们提出了一种恶意提示模板构建方法(\ textbf {stressAttack})来探测PLM的安全性能。研究了几种不友好的模板构建方法,以指导模型错误分类任务。在三个数据集和三个PLM上进行了广泛的实验证明了我们提出的方法提示的有效性。我们还进行实验,以验证我们的方法是否适用于几种镜头。
translated by 谷歌翻译
Natural language prompts have been shown to facilitate cross-task generalization for large language models. However, with no or limited labeled examples, the cross-task performance is highly sensitive to the choice of prompts, while selecting a high-performing prompt is challenging given the scarcity of labels. To address the issue, we propose a Zero-Label Prompt Selection (ZPS) method that selects prompts without any labeled data or gradient update. Specifically, given the candidate human-written prompts for a task, ZPS labels a set of unlabeled data with a prompt ensemble and uses the pseudo-labels for prompt selection. Experiments show that ZPS improves over prior methods by a sizeable margin in zero-label performance. We also extend ZPS to a few-shot setting and show its advantages over strong baselines such as prompt tuning and model tuning.
translated by 谷歌翻译
Pre-trained language models (PLMs) have exhibited remarkable few-shot learning capabilities when provided a few examples in a natural language prompt as demonstrations of test instances, i.e., in-context learning. However, the performance of in-context learning is susceptible to the choice of prompt format, training examples and the ordering of the training examples. In this paper, we propose a novel nearest-neighbor calibration framework for in-context learning to ease this issue. It is inspired by a phenomenon that the in-context learning paradigm produces incorrect labels when inferring training instances, which provides a useful supervised signal to calibrate predictions. Thus, our method directly augments the predictions with a $k$-nearest-neighbor ($k$NN) classifier over a datastore of cached few-shot instance representations obtained by PLMs and their corresponding labels. Then adaptive neighbor selection and feature regularization modules are introduced to make full use of a few support instances to reduce the $k$NN retrieval noise. Experiments on various few-shot text classification tasks demonstrate that our method significantly improves in-context learning, while even achieving comparable performance with state-of-the-art tuning-based approaches in some sentiment analysis tasks.
translated by 谷歌翻译
几乎没有射击的内在学习(ICL)使预训练的语言模型能够通过为输入的一部分提供少量的培训示例来执行以前的任务,而无需任何基于梯度的培训。 ICL会产生大量的计算,内存和存储成本,因为它每次进行预测时都涉及处理所有培训示例。参数有效的微调(PEFT)(例如,适配器模块,提示调谐,稀疏更新方法等)提供了替代范式,其中训练了一组少量参数以启用模型来执行新任务。在本文中,我们严格地比较了几个ICL和PEFT,并证明后者提供了更好的准确性,并大大降低了计算成本。在此过程中,我们引入了一种称为(IA)$^3 $的新PEFT方法,该方法通过学习的向量来扩展激活,从而获得更强的性能,同时仅引入相对少量的新参数。我们还提出了一个基于称为T-FEW的T0模型的简单食谱,可以将其应用于新任务,而无需特定于任务的调整或修改。我们通过将T-FEW应用于木筏基准,首次实现超人性能,并以6%的绝对性能优于最先进的方法来验证T-FEW对完全看不见的任务的有效性。我们实验中使用的所有代码均可公开使用。
translated by 谷歌翻译
已显示迅速学习可以在大多数文本分类任务中实现近调调节性能,但很少有培训示例。对于样品稀缺的NLP任务是有利的。在本文中,我们试图将其应用于实际情况,即恢复信息提取,并增强现有方法,以使其更适用于简历信息提取任务。特别是,我们根据简历的文本特征创建了多组手动模板和语言器。此外,我们比较了蒙版语言模型(MLM)预培训语言模型(PLM)和SEQ2SEQ PLM在此任务上的性能。此外,我们改进了口头设计的设计方法,用于知识渊博的及时调整,以便为其他基于应用程序的NLP任务的迅速模板和语言设计的设计提供了示例。在这种情况下,我们提出了手动知识渊博的语言器(MKV)的概念。构造与应用程序方案相对应的知识渊博的口头表的规则。实验表明,基于我们的规则设计的模板和言语器比现有的手动模板更有效,更强大,并自动生成及时方法。已经确定,当前可用的自动提示方法无法与手动设计的及时模板竞争一些现实的任务方案。最终混淆矩阵的结果表明,我们提出的MKV显着解决了样本不平衡问题。
translated by 谷歌翻译
GPT-3等大型语言模型是优秀的几次学习者,允许他们通过自然文本提示来控制。最近的研究报告称,基于及时的直接分类消除了对微调的需求,但缺乏数据和推理可扩展性。本文提出了一种新的数据增强技术,利用大规模语言模型来生成来自真实样本的混合的现实文本样本。我们还建议利用语言模型预测的软标签,从大规模语言模型中有效地蒸馏知识并同时创建文本扰动。我们对各种分类任务进行数据增强实验,并显示我们的方法非常优于现有的文本增强方法。消融研究和定性分析为我们的方法提供了更多的见解。
translated by 谷歌翻译
文本匹配是信息检索和自然语言处理的基本技术。文本匹配任务共享确定两个给定文本之间关系的相同范例。这些关系因任务而异,例如〜在文档检索中相关性,释义识别中的语义一致性和所回答的可回答判断。但是,文本匹配的基本信号保留在有限范围中,即〜精确匹配,语义匹配和推理匹配。理想情况下,良好的文本匹配模型可以学会捕获和汇总这些信号,以实现不同的匹配任务以实现竞争性能,而最近的最新文本匹配模型,例如〜预训练的语言模型(PLM)很难概括。这是因为在特定于任务的数据集上的端到端监督学习使模型过分强调了数据示例偏置和特定于任务的信号,而不是基本的匹配信号。为了克服这个问题,我们采用了专业化的将军培训策略,并将其称为比赛推出。在专业阶段,对不同匹配任务的描述映射到一些提示令牌。在概括阶段,匹配模型通过接受各种匹配任务的培训来探索基本匹配信号。高不同的匹配任务避免了模型拟合特定任务的数据偏差,因此该模型可以专注于学习基本匹配信号。同时,在第一步中获得的提示令牌有助于模型区分不同的特定任务匹配信号。公共数据集上的实验结果表明,匹配点可以提高PLM在文本匹配中的多任务概括能力,并产生更好的内域多任务,外域多任务和新任务适应性性能由以前的微调范式训练的特定于任务模型。
translated by 谷歌翻译
我们提出了Patron,这是一种新方法,它使用基于及时的不确定性估计,用于在冷启动场景下进行预训练的语言模型进行微调的数据选择,即,没有初始标记的数据可用。在顾客中,我们设计(1)一种基于迅速的不确定性传播方法来估计数据点的重要性和(2)分区 - 然后 - 剥离(PTR)策略,以促进对注释的样品多样性。六个文本分类数据集的实验表明,赞助人的表现优于最强的冷启动数据选择基准,高达6.9%。此外,仅具有128个标签,顾客分别基于香草微调和及时的学习,获得了91.0%和92.1%的全面监督性能。我们的赞助人实施可在\ url {https://github.com/yueyu1030/patron}上获得。
translated by 谷歌翻译
及时调整是将预训练模型调整到下游任务的极其有效的工具。但是,基于标准及时的方法主要考虑下游任务的足够数据的情况。目前尚不清楚是否可以将优势传输到几杆式制度,在每个下游任务中只有有限的数据。尽管有些作品证明了在几次弹奏设置下及时调整的潜力,但通过搜索离散提示或使用有限数据调整软提示的主流方法仍然非常具有挑战性。通过广泛的实证研究,我们发现迅速调整和完全微调之间的学习差距仍然存在差距。为了弥合差距,我们提出了一个新的及时调整框架,称为软模板调整(STT)。 STT结合了手册和自动提示,并将下游分类任务视为掩盖语言建模任务。对不同设置的全面评估表明,STT可以在不引入其他参数的情况下缩小微调和基于及时的方法之间的差距。值得注意的是,它甚至可以胜过情感分类任务的时间和资源消耗的微调方法。
translated by 谷歌翻译
立场检测旨在确定文本的作者是否赞成,反对或中立。这项任务的主要挑战是两个方面的:由于不同目标以及缺乏目标的上下文信息而产生的几乎没有学习。现有作品主要通过设计基于注意力的模型或引入嘈杂的外部知识来解决第二期,而第一个问题仍未探索。在本文中,受到预训练的语言模型(PLM)的潜在能力(PLM)的启发,我们建议介绍基于立场检测的及时基于迅速的微调。 PLM可以为目标提供基本的上下文信息,并通过提示启用几次学习。考虑到目标在立场检测任务中的关键作用,我们设计了目标感知的提示并提出了一种新颖的语言。我们的语言器不会将每个标签映射到具体单词,而是将每个标签映射到矢量,并选择最能捕获姿势与目标之间相关性的标签。此外,为了减轻通过单人工提示来处理不同目标的可能缺陷,我们建议将信息从多个提示中学到的信息提炼。实验结果表明,我们提出的模型在全数据和少数场景中的表现出色。
translated by 谷歌翻译
One of the most impressive results of recent NLP history is the ability of pre-trained language models to solve new tasks in a zero-shot setting. To achieve this, NLP tasks are framed as natural language prompts, generating a response indicating the predicted output. Nonetheless, the performance in such settings often lags far behind its supervised counterpart, suggesting a large space for potential improvement. In this paper, we explore methods to utilize unlabeled data to improve zero-shot performance. Specifically, we take advantage of the fact that multiple prompts can be used to specify a single task, and propose to regularize prompt consistency, encouraging consistent predictions over this diverse set of prompts. Our method makes it possible to fine-tune the model either with extra unlabeled training data, or directly on test input at inference time in an unsupervised manner. In experiments, our approach outperforms the state-of-the-art zero-shot learner, T0 (Sanh et al., 2022), on 9 out of 11 datasets across 4 NLP tasks by up to 10.6 absolute points in terms of accuracy. The gains are often attained with a small number of unlabeled examples.
translated by 谷歌翻译
在本文中,我们描述了我们参与Case-2022的子任务1,即与休闲新闻语料库的事件因果关系识别。我们通过在少数带注释的示例(即几次配置)上利用一组简单但互补的技术来解决因果关系识别(CRI)任务。我们遵循一种基于迅速的预测方法,用于微调LMS,其中CRI任务被视为掩盖语言建模问题(MLM)。这种方法允许LMS在MLM问题上进行本地预先训练,可以直接生成对CRI特异性提示的文本响应。我们将此方法的性能与在整个数据集中训练的集合技术进行比较。我们表现​​最佳的提交仅接受了每班256个实例,整个数据集的一小部分培训,但能够获得第二好的精度(0.82),第三好的精度(0.82)和F1得分。 (0.85)非常接近获胜者团队(0.86)的报道。
translated by 谷歌翻译
Through in-context learning (ICL), large-scale language models are effective few-shot learners without additional model fine-tuning. However, the ICL performance does not scale well with the number of available training samples as it is limited by the inherent input length constraint of the underlying language model. Meanwhile, many studies have revealed that language models are also powerful feature extractors, allowing them to be utilized in a black-box manner and enabling the linear probing paradigm, where lightweight discriminators are trained on top of the pre-extracted input representations. This paper proposes prompt-augmented linear probing (PALP), a hybrid of linear probing and ICL, which leverages the best of both worlds. PALP inherits the scalability of linear probing and the capability of enforcing language models to derive more meaningful representations via tailoring input into a more conceivable form. Throughout in-depth investigations on various datasets, we verified that PALP significantly enhances the input representations closing the gap between ICL in the data-hungry scenario and fine-tuning in the data-abundant scenario with little training overhead, potentially making PALP a strong alternative in a black-box scenario.
translated by 谷歌翻译