We introduce TeSS (Text Similarity Comparison using Sentence Encoder), a framework for zero-shot classification where the assigned label is determined by the embedding similarity between the input text and each candidate label prompt. We leverage representations from sentence encoders optimized to locate semantically similar samples closer to each other in embedding space during pre-training. The label prompt embeddings serve as prototypes of their corresponding class clusters. Furthermore, to compensate for the potentially poorly descriptive labels in their original format, we retrieve semantically similar sentences from external corpora and additionally use them with the original label prompt (TeSS-R). TeSS outperforms strong baselines on various closed-set and open-set classification datasets under zero-shot setting, with further gains when combined with label prompt diversification through retrieval. These results are robustly attained to verbalizer variations, an ancillary benefit of using a bi-encoder. Altogether, our method serves as a reliable baseline for zero-shot classification and a simple interface to assess the quality of sentence encoders.
translated by 谷歌翻译
How can we extend a pre-trained model to many language understanding tasks, without labeled or additional unlabeled data? Pre-trained language models (PLMs) have been effective for a wide range of NLP tasks. However, existing approaches either require fine-tuning on downstream labeled datasets or manually constructing proper prompts. In this paper, we propose nonparametric prompting PLM (NPPrompt) for fully zero-shot language understanding. Unlike previous methods, NPPrompt uses only pre-trained language models and does not require any labeled data or additional raw corpus for further fine-tuning, nor does it rely on humans to construct a comprehensive set of prompt label words. We evaluate NPPrompt against previous major few-shot and zero-shot learning methods on diverse NLP tasks: including text classification, text entailment, similar text retrieval, and paraphrasing. Experimental results demonstrate that our NPPrompt outperforms the previous best fully zero-shot method by big margins, with absolute gains of 12.8% in accuracy on text classification and 18.9% on the GLUE benchmark.
translated by 谷歌翻译
Existing language models (LMs) predict tokens with a softmax over a finite vocabulary, which can make it difficult to predict rare tokens or phrases. We introduce NPM, the first nonparametric masked language model that replaces this softmax with a nonparametric distribution over every phrase in a reference corpus. We show that NPM can be efficiently trained with a contrastive objective and an in-batch approximation to full corpus retrieval. Zero-shot evaluation on 9 closed-set tasks and 7 open-set tasks demonstrates that NPM outperforms significantly larger parametric models, either with or without a retrieve-and-generate approach. It is particularly better on dealing with rare patterns (word senses or facts), and predicting rare or nearly unseen words (e.g., non-Latin script). We release the model and code at github.com/facebookresearch/NPM.
translated by 谷歌翻译
Pre-trained language models (PLMs) have exhibited remarkable few-shot learning capabilities when provided a few examples in a natural language prompt as demonstrations of test instances, i.e., in-context learning. However, the performance of in-context learning is susceptible to the choice of prompt format, training examples and the ordering of the training examples. In this paper, we propose a novel nearest-neighbor calibration framework for in-context learning to ease this issue. It is inspired by a phenomenon that the in-context learning paradigm produces incorrect labels when inferring training instances, which provides a useful supervised signal to calibrate predictions. Thus, our method directly augments the predictions with a $k$-nearest-neighbor ($k$NN) classifier over a datastore of cached few-shot instance representations obtained by PLMs and their corresponding labels. Then adaptive neighbor selection and feature regularization modules are introduced to make full use of a few support instances to reduce the $k$NN retrieval noise. Experiments on various few-shot text classification tasks demonstrate that our method significantly improves in-context learning, while even achieving comparable performance with state-of-the-art tuning-based approaches in some sentiment analysis tasks.
translated by 谷歌翻译
Multilingual Pretrained Language Models (MPLMs) have shown their strong multilinguality in recent empirical cross-lingual transfer studies. In this paper, we propose the Prompts Augmented by Retrieval Crosslingually (PARC) pipeline to improve the zero-shot performance on low-resource languages (LRLs) by augmenting the context with semantically similar sentences retrieved from a high-resource language (HRL) as prompts. PARC improves the zero-shot performance on three downstream tasks (binary sentiment classification, topic categorization and natural language inference) with multilingual parallel test sets across 10 LRLs covering 6 language families in both unlabeled settings (+5.1%) and labeled settings (+16.3%). PARC-labeled also outperforms the finetuning baseline by 3.7%. We find a significant positive correlation between cross-lingual transfer performance on one side, and the similarity between the high- and low-resource languages as well as the amount of low-resource pretraining data on the other side. A robustness analysis suggests that PARC has the potential to achieve even stronger performance with more powerful MPLMs.
translated by 谷歌翻译
Contrastive learning has become a new paradigm for unsupervised sentence embeddings. Previous studies focus on instance-wise contrastive learning, attempting to construct positive pairs with textual data augmentation. In this paper, we propose a novel Contrastive learning method with Prompt-derived Virtual semantic Prototypes (ConPVP). Specifically, with the help of prompts, we construct virtual semantic prototypes to each instance, and derive negative prototypes by using the negative form of the prompts. Using a prototypical contrastive loss, we enforce the anchor sentence embedding to be close to its corresponding semantic prototypes, and far apart from the negative prototypes as well as the prototypes of other sentences. Extensive experimental results on semantic textual similarity, transfer, and clustering tasks demonstrate the effectiveness of our proposed model compared to strong baselines. Code is available at https://github.com/lemon0830/promptCSE.
translated by 谷歌翻译
迅速的学习方法通​​过诱导更好的几次表现,在他们仍然遵循基于参数的学习范式的同时,引起了自然语言处理的波动。学习中的遗忘和死记硬背的记忆问题可能会遇到不稳定的概括问题。具体而言,香草及时的学习可能难以利用死记硬背的非典型实例,在完全监督的培训或过度贴身模式的情况下使用低射击数据。为了减轻此类局限性,我们以将知识从记忆中解耦的动机发展为有助于模型在概括和记忆之间取得平衡。与香草及时学习相反,重新启动构造了培训实例中的开放式知识店,并在输入,培训和推理过程中实现检索机制,从而使该模型能够从培训语料库中检索相关环境作为能力为提示增强。广泛的实验表明,Retroppt可以在几次射击和零拍设置中获得更好的性能。此外,我们进一步说明,我们提出的撤退可以通过新数据集获得更好的概括能力。对记忆的详细分析确实显示逆转可以减少语言模型对记忆的依赖;因此,改善下游任务的概括。
translated by 谷歌翻译
The recent GPT-3 model (Brown et al., 2020) achieves remarkable few-shot performance solely by leveraging a natural-language prompt and a few task demonstrations as input context. Inspired by their findings, we study few-shot learning in a more practical scenario, where we use smaller language models for which fine-tuning is computationally efficient. We present LM-BFF-better few-shot fine-tuning of language models 1 -a suite of simple and complementary techniques for finetuning language models on a small number of annotated examples. Our approach includes (1) prompt-based fine-tuning together with a novel pipeline for automating prompt generation; and (2) a refined strategy for dynamically and selectively incorporating demonstrations into each context. Finally, we present a systematic evaluation for analyzing few-shot performance on a range of NLP tasks, including classification and regression. Our experiments demonstrate that our methods combine to dramatically outperform standard fine-tuning procedures in this low resource setting, achieving up to 30% absolute improvement, and 11% on average across all tasks. Our approach makes minimal assumptions on task resources and domain expertise, and hence constitutes a strong task-agnostic method for few-shot learning. 2 * The first two authors contributed equally. 1 Alternatively, language models' best friends forever. 2 Our implementation is publicly available at https:// github.com/princeton-nlp/LM-BFF.
translated by 谷歌翻译
最近的几种方法,例如参数有效的微调(PEFT)和模式开发训练(PET),在标签筛选设置中取得了令人印象深刻的结果。但是,它们很难使用,因为它们会受到手动制作的提示的高度可变性,并且通常需要十亿参数语言模型才能达到高精度。为了解决这些缺点,我们提出了SETFIT(句子变压器微调),这是一个有效且迅速的框架,用于对句子变形金刚(ST)进行几次微调。 SetFit首先以对比的暹罗方式对少数文本对进行微调验证的st。然后将所得模型用于生成丰富的文本嵌入,这些嵌入方式用于训练分类头。这个简单的框架不需要任何提示或口头化,并且比现有技术少的参数较少,因此可以实现高精度。我们的实验表明,SetFit通过PEFT和PET技术获得了可比的结果,同时训练的速度更快。我们还表明,SETFIT可以通过简单地切换ST主体来应用于多语言设置。我们的代码可从https://github.com/huggingface/setFit以及我们的数据集获得,网址为https://huggingface.co/setfit。
translated by 谷歌翻译
大型预训练的语言模型(PLM)的最新进展导致了自然语言理解(NLU)任务的令人印象深刻的增长,并具有特定于任务的微调。但是,直接调整PLM在很大程度上依赖大量的标记实例,这些实例通常很难获得。迅速对PLM的调整已被证明对各种少数次任务很有价值。现有的作品研究基于迅速的NLU任务的基于及时的调整,主要集中于用语言器来得出正确的标签单词或生成及时的模板,以从PLM中启发语义。此外,还对常规数据增强方法进行了验证,可用于少量射击任务。但是,目前几乎没有针对基于及时的调整范式设计的数据增强方法。因此,我们研究了迅速的少数射击学习者的新数据增强问题。由于标签语义对于迅速的调整至关重要,因此我们提出了一种新颖的标签引导数据增强方法促进DA,该方法利用了丰富的标签语义信息以进行数据增强。很少的文本分类任务的广泛实验结果表明,我们提出的框架通过有效利用标签语义和数据扩展来实现自然语言理解来实现卓越的性能。
translated by 谷歌翻译
大型语言模型在各种任务上显示出令人印象深刻的几次结果。但是,当知识是此类结果的关键时,就像问题回答和事实检查之类的任务一样,似乎需要存储知识的大量参数计数。众所周知,检索增强模型可以在不需要多个参数的情况下在知识密集的任务上表现出色,但是目前尚不清楚它们是否在几个弹药设置中工作。在这项工作中,我们介绍了地图集,这是一个经过精心设计和预先训练的增强语言模型,能够通过很少的培训示例学习知识密集型任务。我们对包括MMLU,苏格兰短裙和归类等各种任务进行评估,并研究文档索引内容的影响,表明它可以很容易地进行更新。值得注意的是,在自然问题上仅使用64个示例在自然问题上达到超过42 \%的准确性,尽管参数少了50倍,但比540B参数模型的表现优于540b参数模型。
translated by 谷歌翻译
上下文学习是最近的自然语言理解的范例,其中大型预先接受的语言模型(LM)观察测试实例和一些训练示例作为其输入,并直接对输出进行解码,而不会对其参数进行任何更新。但是,表现已被证明强烈依赖于所选培训示例(称为提示)。在这项工作中,我们提出了一种有效的方法,用于使用注释的数据和LM检索内心学习的提示。给定输入输出对,我们估计给出输入和候选训练示例的输出的概率作为提示,以及基于这种概率的正面或负标记训练示例。然后,我们从该数据中培训一个有效的密集鼠尾,用于检索训练示例作为测试时间的提示。我们在三个序列到序列任务中评估我们的方法,其中语言话语映射到意义表示,并发现它基本上优于前面的工作和电路板的多个基线。
translated by 谷歌翻译
基础模型由于在广泛的下游应用中的有效性而受到了很多关注。尽管在体系结构方面存在很大的融合,但大多数审慎的模型通常仍用于特定任务或模式。在这项工作中,我们建议将语言模型用作各种基础模型的通用接口。一系列预处理的编码者感知到了多种方式(例如视觉和语言),并与扮演通用任务层角色的语言模型对接。我们提出了一个半伴侣的语言建模目标,以共同确定界面和模块化编码器。我们从因果关系和非因果建模中涵盖了优势和能力,从而结合了两个世界的最佳状态。具体而言,所提出的方法不仅从因果语言建模中继承了内在学习和开放式生成的能力,而且由于双向编码器而有利于填补。更重要的是,我们的方法无缝地解锁了上述功能的组合,例如,通过填充编码器启用了文本学习或指导。各种仅语言和视觉语言基准的实验结果表明,我们的模型表现优于或与鉴定,零弹性概括和几乎没有的学习的专业模型竞争。
translated by 谷歌翻译
We introduce INSTRUCTOR, a new method for computing text embeddings given task instructions: every text input is embedded together with instructions explaining the use case (e.g., task and domain descriptions). Unlike encoders from prior work that are more specialized, INSTRUCTOR is a single embedder that can generate text embeddings tailored to different downstream tasks and domains, without any further training. We first annotate instructions for 330 diverse tasks and train INSTRUCTOR on this multitask mixture with a contrastive loss. We evaluate INSTRUCTOR on 70 embedding evaluation tasks (66 of which are unseen during training), ranging from classification and information retrieval to semantic textual similarity and text generation evaluation. INSTRUCTOR, while having an order of magnitude fewer parameters than the previous best model, achieves state-of-the-art performance, with an average improvement of 3.4% compared to the previous best results on the 70 diverse datasets. Our analysis suggests that INSTRUCTOR is robust to changes in instructions, and that instruction finetuning mitigates the challenge of training a single model on diverse datasets.
translated by 谷歌翻译
预先训练的蒙版语言模型通过将下游任务作为文本填充来成功执行几次学习。但是,作为全镜头环境中的强大替代方案,诸如Electra之类的判别预训练模型不适合范式。在这项工作中,我们调整了基于及时的几次学习来进行电信,并表明它在广泛的任务中优于蒙面的语言模型。Electra是预先训练的,以区分令牌是产生还是原始。我们自然地将其扩展到基于迅速的几次学习,通过培训来评分目标选项的原创性,而无需引入新参数。我们的方法很容易适应涉及多token预测的任务,而无需额外的计算开销。分析表明,Electra学习分布与下游任务更好。
translated by 谷歌翻译
Prompt tuning is a new few-shot transfer learning technique that only tunes the learnable prompt for pre-trained vision and language models such as CLIP. However, existing prompt tuning methods tend to learn spurious or entangled representations, which leads to poor generalization to unseen concepts. Towards non-spurious and efficient prompt learning from limited examples, this paper presents a novel \underline{\textbf{C}}ounterfactual \underline{\textbf{P}}rompt \underline{\textbf{L}}earning (CPL) method for vision and language models, which simultaneously employs counterfactual generation and contrastive learning in a joint optimization framework. Particularly, CPL constructs counterfactual by identifying minimal non-spurious feature change between semantically-similar positive and negative samples that causes concept change, and learns more generalizable prompt representation from both factual and counterfactual examples via contrastive learning. Extensive experiments demonstrate that CPL can obtain superior few-shot performance on different vision and language tasks than previous prompt tuning methods on CLIP. On image classification, we achieve 3.55\% average relative improvement on unseen classes across seven datasets; on image-text retrieval and visual question answering, we gain up to 4.09\% and 25.08\% relative improvements across three few-shot scenarios on unseen test sets respectively.
translated by 谷歌翻译
Existing techniques for training language models can be misaligned with the truth: if we train models with imitation learning, they may reproduce errors that humans make; if we train them to generate text that humans rate highly, they may output errors that human evaluators can't detect. We propose circumventing this issue by directly finding latent knowledge inside the internal activations of a language model in a purely unsupervised way. Specifically, we introduce a method for accurately answering yes-no questions given only unlabeled model activations. It works by finding a direction in activation space that satisfies logical consistency properties, such as that a statement and its negation have opposite truth values. We show that despite using no supervision and no model outputs, our method can recover diverse knowledge represented in large language models: across 6 models and 10 question-answering datasets, it outperforms zero-shot accuracy by 4\% on average. We also find that it cuts prompt sensitivity in half and continues to maintain high accuracy even when models are prompted to generate incorrect answers. Our results provide an initial step toward discovering what language models know, distinct from what they say, even when we don't have access to explicit ground truth labels.
translated by 谷歌翻译
及时调整是将预训练模型调整到下游任务的极其有效的工具。但是,基于标准及时的方法主要考虑下游任务的足够数据的情况。目前尚不清楚是否可以将优势传输到几杆式制度,在每个下游任务中只有有限的数据。尽管有些作品证明了在几次弹奏设置下及时调整的潜力,但通过搜索离散提示或使用有限数据调整软提示的主流方法仍然非常具有挑战性。通过广泛的实证研究,我们发现迅速调整和完全微调之间的学习差距仍然存在差距。为了弥合差距,我们提出了一个新的及时调整框架,称为软模板调整(STT)。 STT结合了手册和自动提示,并将下游分类任务视为掩盖语言建模任务。对不同设置的全面评估表明,STT可以在不引入其他参数的情况下缩小微调和基于及时的方法之间的差距。值得注意的是,它甚至可以胜过情感分类任务的时间和资源消耗的微调方法。
translated by 谷歌翻译
在维持预审预定序列模型的灵活性的同时,是否有利于常识性推理,这仍然是一个悬而未决的问题。为了调查这个问题,我们开发了生成的知识提示,该提示包括从语言模型中生成知识,然后在回答问题时提供知识作为附加输入。我们的方法不需要特定于任务的监督知识集成或访问结构化的知识库,但它可以提高四个常识性推理任务上的大规模,最先进的模型的性能,从而实现最先进-ART结果取决于数值常识(NumerSense),通用常识性(Commonsenseqa 2.0)和科学常识(QASC)基准。产生的知识促使大型语言模型是灵活的外部知识来源,以改善常识性推理。我们的代码可从https://github.com/liujch1998/gkp获得
translated by 谷歌翻译
In this work, we explore "prompt tuning," a simple yet effective mechanism for learning "soft prompts" to condition frozen language models to perform specific downstream tasks. Unlike the discrete text prompts used by GPT-3, soft prompts are learned through backpropagation and can be tuned to incorporate signals from any number of labeled examples. Our end-to-end learned approach outperforms GPT-3's few-shot learning by a large margin. More remarkably, through ablations on model size using T5, we show that prompt tuning becomes more competitive with scale: as models exceed billions of parameters, our method "closes the gap" and matches the strong performance of model tuning (where all model weights are tuned). This finding is especially relevant because large models are costly to share and serve and the ability to reuse one frozen model for multiple downstream tasks can ease this burden. Our method can be seen as a simplification of the recently proposed "prefix tuning" of Li and Liang (2021) and we provide a comparison to this and other similar approaches. Finally, we show that conditioning a frozen model with soft prompts confers benefits in robustness to domain transfer and enables efficient "prompt ensembling." * Work done as a Google AI Resident.
translated by 谷歌翻译