如今,基于变压器的模型逐渐成为人工智能先驱的默认选择。即使在几个镜头的情况下,这些模型也会显示出优势。在本文中,我们重新审视了经典方法,并提出了一种新的几次替代方法。具体而言,我们研究了几个镜头的单级问题,该问题实际上以已知样本为参考来检测未知实例是否属于同一类。可以从序列匹配的角度研究此问题。结果表明,使用元学习,经典序列匹配方法,即比较聚集,显着优于变压器。经典方法所需的培训成本要少得多。此外,我们在简单的微调和元学习下进行两种序列匹配方法之间进行了经验比较。元学习导致变压器模型的特征具有高相关尺寸。原因与变压器模型的层和头数密切相关。实验代码和数据可从https://github.com/hmt2014/fewone获得
translated by 谷歌翻译
Metric-based meta-learning is one of the de facto standards in few-shot learning. It composes of representation learning and metrics calculation designs. Previous works construct class representations in different ways, varying from mean output embedding to covariance and distributions. However, using embeddings in space lacks expressivity and cannot capture class information robustly, while statistical complex modeling poses difficulty to metric designs. In this work, we use tensor fields (``areas'') to model classes from the geometrical perspective for few-shot learning. We present a simple and effective method, dubbed hypersphere prototypes (HyperProto), where class information is represented by hyperspheres with dynamic sizes with two sets of learnable parameters: the hypersphere's center and the radius. Extending from points to areas, hyperspheres are much more expressive than embeddings. Moreover, it is more convenient to perform metric-based classification with hypersphere prototypes than statistical modeling, as we only need to calculate the distance from a data point to the surface of the hypersphere. Following this idea, we also develop two variants of prototypes under other measurements. Extensive experiments and analysis on few-shot learning tasks across NLP and CV and comparison with 20+ competitive baselines demonstrate the effectiveness of our approach.
translated by 谷歌翻译
几乎没有弹出的文本分类旨在在几个弹奏方案下对文本进行分类。以前的大多数方法都采用基于优化的元学习来获得任务分布。但是,由于少数样本和复杂模型之间的匹配以及有用的任务功能之间的区别,这些方法遭受了过度拟合问题的影响。为了解决这个问题,我们通过梯度相似性(AMGS)方法提出了一种新颖的自适应元学习器,以提高模型的泛化能力。具体而言,拟议的AMG基于两个方面缓解了过度拟合:(i)通过内部循环中的自我监督的辅助任务来获取样品的潜在语义表示并改善模型的概括,(ii)利用适应性元学习者通过适应性元学习者通过梯度通过相似性,可以在外环中基底学习者获得的梯度上增加约束。此外,我们对正则化对整个框架的影响进行系统分析。对几个基准测试的实验结果表明,与最先进的优化元学习方法相比,提出的AMG始终提高了很少的文本分类性能。
translated by 谷歌翻译
Learning with limited data is a key challenge for visual recognition. Many few-shot learning methods address this challenge by learning an instance embedding function from seen classes and apply the function to instances from unseen classes with limited labels. This style of transfer learning is task-agnostic: the embedding function is not learned optimally discriminative with respect to the unseen classes, where discerning among them leads to the target task. In this paper, we propose a novel approach to adapt the instance embeddings to the target classification task with a set-to-set function, yielding embeddings that are task-specific and are discriminative. We empirically investigated various instantiations of such set-to-set functions and observed the Transformer is most effective -as it naturally satisfies key properties of our desired model. We denote this model as FEAT (few-shot embedding adaptation w/ Transformer) and validate it on both the standard few-shot classification benchmark and four extended few-shot learning settings with essential use cases, i.e., cross-domain, transductive, generalized few-shot learning, and low-shot learning. It archived consistent improvements over baseline models as well as previous methods, and established the new stateof-the-art results on two benchmarks.
translated by 谷歌翻译
Few-shot relation extraction (FSRE) aims at recognizing unseen relations by learning with merely a handful of annotated instances. To generalize to new relations more effectively, this paper proposes a novel pipeline for the FSRE task based on queRy-information guided Attention and adaptive Prototype fuSion, namely RAPS. Specifically, RAPS first derives the relation prototype by the query-information guided attention module, which exploits rich interactive information between the support instances and the query instances, in order to obtain more accurate initial prototype representations. Then RAPS elaborately combines the derived initial prototype with the relation information by the adaptive prototype fusion mechanism to get the integrated prototype for both train and prediction. Experiments on the benchmark dataset FewRel 1.0 show a significant improvement of our method against state-of-the-art methods.
translated by 谷歌翻译
在新课程训练时,几乎没有射击学习(FSL)方法通常假设具有准确标记的样品的清洁支持集。这个假设通常可能是不现实的:支持集,无论多么小,仍然可能包括标签错误的样本。因此,对标签噪声的鲁棒性对于FSL方法是实用的,但是这个问题令人惊讶地在很大程度上没有探索。为了解决FSL设置中标签错误的样品,我们做出了一些技术贡献。 (1)我们提供了简单而有效的特征聚合方法,改善了流行的FSL技术Protonet使用的原型。 (2)我们描述了一种嘈杂的噪声学习的新型变压器模型(TRANFS)。 TRANFS利用变压器的注意机制称重标记为错误的样品。 (3)最后,我们对迷你胶原和tieredimagenet的嘈杂版本进行了广泛的测试。我们的结果表明,TRANFS与清洁支持集的领先FSL方法相对应,但到目前为止,在存在标签噪声的情况下,它们的表现优于它们。
translated by 谷歌翻译
提示方法被认为是几次自然语言处理的关键进展之一。最近对基于离散令牌的``硬提示''转移到连续``软提示''的最新研究,这些提示将可学习的向量用作伪提示代币并实现更好的性能。尽管显示出有希望的前景,但观察到这些软宣传的方法在很大程度上依赖良好的初始化来生效。不幸的是,获得软提示的完美初始化需要了解内在语言模型的工作和精心设计,这绝非易事,必须从头开始重新启动每个新任务。为了解决此问题,我们提出了一种称为Metaprompting的广义软提示方法,该方法采用了良好认可的模型 - 静态元学习算法,以自动找到更好的及时初始化,从而快速适应新的促进任务。问题并在四个不同的数据集上带来了显着改善(1次设置的准确性提高了6分),从而实现了新的最新性能。
translated by 谷歌翻译
很少有视觉识别是指从一些标记实例中识别新颖的视觉概念。通过将查询表示形式与类表征进行比较以预测查询实例的类别,许多少数射击的视觉识别方法采用了基于公制的元学习范式。但是,当前基于度量的方法通常平等地对待所有实例,因此通常会获得有偏见的类表示,考虑到并非所有实例在总结了类级表示的实例级表示时都同样重要。例如,某些实例可能包含无代表性的信息,例如过多的背景和无关概念的信息,这使结果偏差。为了解决上述问题,我们提出了一个新型的基于公制的元学习框架,称为实例自适应类别表示网络(ICRL-net),以进行几次视觉识别。具体而言,我们开发了一个自适应实例重新平衡网络,具有在生成班级表示,通过学习和分配自适应权重的不同实例中的自适应权重时,根据其在相应类的支持集中的相对意义来解决偏见的表示问题。此外,我们设计了改进的双线性实例表示,并结合了两个新型的结构损失,即,阶层内实例聚类损失和阶层间表示区分损失,以进一步调节实例重估过程并完善类表示。我们对四个通常采用的几个基准测试:Miniimagenet,Tieredimagenet,Cifar-FS和FC100数据集进行了广泛的实验。与最先进的方法相比,实验结果证明了我们的ICRL-NET的优势。
translated by 谷歌翻译
We introduce a new language representation model called BERT, which stands for Bidirectional Encoder Representations from Transformers. Unlike recent language representation models (Peters et al., 2018a;Radford et al., 2018), BERT is designed to pretrain deep bidirectional representations from unlabeled text by jointly conditioning on both left and right context in all layers. As a result, the pre-trained BERT model can be finetuned with just one additional output layer to create state-of-the-art models for a wide range of tasks, such as question answering and language inference, without substantial taskspecific architecture modifications.BERT is conceptually simple and empirically powerful. It obtains new state-of-the-art results on eleven natural language processing tasks, including pushing the GLUE score to 80.5% (7.7% point absolute improvement), MultiNLI accuracy to 86.7% (4.6% absolute improvement), SQuAD v1.1 question answering Test F1 to 93.2 (1.5 point absolute improvement) and SQuAD v2.0 Test F1 to 83.1 (5.1 point absolute improvement).
translated by 谷歌翻译
很少有图像分类是一个具有挑战性的问题,旨在仅基于少量培训图像来达到人类的识别水平。少数图像分类的一种主要解决方案是深度度量学习。这些方法是,通过将看不见的样本根据距离的距离进行分类,可在强大的深神经网络中学到的嵌入空间中看到的样品,可以避免以少数图像分类的少数训练图像过度拟合,并实现了最新的图像表现。在本文中,我们提供了对深度度量学习方法的最新审查,以进行2018年至2022年的少量图像分类,并根据度量学习的三个阶段将它们分为三组,即学习功能嵌入,学习课堂表示和学习距离措施。通过这种分类法,我们确定了他们面临的不同方法和问题的新颖性。我们通过讨论当前的挑战和未来趋势进行了少量图像分类的讨论。
translated by 谷歌翻译
图形神经网络(GNNS)已被用于解决几次拍摄学习(FSL)问题,并在转换设置下显示出很大的潜力。但是在归纳设置下,现有的基于GNN的方法竞争较差。这是因为它们使用实例GNN作为标签传播/分类模块,其与特征嵌入网络共同学习。这种设计是有问题的,因为分类器需要在嵌入而不快速地适应新任务。为了克服这个问题,本文提出了一种新的混合GNN(HGNN)模型,包括两个GNN,实例GNN和原型GNN。它们代替标签传播,它们用作嵌入适应模块的功能,以便快速适应嵌入到新任务的元学员的功能。重要的是,他们旨在处理FSL中的基本但经常被忽视的挑战,即只有每班少量镜头,任何几次拍摄的分类器都会对差异或可能导致阶层的严重采样镜头敏感分配重叠。 %我们的两个GNNS旨在分别解决这两种类型的差别少量射击,并且在混合GNN模型中利用它们的互补性。广泛的实验表明,我们的HGNN在三个FSL基准上获得了新的最先进。
translated by 谷歌翻译
面向目标的对话系统的核心组件之一是意图检测的任务。由于可用的附带话语的稀缺性,目的检测时的几次射门学习是挑战。尽管最近的作品已经提出了使用基于度量的基于优化的方法,但任务仍然在大标签空间中挑战,射击数量小得多。由于在测试阶段,由于两种新颖和看到的课程存在,概括的少量学习更加困难。在这项工作中,我们提出了一种基于自然语言推理的简单有效的方法,不仅解决了几次射击意图检测问题,而且在零射击和广义少数射击学习问题中证明是有用的。我们对许多自然语言理解(NLU)和口语理解(SLU)数据集的大量实验表明了我们的方法的有效性。此外,我们突出了我们基于NLI的方法的设置,通过巨大的利润率优于基线。
translated by 谷歌翻译
The recent GPT-3 model (Brown et al., 2020) achieves remarkable few-shot performance solely by leveraging a natural-language prompt and a few task demonstrations as input context. Inspired by their findings, we study few-shot learning in a more practical scenario, where we use smaller language models for which fine-tuning is computationally efficient. We present LM-BFF-better few-shot fine-tuning of language models 1 -a suite of simple and complementary techniques for finetuning language models on a small number of annotated examples. Our approach includes (1) prompt-based fine-tuning together with a novel pipeline for automating prompt generation; and (2) a refined strategy for dynamically and selectively incorporating demonstrations into each context. Finally, we present a systematic evaluation for analyzing few-shot performance on a range of NLP tasks, including classification and regression. Our experiments demonstrate that our methods combine to dramatically outperform standard fine-tuning procedures in this low resource setting, achieving up to 30% absolute improvement, and 11% on average across all tasks. Our approach makes minimal assumptions on task resources and domain expertise, and hence constitutes a strong task-agnostic method for few-shot learning. 2 * The first two authors contributed equally. 1 Alternatively, language models' best friends forever. 2 Our implementation is publicly available at https:// github.com/princeton-nlp/LM-BFF.
translated by 谷歌翻译
It has been experimentally demonstrated that humans are able to learn in a manner that allows them to make predictions on categories for which they have not seen any examples (Malaviya et al., 2022). Sucholutsky and Schonlau (2020) have recently presented a machine learning approach that aims to do the same. They utilise synthetically generated data and demonstrate that it is possible to achieve sub-linear scaling and develop models that can learn to recognise N classes from M training samples where M is less than N - aka less-than-one shot learning. Their method was, however, defined for univariate or simple multivariate data (Sucholutsky et al., 2021). We extend it to work on large, high-dimensional and real-world datasets and empirically validate it in this new and challenging setting. We apply this method to learn previously unseen NLP tasks from very few examples (4, 8 or 16). We first generate compact, sophisticated less-than-one shot representations called soft-label prototypes which are fitted on training data, capturing the distribution of different classes across the input domain space. We then use a modified k-Nearest Neighbours classifier to demonstrate that soft-label prototypes can classify data competitively, even outperforming much more computationally complex few-shot learning methods.
translated by 谷歌翻译
很少有射击学习(FSL)旨在使用有限标记的示例生成分类器。许多现有的作品采用了元学习方法,构建了一些可以从几个示例中学习以生成分类器的学习者。通常,几次学习者是通过依次对多个几次射击任务进行采样并优化几杆学习者在为这些任务生成分类器时的性能来构建或进行元训练的。性能是通过结果分类器对这些任务的测试(即查询)示例进行分类的程度来衡量的。在本文中,我们指出了这种方法的两个潜在弱点。首先,采样的查询示例可能无法提供足够的监督来进行元训练少数学习者。其次,元学习的有效性随着射击数量的增加而急剧下降。为了解决这些问题,我们为少数学习者提出了一个新颖的元训练目标,这是为了鼓励少数学习者生成像强大分类器一样执行的分类器。具体而言,我们将每个采样的几个弹药任务与强大的分类器相关联,该分类器接受了充分的标记示例。强大的分类器可以看作是目标分类器,我们希望在几乎没有示例的情况下生成的几个学习者,我们使用强大的分类器来监督少数射击学习者。我们提出了一种构建强分类器的有效方法,使我们提出的目标成为现有基于元学习的FSL方法的易于插入的术语。我们与许多代表性的元学习方法相结合验证了我们的方法,Lastshot。在几个基准数据集中,我们的方法可导致各种任务的显着改进。更重要的是,通过我们的方法,基于元学习的FSL方法可以在不同数量的镜头上胜过基于非Meta学习的方法。
translated by 谷歌翻译
Few-shot learning (FSL) is a central problem in meta-learning, where learners must efficiently learn from few labeled examples. Within FSL, feature pre-training has recently become an increasingly popular strategy to significantly improve generalization performance. However, the contribution of pre-training is often overlooked and understudied, with limited theoretical understanding of its impact on meta-learning performance. Further, pre-training requires a consistent set of global labels shared across training tasks, which may be unavailable in practice. In this work, we address the above issues by first showing the connection between pre-training and meta-learning. We discuss why pre-training yields more robust meta-representation and connect the theoretical analysis to existing works and empirical results. Secondly, we introduce Meta Label Learning (MeLa), a novel meta-learning algorithm that learns task relations by inferring global labels across tasks. This allows us to exploit pre-training for FSL even when global labels are unavailable or ill-defined. Lastly, we introduce an augmented pre-training procedure that further improves the learned meta-representation. Empirically, MeLa outperforms existing methods across a diverse range of benchmarks, in particular under a more challenging setting where the number of training tasks is limited and labels are task-specific. We also provide extensive ablation study to highlight its key properties.
translated by 谷歌翻译
模型不合时宜的元学习(MAML)可以说是当今最流行的元学习算法之一。然而,它在几次分类上的性能远远远远远远远远远远远远远远落在许多致力于该问题的算法。在本文中,我们指出了如何训练MAML以进行几次分类的几个关键方面。首先,我们发现MAML在其内部循环更新中需要大量的梯度步骤,这与其常见的用法相矛盾。其次,我们发现MAML对元测试过程中的类标签分配敏感。具体而言,MAML Meta-Trains $ n$道分类器的初始化。这些$ n $方式,在元测试期间,然后具有“ $ n!$”的“ $ n!$”排列,并与$ n $新颖的课程配对。我们发现这些排列会导致巨大的准确性差异,从而使MAML不稳定。第三,我们研究了几种使MAML置换不变的方法,其中元训练单个向量以初始化分类头中的所有$ n $重量矢量的初始化。在Miniimagenet和Tieredimagenet等基准数据集上,我们命名Unicorn-MAML的方法在不牺牲MAML的简单性的情况下以与许多最近的几杆分类算法相同甚至优于许多近期的几个次数分类算法。
translated by 谷歌翻译
How can we extend a pre-trained model to many language understanding tasks, without labeled or additional unlabeled data? Pre-trained language models (PLMs) have been effective for a wide range of NLP tasks. However, existing approaches either require fine-tuning on downstream labeled datasets or manually constructing proper prompts. In this paper, we propose nonparametric prompting PLM (NPPrompt) for fully zero-shot language understanding. Unlike previous methods, NPPrompt uses only pre-trained language models and does not require any labeled data or additional raw corpus for further fine-tuning, nor does it rely on humans to construct a comprehensive set of prompt label words. We evaluate NPPrompt against previous major few-shot and zero-shot learning methods on diverse NLP tasks: including text classification, text entailment, similar text retrieval, and paraphrasing. Experimental results demonstrate that our NPPrompt outperforms the previous best fully zero-shot method by big margins, with absolute gains of 12.8% in accuracy on text classification and 18.9% on the GLUE benchmark.
translated by 谷歌翻译
GPT-2和BERT展示了在各种自然语言处理任务上使用预训练的语言模型(LMS)的有效性。但是,在应用于资源丰富的任务时,LM微调通常会遭受灾难性的遗忘。在这项工作中,我们引入了一个协同的培训框架(CTNMT),该框架是将预训练的LMS集成到神经机器翻译(NMT)的关键。我们提出的CTNMT包括三种技术:a)渐近蒸馏,以确保NMT模型可以保留先前的预训练知识; b)动态的开关门,以避免灾难性忘记预训练的知识; c)根据计划的政策调整学习步伐的策略。我们在机器翻译中的实验表明,WMT14英语 - 德语对的CTNMT获得了最高3个BLEU得分,甚至超过了先前的最先进的预培训辅助NMT NMT的NMT。尽管对于大型WMT14英语法国任务,有400万句话,但我们的基本模型仍然可以显着改善最先进的变压器大型模型,超过1个BLEU得分。代码和模型可以从https://github.com/bytedance/neurst/tree/Master/Master/examples/ctnmt下载。
translated by 谷歌翻译
最近的自然语言理解进展(NLU)已经被驱动,部分是由胶水,超级格,小队等的基准。事实上,许多NLU模型现在在许多任务中匹配或超过“人类水平”性能这些基准。然而,大多数这些基准测试都提供模型访问相对大量的标记数据进行培训。因此,该模型提供了比人类所需的更多数据,以实现强大的性能。这有动机侧重于侧重于改善NLU模型的少量学习性能。然而,缺乏少量射门的标准化评估基准,导致不同纸张中的不同实验设置。为了帮助加速这一工作的工作,我们介绍了线索(受限制的语言理解评估标准),这是评估NLU模型的几次拍摄学习功能的基准。我们证明,虽然最近的模型在获得大量标记数据时达到人类性能,但对于大多数任务,少量拍摄设置中的性能存在巨大差距。我们还展示了几个拍摄设置中替代模型家族和适应技术之间的差异。最后,我们讨论了在设计实验设置时讨论了评估真实少量学习绩效的实验设置,并提出了统一的标准化方法,以获得少量学习评估。我们的目标是鼓励对NLU模型的研究,可以概括为具有少数示例的新任务。线索的代码和数据可以在https://github.com/microsoft/clues提供。
translated by 谷歌翻译