Current state-of-the-art approaches to text classification typically leverage BERT-style Transformer models with a softmax classifier, jointly fine-tuned to predict class labels of a target task. In this paper, we instead propose an alternative training objective in which we learn task-specific embeddings of text: our proposed objective learns embeddings such that all texts that share the same target class label should be close together in the embedding space, while all others should be far apart. This allows us to replace the softmax classifier with a more interpretable k-nearest-neighbor classification approach. In a series of experiments, we show that this yields a number of interesting benefits: (1) The resulting order induced by distances in the embedding space can be used to directly explain classification decisions. (2) This facilitates qualitative inspection of the training data, helping us to better understand the problem space and identify labelling quality issues. (3) The learned distances to some degree generalize to unseen classes, allowing us to incrementally add new classes without retraining the model. We present extensive experiments which show that the benefits of ante-hoc explainability and incremental learning come at no cost in overall classification accuracy, thus pointing to practical applicability of our proposed approach.
translated by 谷歌翻译
我们提出了一种具有有限目标语言数据的交叉语言内容标记的新颖框架,这在预测性能方面显着优于现有的工作。该框架基于最近的邻居架构。它是Vanilla K-最近邻模型的现代实例化,因为我们在所有组件中使用变压器表示。我们的框架可以适应新的源语言实例,而无需从头开始侦察。与基于邻域的方法的事先工作不同,我们基于查询邻的交互对邻居信息进行编码。我们提出了两个编码方案,并使用定性和定量分析显示其有效性。我们的评估结果是来自两个不同数据集的八种语言,用于滥用语言检测,在强大的基线上,可以在F1中显示最多9.5(对于意大利语)的大量改进。平均水平,我们在拼图式多语言数据集中的三种语言中实现了3.6的F1改进,2.14在WUL数据集的F1中的改进。
translated by 谷歌翻译
We introduce a new language representation model called BERT, which stands for Bidirectional Encoder Representations from Transformers. Unlike recent language representation models (Peters et al., 2018a;Radford et al., 2018), BERT is designed to pretrain deep bidirectional representations from unlabeled text by jointly conditioning on both left and right context in all layers. As a result, the pre-trained BERT model can be finetuned with just one additional output layer to create state-of-the-art models for a wide range of tasks, such as question answering and language inference, without substantial taskspecific architecture modifications.BERT is conceptually simple and empirically powerful. It obtains new state-of-the-art results on eleven natural language processing tasks, including pushing the GLUE score to 80.5% (7.7% point absolute improvement), MultiNLI accuracy to 86.7% (4.6% absolute improvement), SQuAD v1.1 question answering Test F1 to 93.2 (1.5 point absolute improvement) and SQuAD v2.0 Test F1 to 83.1 (5.1 point absolute improvement).
translated by 谷歌翻译
BERT (Devlin et al., 2018) and RoBERTa has set a new state-of-the-art performance on sentence-pair regression tasks like semantic textual similarity (STS). However, it requires that both sentences are fed into the network, which causes a massive computational overhead: Finding the most similar pair in a collection of 10,000 sentences requires about 50 million inference computations (~65 hours) with BERT. The construction of BERT makes it unsuitable for semantic similarity search as well as for unsupervised tasks like clustering.In this publication, we present Sentence-BERT (SBERT), a modification of the pretrained BERT network that use siamese and triplet network structures to derive semantically meaningful sentence embeddings that can be compared using cosine-similarity. This reduces the effort for finding the most similar pair from 65 hours with BERT / RoBERTa to about 5 seconds with SBERT, while maintaining the accuracy from BERT.We evaluate SBERT and SRoBERTa on common STS tasks and transfer learning tasks, where it outperforms other state-of-the-art sentence embeddings methods. 1
translated by 谷歌翻译
Weakly-supervised text classification aims to train a classifier using only class descriptions and unlabeled data. Recent research shows that keyword-driven methods can achieve state-of-the-art performance on various tasks. However, these methods not only rely on carefully-crafted class descriptions to obtain class-specific keywords but also require substantial amount of unlabeled data and takes a long time to train. This paper proposes FastClass, an efficient weakly-supervised classification approach. It uses dense text representation to retrieve class-relevant documents from external unlabeled corpus and selects an optimal subset to train a classifier. Compared to keyword-driven methods, our approach is less reliant on initial class descriptions as it no longer needs to expand each class description into a set of class-specific keywords. Experiments on a wide range of classification tasks show that the proposed approach frequently outperforms keyword-driven models in terms of classification accuracy and often enjoys orders-of-magnitude faster training speed.
translated by 谷歌翻译
Using a single model across various tasks is beneficial for training and applying deep neural sequence models. We address the problem of developing generalist representations of text that can be used to perform a range of different tasks rather than being specialised to a single application. We focus on processing short questions and developing an embedding for these questions that is useful on a diverse set of problems, such as question topic classification, equivalent question recognition, and question answering. This paper introduces QBERT, a generalist model for processing questions. With QBERT, we demonstrate how we can train a multi-task network that performs all question-related tasks and has achieved similar performance compared to its corresponding single-task models.
translated by 谷歌翻译
Text classification of unseen classes is a challenging Natural Language Processing task and is mainly attempted using two different types of approaches. Similarity-based approaches attempt to classify instances based on similarities between text document representations and class description representations. Zero-shot text classification approaches aim to generalize knowledge gained from a training task by assigning appropriate labels of unknown classes to text documents. Although existing studies have already investigated individual approaches to these categories, the experiments in literature do not provide a consistent comparison. This paper addresses this gap by conducting a systematic evaluation of different similarity-based and zero-shot approaches for text classification of unseen classes. Different state-of-the-art approaches are benchmarked on four text classification datasets, including a new dataset from the medical domain. Additionally, novel SimCSE and SBERT-based baselines are proposed, as other baselines used in existing work yield weak classification results and are easily outperformed. Finally, the novel similarity-based Lbl2TransformerVec approach is presented, which outperforms previous state-of-the-art approaches in unsupervised text classification. Our experiments show that similarity-based approaches significantly outperform zero-shot approaches in most cases. Additionally, using SimCSE or SBERT embeddings instead of simpler text representations increases similarity-based classification results even further.
translated by 谷歌翻译
数据增强是通过转换为机器学习的人工创建数据的人工创建,是一个跨机器学习学科的研究领域。尽管它对于增加模型的概括功能很有用,但它还可以解决许多其他挑战和问题,从克服有限的培训数据到正规化目标到限制用于保护隐私的数据的数量。基于对数据扩展的目标和应用的精确描述以及现有作品的分类法,该调查涉及用于文本分类的数据增强方法,并旨在为研究人员和从业者提供简洁而全面的概述。我们将100多种方法划分为12种不同的分组,并提供最先进的参考文献来阐述哪种方法可以通过将它们相互关联,从而阐述了哪种方法。最后,提供可能构成未来工作的基础的研究观点。
translated by 谷歌翻译
我们渴望基于匹配的细粒度方面的新的科学文档相似模式。我们的模型采用了将相关论文方面描述为一种新颖的文本监督形式的共同引用上下文培训。我们使用多向量文档表示,最近在设置中探讨了短查询文本,但在具有挑战性的文档文件设置中探索。我们呈现了一种快速方法,涉及仅匹配单句对,以及一种具有最佳传输的稀疏多匹配的方法。我们的模型在四个数据集中提高了文档相似任务的性能。此外,我们的快速单匹赛方法实现了竞争力的结果,开辟了将细粒度的文档相似性模型应用于大型科学集团的可能性。
translated by 谷歌翻译
Short text classification is a crucial and challenging aspect of Natural Language Processing. For this reason, there are numerous highly specialized short text classifiers. However, in recent short text research, State of the Art (SOTA) methods for traditional text classification, particularly the pure use of Transformers, have been unexploited. In this work, we examine the performance of a variety of short text classifiers as well as the top performing traditional text classifier. We further investigate the effects on two new real-world short text datasets in an effort to address the issue of becoming overly dependent on benchmark datasets with a limited number of characteristics. Our experiments unambiguously demonstrate that Transformers achieve SOTA accuracy on short text classification tasks, raising the question of whether specialized short text techniques are necessary.
translated by 谷歌翻译
建模法检索和检索作为预测问题最近被出现为法律智能的主要方法。专注于法律文章检索任务,我们展示了一个名为Lamberta的深度学习框架,该框架被设计用于民法代码,并在意大利民法典上专门培训。为了我们的知识,这是第一项研究提出了基于伯特(来自变压器的双向编码器表示)学习框架的意大利法律制度对意大利法律制度的高级法律文章预测的研究,最近引起了深度学习方法的增加,呈现出色的有效性在几种自然语言处理和学习任务中。我们通过微调意大利文章或其部分的意大利预先训练的意大利预先训练的伯爵来定义Lamberta模型,因为法律文章作为分类任务检索。我们Lamberta框架的一个关键方面是我们构思它以解决极端的分类方案,其特征在于课程数量大,少量学习问题,以及意大利法律预测任务的缺乏测试查询基准。为了解决这些问题,我们为法律文章的无监督标签定义了不同的方法,原则上可以应用于任何法律制度。我们提供了深入了解我们Lamberta模型的解释性和可解释性,并且我们对单一标签以及多标签评估任务进行了广泛的查询模板实验分析。经验证据表明了Lamberta的有效性,以及对广泛使用的深度学习文本分类器和一些构思的几次学习者来说,其优越性是对属性感知预测任务的优势。
translated by 谷歌翻译
长期以来,共同基金或交易所交易基金(ETF)的分类已为财务分析师提供服务,以进行同行分析,以从竞争对手分析开始到量化投资组合多元化。分类方法通常依赖于从n-1a表格中提取的结构化格式的基金组成数据。在这里,我们启动一项研究,直接从使用自然语言处理(NLP)的表格中描绘的非结构化数据中学习分类系统。将输入数据仅作为表格中报告的投资策略描述,而目标变量是Lipper全球类别,并且使用各种NLP模型,我们表明,分类系统确实可以通过高准确率。我们讨论了我们发现的含义和应用,以及现有的预培训架构的局限性在应用它们以学习基金分类时。
translated by 谷歌翻译
Deep Learning and Machine Learning based models have become extremely popular in text processing and information retrieval. However, the non-linear structures present inside the networks make these models largely inscrutable. A significant body of research has focused on increasing the transparency of these models. This article provides a broad overview of research on the explainability and interpretability of natural language processing and information retrieval methods. More specifically, we survey approaches that have been applied to explain word embeddings, sequence modeling, attention modules, transformers, BERT, and document ranking. The concluding section suggests some possible directions for future research on this topic.
translated by 谷歌翻译
标记数据可以是昂贵的任务,因为它通常由域专家手动执行。对于深度学习而言,这是繁琐的,因为它取决于大型标记的数据集。主动学习(AL)是一种范式,旨在通过仅使用二手车型认为最具信息丰富的数据来减少标签努力。在文本分类设置中,在AL上完成了很少的研究,旁边没有涉及最近的最先进的自然语言处理(NLP)模型。在这里,我们介绍了一个实证研究,可以将基于不确定性的基于不确定性的算法与Bert $ _ {base} $相比,作为使用的分类器。我们评估两个NLP分类数据集的算法:斯坦福情绪树木银行和kvk-Front页面。此外,我们探讨了旨在解决不确定性的al的预定问题的启发式;即,它是不可规范的,并且易于选择异常值。此外,我们探讨了查询池大小对al的性能的影响。虽然发现,AL的拟议启发式没有提高AL的表现;我们的结果表明,使用BERT $ _ {Base} $概率使用不确定性的AL。随着查询池大小变大,性能的这种差异可以减少。
translated by 谷歌翻译
近年来,预制语言模型彻底改变了NLP世界,同时在各种下游任务中实现了最先进的性能。但是,在许多情况下,当标记数据稀缺时,这些模型不会表现良好,并且预计模型将在零或几秒钟内执行。最近,有几项工作表明,与下游任务更好地对准的预先预测或执行第二阶段,可以导致改进的结果,尤其是在稀缺数据设置中。在此,我们建议利用携带的情绪话语标记来产生大规模的弱标记数据,这又可以用于适应语言模型进行情感分析。广泛的实验结果显示了我们在各种基准数据集中的方法的价值,包括金融域。在https://github.com/ibm/tslm-discourse-markers上提供代码,模型和数据。
translated by 谷歌翻译
在线学习系统具有成绩单,书籍和问题形式的多个数据存储库。为了易于访问,此类系统会根据层次性质(主题 - 主题)的明确分类法组织内容。将输入分类为层次标签的任务通常被视为平坦的多类分类问题。这种方法忽略了输入中的术语与层次标签中的令牌之间的语义相关性。当它们仅将叶片节点视为标签时,替代方法也患有类不平衡。为了解决这些问题,我们将任务制定为一个密集的检索问题,以检索每个内容的适当层次标签。在本文中,我们处理问题。我们将层次标签建模为其令牌的组成,并使用有效的交叉注意机制将信息与内容术语表示融合。我们还提出了一种自适应内部的硬采样方法,随着培训的进行,该方法可以更好地取消负面影响。我们证明了所提出的方法\ textit {tagrec ++}在问题数据集上的现有最新方法均超过了receal@k所测量的现有最新方法。此外,我们演示了\ textit {tagrec ++}的零射击功能以及适应标签更改的能力。
translated by 谷歌翻译
我们大多数人不是特定领域的专家,例如鸟类学。尽管如此,我们确实有一般的图像和语言理解,我们用来匹配我们所看到的专家资源。这使我们能够扩展我们的知识并在没有临时外部监督的情况下执行新的任务。相反,除非培训专门考虑到​​这一知识,否则机器更加难以咨询专家策划知识库。因此,在本文中,我们考虑了一个新问题:没有专家注释的细粒度的图像识别,我们通过利用Web百科全书中提供的广泛知识来解决这些问题。首先,我们学习模型来描述使用非专家图像描述来描述对象的视觉外观。然后,我们培训一个细粒度的文本相似性模型,它与句子级别的文件描述匹配。我们在两个数据集上评估该方法,并与跨模型检索的几个强大的基线和最先进的技术相比。代码可用:https://github.com/subhc/clever
translated by 谷歌翻译
Metric-based meta-learning is one of the de facto standards in few-shot learning. It composes of representation learning and metrics calculation designs. Previous works construct class representations in different ways, varying from mean output embedding to covariance and distributions. However, using embeddings in space lacks expressivity and cannot capture class information robustly, while statistical complex modeling poses difficulty to metric designs. In this work, we use tensor fields (``areas'') to model classes from the geometrical perspective for few-shot learning. We present a simple and effective method, dubbed hypersphere prototypes (HyperProto), where class information is represented by hyperspheres with dynamic sizes with two sets of learnable parameters: the hypersphere's center and the radius. Extending from points to areas, hyperspheres are much more expressive than embeddings. Moreover, it is more convenient to perform metric-based classification with hypersphere prototypes than statistical modeling, as we only need to calculate the distance from a data point to the surface of the hypersphere. Following this idea, we also develop two variants of prototypes under other measurements. Extensive experiments and analysis on few-shot learning tasks across NLP and CV and comparison with 20+ competitive baselines demonstrate the effectiveness of our approach.
translated by 谷歌翻译
当前的文本分类方法通常仅在幼稚或复杂的分类器之前将文本编码为嵌入,该分类器忽略了标签文本中包含的建议信息。实际上,人类主要基于子类别的语义含义对文档进行分类。我们通过暹罗伯特(Siamese Bert)和名为Ideas(交互式双重注意力)的交互式双重注意提出了一种新颖的模型结构,以捕获文本和标签名称的信息交换。交互式双重注意力使该模型能够从粗糙到细小的类中开利的类和类内部信息,这涉及区分所有标签并匹配地面真实标签的语义子类。我们提出的方法的表现优于最新方法,使用标签文本显着,结果更稳定。
translated by 谷歌翻译
Language models pretrained on text from a wide variety of sources form the foundation of today's NLP. In light of the success of these broad-coverage models, we investigate whether it is still helpful to tailor a pretrained model to the domain of a target task. We present a study across four domains (biomedical and computer science publications, news, and reviews) and eight classification tasks, showing that a second phase of pretraining indomain (domain-adaptive pretraining) leads to performance gains, under both high-and low-resource settings. Moreover, adapting to the task's unlabeled data (task-adaptive pretraining) improves performance even after domain-adaptive pretraining. Finally, we show that adapting to a task corpus augmented using simple data selection strategies is an effective alternative, especially when resources for domain-adaptive pretraining might be unavailable. Overall, we consistently find that multiphase adaptive pretraining offers large gains in task performance.
translated by 谷歌翻译