Label noise is ubiquitous in various machine learning scenarios such as self-labeling with model predictions and erroneous data annotation. Many existing approaches are based on heuristics such as sample losses, which might not be flexible enough to achieve optimal solutions. Meta learning based methods address this issue by learning a data selection function, but can be hard to optimize. In light of these pros and cons, we propose Selection-Enhanced Noisy label Training (SENT) that does not rely on meta learning while having the flexibility of being data-driven. SENT transfers the noise distribution to a clean set and trains a model to distinguish noisy labels from clean ones using model-based features. Empirically, on a wide range of tasks including text classification and speech recognition, SENT improves performance over strong baselines under the settings of self-training and label corruption.
translated by 谷歌翻译
We present Noisy Student Training, a semi-supervised learning approach that works well even when labeled data is abundant. Noisy Student Training achieves 88.4% top-1 accuracy on ImageNet, which is 2.0% better than the state-of-the-art model that requires 3.5B weakly labeled Instagram images. On robustness test sets, it improves ImageNet-A top-1 accuracy from 61.0% to 83.7%, reduces ImageNet-C mean corruption error from 45.7 to 28.3, and reduces ImageNet-P mean flip rate from 27.8 to 12.2.Noisy Student Training extends the idea of self-training and distillation with the use of equal-or-larger student models and noise added to the student during learning. On Im-ageNet, we first train an EfficientNet model on labeled images and use it as a teacher to generate pseudo labels for 300M unlabeled images. We then train a larger Efficient-Net as a student model on the combination of labeled and pseudo labeled images. We iterate this process by putting back the student as the teacher. During the learning of the student, we inject noise such as dropout, stochastic depth, and data augmentation via RandAugment to the student so that the student generalizes better than the teacher. 1 * This work was conducted at Google.
translated by 谷歌翻译
除了使用硬标签的标准监督学习外,通常在许多监督学习设置中使用辅助损失来改善模型的概括。例如,知识蒸馏增加了第二个教师模仿模型训练的损失,在该培训中,教师可能是一个验证的模型,可以输出比标签更丰富的分布。同样,在标记数据有限的设置中,弱标记信息以标签函数的形式使用。此处引入辅助损失来对抗标签函数,这些功能可能是基于嘈杂的规则的真实标签近似值。我们解决了学习以原则性方式结合这些损失的问题。我们介绍AMAL,该AMAL使用元学习在验证度量上学习实例特定的权重,以实现损失的最佳混合。在许多知识蒸馏和规则降解域中进行的实验表明,Amal在这些领域中对竞争基准的增长可显着。我们通过经验分析我们的方法,并分享有关其提供性能提升的机制的见解。
translated by 谷歌翻译
深度学习在大量大数据的帮助下取得了众多域中的显着成功。然而,由于许多真实情景中缺乏高质量标签,数据标签的质量是一个问题。由于嘈杂的标签严重降低了深度神经网络的泛化表现,从嘈杂的标签(强大的培训)学习是在现代深度学习应用中成为一项重要任务。在本调查中,我们首先从监督的学习角度描述了与标签噪声学习的问题。接下来,我们提供62项最先进的培训方法的全面审查,所有这些培训方法都按照其方法论差异分为五个群体,其次是用于评估其优越性的六种性质的系统比较。随后,我们对噪声速率估计进行深入分析,并总结了通常使用的评估方法,包括公共噪声数据集和评估度量。最后,我们提出了几个有前途的研究方向,可以作为未来研究的指导。所有内容将在https://github.com/songhwanjun/awesome-noisy-labels提供。
translated by 谷歌翻译
Positive-Unlabeled (PU) learning aims to learn a model with rare positive samples and abundant unlabeled samples. Compared with classical binary classification, the task of PU learning is much more challenging due to the existence of many incompletely-annotated data instances. Since only part of the most confident positive samples are available and evidence is not enough to categorize the rest samples, many of these unlabeled data may also be the positive samples. Research on this topic is particularly useful and essential to many real-world tasks which demand very expensive labelling cost. For example, the recognition tasks in disease diagnosis, recommendation system and satellite image recognition may only have few positive samples that can be annotated by the experts. These methods mainly omit the intrinsic hardness of some unlabeled data, which can result in sub-optimal performance as a consequence of fitting the easy noisy data and not sufficiently utilizing the hard data. In this paper, we focus on improving the commonly-used nnPU with a novel training pipeline. We highlight the intrinsic difference of hardness of samples in the dataset and the proper learning strategies for easy and hard data. By considering this fact, we propose first splitting the unlabeled dataset with an early-stop strategy. The samples that have inconsistent predictions between the temporary and base model are considered as hard samples. Then the model utilizes a noise-tolerant Jensen-Shannon divergence loss for easy data; and a dual-source consistency regularization for hard data which includes a cross-consistency between student and base model for low-level features and self-consistency for high-level features and predictions, respectively.
translated by 谷歌翻译
Data Augmentation (DA) is frequently used to automatically provide additional training data without extra human annotation. However, data augmentation may introduce noisy data that impairs training. To guarantee the quality of augmented data, existing methods either assume no noise exists in the augmented data and adopt consistency training or use simple heuristics such as training loss and diversity constraints to filter out ``noisy'' data. However, those filtered examples may still contain useful information, and dropping them completely causes loss of supervision signals. In this paper, based on the assumption that the original dataset is cleaner than the augmented data, we propose an on-the-fly denoising technique for data augmentation that learns from soft augmented labels provided by an organic teacher model trained on the cleaner original data. A simple self-regularization module is applied to force the model prediction to be consistent across two distinct dropouts to further prevent overfitting on noisy labels. Our method can be applied to augmentation techniques in general and can consistently improve the performance on both text classification and question-answering tasks.
translated by 谷歌翻译
尽管在许多自然语言处理(NLP)任务中进行了预先训练的语言模型(LMS),但它们需要过多标记的数据来进行微调以实现令人满意的性能。为了提高标签效率,研究人员采取了活跃的学习(AL),而大多数事先工作则忽略未标记数据的潜力。要释放未标记数据的强大功能以获得更好的标签效率和模型性能,我们开发ATM,一个新的框架,它利用自我训练来利用未标记的数据,并且对于特定的AL算法不可知,用作改善现有的插件模块Al方法。具体地,具有高不确定性的未标记数据暴露于Oracle以进行注释,而具有低不确定性的人则可用于自培训。为了缓解自我训练中的标签噪声传播问题,我们设计一个简单且有效的基于动量的内存库,可以动态地从所有轮次汇总模型预测。通过广泛的实验,我们证明了ATM优于最强大的积极学习和自我训练基线,平均将标签效率提高51.9%。
translated by 谷歌翻译
不完美的标签在现实世界数据集中无处不在,严重损害了模型性能。几个最近处理嘈杂标签的有效方法有两个关键步骤:1)将样品分开通过培训丢失,2)使用半监控方法在错误标记的集合中生成样本的伪标签。然而,由于硬样品和噪声之间的类似损失分布,目前的方法总是损害信息性的硬样品。在本文中,我们提出了PGDF(先前引导的去噪框架),通过生成样本的先验知识来学习深层模型来抑制噪声的新框架,这被集成到分割样本步骤和半监督步骤中。我们的框架可以将更多信息性硬清洁样本保存到干净标记的集合中。此外,我们的框架还通过抑制当前伪标签生成方案中的噪声来促进半监控步骤期间伪标签的质量。为了进一步增强硬样品,我们在训练期间在干净的标记集合中重新重量样品。我们使用基于CiFar-10和CiFar-100的合成数据集以及现实世界数据集WebVision和服装1M进行了评估了我们的方法。结果表明了最先进的方法的大量改进。
translated by 谷歌翻译
基于深度学习的组织病理学图像分类是帮助医生提高癌症诊断的准确性和迅速性的关键技术。然而,在复杂的手动注释过程中,嘈杂的标签通常是不可避免的,因此误导了分类模型的培训。在这项工作中,我们介绍了一种用于组织病理学图像分类的新型硬样本感知噪声稳健学习方法。为了区分来自有害嘈杂的内容漏洞,我们通过使用样本培训历史来构建一个简单/硬/噪声(EHN)检测模型。然后,我们将EHN集成到自动训练架构中,通过逐渐校正降低噪声速率。通过获得的几乎干净的数据集,我们进一步提出了一种噪声抑制和硬增强(NSHE)方案来训练噪声鲁棒模型。与以前的作品相比,我们的方法可以节省更多清洁样本,并且可以直接应用于实际嘈杂的数据集场景,而无需使用清洁子集。实验结果表明,该方案在合成和现实世界嘈杂的数据集中优于当前最先进的方法。源代码和数据可在https://github.com/bupt-ai-cz/hsa-nrl/处获得。
translated by 谷歌翻译
在标签噪声下训练深神网络的能力很有吸引力,因为不完美的注释数据相对便宜。最先进的方法基于半监督学习(SSL),该学习选择小损失示例为清洁,然后应用SSL技术来提高性能。但是,选择步骤主要提供一个中等大小的清洁子集,该子集可俯瞰丰富的干净样品。在这项工作中,我们提出了一个新颖的嘈杂标签学习框架Promix,试图最大程度地提高清洁样品的实用性以提高性能。我们方法的关键是,我们提出了一种匹配的高信心选择技术,该技术选择了那些具有很高置信的示例,并与给定标签进行了匹配的预测。结合小损失选择,我们的方法能够达到99.27的精度,并在检测CIFAR-10N数据集上的干净样品时召回98.22。基于如此大的清洁数据,Promix将最佳基线方法提高了CIFAR-10N的 +2.67%,而CIFAR-100N数据集则提高了 +1.61%。代码和数据可从https://github.com/justherozen/promix获得
translated by 谷歌翻译
样品选择是减轻标签噪声在鲁棒学习中的影响的有效策略。典型的策略通常应用小损失标准来识别干净的样品。但是,这些样本位于决策边界周围,通常会与嘈杂的例子纠缠在一起,这将被此标准丢弃,从而导致概括性能的严重退化。在本文中,我们提出了一种新颖的选择策略,\ textbf {s} elf- \ textbf {f} il \ textbf {t} ering(sft),它利用历史预测中嘈杂的示例的波动来过滤它们,可以过滤它们,这可以是可以过滤的。避免在边界示例中的小损失标准的选择偏置。具体来说,我们介绍了一个存储库模块,该模块存储了每个示例的历史预测,并动态更新以支持随后的学习迭代的选择。此外,为了减少SFT样本选择偏置的累积误差,我们设计了一个正规化术语来惩罚自信的输出分布。通过通过此术语增加错误分类类别的重量,损失函数在轻度条件下标记噪声是可靠的。我们对具有变化噪声类型的三个基准测试并实现了新的最先进的实验。消融研究和进一步分析验证了SFT在健壮学习中选择样本的优点。
translated by 谷歌翻译
Deep neural networks are known to be annotation-hungry. Numerous efforts have been devoted to reducing the annotation cost when learning with deep networks. Two prominent directions include learning with noisy labels and semi-supervised learning by exploiting unlabeled data. In this work, we propose DivideMix, a novel framework for learning with noisy labels by leveraging semi-supervised learning techniques. In particular, DivideMix models the per-sample loss distribution with a mixture model to dynamically divide the training data into a labeled set with clean samples and an unlabeled set with noisy samples, and trains the model on both the labeled and unlabeled data in a semi-supervised manner. To avoid confirmation bias, we simultaneously train two diverged networks where each network uses the dataset division from the other network. During the semi-supervised training phase, we improve the MixMatch strategy by performing label co-refinement and label co-guessing on labeled and unlabeled samples, respectively. Experiments on multiple benchmark datasets demonstrate substantial improvements over state-of-the-art methods. Code is available at https://github.com/LiJunnan1992/DivideMix.
translated by 谷歌翻译
最近关于使用嘈杂标签的学习的研究通过利用小型干净数据集来显示出色的性能。特别是,基于模型不可知的元学习的标签校正方法进一步提高了性能,通过纠正了嘈杂的标签。但是,标签错误矫予没有保障措施,导致不可避免的性能下降。此外,每个训练步骤都需要至少三个背部传播,显着减慢训练速度。为了缓解这些问题,我们提出了一种强大而有效的方法,可以在飞行中学习标签转换矩阵。采用转换矩阵使分类器对所有校正样本持怀疑态度,这减轻了错误的错误问题。我们还介绍了一个双头架构,以便在单个反向传播中有效地估计标签转换矩阵,使得估计的矩阵紧密地遵循由标签校正引起的移位噪声分布。广泛的实验表明,我们的方法在训练效率方面表现出比现有方法相当或更好的准确性。
translated by 谷歌翻译
Distantly-Supervised Named Entity Recognition (DS-NER) effectively alleviates the data scarcity problem in NER by automatically generating training samples. Unfortunately, the distant supervision may induce noisy labels, thus undermining the robustness of the learned models and restricting the practical application. To relieve this problem, recent works adopt self-training teacher-student frameworks to gradually refine the training labels and improve the generalization ability of NER models. However, we argue that the performance of the current self-training frameworks for DS-NER is severely underestimated by their plain designs, including both inadequate student learning and coarse-grained teacher updating. Therefore, in this paper, we make the first attempt to alleviate these issues by proposing: (1) adaptive teacher learning comprised of joint training of two teacher-student networks and considering both consistent and inconsistent predictions between two teachers, thus promoting comprehensive student learning. (2) fine-grained student ensemble that updates each fragment of the teacher model with a temporal moving average of the corresponding fragment of the student, which enhances consistent predictions on each model fragment against noise. To verify the effectiveness of our proposed method, we conduct experiments on four DS-NER datasets. The experimental results demonstrate that our method significantly surpasses previous SOTA methods.
translated by 谷歌翻译
大多数现有的少量学习(FSL)方法都需要大量的元训练中标记数据,这是一个主要限制。为了减少标签的需求,已经为FSL提出了半监督的元训练设置,其中仅包括几个标记的样品和基础类别中的未标记样本数量。但是,此设置下的现有方法需要从未标记的集合中选择类吸引的样本选择,这违反了未标记集的假设。在本文中,我们提出了一个实用的半监督元训练环境,并使用真正的未标记数据。在新设置下,现有方法的性能显着下降。为了更好地利用标签和真正未标记的数据,我们提出了一个简单有效的元训练框架,称为基于元学习(PLML)的伪标记。首先,我们通过常见的半监督学习(SSL)训练分类器,并使用它来获取未标记数据的伪标记。然后,我们从标记和伪标记的数据中构建了几个射击任务,并在构造的任务上运行元学习以学习FSL模型。令人惊讶的是,通过在两个FSL数据集的广泛实验中,我们发现这个简单的元训练框架有效地防止了在有限的标记数据下FSL的性能降解。此外,从元培训中受益,提出的方法还改善了两种代表性SSL算法所学的分类器。
translated by 谷歌翻译
We demonstrate that self-learning techniques like entropy minimization and pseudo-labeling are simple and effective at improving performance of a deployed computer vision model under systematic domain shifts. We conduct a wide range of large-scale experiments and show consistent improvements irrespective of the model architecture, the pre-training technique or the type of distribution shift. At the same time, self-learning is simple to use in practice because it does not require knowledge or access to the original training data or scheme, is robust to hyperparameter choices, is straight-forward to implement and requires only a few adaptation epochs. This makes self-learning techniques highly attractive for any practitioner who applies machine learning algorithms in the real world. We present state-of-the-art adaptation results on CIFAR10-C (8.5% error), ImageNet-C (22.0% mCE), ImageNet-R (17.4% error) and ImageNet-A (14.8% error), theoretically study the dynamics of self-supervised adaptation methods and propose a new classification dataset (ImageNet-D) which is challenging even with adaptation.
translated by 谷歌翻译
最先进的自动语音识别(ASR)系统经过数以万计的标记语音数据训练。人类转录很昂贵且耗时。诸如转录的质量和一致性之类的因素可以极大地影响使用这些数据训练的ASR模型的性能。在本文中,我们表明我们可以通过利用最近的自学和半监督学习技术来培训强大的教师模型来生产高质量的伪标签。具体来说,我们仅使用(无监督/监督培训)和迭代嘈杂的学生教师培训来培训6亿个参数双向教师模型。该模型在语音搜索任务上达到了4.0%的单词错误率(WER),比基线相对好11.1%。我们进一步表明,通过使用这种强大的教师模型来生成用于训练的高质量伪标签,与使用人类标签相比,流媒体模型可以实现13.6%的相对减少(5.9%至5.1%)。
translated by 谷歌翻译
Information Extraction (IE) aims to extract structured information from heterogeneous sources. IE from natural language texts include sub-tasks such as Named Entity Recognition (NER), Relation Extraction (RE), and Event Extraction (EE). Most IE systems require comprehensive understandings of sentence structure, implied semantics, and domain knowledge to perform well; thus, IE tasks always need adequate external resources and annotations. However, it takes time and effort to obtain more human annotations. Low-Resource Information Extraction (LRIE) strives to use unsupervised data, reducing the required resources and human annotation. In practice, existing systems either utilize self-training schemes to generate pseudo labels that will cause the gradual drift problem, or leverage consistency regularization methods which inevitably possess confirmation bias. To alleviate confirmation bias due to the lack of feedback loops in existing LRIE learning paradigms, we develop a Gradient Imitation Reinforcement Learning (GIRL) method to encourage pseudo-labeled data to imitate the gradient descent direction on labeled data, which can force pseudo-labeled data to achieve better optimization capabilities similar to labeled data. Based on how well the pseudo-labeled data imitates the instructive gradient descent direction obtained from labeled data, we design a reward to quantify the imitation process and bootstrap the optimization capability of pseudo-labeled data through trial and error. In addition to learning paradigms, GIRL is not limited to specific sub-tasks, and we leverage GIRL to solve all IE sub-tasks (named entity recognition, relation extraction, and event extraction) in low-resource settings (semi-supervised IE and few-shot IE).
translated by 谷歌翻译
很少有射击分类旨在学习一个模型,该模型只有几个标签样本可用,可以很好地推广到新任务。为了利用在实际应用中更丰富的未标记数据,Ren等人。 \ shortcite {ren2018meta}提出了一种半监督的少数射击分类方法,该方法通过手动定义的度量标记为每个未标记的样本分配了适当的标签。但是,手动定义的度量未能捕获数据中的内在属性。在本文中,我们提出了a \ textbf {s} elf- \ textbf {a} daptive \ textbf {l} abel \ textbf {a} u摄孔方法,称为\ textbf {sala},用于半精神分裂的几个分类。萨拉(Sala)的主要新颖性是任务自适应指标,可以以端到端的方式适应不同任务的指标。萨拉(Sala)的另一个吸引人的特征是一种进步的邻居选择策略,该策略在整个训练阶段逐渐逐渐信心选择未标记的数据。实验表明,SALA优于在基准数据集上半监督的几种射击分类的几种最新方法。
translated by 谷歌翻译
灵感来自深度学习的广泛成功,已经提出了图表神经网络(GNNS)来学习表达节点表示,并在各种图形学习任务中表现出有希望的性能。然而,现有的努力主要集中在提供相对丰富的金色标记节点的传统半监督设置。虽然数据标签是难以忍受的事实令人生畏的事实并且需要强化领域知识,但特别是在考虑图形结构数据的异质性时,它通常是不切实际的。在几次半监督的环境下,大多数现有GNN的性能不可避免地受到过度装备和过天际问题的破坏,在很大程度上由于标记数据的短缺。在本文中,我们提出了一种配备有新型元学习算法的解耦的网络架构来解决这个问题。从本质上讲,我们的框架META-PN通过META学习的标签传播策略在未标记节点上乘坐高质量的伪标签,这有效增强了稀缺标记的数据,同时在培训期间启用大型接受领域。广泛的实验表明,与各种基准数据集上的现有技术相比,我们的方法提供了简单且实质性的性能。
translated by 谷歌翻译