在标签 - 噪声学习中,估计过渡矩阵是一个热门话题,因为矩阵在构建统计上一致的分类器中起着重要作用。传统上,从干净的标签到嘈杂的标签(即,清洁标签过渡矩阵(CLTM))已被广泛利用,以通过使用嘈杂的数据来学习干净的标签分类器。该分类器的动机主要是输出贝叶斯的最佳预测标签,在本文中,我们研究以直接建模从贝叶斯最佳标签过渡到嘈杂标签(即贝叶斯标签,贝叶斯标签,是BLTM)),并学习分类器以预测贝叶斯最佳的分类器标签。请注意,只有嘈杂的数据,它不足以估计CLTM或BLTM。但是,贝叶斯最佳标签与干净标签相比,贝叶斯最佳标签的不确定性较小,即,贝叶斯最佳标签的类后代是一热矢量,而干净标签的载体则不是。这使两个优点能够估算BLTM,即(a)一组具有理论上保证的贝叶斯最佳标签的示例可以从嘈杂的数据中收集; (b)可行的解决方案空间要小得多。通过利用优势,我们通过采用深层神经网络来估计BLTM参数,从而更好地概括和出色的分类性能。
translated by 谷歌翻译
深度学习在大量大数据的帮助下取得了众多域中的显着成功。然而,由于许多真实情景中缺乏高质量标签,数据标签的质量是一个问题。由于嘈杂的标签严重降低了深度神经网络的泛化表现,从嘈杂的标签(强大的培训)学习是在现代深度学习应用中成为一项重要任务。在本调查中,我们首先从监督的学习角度描述了与标签噪声学习的问题。接下来,我们提供62项最先进的培训方法的全面审查,所有这些培训方法都按照其方法论差异分为五个群体,其次是用于评估其优越性的六种性质的系统比较。随后,我们对噪声速率估计进行深入分析,并总结了通常使用的评估方法,包括公共噪声数据集和评估度量。最后,我们提出了几个有前途的研究方向,可以作为未来研究的指导。所有内容将在https://github.com/songhwanjun/awesome-noisy-labels提供。
translated by 谷歌翻译
最近关于使用嘈杂标签的学习的研究通过利用小型干净数据集来显示出色的性能。特别是,基于模型不可知的元学习的标签校正方法进一步提高了性能,通过纠正了嘈杂的标签。但是,标签错误矫予没有保障措施,导致不可避免的性能下降。此外,每个训练步骤都需要至少三个背部传播,显着减慢训练速度。为了缓解这些问题,我们提出了一种强大而有效的方法,可以在飞行中学习标签转换矩阵。采用转换矩阵使分类器对所有校正样本持怀疑态度,这减轻了错误的错误问题。我们还介绍了一个双头架构,以便在单个反向传播中有效地估计标签转换矩阵,使得估计的矩阵紧密地遵循由标签校正引起的移位噪声分布。广泛的实验表明,我们的方法在训练效率方面表现出比现有方法相当或更好的准确性。
translated by 谷歌翻译
深神经网络(DNN)的记忆效果在许多最先进的标签噪声学习方法中起着枢轴作用。为了利用这一财产,通常采用早期停止训练早期优化的伎俩。目前的方法通常通过考虑整个DNN来决定早期停止点。然而,DNN可以被认为是一系列层的组成,并且发现DNN中的后一个层对标签噪声更敏感,而其前同行是非常稳健的。因此,选择整个网络的停止点可以使不同的DNN层对抗彼此影响,从而降低最终性能。在本文中,我们建议将DNN分离为不同的部位,逐步培训它们以解决这个问题。而不是早期停止,它一次列举一个整体DNN,我们最初通过用相对大量的时期优化DNN来训练前DNN层。在培训期间,我们通过使用较少数量的时期使用较少的地层来逐步培训后者DNN层,以抵消嘈杂标签的影响。我们将所提出的方法术语作为渐进式早期停止(PES)。尽管其简单性,与早期停止相比,PES可以帮助获得更有前景和稳定的结果。此外,通过将PE与现有的嘈杂标签培训相结合,我们在图像分类基准上实现了最先进的性能。
translated by 谷歌翻译
作为标签噪声,最受欢迎的分布变化之一,严重降低了深度神经网络的概括性能,具有嘈杂标签的强大训练正在成为现代深度学习中的重要任务。在本文中,我们提出了我们的框架,在子分类器(ALASCA)上创造了自适应标签平滑,该框架提供了具有理论保证和可忽略的其他计算的可靠特征提取器。首先,我们得出标签平滑(LS)会产生隐式Lipschitz正则化(LR)。此外,基于这些推导,我们将自适应LS(ALS)应用于子分类器架构上,以在中间层上的自适应LR的实际应用。我们对ALASCA进行了广泛的实验,并将其与以前的几个数据集上的噪声燃烧方法相结合,并显示我们的框架始终优于相应的基线。
translated by 谷歌翻译
在标签噪声下训练深神网络的能力很有吸引力,因为不完美的注释数据相对便宜。最先进的方法基于半监督学习(SSL),该学习选择小损失示例为清洁,然后应用SSL技术来提高性能。但是,选择步骤主要提供一个中等大小的清洁子集,该子集可俯瞰丰富的干净样品。在这项工作中,我们提出了一个新颖的嘈杂标签学习框架Promix,试图最大程度地提高清洁样品的实用性以提高性能。我们方法的关键是,我们提出了一种匹配的高信心选择技术,该技术选择了那些具有很高置信的示例,并与给定标签进行了匹配的预测。结合小损失选择,我们的方法能够达到99.27的精度,并在检测CIFAR-10N数据集上的干净样品时召回98.22。基于如此大的清洁数据,Promix将最佳基线方法提高了CIFAR-10N的 +2.67%,而CIFAR-100N数据集则提高了 +1.61%。代码和数据可从https://github.com/justherozen/promix获得
translated by 谷歌翻译
强大的学习方法旨在从嘈杂和损坏的训练数据中学习清洁的目标分布,其中经常假设特定的损坏模式是先验的。我们所提出的方法不仅可以成功地从脏数据集中学习清洁目标分布,还可以估计潜在的噪声模式。为此,我们利用了专家模型,可以区分两种不同类型的预测性不确定性,炼术和认识性的不确定性。我们表明,估计不确定性的能力在阐明腐败模式时发挥着重要作用,因为这两个目标紧密交织在一起。我们还提出了一种用于评估腐败模式估计性能的新型验证方案。我们提出的方法通过许多域来广泛地评估了鲁棒性和腐败模式估计,包括计算机视觉和自然语言处理。
translated by 谷歌翻译
The existence of label noise imposes significant challenges (e.g., poor generalization) on the training process of deep neural networks (DNN). As a remedy, this paper introduces a permutation layer learning approach termed PermLL to dynamically calibrate the training process of the DNN subject to instance-dependent and instance-independent label noise. The proposed method augments the architecture of a conventional DNN by an instance-dependent permutation layer. This layer is essentially a convex combination of permutation matrices that is dynamically calibrated for each sample. The primary objective of the permutation layer is to correct the loss of noisy samples mitigating the effect of label noise. We provide two variants of PermLL in this paper: one applies the permutation layer to the model's prediction, while the other applies it directly to the given noisy label. In addition, we provide a theoretical comparison between the two variants and show that previous methods can be seen as one of the variants. Finally, we validate PermLL experimentally and show that it achieves state-of-the-art performance on both real and synthetic datasets.
translated by 谷歌翻译
深神经网络(DNN)的记忆效应在最近的标签噪声学习方法中起关键作用。为了利用这种效果,已经广泛采用了基于模型预测的方法,该方法旨在利用DNN在学习的早期阶段以纠正嘈杂标签的效果。但是,我们观察到该模型在标签预测期间会犯错误,从而导致性能不令人满意。相比之下,在学习早期阶段产生的特征表现出更好的鲁棒性。受到这一观察的启发,在本文中,我们提出了一种基于特征嵌入的新方法,用于用标签噪声,称为标签NoissiLution(Lend)。要具体而言,我们首先根据当前的嵌入式特征计算一个相似性矩阵,以捕获训练数据的局部结构。然后,附近标记的数据(\ textIt {i.e。},标签噪声稀释)使错误标记的数据携带的嘈杂的监督信号淹没了,其有效性是由特征嵌入的固有鲁棒性保证的。最后,带有稀释标签的培训数据进一步用于培训强大的分类器。从经验上讲,我们通过将我们的贷款与几种代表性的强大学习方法进行比较,对合成和现实世界嘈杂数据集进行了广泛的实验。结果验证了我们贷款的有效性。
translated by 谷歌翻译
对标签噪声的学习是一个至关重要的话题,可以保证深度神经网络的可靠表现。最近的研究通常是指具有模型输出概率和损失值的动态噪声建模,然后分离清洁和嘈杂的样本。这些方法取得了显着的成功。但是,与樱桃挑选的数据不同,现有方法在面对不平衡数据集时通常无法表现良好,这是现实世界中常见的情况。我们彻底研究了这一现象,并指出了两个主要问题,这些问题阻碍了性能,即\ emph {类间损耗分布差异}和\ emph {由于不确定性而引起的误导性预测}。第一个问题是现有方法通常执行类不足的噪声建模。然而,损失分布显示在类失衡下的类别之间存在显着差异,并且类不足的噪声建模很容易与少数族裔类别中的嘈杂样本和样本混淆。第二个问题是指该模型可能会因认知不确定性和不确定性而导致的误导性预测,因此仅依靠输出概率的现有方法可能无法区分自信的样本。受我们的观察启发,我们提出了一个不确定性的标签校正框架〜(ULC)来处理不平衡数据集上的标签噪声。首先,我们执行认识不确定性的班级特异性噪声建模,以识别可信赖的干净样本并精炼/丢弃高度自信的真实/损坏的标签。然后,我们在随后的学习过程中介绍了不确定性,以防止标签噪声建模过程中的噪声积累。我们对几个合成和现实世界数据集进行实验。结果证明了提出的方法的有效性,尤其是在数据集中。
translated by 谷歌翻译
Label noise is a significant obstacle in deep learning model training. It can have a considerable impact on the performance of image classification models, particularly deep neural networks, which are especially susceptible because they have a strong propensity to memorise noisy labels. In this paper, we have examined the fundamental concept underlying related label noise approaches. A transition matrix estimator has been created, and its effectiveness against the actual transition matrix has been demonstrated. In addition, we examined the label noise robustness of two convolutional neural network classifiers with LeNet and AlexNet designs. The two FashionMINIST datasets have revealed the robustness of both models. We are not efficiently able to demonstrate the influence of the transition matrix noise correction on robustness enhancements due to our inability to correctly tune the complex convolutional neural network model due to time and computing resource constraints. There is a need for additional effort to fine-tune the neural network model and explore the precision of the estimated transition model in future research.
translated by 谷歌翻译
标签平滑(LS)是一种出现的学习范式,它使用硬训练标签和均匀分布的软标签的正加权平均值。结果表明,LS是带有硬标签的训练数据的常规器,因此改善了模型的概括。后来,据报道,LS甚至有助于用嘈杂的标签学习时改善鲁棒性。但是,我们观察到,当我们以高标签噪声状态运行时,LS的优势就会消失。从直觉上讲,这是由于$ \ mathbb {p}的熵增加(\ text {noisy label} | x)$当噪声速率很高时,在这种情况下,进一步应用LS会倾向于“超平滑”估计后部。我们开始发现,文献中的几种学习与噪声标签的解决方案相反,与负面/不标签平滑(NLS)更紧密地关联,它们与LS相反,并将其定义为使用负重量来结合硬和软标签呢我们在使用嘈杂标签学习时对LS和NLS的性质提供理解。在其他已建立的属性中,我们从理论上表明,当标签噪声速率高时,NLS被认为更有益。我们在多个基准测试中提供了广泛的实验结果,以支持我们的发现。代码可在https://github.com/ucsc-real/negative-label-smooth上公开获取。
translated by 谷歌翻译
Modeling noise transition matrix is a kind of promising method for learning with label noise. Based on the estimated noise transition matrix and the noisy posterior probabilities, the clean posterior probabilities, which are jointly called Label Distribution (LD) in this paper, can be calculated as the supervision. To reliably estimate the noise transition matrix, some methods assume that anchor points are available during training. Nonetheless, if anchor points are invalid, the noise transition matrix might be poorly learned, resulting in poor performance. Consequently, other methods treat reliable data points, extracted from training data, as pseudo anchor points. However, from a statistical point of view, the noise transition matrix can be inferred from data with noisy labels under the clean-label-domination assumption. Therefore, we aim to estimate the noise transition matrix without (pseudo) anchor points. There is evidence showing that samples are more likely to be mislabeled as other similar class labels, which means the mislabeling probability is highly correlated with the inter-class correlation. Inspired by this observation, we propose an instance-specific Label Distribution Regularization (LDR), in which the instance-specific LD is estimated as the supervision, to prevent DCNNs from memorizing noisy labels. Specifically, we estimate the noisy posterior under the supervision of noisy labels, and approximate the batch-level noise transition matrix by estimating the inter-class correlation matrix with neither anchor points nor pseudo anchor points. Experimental results on two synthetic noisy datasets and two real-world noisy datasets demonstrate that our LDR outperforms existing methods.
translated by 谷歌翻译
Despite being robust to small amounts of label noise, convolutional neural networks trained with stochastic gradient methods have been shown to easily fit random labels. When there are a mixture of correct and mislabelled targets, networks tend to fit the former before the latter. This suggests using a suitable two-component mixture model as an unsupervised generative model of sample loss values during training to allow online estimation of the probability that a sample is mislabelled. Specifically, we propose a beta mixture to estimate this probability and correct the loss by relying on the network prediction (the so-called bootstrapping loss). We further adapt mixup augmentation to drive our approach a step further. Experiments on CIFAR-10/100 and TinyImageNet demonstrate a robustness to label noise that substantially outperforms recent state-of-the-art. Source code is available at https://git.io/fjsvE.
translated by 谷歌翻译
Partial label learning (PLL) is a typical weakly supervised learning, where each sample is associated with a set of candidate labels. The basic assumption of PLL is that the ground-truth label must reside in the candidate set. However, this assumption may not be satisfied due to the unprofessional judgment of the annotators, thus limiting the practical application of PLL. In this paper, we relax this assumption and focus on a more general problem, noisy PLL, where the ground-truth label may not exist in the candidate set. To address this challenging problem, we further propose a novel framework called "Automatic Refinement Network (ARNet)". Our method consists of multiple rounds. In each round, we purify the noisy samples through two key modules, i.e., noisy sample detection and label correction. To guarantee the performance of these modules, we start with warm-up training and automatically select the appropriate correction epoch. Meanwhile, we exploit data augmentation to further reduce prediction errors in ARNet. Through theoretical analysis, we prove that our method is able to reduce the noise level of the dataset and eventually approximate the Bayes optimal classifier. To verify the effectiveness of ARNet, we conduct experiments on multiple benchmark datasets. Experimental results demonstrate that our ARNet is superior to existing state-of-the-art approaches in noisy PLL. Our code will be made public soon.
translated by 谷歌翻译
Deep neural networks are known to be annotation-hungry. Numerous efforts have been devoted to reducing the annotation cost when learning with deep networks. Two prominent directions include learning with noisy labels and semi-supervised learning by exploiting unlabeled data. In this work, we propose DivideMix, a novel framework for learning with noisy labels by leveraging semi-supervised learning techniques. In particular, DivideMix models the per-sample loss distribution with a mixture model to dynamically divide the training data into a labeled set with clean samples and an unlabeled set with noisy samples, and trains the model on both the labeled and unlabeled data in a semi-supervised manner. To avoid confirmation bias, we simultaneously train two diverged networks where each network uses the dataset division from the other network. During the semi-supervised training phase, we improve the MixMatch strategy by performing label co-refinement and label co-guessing on labeled and unlabeled samples, respectively. Experiments on multiple benchmark datasets demonstrate substantial improvements over state-of-the-art methods. Code is available at https://github.com/LiJunnan1992/DivideMix.
translated by 谷歌翻译
Semi-supervised learning based methods are current SOTA solutions to the noisy-label learning problem, which rely on learning an unsupervised label cleaner first to divide the training samples into a labeled set for clean data and an unlabeled set for noise data. Typically, the cleaner is obtained via fitting a mixture model to the distribution of per-sample training losses. However, the modeling procedure is \emph{class agnostic} and assumes the loss distributions of clean and noise samples are the same across different classes. Unfortunately, in practice, such an assumption does not always hold due to the varying learning difficulty of different classes, thus leading to sub-optimal label noise partition criteria. In this work, we reveal this long-ignored problem and propose a simple yet effective solution, named \textbf{C}lass \textbf{P}rototype-based label noise \textbf{C}leaner (\textbf{CPC}). Unlike previous works treating all the classes equally, CPC fully considers loss distribution heterogeneity and applies class-aware modulation to partition the clean and noise data. CPC takes advantage of loss distribution modeling and intra-class consistency regularization in feature space simultaneously and thus can better distinguish clean and noise labels. We theoretically justify the effectiveness of our method by explaining it from the Expectation-Maximization (EM) framework. Extensive experiments are conducted on the noisy-label benchmarks CIFAR-10, CIFAR-100, Clothing1M and WebVision. The results show that CPC consistently brings about performance improvement across all benchmarks. Codes and pre-trained models will be released at \url{https://github.com/hjjpku/CPC.git}.
translated by 谷歌翻译
已证明深度神经网络容易受到对抗噪声的影响,从而促进了针对对抗攻击的防御。受到对抗噪声包含良好的特征的动机,并且对抗数据和自然数据之间的关系可以帮助推断自然数据并做出可靠的预测,在本文中,我们研究通过学习对抗性标签之间的过渡关系来建模对抗性噪声(即用于生成对抗数据的翻转标签)和天然标签(即自然数据的地面真实标签)。具体而言,我们引入了一个依赖实例的过渡矩阵来关联对抗标签和天然标签,可以将其无缝嵌入目标模型(使我们能够建模更强的自适应对手噪声)。经验评估表明,我们的方法可以有效提高对抗性的准确性。
translated by 谷歌翻译
Learning with noisy labels is one of the hottest problems in weakly-supervised learning. Based on memorization effects of deep neural networks, training on small-loss instances becomes very promising for handling noisy labels. This fosters the state-of-the-art approach "Co-teaching" that cross-trains two deep neural networks using the small-loss trick. However, with the increase of epochs, two networks converge to a consensus and Co-teaching reduces to the self-training MentorNet. To tackle this issue, we propose a robust learning paradigm called Co-teaching+, which bridges the "Update by Disagreement" strategy with the original Co-teaching. First, two networks feed forward and predict all data, but keep prediction disagreement data only. Then, among such disagreement data, each network selects its small-loss data, but back propagates the small-loss data from its peer network and updates its own parameters. Empirical results on benchmark datasets demonstrate that Co-teaching+ is much superior to many state-of-theart methods in the robustness of trained models.
translated by 谷歌翻译
Learning with noisy labels is a vital topic for practical deep learning as models should be robust to noisy open-world datasets in the wild. The state-of-the-art noisy label learning approach JoCoR fails when faced with a large ratio of noisy labels. Moreover, selecting small-loss samples can also cause error accumulation as once the noisy samples are mistakenly selected as small-loss samples, they are more likely to be selected again. In this paper, we try to deal with error accumulation in noisy label learning from both model and data perspectives. We introduce mean point ensemble to utilize a more robust loss function and more information from unselected samples to reduce error accumulation from the model perspective. Furthermore, as the flip images have the same semantic meaning as the original images, we select small-loss samples according to the loss values of flip images instead of the original ones to reduce error accumulation from the data perspective. Extensive experiments on CIFAR-10, CIFAR-100, and large-scale Clothing1M show that our method outperforms state-of-the-art noisy label learning methods with different levels of label noise. Our method can also be seamlessly combined with other noisy label learning methods to further improve their performance and generalize well to other tasks. The code is available in https://github.com/zyh-uaiaaaa/MDA-noisy-label-learning.
translated by 谷歌翻译