The performance of the Deep Learning (DL) models depends on the quality of labels. In some areas, the involvement of human annotators may lead to noise in the data. When these corrupted labels are blindly regarded as the ground truth (GT), DL models suffer from performance deficiency. This paper presents a method that aims to learn a confident model in the presence of noisy labels. This is done in conjunction with estimating the uncertainty of multiple annotators. We robustly estimate the predictions given only the noisy labels by adding entropy or information-based regularizer to the classifier network. We conduct our experiments on a noisy version of MNIST, CIFAR-10, and FMNIST datasets. Our empirical results demonstrate the robustness of our method as it outperforms or performs comparably to other state-of-the-art (SOTA) methods. In addition, we evaluated the proposed method on the curated dataset, where the noise type and level of various annotators depend on the input image style. We show that our approach performs well and is adept at learning annotators' confusion. Moreover, we demonstrate how our model is more confident in predicting GT than other baselines. Finally, we assess our approach for segmentation problem and showcase its effectiveness with experiments.
translated by 谷歌翻译
Deep Learning with noisy labels is a practically challenging problem in weakly supervised learning. The stateof-the-art approaches "Decoupling" and "Co-teaching+" claim that the "disagreement" strategy is crucial for alleviating the problem of learning with noisy labels. In this paper, we start from a different perspective and propose a robust learning paradigm called JoCoR, which aims to reduce the diversity of two networks during training. Specifically, we first use two networks to make predictions on the same mini-batch data and calculate a joint loss with Co-Regularization for each training example. Then we select small-loss examples to update the parameters of both two networks simultaneously. Trained by the joint loss, these two networks would be more and more similar due to the effect of Co-Regularization. Extensive experimental results on corrupted data from benchmark datasets including MNIST, CIFAR-10, CIFAR-100 and Clothing1M demonstrate that JoCoR is superior to many state-of-the-art approaches for learning with noisy labels.
translated by 谷歌翻译
强大的学习方法旨在从嘈杂和损坏的训练数据中学习清洁的目标分布,其中经常假设特定的损坏模式是先验的。我们所提出的方法不仅可以成功地从脏数据集中学习清洁目标分布,还可以估计潜在的噪声模式。为此,我们利用了专家模型,可以区分两种不同类型的预测性不确定性,炼术和认识性的不确定性。我们表明,估计不确定性的能力在阐明腐败模式时发挥着重要作用,因为这两个目标紧密交织在一起。我们还提出了一种用于评估腐败模式估计性能的新型验证方案。我们提出的方法通过许多域来广泛地评估了鲁棒性和腐败模式估计,包括计算机视觉和自然语言处理。
translated by 谷歌翻译
深度学习在大量大数据的帮助下取得了众多域中的显着成功。然而,由于许多真实情景中缺乏高质量标签,数据标签的质量是一个问题。由于嘈杂的标签严重降低了深度神经网络的泛化表现,从嘈杂的标签(强大的培训)学习是在现代深度学习应用中成为一项重要任务。在本调查中,我们首先从监督的学习角度描述了与标签噪声学习的问题。接下来,我们提供62项最先进的培训方法的全面审查,所有这些培训方法都按照其方法论差异分为五个群体,其次是用于评估其优越性的六种性质的系统比较。随后,我们对噪声速率估计进行深入分析,并总结了通常使用的评估方法,包括公共噪声数据集和评估度量。最后,我们提出了几个有前途的研究方向,可以作为未来研究的指导。所有内容将在https://github.com/songhwanjun/awesome-noisy-labels提供。
translated by 谷歌翻译
Deep learning with noisy labels is practically challenging, as the capacity of deep models is so high that they can totally memorize these noisy labels sooner or later during training. Nonetheless, recent studies on the memorization effects of deep neural networks show that they would first memorize training data of clean labels and then those of noisy labels. Therefore in this paper, we propose a new deep learning paradigm called "Co-teaching" for combating with noisy labels. Namely, we train two deep neural networks simultaneously, and let them teach each other given every mini-batch: firstly, each network feeds forward all data and selects some data of possibly clean labels; secondly, two networks communicate with each other what data in this mini-batch should be used for training; finally, each network back propagates the data selected by its peer network and updates itself. Empirical results on noisy versions of MNIST, CIFAR-10 and CIFAR-100 demonstrate that Co-teaching is much superior to the state-of-the-art methods in the robustness of trained deep models. * The first two authors (Bo Han and Quanming Yao) made equal contributions. The implementation is available at https://github.com/bhanML/Co-teaching.32nd Conference on Neural Information Processing Systems (NIPS 2018),
translated by 谷歌翻译
Learning with noisy labels is one of the hottest problems in weakly-supervised learning. Based on memorization effects of deep neural networks, training on small-loss instances becomes very promising for handling noisy labels. This fosters the state-of-the-art approach "Co-teaching" that cross-trains two deep neural networks using the small-loss trick. However, with the increase of epochs, two networks converge to a consensus and Co-teaching reduces to the self-training MentorNet. To tackle this issue, we propose a robust learning paradigm called Co-teaching+, which bridges the "Update by Disagreement" strategy with the original Co-teaching. First, two networks feed forward and predict all data, but keep prediction disagreement data only. Then, among such disagreement data, each network selects its small-loss data, but back propagates the small-loss data from its peer network and updates its own parameters. Empirical results on benchmark datasets demonstrate that Co-teaching+ is much superior to many state-of-theart methods in the robustness of trained models.
translated by 谷歌翻译
深神经网络(DNN)的记忆效果在许多最先进的标签噪声学习方法中起着枢轴作用。为了利用这一财产,通常采用早期停止训练早期优化的伎俩。目前的方法通常通过考虑整个DNN来决定早期停止点。然而,DNN可以被认为是一系列层的组成,并且发现DNN中的后一个层对标签噪声更敏感,而其前同行是非常稳健的。因此,选择整个网络的停止点可以使不同的DNN层对抗彼此影响,从而降低最终性能。在本文中,我们建议将DNN分离为不同的部位,逐步培训它们以解决这个问题。而不是早期停止,它一次列举一个整体DNN,我们最初通过用相对大量的时期优化DNN来训练前DNN层。在培训期间,我们通过使用较少数量的时期使用较少的地层来逐步培训后者DNN层,以抵消嘈杂标签的影响。我们将所提出的方法术语作为渐进式早期停止(PES)。尽管其简单性,与早期停止相比,PES可以帮助获得更有前景和稳定的结果。此外,通过将PE与现有的嘈杂标签培训相结合,我们在图像分类基准上实现了最先进的性能。
translated by 谷歌翻译
深度学习在许多领域取得了许多显着的成就,但数据集中有嘈杂的标签。使用嘈杂的标签方法共同教学和共同教学的最先进的学习+通过双网络之间的相互信息面对嘈杂的标签。但是,双网络始终倾向于收敛,这会削弱双网机制以抵抗嘈杂标签。在本文中,我们以端到端的方式提出了一个名为MLC的耐噪声框架。它通过不同的正则化来调整双网络,以确保机制的有效性。此外,我们根据双网络之间的协议纠正标签分布。提出的方法可以利用嘈杂的数据来提高网络的准确性,概括和鲁棒性。我们在模拟嘈杂的数据集MNIST,CIFAR-10和现实世界嘈杂的数据集服装上测试了提出的方法。1M。实验结果表明,我们的方法优于先前的最新方法。此外,我们的方法是无网络的,因此它适用于许多任务。我们的代码可以在https://github.com/jiarunliu/mlc上找到。
translated by 谷歌翻译
数据在于现代深度学习的核心。监督学习的令人印象深刻的表现建立在大量准确标记的数据基础上。但是,在某些现实世界中,准确的标签可能不可行。取而代之的是,为每个数据示例提供了多个注释者提供多个嘈杂标签(而不是一个精确的标签)。在这样的嘈杂培训数据集上学习分类器是一项具有挑战性的任务。以前的方法通常假设所有数据示例共享与注释误差相关的相同参数集,而我们证明标签错误学习应既是注释者,又是数据示例依赖性。在这一观察结果的激励下,我们提出了一种新颖的学习算法。与MNIST,CIFAR-100和Imagenet-100的几种最新基线方法相比,该方法显示出优势。我们的代码可在以下网址获得:https://github.com/zhengqigao/learning-from-multiple-annotator-noisy-labels。
translated by 谷歌翻译
可以将监督学习视为将相关信息从输入数据中提取到特征表示形式。当监督嘈杂时,此过程变得困难,因为蒸馏信息可能无关紧要。实际上,最近的研究表明,网络可以轻松地过度贴合所有标签,包括损坏的标签,因此几乎无法概括以清洁数据集。在本文中,我们专注于使用嘈杂的标签学习的问题,并将压缩归纳偏置引入网络体系结构以减轻这种过度的问题。更确切地说,我们重新审视一个名为辍学的经典正则化及其变体嵌套辍学。辍学可以作为其功能删除机制的压缩约束,而嵌套辍学进一步学习有序的特征表示W.R.T.特征重要性。此外,具有压缩正则化的训练有素的模型与共同教学相结合,以提高性能。从理论上讲,我们在压缩正则化下对目标函数进行偏置变化分解。我们分析了单个模型和共同教学。该分解提供了三个见解:(i)表明过度合适确实是使用嘈杂标签学习的问题; (ii)通过信息瓶颈配方,它解释了为什么提出的特征压缩有助于对抗标签噪声; (iii)它通过将压缩正规化纳入共同教学而带来的性能提升提供了解释。实验表明,我们的简单方法比具有现实世界标签噪声(包括服装1M和Animal-10N)的基准测试标准的最先进方法具有可比性甚至更好的性能。我们的实施可在https://yingyichen-cyy.github.io/compressfatsfeatnoisylabels/上获得。
translated by 谷歌翻译
Deep neural networks are known to be annotation-hungry. Numerous efforts have been devoted to reducing the annotation cost when learning with deep networks. Two prominent directions include learning with noisy labels and semi-supervised learning by exploiting unlabeled data. In this work, we propose DivideMix, a novel framework for learning with noisy labels by leveraging semi-supervised learning techniques. In particular, DivideMix models the per-sample loss distribution with a mixture model to dynamically divide the training data into a labeled set with clean samples and an unlabeled set with noisy samples, and trains the model on both the labeled and unlabeled data in a semi-supervised manner. To avoid confirmation bias, we simultaneously train two diverged networks where each network uses the dataset division from the other network. During the semi-supervised training phase, we improve the MixMatch strategy by performing label co-refinement and label co-guessing on labeled and unlabeled samples, respectively. Experiments on multiple benchmark datasets demonstrate substantial improvements over state-of-the-art methods. Code is available at https://github.com/LiJunnan1992/DivideMix.
translated by 谷歌翻译
The existence of label noise imposes significant challenges (e.g., poor generalization) on the training process of deep neural networks (DNN). As a remedy, this paper introduces a permutation layer learning approach termed PermLL to dynamically calibrate the training process of the DNN subject to instance-dependent and instance-independent label noise. The proposed method augments the architecture of a conventional DNN by an instance-dependent permutation layer. This layer is essentially a convex combination of permutation matrices that is dynamically calibrated for each sample. The primary objective of the permutation layer is to correct the loss of noisy samples mitigating the effect of label noise. We provide two variants of PermLL in this paper: one applies the permutation layer to the model's prediction, while the other applies it directly to the given noisy label. In addition, we provide a theoretical comparison between the two variants and show that previous methods can be seen as one of the variants. Finally, we validate PermLL experimentally and show that it achieves state-of-the-art performance on both real and synthetic datasets.
translated by 谷歌翻译
Deep Neural Networks (DNNs) have been shown to be susceptible to memorization or overfitting in the presence of noisily-labelled data. For the problem of robust learning under such noisy data, several algorithms have been proposed. A prominent class of algorithms rely on sample selection strategies wherein, essentially, a fraction of samples with loss values below a certain threshold are selected for training. These algorithms are sensitive to such thresholds, and it is difficult to fix or learn these thresholds. Often, these algorithms also require information such as label noise rates which are typically unavailable in practice. In this paper, we propose an adaptive sample selection strategy that relies only on batch statistics of a given mini-batch to provide robustness against label noise. The algorithm does not have any additional hyperparameters for sample selection, does not need any information on noise rates and does not need access to separate data with clean labels. We empirically demonstrate the effectiveness of our algorithm on benchmark datasets.
translated by 谷歌翻译
Training accurate deep neural networks (DNNs) in the presence of noisy labels is an important and challenging task. Though a number of approaches have been proposed for learning with noisy labels, many open issues remain. In this paper, we show that DNN learning with Cross Entropy (CE) exhibits overfitting to noisy labels on some classes ("easy" classes), but more surprisingly, it also suffers from significant under learning on some other classes ("hard" classes). Intuitively, CE requires an extra term to facilitate learning of hard classes, and more importantly, this term should be noise tolerant, so as to avoid overfitting to noisy labels. Inspired by the symmetric KL-divergence, we propose the approach of Symmetric cross entropy Learning (SL), boosting CE symmetrically with a noise robust counterpart Reverse Cross Entropy (RCE). Our proposed SL approach simultaneously addresses both the under learning and overfitting problem of CE in the presence of noisy labels. We provide a theoretical analysis of SL and also empirically show, on a range of benchmark and real-world datasets, that SL outperforms state-of-the-art methods. We also show that SL can be easily incorporated into existing methods in order to further enhance their performance.
translated by 谷歌翻译
最近关于使用嘈杂标签的学习的研究通过利用小型干净数据集来显示出色的性能。特别是,基于模型不可知的元学习的标签校正方法进一步提高了性能,通过纠正了嘈杂的标签。但是,标签错误矫予没有保障措施,导致不可避免的性能下降。此外,每个训练步骤都需要至少三个背部传播,显着减慢训练速度。为了缓解这些问题,我们提出了一种强大而有效的方法,可以在飞行中学习标签转换矩阵。采用转换矩阵使分类器对所有校正样本持怀疑态度,这减轻了错误的错误问题。我们还介绍了一个双头架构,以便在单个反向传播中有效地估计标签转换矩阵,使得估计的矩阵紧密地遵循由标签校正引起的移位噪声分布。广泛的实验表明,我们的方法在训练效率方面表现出比现有方法相当或更好的准确性。
translated by 谷歌翻译
嘈杂的标签损坏了深网络的性能。为了稳健的学习,突出的两级管道在消除可能的不正确标签和半监督培训之间交替。然而,丢弃观察到的标签的部分可能导致信息丢失,尤其是当腐败不是完全随机的时,例如依赖类或实例依赖。此外,从代表性两级方法Dividemix的训练动态,我们确定了确认偏置的统治:伪标签未能纠正相当大量的嘈杂标签,因此累积误差。为了充分利用观察到的标签和减轻错误的校正,我们提出了强大的标签翻新(鲁棒LR)-a新的混合方法,该方法集成了伪标签和置信度估计技术来翻新嘈杂的标签。我们表明我们的方法成功减轻了标签噪声和确认偏差的损害。结果,它跨数据集和噪声类型实现最先进的结果。例如,强大的LR在真实世界嘈杂的数据集网络VIVION上以前最好的绝对高度提高了4.5%的绝对顶级精度改进。
translated by 谷歌翻译
标签噪声显着降低了应用中深度模型的泛化能力。有效的策略和方法,\ Texit {例如}重新加权或损失校正,旨在在训练神经网络时缓解标签噪声的负面影响。这些现有的工作通常依赖于预指定的架构并手动调整附加的超参数。在本文中,我们提出了翘曲的概率推断(WARPI),以便在元学习情景中自适应地整理分类网络的培训程序。与确定性模型相比,WARPI通过学习摊销元网络来制定为分层概率模型,这可以解决样本模糊性,因此对严格的标签噪声更加坚固。与直接生成损耗的重量值的现有近似加权功能不同,我们的元网络被学习以估计从登录和标签的输入来估计整流向量,这具有利用躺在它们中的足够信息的能力。这提供了纠正分类网络的学习过程的有效方法,证明了泛化能力的显着提高。此外,可以将整流载体建模为潜在变量并学习元网络,可以无缝地集成到分类网络的SGD优化中。我们在嘈杂的标签上评估了四个强大学习基准的Warpi,并在变体噪声类型下实现了新的最先进的。广泛的研究和分析还展示了我们模型的有效性。
translated by 谷歌翻译
Label noise is a significant obstacle in deep learning model training. It can have a considerable impact on the performance of image classification models, particularly deep neural networks, which are especially susceptible because they have a strong propensity to memorise noisy labels. In this paper, we have examined the fundamental concept underlying related label noise approaches. A transition matrix estimator has been created, and its effectiveness against the actual transition matrix has been demonstrated. In addition, we examined the label noise robustness of two convolutional neural network classifiers with LeNet and AlexNet designs. The two FashionMINIST datasets have revealed the robustness of both models. We are not efficiently able to demonstrate the influence of the transition matrix noise correction on robustness enhancements due to our inability to correctly tune the complex convolutional neural network model due to time and computing resource constraints. There is a need for additional effort to fine-tune the neural network model and explore the precision of the estimated transition model in future research.
translated by 谷歌翻译
带有嘈杂标签的训练深神经网络(DNN)实际上是具有挑战性的,因为不准确的标签严重降低了DNN的概括能力。以前的努力倾向于通过识别带有粗糙的小损失标准来减轻嘈杂标签的干扰的嘈杂数据来处理统一的denoising流中的零件或完整数据,而忽略了嘈杂样本的困难是不同的,因此是刚性和统一的。数据选择管道无法很好地解决此问题。在本文中,我们首先提出了一种称为CREMA的粗到精细的稳健学习方法,以分裂和串扰的方式处理嘈杂的数据。在粗糙水平中,干净和嘈杂的集合首先从统计意义上就可信度分开。由于实际上不可能正确对所有嘈杂样本进行分类,因此我们通过对每个样本的可信度进行建模来进一步处理它们。具体而言,对于清洁集,我们故意设计了一种基于内存的调制方案,以动态调整每个样本在训练过程中的历史可信度顺序方面的贡献,从而减轻了错误地分组为清洁集中的嘈杂样本的效果。同时,对于分类为嘈杂集的样品,提出了选择性标签更新策略,以纠正嘈杂的标签,同时减轻校正错误的问题。广泛的实验是基于不同方式的基准,包括图像分类(CIFAR,Clothing1M等)和文本识别(IMDB),具有合成或自然语义噪声,表明CREMA的优势和普遍性。
translated by 谷歌翻译
作为标签噪声,最受欢迎的分布变化之一,严重降低了深度神经网络的概括性能,具有嘈杂标签的强大训练正在成为现代深度学习中的重要任务。在本文中,我们提出了我们的框架,在子分类器(ALASCA)上创造了自适应标签平滑,该框架提供了具有理论保证和可忽略的其他计算的可靠特征提取器。首先,我们得出标签平滑(LS)会产生隐式Lipschitz正则化(LR)。此外,基于这些推导,我们将自适应LS(ALS)应用于子分类器架构上,以在中间层上的自适应LR的实际应用。我们对ALASCA进行了广泛的实验,并将其与以前的几个数据集上的噪声燃烧方法相结合,并显示我们的框架始终优于相应的基线。
translated by 谷歌翻译