给定标签噪声的数据(即数据不正确),深神经网络将逐渐记住标签噪声和损害模型性能。为了减轻此问题,提出了课程学习,以通过在有意义的(例如,易于硬)序列中订购培训样本来提高模型性能和概括。先前的工作将错误的样本作为通用的硬性样本,而无需区分硬样品(即正确数据中的硬样品)和不正确的样本。确实,模型应该从硬样本中学习,以促进概括而不是过度拟合错误。在本文中,我们通过在现有的任务损失之外附加新颖的损失函数Indimloss来解决此问题。它的主要影响是在训练的早期阶段自动,稳定地估计简易样品和困难样本(包括硬和不正确的样品)的重要性,以改善模型性能。然后,在以下阶段中,歧视专门用于区分硬性和不正确样本以改善模型的概括。这种培训策略可以以自我监督的方式动态制定,从而有效地模仿课程学习的主要原则。关于图像分类,图像回归,文本序列回归和事件关系推理的实验证明了我们方法的多功能性和有效性,尤其是在存在多样化的噪声水平的情况下。
translated by 谷歌翻译
带有嘈杂标签的训练深神经网络(DNN)实际上是具有挑战性的,因为不准确的标签严重降低了DNN的概括能力。以前的努力倾向于通过识别带有粗糙的小损失标准来减轻嘈杂标签的干扰的嘈杂数据来处理统一的denoising流中的零件或完整数据,而忽略了嘈杂样本的困难是不同的,因此是刚性和统一的。数据选择管道无法很好地解决此问题。在本文中,我们首先提出了一种称为CREMA的粗到精细的稳健学习方法,以分裂和串扰的方式处理嘈杂的数据。在粗糙水平中,干净和嘈杂的集合首先从统计意义上就可信度分开。由于实际上不可能正确对所有嘈杂样本进行分类,因此我们通过对每个样本的可信度进行建模来进一步处理它们。具体而言,对于清洁集,我们故意设计了一种基于内存的调制方案,以动态调整每个样本在训练过程中的历史可信度顺序方面的贡献,从而减轻了错误地分组为清洁集中的嘈杂样本的效果。同时,对于分类为嘈杂集的样品,提出了选择性标签更新策略,以纠正嘈杂的标签,同时减轻校正错误的问题。广泛的实验是基于不同方式的基准,包括图像分类(CIFAR,Clothing1M等)和文本识别(IMDB),具有合成或自然语义噪声,表明CREMA的优势和普遍性。
translated by 谷歌翻译
样品选择是减轻标签噪声在鲁棒学习中的影响的有效策略。典型的策略通常应用小损失标准来识别干净的样品。但是,这些样本位于决策边界周围,通常会与嘈杂的例子纠缠在一起,这将被此标准丢弃,从而导致概括性能的严重退化。在本文中,我们提出了一种新颖的选择策略,\ textbf {s} elf- \ textbf {f} il \ textbf {t} ering(sft),它利用历史预测中嘈杂的示例的波动来过滤它们,可以过滤它们,这可以是可以过滤的。避免在边界示例中的小损失标准的选择偏置。具体来说,我们介绍了一个存储库模块,该模块存储了每个示例的历史预测,并动态更新以支持随后的学习迭代的选择。此外,为了减少SFT样本选择偏置的累积误差,我们设计了一个正规化术语来惩罚自信的输出分布。通过通过此术语增加错误分类类别的重量,损失函数在轻度条件下标记噪声是可靠的。我们对具有变化噪声类型的三个基准测试并实现了新的最先进的实验。消融研究和进一步分析验证了SFT在健壮学习中选择样本的优点。
translated by 谷歌翻译
Training accurate deep neural networks (DNNs) in the presence of noisy labels is an important and challenging task. Though a number of approaches have been proposed for learning with noisy labels, many open issues remain. In this paper, we show that DNN learning with Cross Entropy (CE) exhibits overfitting to noisy labels on some classes ("easy" classes), but more surprisingly, it also suffers from significant under learning on some other classes ("hard" classes). Intuitively, CE requires an extra term to facilitate learning of hard classes, and more importantly, this term should be noise tolerant, so as to avoid overfitting to noisy labels. Inspired by the symmetric KL-divergence, we propose the approach of Symmetric cross entropy Learning (SL), boosting CE symmetrically with a noise robust counterpart Reverse Cross Entropy (RCE). Our proposed SL approach simultaneously addresses both the under learning and overfitting problem of CE in the presence of noisy labels. We provide a theoretical analysis of SL and also empirically show, on a range of benchmark and real-world datasets, that SL outperforms state-of-the-art methods. We also show that SL can be easily incorporated into existing methods in order to further enhance their performance.
translated by 谷歌翻译
深度学习在大量大数据的帮助下取得了众多域中的显着成功。然而,由于许多真实情景中缺乏高质量标签,数据标签的质量是一个问题。由于嘈杂的标签严重降低了深度神经网络的泛化表现,从嘈杂的标签(强大的培训)学习是在现代深度学习应用中成为一项重要任务。在本调查中,我们首先从监督的学习角度描述了与标签噪声学习的问题。接下来,我们提供62项最先进的培训方法的全面审查,所有这些培训方法都按照其方法论差异分为五个群体,其次是用于评估其优越性的六种性质的系统比较。随后,我们对噪声速率估计进行深入分析,并总结了通常使用的评估方法,包括公共噪声数据集和评估度量。最后,我们提出了几个有前途的研究方向,可以作为未来研究的指导。所有内容将在https://github.com/songhwanjun/awesome-noisy-labels提供。
translated by 谷歌翻译
作为标签噪声,最受欢迎的分布变化之一,严重降低了深度神经网络的概括性能,具有嘈杂标签的强大训练正在成为现代深度学习中的重要任务。在本文中,我们提出了我们的框架,在子分类器(ALASCA)上创造了自适应标签平滑,该框架提供了具有理论保证和可忽略的其他计算的可靠特征提取器。首先,我们得出标签平滑(LS)会产生隐式Lipschitz正则化(LR)。此外,基于这些推导,我们将自适应LS(ALS)应用于子分类器架构上,以在中间层上的自适应LR的实际应用。我们对ALASCA进行了广泛的实验,并将其与以前的几个数据集上的噪声燃烧方法相结合,并显示我们的框架始终优于相应的基线。
translated by 谷歌翻译
标签噪声显着降低了应用中深度模型的泛化能力。有效的策略和方法,\ Texit {例如}重新加权或损失校正,旨在在训练神经网络时缓解标签噪声的负面影响。这些现有的工作通常依赖于预指定的架构并手动调整附加的超参数。在本文中,我们提出了翘曲的概率推断(WARPI),以便在元学习情景中自适应地整理分类网络的培训程序。与确定性模型相比,WARPI通过学习摊销元网络来制定为分层概率模型,这可以解决样本模糊性,因此对严格的标签噪声更加坚固。与直接生成损耗的重量值的现有近似加权功能不同,我们的元网络被学习以估计从登录和标签的输入来估计整流向量,这具有利用躺在它们中的足够信息的能力。这提供了纠正分类网络的学习过程的有效方法,证明了泛化能力的显着提高。此外,可以将整流载体建模为潜在变量并学习元网络,可以无缝地集成到分类网络的SGD优化中。我们在嘈杂的标签上评估了四个强大学习基准的Warpi,并在变体噪声类型下实现了新的最先进的。广泛的研究和分析还展示了我们模型的有效性。
translated by 谷歌翻译
应付嘈杂标签的大多数现有方法通常假定类别分布良好,因此无法应对训练样本不平衡分布的实际情况的能力不足。为此,本文尽早努力通过长尾分配和标签噪声来解决图像分类任务。在这种情况下,现有的噪声学习方法无法正常工作,因为将噪声样本与干净的尾巴类别的样本区分开来是具有挑战性的。为了解决这个问题,我们提出了一个新的学习范式,基于对弱数据和强数据扩展的推论,以筛选嘈杂的样本,并引入休假散布的正则化,以消除公认的嘈杂样本的效果。此外,我们基于在线先验分布中纳入了一种新颖的预测惩罚,以避免对头等阶层的偏见。与现有的长尾分类方法相比,这种机制在实时捕获班级拟合度方面具有优越性。详尽的实验表明,所提出的方法优于解决噪声标签下长尾分类中分布不平衡问题的最先进算法。
translated by 谷歌翻译
Deep neural networks are known to be annotation-hungry. Numerous efforts have been devoted to reducing the annotation cost when learning with deep networks. Two prominent directions include learning with noisy labels and semi-supervised learning by exploiting unlabeled data. In this work, we propose DivideMix, a novel framework for learning with noisy labels by leveraging semi-supervised learning techniques. In particular, DivideMix models the per-sample loss distribution with a mixture model to dynamically divide the training data into a labeled set with clean samples and an unlabeled set with noisy samples, and trains the model on both the labeled and unlabeled data in a semi-supervised manner. To avoid confirmation bias, we simultaneously train two diverged networks where each network uses the dataset division from the other network. During the semi-supervised training phase, we improve the MixMatch strategy by performing label co-refinement and label co-guessing on labeled and unlabeled samples, respectively. Experiments on multiple benchmark datasets demonstrate substantial improvements over state-of-the-art methods. Code is available at https://github.com/LiJunnan1992/DivideMix.
translated by 谷歌翻译
Recent deep networks are capable of memorizing the entire data even when the labels are completely random. To overcome the overfitting on corrupted labels, we propose a novel technique of learning another neural network, called Men-torNet, to supervise the training of the base deep networks, namely, StudentNet. During training, MentorNet provides a curriculum (sample weighting scheme) for StudentNet to focus on the sample the label of which is probably correct. Unlike the existing curriculum that is usually predefined by human experts, MentorNet learns a data-driven curriculum dynamically with StudentNet. Experimental results demonstrate that our approach can significantly improve the generalization performance of deep networks trained on corrupted training data. Notably, to the best of our knowledge, we achieve the best-published result on We-bVision, a large benchmark containing 2.2 million images of real-world noisy labels. The code are at https://github.com/google/mentornet.
translated by 谷歌翻译
在监督的机器学习中,使用正确的标签对于确保高精度非常重要。不幸的是,大多数数据集都包含损坏的标签。在此类数据集上训练的机器学习模型不能很好地概括。因此,检测其标签错误可以显着提高其功效。我们提出了一个名为CTRL的新型框架(标签错误检测的聚类训练损失),以检测多级数据集中的标签错误。它基于模型以不同方式学习干净和嘈杂的标签的观察结果,以两个步骤检测标签错误。首先,我们使用嘈杂的训练数据集训练神经网络,并为每个样本获得损失曲线。然后,我们将聚类算法应用于训练损失,将样本分为两类:已标记和噪声标记。标签误差检测后,我们删除带有嘈杂标签的样品并重新训练该模型。我们的实验结果表明,在模拟噪声下,图像(CIFAR-10和CIFAR-100和CIFAR-100)和表格数据集上的最新误差检测准确性。我们还使用理论分析来提供有关CTRL表现如此出色的见解。
translated by 谷歌翻译
最近关于使用嘈杂标签的学习的研究通过利用小型干净数据集来显示出色的性能。特别是,基于模型不可知的元学习的标签校正方法进一步提高了性能,通过纠正了嘈杂的标签。但是,标签错误矫予没有保障措施,导致不可避免的性能下降。此外,每个训练步骤都需要至少三个背部传播,显着减慢训练速度。为了缓解这些问题,我们提出了一种强大而有效的方法,可以在飞行中学习标签转换矩阵。采用转换矩阵使分类器对所有校正样本持怀疑态度,这减轻了错误的错误问题。我们还介绍了一个双头架构,以便在单个反向传播中有效地估计标签转换矩阵,使得估计的矩阵紧密地遵循由标签校正引起的移位噪声分布。广泛的实验表明,我们的方法在训练效率方面表现出比现有方法相当或更好的准确性。
translated by 谷歌翻译
Convolutional Neural Networks (CNNs) have demonstrated superiority in learning patterns, but are sensitive to label noises and may overfit noisy labels during training. The early stopping strategy averts updating CNNs during the early training phase and is widely employed in the presence of noisy labels. Motivated by biological findings that the amplitude spectrum (AS) and phase spectrum (PS) in the frequency domain play different roles in the animal's vision system, we observe that PS, which captures more semantic information, can increase the robustness of DNNs to label noise, more so than AS can. We thus propose early stops at different times for AS and PS by disentangling the features of some layer(s) into AS and PS using Discrete Fourier Transform (DFT) during training. Our proposed Phase-AmplituDe DisentangLed Early Stopping (PADDLES) method is shown to be effective on both synthetic and real-world label-noise datasets. PADDLES outperforms other early stopping methods and obtains state-of-the-art performance.
translated by 谷歌翻译
使用嘈杂的标签学习是一场实际上有挑战性的弱势监督。在现有文献中,开放式噪声总是被认为是有毒的泛化,类似于封闭式噪音。在本文中,我们经验证明,开放式嘈杂标签可能是无毒的,甚至有利于对固有的嘈杂标签的鲁棒性。灵感来自观察,我们提出了一种简单而有效的正则化,通过将具有动态噪声标签(ODNL)引入培训的开放式样本。使用ODNL,神经网络的额外容量可以在很大程度上以不干扰来自清洁数据的学习模式的方式消耗。通过SGD噪声的镜头,我们表明我们的方法引起的噪音是随机方向,无偏向,这可能有助于模型收敛到最小的最小值,具有卓越的稳定性,并强制执行模型以产生保守预测-of-分配实例。具有各种类型噪声标签的基准数据集的广泛实验结果表明,所提出的方法不仅提高了许多现有的强大算法的性能,而且即使在标签噪声设置中也能实现分配异点检测任务的显着改进。
translated by 谷歌翻译
Deep Learning with noisy labels is a practically challenging problem in weakly supervised learning. The stateof-the-art approaches "Decoupling" and "Co-teaching+" claim that the "disagreement" strategy is crucial for alleviating the problem of learning with noisy labels. In this paper, we start from a different perspective and propose a robust learning paradigm called JoCoR, which aims to reduce the diversity of two networks during training. Specifically, we first use two networks to make predictions on the same mini-batch data and calculate a joint loss with Co-Regularization for each training example. Then we select small-loss examples to update the parameters of both two networks simultaneously. Trained by the joint loss, these two networks would be more and more similar due to the effect of Co-Regularization. Extensive experimental results on corrupted data from benchmark datasets including MNIST, CIFAR-10, CIFAR-100 and Clothing1M demonstrate that JoCoR is superior to many state-of-the-art approaches for learning with noisy labels.
translated by 谷歌翻译
我们提出了自适应培训 - 一种统一的培训算法,通过模型预测动态校准并增强训练过程,而不会产生额外的计算成本 - 以推进深度神经网络的监督和自我监督的学习。我们分析了培训数据的深网络培训动态,例如随机噪声和对抗例。我们的分析表明,模型预测能够在数据中放大有用的基础信息,即使在没有任何标签信息的情况下,这种现象也会发生,突出显示模型预测可能会产生培训过程:自适应培训改善了深网络的概括在噪音下,增强自我监督的代表学习。分析还阐明了解深度学习,例如,在经验风险最小化和最新的自我监督学习算法的折叠问题中对最近发现的双重现象的潜在解释。在CIFAR,STL和Imagenet数据集上的实验验证了我们在三种应用中的方法的有效性:用标签噪声,选择性分类和线性评估进行分类。为了促进未来的研究,该代码已在HTTPS://github.com/layneh/Self-Aveptive-训练中公开提供。
translated by 谷歌翻译
Deep Neural Networks (DNNs) have been shown to be susceptible to memorization or overfitting in the presence of noisily-labelled data. For the problem of robust learning under such noisy data, several algorithms have been proposed. A prominent class of algorithms rely on sample selection strategies wherein, essentially, a fraction of samples with loss values below a certain threshold are selected for training. These algorithms are sensitive to such thresholds, and it is difficult to fix or learn these thresholds. Often, these algorithms also require information such as label noise rates which are typically unavailable in practice. In this paper, we propose an adaptive sample selection strategy that relies only on batch statistics of a given mini-batch to provide robustness against label noise. The algorithm does not have any additional hyperparameters for sample selection, does not need any information on noise rates and does not need access to separate data with clean labels. We empirically demonstrate the effectiveness of our algorithm on benchmark datasets.
translated by 谷歌翻译
Annotating the dataset with high-quality labels is crucial for performance of deep network, but in real world scenarios, the labels are often contaminated by noise. To address this, some methods were proposed to automatically split clean and noisy labels, and learn a semi-supervised learner in a Learning with Noisy Labels (LNL) framework. However, they leverage a handcrafted module for clean-noisy label splitting, which induces a confirmation bias in the semi-supervised learning phase and limits the performance. In this paper, we for the first time present a learnable module for clean-noisy label splitting, dubbed SplitNet, and a novel LNL framework which complementarily trains the SplitNet and main network for the LNL task. We propose to use a dynamic threshold based on a split confidence by SplitNet to better optimize semi-supervised learner. To enhance SplitNet training, we also present a risk hedging method. Our proposed method performs at a state-of-the-art level especially in high noise ratio settings on various LNL benchmarks.
translated by 谷歌翻译
来自X射线图像的近端股骨骨折的足够分类对于治疗选择和患者的临床结果至关重要。我们依赖于常用的AO系统,该系统描述了将图像分类为类型和亚型的分层知识树根据裂缝的位置和复杂性。在本文中,我们提出了一种基于卷积神经网络(CNN)自动分类近端股骨骨折的近端骨折分类为3和7 AO类。如已知所知,CNNS需要具有可靠标签的大型和代表性数据集,这很难收集手头的应用。在本文中,我们设计了一个课程学习(CL)方法,在这种情况下通过基本的CNNS性能提高。我们的小说配方团结了三个课程策略:单独加权培训样本,重新排序培训集,以及数据采样子集。这些策略的核心是评分函数排名训练样本。我们定义了两种小说评分函数:一个来自域的特定于域的先前知识和原始的自我节奏的不确定性分数。我们对近端股骨射线照片的临床数据集进行实验。课程改善了近端股骨骨折分类,达到了经验丰富的创伤外科医生的性能。最佳课程方法根据现有知识重新排列培训集,从而达到15%的分类提高。使用公开可用的MNIST DataSet,我们进一步讨论并展示了我们统一的CL配方对三个受控和具有挑战性的数字识别方案的好处:具有有限的数据,在类别 - 不平衡下以及在标签噪声存在下。我们的工作代码可在:https://github.com/ameliajimenez/curriculum-learning-prior -unctainty。
translated by 谷歌翻译
深神经网络(DNN)的记忆效果在许多最先进的标签噪声学习方法中起着枢轴作用。为了利用这一财产,通常采用早期停止训练早期优化的伎俩。目前的方法通常通过考虑整个DNN来决定早期停止点。然而,DNN可以被认为是一系列层的组成,并且发现DNN中的后一个层对标签噪声更敏感,而其前同行是非常稳健的。因此,选择整个网络的停止点可以使不同的DNN层对抗彼此影响,从而降低最终性能。在本文中,我们建议将DNN分离为不同的部位,逐步培训它们以解决这个问题。而不是早期停止,它一次列举一个整体DNN,我们最初通过用相对大量的时期优化DNN来训练前DNN层。在培训期间,我们通过使用较少数量的时期使用较少的地层来逐步培训后者DNN层,以抵消嘈杂标签的影响。我们将所提出的方法术语作为渐进式早期停止(PES)。尽管其简单性,与早期停止相比,PES可以帮助获得更有前景和稳定的结果。此外,通过将PE与现有的嘈杂标签培训相结合,我们在图像分类基准上实现了最先进的性能。
translated by 谷歌翻译