In many applications of classifier learning, training data suffers from label noise. Deep networks are learned using huge training data where the problem of noisy labels is particularly relevant. The current techniques proposed for learning deep networks under label noise focus on modifying the network architecture and on algorithms for estimating true labels from noisy labels. An alternate approach would be to look for loss functions that are inherently noise-tolerant. For binary classification there exist theoretical results on loss functions that are robust to label noise. In this paper, we provide some sufficient conditions on a loss function so that risk minimization under that loss function would be inherently tolerant to label noise for multiclass classification problems. These results generalize the existing results on noise-tolerant loss functions for binary classification. We study some of the widely used loss functions in deep networks and show that the loss function based on mean absolute value of error is inherently robust to label noise. Thus standard back propagation is enough to learn the true classifier even under label noise. Through experiments, we illustrate the robustness of risk minimization with such loss functions for learning neural networks.
translated by 谷歌翻译
In this paper, we theoretically study the problem of binary classification in the presence of random classification noise -the learner, instead of seeing the true labels, sees labels that have independently been flipped with some small probability. Moreover, random label noise is class-conditional -the flip probability depends on the class. We provide two approaches to suitably modify any given surrogate loss function. First, we provide a simple unbiased estimator of any loss, and obtain performance bounds for empirical risk minimization in the presence of iid data with noisy labels. If the loss function satisfies a simple symmetry condition, we show that the method leads to an efficient algorithm for empirical minimization. Second, by leveraging a reduction of risk minimization under noisy labels to classification with weighted 0-1 loss, we suggest the use of a simple weighted surrogate loss, for which we are able to obtain strong empirical risk bounds. This approach has a very remarkable consequence -methods used in practice such as biased SVM and weighted logistic regression are provably noise-tolerant. On a synthetic non-separable dataset, our methods achieve over 88% accuracy even when 40% of the labels are corrupted, and are competitive with respect to recently proposed methods for dealing with label noise in several benchmark datasets.
translated by 谷歌翻译
Deep neural networks (DNNs) have achieved tremendous success in a variety of applications across many disciplines. Yet, their superior performance comes with the expensive cost of requiring correctly annotated large-scale datasets. Moreover, due to DNNs' rich capacity, errors in training labels can hamper performance. To combat this problem, mean absolute error (MAE) has recently been proposed as a noise-robust alternative to the commonly-used categorical cross entropy (CCE) loss. However, as we show in this paper, MAE can perform poorly with DNNs and challenging datasets. Here, we present a theoretically grounded set of noise-robust loss functions that can be seen as a generalization of MAE and CCE. Proposed loss functions can be readily applied with any existing DNN architecture and algorithm, while yielding good performance in a wide range of noisy label scenarios. We report results from experiments conducted with CIFAR-10, CIFAR-100 and FASHION-MNIST datasets and synthetically generated noisy labels.
translated by 谷歌翻译
In this paper, we study a classification problem in which sample labels are randomly corrupted. In this scenario, there is an unobservable sample with noise-free labels. However, before being observed, the true labels are independently flipped with a probability ρ ∈ [0, 0.5), and the random label noise can be class-conditional. Here, we address two fundamental problems raised by this scenario. The first is how to best use the abundant surrogate loss functions designed for the traditional classification problem when there is label noise. We prove that any surrogate loss function can be used for classification with noisy labels by using importance reweighting, with consistency assurance that the label noise does not ultimately hinder the search for the optimal classifier of the noise-free sample. The other is the open problem of how to obtain the noise rate ρ. We show that the rate is upper bounded by the conditional probability P ( Ŷ |X) of the noisy sample. Consequently, the rate can be estimated, because the upper bound can be easily reached in classification problems. Experimental results on synthetic and real datasets confirm the efficiency of our methods.
translated by 谷歌翻译
通常,用于训练排名模型的数据受到标签噪声。例如,在Web搜索中,由于ClickStream数据创建的标签是嘈杂的,这是因为诸如SERP上的项目描述中的信息不足,用户查询重新进行的,以及不稳定的或意外的用户行为。在实践中,很难处理标签噪声而不对标签生成过程做出强烈的假设。结果,如果不考虑标签噪声,从业人员通常会直接在此嘈杂的数据上训练他们的学习到秩(LTR)模型。令人惊讶的是,我们经常看到以这种方式训练的LTR模型的出色表现。在这项工作中,我们描述了一类耐噪声的LTR损失,即使在类条件标签噪声的背景下,经验风险最小化也是一致的程序。我们还开发了常用损失函数的耐噪声类似物。实验结果进一步支持了我们理论发现的实际意义。
translated by 谷歌翻译
We present a theoretically grounded approach to train deep neural networks, including recurrent networks, subject to class-dependent label noise. We propose two procedures for loss correction that are agnostic to both application domain and network architecture. They simply amount to at most a matrix inversion and multiplication, provided that we know the probability of each class being corrupted into another. We further show how one can estimate these probabilities, adapting a recent technique for noise estimation to the multi-class setting, and thus providing an end-to-end framework. Extensive experiments on MNIST, IMDB, CIFAR-10, CIFAR-100 and a large scale dataset of clothing images employing a diversity of architectures -stacking dense, convolutional, pooling, dropout, batch normalization, word embedding, LSTM and residual layers -demonstrate the noise robustness of our proposals. Incidentally, we also prove that, when ReLU is the only non-linearity, the loss curvature is immune to class-dependent label noise.
translated by 谷歌翻译
原始收集的培训数据通常带有从多个不完美的注释器中收集的单独的嘈杂标签(例如,通过众包)。通常,首先将单独的嘈杂标签汇总为一个,并应用标准培训方法。文献还广泛研究了有效的聚合方法。本文重新审视了此选择,并旨在为一个问题提供一个答案,即是否应该将单独的嘈杂标签汇总为单个单个标签或单独使用它们作为给定标签。我们从理论上分析了许多流行损失功能的经验风险最小化框架下的两种方法的性能,包括专门为使用嘈杂标签学习的问题而设计的损失功能。我们的定理得出的结论是,当噪声速率较高时,标签分离优于标签聚集,或者标记器/注释的数量不足。广泛的经验结果证明了我们的结论。
translated by 谷歌翻译
Deep Neural Networks (DNNs) have been shown to be susceptible to memorization or overfitting in the presence of noisily-labelled data. For the problem of robust learning under such noisy data, several algorithms have been proposed. A prominent class of algorithms rely on sample selection strategies wherein, essentially, a fraction of samples with loss values below a certain threshold are selected for training. These algorithms are sensitive to such thresholds, and it is difficult to fix or learn these thresholds. Often, these algorithms also require information such as label noise rates which are typically unavailable in practice. In this paper, we propose an adaptive sample selection strategy that relies only on batch statistics of a given mini-batch to provide robustness against label noise. The algorithm does not have any additional hyperparameters for sample selection, does not need any information on noise rates and does not need access to separate data with clean labels. We empirically demonstrate the effectiveness of our algorithm on benchmark datasets.
translated by 谷歌翻译
监督学习的关键假设是培训和测试数据遵循相同的概率分布。然而,这种基本假设在实践中并不总是满足,例如,由于不断变化的环境,样本选择偏差,隐私问题或高标签成本。转移学习(TL)放松这种假设,并允许我们在分销班次下学习。通常依赖于重要性加权的经典TL方法 - 基于根据重要性(即测试过度训练密度比率)的训练损失培训预测器。然而,由于现实世界机器学习任务变得越来越复杂,高维和动态,探讨了新的新方法,以应对这些挑战最近。在本文中,在介绍基于重要性加权的TL基础之后,我们根据关节和动态重要预测估计审查最近的进步。此外,我们介绍一种因果机制转移方法,该方法包含T1中的因果结构。最后,我们讨论了TL研究的未来观点。
translated by 谷歌翻译
近年来,有监督的深度学习取得了巨大的成功,从大量完全标记的数据中,对预测模型进行了培训。但是,实际上,标记这样的大数据可能非常昂贵,甚至出于隐私原因甚至可能是不可能的。因此,在本文中,我们旨在学习一个无需任何类标签的准确分类器。更具体地说,我们考虑了多组未标记的数据及其类先验的情况,即每个类别的比例。在此问题设置下,我们首先得出了对分类风险的无偏估计量,可以从给定未标记的集合中估算,并理论上分析了学习分类器的概括误差。然后,我们发现获得的分类器往往会导致过度拟合,因为其经验风险在训练过程中呈负面。为了防止过度拟合,我们进一步提出了一个部分风险正规化,该风险正规化在某些级别上保持了未标记的数据集和类方面的部分风险。实验表明,我们的方法有效地减轻了过度拟合和优于从多个未标记集中学习的最先进方法。
translated by 谷歌翻译
Training accurate deep neural networks (DNNs) in the presence of noisy labels is an important and challenging task. Though a number of approaches have been proposed for learning with noisy labels, many open issues remain. In this paper, we show that DNN learning with Cross Entropy (CE) exhibits overfitting to noisy labels on some classes ("easy" classes), but more surprisingly, it also suffers from significant under learning on some other classes ("hard" classes). Intuitively, CE requires an extra term to facilitate learning of hard classes, and more importantly, this term should be noise tolerant, so as to avoid overfitting to noisy labels. Inspired by the symmetric KL-divergence, we propose the approach of Symmetric cross entropy Learning (SL), boosting CE symmetrically with a noise robust counterpart Reverse Cross Entropy (RCE). Our proposed SL approach simultaneously addresses both the under learning and overfitting problem of CE in the presence of noisy labels. We provide a theoretical analysis of SL and also empirically show, on a range of benchmark and real-world datasets, that SL outperforms state-of-the-art methods. We also show that SL can be easily incorporated into existing methods in order to further enhance their performance.
translated by 谷歌翻译
深度学习在大量大数据的帮助下取得了众多域中的显着成功。然而,由于许多真实情景中缺乏高质量标签,数据标签的质量是一个问题。由于嘈杂的标签严重降低了深度神经网络的泛化表现,从嘈杂的标签(强大的培训)学习是在现代深度学习应用中成为一项重要任务。在本调查中,我们首先从监督的学习角度描述了与标签噪声学习的问题。接下来,我们提供62项最先进的培训方法的全面审查,所有这些培训方法都按照其方法论差异分为五个群体,其次是用于评估其优越性的六种性质的系统比较。随后,我们对噪声速率估计进行深入分析,并总结了通常使用的评估方法,包括公共噪声数据集和评估度量。最后,我们提出了几个有前途的研究方向,可以作为未来研究的指导。所有内容将在https://github.com/songhwanjun/awesome-noisy-labels提供。
translated by 谷歌翻译
In the presence of noisy labels, designing robust loss functions is critical for securing the generalization performance of deep neural networks. Cross Entropy (CE) loss has been shown to be not robust to noisy labels due to its unboundedness. To alleviate this issue, existing works typically design specialized robust losses with the symmetric condition, which usually lead to the underfitting issue. In this paper, our key idea is to induce a loss bound at the logit level, thus universally enhancing the noise robustness of existing losses. Specifically, we propose logit clipping (LogitClip), which clamps the norm of the logit vector to ensure that it is upper bounded by a constant. In this manner, CE loss equipped with our LogitClip method is effectively bounded, mitigating the overfitting to examples with noisy labels. Moreover, we present theoretical analyses to certify the noise-tolerant ability of LogitClip. Extensive experiments show that LogitClip not only significantly improves the noise robustness of CE loss, but also broadly enhances the generalization performance of popular robust losses.
translated by 谷歌翻译
我们推出了可实现的机器学习模型的贝叶斯风险和泛化误差的信息 - 理论下限。特别地,我们采用了一个分析,其中模型参数的速率失真函数在训练样本和模型参数之间界定了所需的互信息,以便向贝叶斯风险约束学习模型。对于可实现的模型,我们表明,速率失真函数和相互信息承认的表达式,方便分析。对于在其参数中(大致)较低的LipsChitz的模型,我们将从下面的速率失真函数绑定,而对于VC类,相互信息以高于$ d_ \ mathrm {vc} \ log(n)$。当这些条件匹配时,贝叶斯相对于零一个损耗尺度的风险不足于$ \ oomega(d_ \ mathrm {vc} / n)$,它与已知的外界和最小界限匹配对数因子。我们还考虑标签噪声的影响,在训练和/或测试样本损坏时提供下限。
translated by 谷歌翻译
神经崩溃的概念是指在各种规范分类问题中经验观察到的几种新兴现象。在训练深度神经网络的终端阶段,同一类的所有示例的特征嵌入往往会崩溃为单一表示,而不同类别的特征往往会尽可能分开。通常通过简化的模型(称为无约束的特征表示)来研究神经崩溃,其中假定模型具有“无限表达性”,并且可以将每个数据点映射到任何任意表示。在这项工作中,我们提出了不受约束的功能表示的更现实的变体,该变体考虑到了网络的有限表达性。经验证据表明,嘈杂数据点的记忆导致神经崩溃的降解(扩张)。使用记忆 - 稀释(M-D)现象的模型,我们展示了一种机制,通过该机制,不同的损失导致嘈杂数据上受过训练的网络的不同性能。我们的证据揭示了为什么标签平滑性(经验观察到产生正则化效果的跨凝性的修改)导致分类任务的概括改善的原因。
translated by 谷歌翻译
We introduce a tunable loss function called $\alpha$-loss, parameterized by $\alpha \in (0,\infty]$, which interpolates between the exponential loss ($\alpha = 1/2$), the log-loss ($\alpha = 1$), and the 0-1 loss ($\alpha = \infty$), for the machine learning setting of classification. Theoretically, we illustrate a fundamental connection between $\alpha$-loss and Arimoto conditional entropy, verify the classification-calibration of $\alpha$-loss in order to demonstrate asymptotic optimality via Rademacher complexity generalization techniques, and build-upon a notion called strictly local quasi-convexity in order to quantitatively characterize the optimization landscape of $\alpha$-loss. Practically, we perform class imbalance, robustness, and classification experiments on benchmark image datasets using convolutional-neural-networks. Our main practical conclusion is that certain tasks may benefit from tuning $\alpha$-loss away from log-loss ($\alpha = 1$), and to this end we provide simple heuristics for the practitioner. In particular, navigating the $\alpha$ hyperparameter can readily provide superior model robustness to label flips ($\alpha > 1$) and sensitivity to imbalanced classes ($\alpha < 1$).
translated by 谷歌翻译
我们研究由SGD的变体训练的Relu神经网络的隐式偏置,其中在每个步骤中,标签以概率$ P $更改为随机标签(标记平滑是该过程的关闭变体)。我们的实验表明,标签噪声在以下意义上推动网络到稀疏解决方案:对于典型的输入,一小部分神经元是有效的,并且隐藏层的烧制图案是稀疏的。实际上,对于某些情况,适当的标签噪声不仅缩小网络,而且还减少了测试错误。然后,我们转向这些稀疏机制的理论分析,重点关注$ p = 1 $的极值案例。我们展示在这种情况下,网络沿着实验预期,但令人惊讶的是,以不同的方式依赖于学习率和偏见的存在,有重量消失或释放的神经元。
translated by 谷歌翻译
深度神经网络的成功在很大程度上取决于大量高质量注释的数据的可用性,但是这些数据很难或昂贵。由此产生的标签可能是类别不平衡,嘈杂或人类偏见。从不完美注释的数据集中学习无偏分类模型是一项挑战,我们通常会遭受过度拟合或不足的折磨。在这项工作中,我们彻底研究了流行的软马克斯损失和基于保证金的损失,并提供了一种可行的方法来加强通过最大化最小样本余量来限制的概括误差。我们为此目的进一步得出了最佳条件,该条件指示了类原型应锚定的方式。通过理论分析的激励,我们提出了一种简单但有效的方法,即原型锚定学习(PAL),可以轻松地将其纳入各种基于学习的分类方案中以处理不完美的注释。我们通过对合成和现实世界数据集进行广泛的实验来验证PAL对班级不平衡学习和降低噪声学习的有效性。
translated by 谷歌翻译
当可能的许多标签是可能的时,选择单个可以导致低精度。一个常见的替代方案,称为顶级k $分类,是选择一些数字$ k $(通常约5),并返回最高分数的$ k $标签。不幸的是,对于明确的案例,$ k> 1 $太多,对于非常暧昧的情况,$ k \ leq 5 $(例如)太小。另一种明智的策略是使用一种自适应方法,其中返回的标签数量随着计算的歧义而变化,但必须平均到所有样本的某些特定的$ k $。我们表示这种替代方案 - $ k $分类。本文在平均值的含量较低的误差率时,本文正式地表征了模糊性曲线,比固定的顶级k $分类更低。此外,它为固定尺寸和自适应分类器提供了自然估计程序,并证明了它们的一致性。最后,它报告了实际图像数据集的实验,揭示了平均值的效益 - 在实践中的价格超过高度k $分类。总的来说,当含糊不清的歧义时,平均值-$ k $永远不会比Top-$ K $更差,并且在我们的实验中,当估计时,这也持有。
translated by 谷歌翻译
最近关于使用嘈杂标签的学习的研究通过利用小型干净数据集来显示出色的性能。特别是,基于模型不可知的元学习的标签校正方法进一步提高了性能,通过纠正了嘈杂的标签。但是,标签错误矫予没有保障措施,导致不可避免的性能下降。此外,每个训练步骤都需要至少三个背部传播,显着减慢训练速度。为了缓解这些问题,我们提出了一种强大而有效的方法,可以在飞行中学习标签转换矩阵。采用转换矩阵使分类器对所有校正样本持怀疑态度,这减轻了错误的错误问题。我们还介绍了一个双头架构,以便在单个反向传播中有效地估计标签转换矩阵,使得估计的矩阵紧密地遵循由标签校正引起的移位噪声分布。广泛的实验表明,我们的方法在训练效率方面表现出比现有方法相当或更好的准确性。
translated by 谷歌翻译