The success of deep neural networks requires both high annotation quality and massive data. However, the size and the quality of a dataset are usually a trade-off in practice, as data collection and cleaning are expensive and time-consuming. Therefore, automatic noisy label detection (NLD) techniques are critical to real-world applications, especially those using crowdsourcing datasets. As this is an under-explored topic in automatic speaker verification (ASV), we present a simple but effective solution to the task. First, we compare the effectiveness of various commonly used metric learning loss functions under different noise settings. Then, we propose two ranking-based NLD methods, inter-class inconsistency and intra-class inconsistency ranking. They leverage the inconsistent nature of noisy labels and show high detection precision even under a high level of noise. Our solution gives rise to both efficient and effective cleaning of large-scale speaker recognition datasets.
translated by 谷歌翻译
嘈杂的标签通常在现实世界数据中找到,这导致深神经网络的性能下降。手动清洁数据是劳动密集型和耗时的。以前的研究主要侧重于加强对嘈杂标签的分类模型,而对嘈杂标签的深度度量学习(DML)的鲁棒性仍然较少。在本文中,通过提出与DML的内存(棱镜)方法提出基于概率排名的实例选择来弥合这一重要差异。棱镜计算清洁标签的概率,并滤除潜在的噪声样本。具体地,我们提出了一种新方法,即Von Mises-Fisher分配相似性(VMF-SIM),通过估计每个数据类的VON MISES-FISHER(VMF)分布来计算这种概率。与现有的平均相似性方法(AVGSIM)相比,除了平均相似度之外,VMF-SIM还考虑每个类的方差。通过这种设计,所提出的方法可以应对挑战的DML情况,其中大多数样本是嘈杂的。在合成和现实世界嘈杂的数据集中的广泛实验表明,拟议的方法在合理的培训时间内实现了高达@ 1的精度高达8.37%的精度@ 1。
translated by 谷歌翻译
Recently, a popular line of research in face recognition is adopting margins in the well-established softmax loss function to maximize class separability. In this paper, we first introduce an Additive Angular Margin Loss (ArcFace), which not only has a clear geometric interpretation but also significantly enhances the discriminative power. Since ArcFace is susceptible to the massive label noise, we further propose sub-center ArcFace, in which each class contains K sub-centers and training samples only need to be close to any of the K positive sub-centers. Sub-center ArcFace encourages one dominant sub-class that contains the majority of clean faces and non-dominant sub-classes that include hard or noisy faces. Based on this self-propelled isolation, we boost the performance through automatically purifying raw web faces under massive real-world noise. Besides discriminative feature embedding, we also explore the inverse problem, mapping feature vectors to face images. Without training any additional generator or discriminator, the pre-trained ArcFace model can generate identity-preserved face images for both subjects inside and outside the training data only by using the network gradient and Batch Normalization (BN) priors. Extensive experiments demonstrate that ArcFace can enhance the discriminative feature embedding as well as strengthen the generative face synthesis.
translated by 谷歌翻译
使用嘈杂标签(LNL)学习旨在设计策略来通过减轻模型过度适应嘈杂标签的影响来提高模型性能和概括。 LNL的主要成功在于从大量嘈杂数据中识别尽可能多的干净样品,同时纠正错误分配的嘈杂标签。最近的进步采用了单个样品的预测标签分布来执行噪声验证和嘈杂的标签校正,很容易产生确认偏差。为了减轻此问题,我们提出了邻里集体估计,其中通过将其与其功能空间最近的邻居进行对比,重新估计了候选样本的预测性可靠性。具体而言,我们的方法分为两个步骤:1)邻域集体噪声验证,将所有训练样品分为干净或嘈杂的子集,2)邻里集体标签校正到Relabel嘈杂样品,然后使用辅助技术来帮助进一步的模型优化。 。在四个常用基准数据集(即CIFAR-10,CIFAR-100,Clothing-1M和WebVision-1.0)上进行了广泛的实验,这表明我们提出的方法非常优于最先进的方法。
translated by 谷歌翻译
Training accurate deep neural networks (DNNs) in the presence of noisy labels is an important and challenging task. Though a number of approaches have been proposed for learning with noisy labels, many open issues remain. In this paper, we show that DNN learning with Cross Entropy (CE) exhibits overfitting to noisy labels on some classes ("easy" classes), but more surprisingly, it also suffers from significant under learning on some other classes ("hard" classes). Intuitively, CE requires an extra term to facilitate learning of hard classes, and more importantly, this term should be noise tolerant, so as to avoid overfitting to noisy labels. Inspired by the symmetric KL-divergence, we propose the approach of Symmetric cross entropy Learning (SL), boosting CE symmetrically with a noise robust counterpart Reverse Cross Entropy (RCE). Our proposed SL approach simultaneously addresses both the under learning and overfitting problem of CE in the presence of noisy labels. We provide a theoretical analysis of SL and also empirically show, on a range of benchmark and real-world datasets, that SL outperforms state-of-the-art methods. We also show that SL can be easily incorporated into existing methods in order to further enhance their performance.
translated by 谷歌翻译
我们先前的实验表明,人类和机器似乎采用了不同的方法来歧视说话者歧视,尤其是在说话风格可变性的情况下。实验检查了阅读与对话演讲。听众专注于特定于说话者的特质,同时“一起告诉说话者”,以及“告诉说话者分开”时共享声学空间的相对距离。但是,无论目标或非目标试验如何,自动扬声器验证(ASV)系统使用相同的损失函数。为了在风格变异性的存在下提高ASV性能,从人类感知中学到的见解被用来设计一种新的训练损失功能,我们称为“ CLLRCE损失”。 CLLRCE损失既使用说话者特异性的特质,又使用扬声器之间的相对声学距离来训练ASV系统。当使用UCLA扬声器可变性数据库时,在X-Vector和条件设置中,CLLCE损失使EER显着相对改善1-66%,而MindCF分别与1-31%和1-56%相比,相比之下X矢量基线。使用涉及不同的对话语音任务的SITW评估任务,拟议的损失与自我发项式调节结合,导致EER的显着相对改善2-5%,而MindCF则比基线高6-12%。在SITW案例中,绩效的改善仅与调理保持一致。
translated by 谷歌翻译
Modeling noise transition matrix is a kind of promising method for learning with label noise. Based on the estimated noise transition matrix and the noisy posterior probabilities, the clean posterior probabilities, which are jointly called Label Distribution (LD) in this paper, can be calculated as the supervision. To reliably estimate the noise transition matrix, some methods assume that anchor points are available during training. Nonetheless, if anchor points are invalid, the noise transition matrix might be poorly learned, resulting in poor performance. Consequently, other methods treat reliable data points, extracted from training data, as pseudo anchor points. However, from a statistical point of view, the noise transition matrix can be inferred from data with noisy labels under the clean-label-domination assumption. Therefore, we aim to estimate the noise transition matrix without (pseudo) anchor points. There is evidence showing that samples are more likely to be mislabeled as other similar class labels, which means the mislabeling probability is highly correlated with the inter-class correlation. Inspired by this observation, we propose an instance-specific Label Distribution Regularization (LDR), in which the instance-specific LD is estimated as the supervision, to prevent DCNNs from memorizing noisy labels. Specifically, we estimate the noisy posterior under the supervision of noisy labels, and approximate the batch-level noise transition matrix by estimating the inter-class correlation matrix with neither anchor points nor pseudo anchor points. Experimental results on two synthetic noisy datasets and two real-world noisy datasets demonstrate that our LDR outperforms existing methods.
translated by 谷歌翻译
公开意图检测是自然语言理解中的一个重大问题,旨在以仅知道已知意图的先验知识来检测看不见的公开意图。当前方法在此任务中面临两个核心挑战。一方面,他们在学习友好表示方面有局限性来检测公开意图。另一方面,缺乏有效的方法来获得已知意图的特定和紧凑的决策边界。为了解决这些问题,本文介绍了一个原始框架DA-ADB,该框架连续学习了远距离感知的意图表示和自适应决策边界,以进行开放意图检测。具体而言,我们首先利用距离信息来增强意图表示的区别能力。然后,我们设计了一种新颖的损失函数,以通过平衡经验和开放空间风险来获得适当的决策界限。广泛的实验显示了距离了解和边界学习策略的有效性。与最先进的方法相比,我们的方法在三个基准数据集上实现了重大改进。它还具有不同比例的标记数据和已知类别的稳健性能。完整的数据和代码可在https://github.com/thuiar/textoir上获得
translated by 谷歌翻译
Voice anti-spoofing systems are crucial auxiliaries for automatic speaker verification (ASV) systems. A major challenge is caused by unseen attacks empowered by advanced speech synthesis technologies. Our previous research on one-class learning has improved the generalization ability to unseen attacks by compacting the bona fide speech in the embedding space. However, such compactness lacks consideration of the diversity of speakers. In this work, we propose speaker attractor multi-center one-class learning (SAMO), which clusters bona fide speech around a number of speaker attractors and pushes away spoofing attacks from all the attractors in a high-dimensional embedding space. For training, we propose an algorithm for the co-optimization of bona fide speech clustering and bona fide/spoof classification. For inference, we propose strategies to enable anti-spoofing for speakers without enrollment. Our proposed system outperforms existing state-of-the-art single systems with a relative improvement of 38% on equal error rate (EER) on the ASVspoof2019 LA evaluation set.
translated by 谷歌翻译
Point cloud segmentation is a fundamental task in 3D. Despite recent progress on point cloud segmentation with the power of deep networks, current learning methods based on the clean label assumptions may fail with noisy labels. Yet, class labels are often mislabeled at both instance-level and boundary-level in real-world datasets. In this work, we take the lead in solving the instance-level label noise by proposing a Point Noise-Adaptive Learning (PNAL) framework. Compared to noise-robust methods on image tasks, our framework is noise-rate blind, to cope with the spatially variant noise rate specific to point clouds. Specifically, we propose a point-wise confidence selection to obtain reliable labels from the historical predictions of each point. A cluster-wise label correction is proposed with a voting strategy to generate the best possible label by considering the neighbor correlations. To handle boundary-level label noise, we also propose a variant ``PNAL-boundary " with a progressive boundary label cleaning strategy. Extensive experiments demonstrate its effectiveness on both synthetic and real-world noisy datasets. Even with $60\%$ symmetric noise and high-level boundary noise, our framework significantly outperforms its baselines, and is comparable to the upper bound trained on completely clean data. Moreover, we cleaned the popular real-world dataset ScanNetV2 for rigorous experiment. Our code and data is available at https://github.com/pleaseconnectwifi/PNAL.
translated by 谷歌翻译
深度学习在大量大数据的帮助下取得了众多域中的显着成功。然而,由于许多真实情景中缺乏高质量标签,数据标签的质量是一个问题。由于嘈杂的标签严重降低了深度神经网络的泛化表现,从嘈杂的标签(强大的培训)学习是在现代深度学习应用中成为一项重要任务。在本调查中,我们首先从监督的学习角度描述了与标签噪声学习的问题。接下来,我们提供62项最先进的培训方法的全面审查,所有这些培训方法都按照其方法论差异分为五个群体,其次是用于评估其优越性的六种性质的系统比较。随后,我们对噪声速率估计进行深入分析,并总结了通常使用的评估方法,包括公共噪声数据集和评估度量。最后,我们提出了几个有前途的研究方向,可以作为未来研究的指导。所有内容将在https://github.com/songhwanjun/awesome-noisy-labels提供。
translated by 谷歌翻译
经过嘈杂标签训练的深层模型很容易在概括中过度拟合和挣扎。大多数现有的解决方案都是基于理想的假设,即标签噪声是类条件,即同一类的实例共享相同的噪声模型,并且独立于特征。在实践中,现实世界中的噪声模式通常更为细粒度作为实例依赖性,这构成了巨大的挑战,尤其是在阶层间失衡的情况下。在本文中,我们提出了一种两阶段的干净样品识别方法,以应对上述挑战。首先,我们采用类级特征聚类程序,以早期识别在班级预测中心附近的干净样品。值得注意的是,我们根据稀有类的预测熵来解决类不平衡问题。其次,对于接近地面真相类边界的其余清洁样品(通常与样品与实例有关的噪声混合),我们提出了一种基于一致性的新型分类方法,该方法使用两个分类器头的一致性来识别它们:一致性越高,样品清洁的可能性就越大。对几个具有挑战性的基准进行了广泛的实验,证明了我们的方法与最先进的方法相比。
translated by 谷歌翻译
In this paper, we propose a conceptually simple and geometrically interpretable objective function, i.e. additive margin Softmax (AM-Softmax), for deep face verification. In general, the face verification task can be viewed as a metric learning problem, so learning large-margin face features whose intra-class variation is small and inter-class difference is large is of great importance in order to achieve good performance. Recently, Large-margin Softmax [10] and Angular Softmax [9] have been proposed to incorporate the angular margin in a multiplicative manner. In this work, we introduce a novel additive angular margin for the Softmax loss, which is intuitively appealing and more interpretable than the existing works. We also emphasize and discuss the importance of feature normalization in the paper. Most importantly, our experiments on LFW and MegaFace show that our additive margin softmax loss consistently performs better than the current state-of-the-art methods using the same network architecture and training dataset. Our code has also been made available 1 .
translated by 谷歌翻译
现实世界的面部表达识别(FER)数据集遭受吵闹的注释,由于众包,表达式的歧义,注释者的主观性和类间的相似性。但是,最近的深层网络具有强大的能力,可以记住嘈杂的注释导致腐蚀功能嵌入和泛化不良的能力。为了处理嘈杂的注释,我们提出了一个动态FER学习框架(DNFER),其中根据训练过程中的动态类特定阈值选择了干净的样品。具体而言,DNFER基于使用选定的干净样品和使用所有样品的无监督培训的监督培训。在训练过程中,每个微型批次的平均后类概率被用作动态类特异性阈值,以选择干净的样品进行监督训练。该阈值与噪声率无关,与其他方法不同,不需要任何干净的数据。此外,要从所有样品中学习,使用无监督的一致性损失对齐弱调节图像和强大图像之间的后验分布。我们证明了DNFER在合成和实际噪声注释的FER数据集(如RaFDB,Ferplus,Sfew和altimpnet)上的鲁棒性。
translated by 谷歌翻译
在嘈杂标记的数据上进行强大的学习是实际应用中的重要任务,因为标签噪声直接导致深度学习模型的概括不良。现有的标签噪声学习方法通​​常假定培训数据的基础类别是平衡的。但是,现实世界中的数据通常是不平衡的,导致观察到的与标签噪声引起的固有类别分布之间的不一致。分布不一致使标签 - 噪声学习的问题更具挑战性,因为很难将干净的样本与内在尾巴类别的嘈杂样本区分开来。在本文中,我们提出了一个学习框架,用于使用内在长尾数据进行标签 - 噪声学习。具体而言,我们提出了一种称为两阶段双维样品选择(TBS)的可靠样品选择方法,以更好地与嘈杂的样品分开清洁样品,尤其是对于尾巴类别。 TBSS由两个新的分离指标组成,以在每个类别中共同分开样本。对具有内在长尾巴分布的多个嘈杂标记的数据集进行了广泛的实验,证明了我们方法的有效性。
translated by 谷歌翻译
在新课程训练时,几乎没有射击学习(FSL)方法通常假设具有准确标记的样品的清洁支持集。这个假设通常可能是不现实的:支持集,无论多么小,仍然可能包括标签错误的样本。因此,对标签噪声的鲁棒性对于FSL方法是实用的,但是这个问题令人惊讶地在很大程度上没有探索。为了解决FSL设置中标签错误的样品,我们做出了一些技术贡献。 (1)我们提供了简单而有效的特征聚合方法,改善了流行的FSL技术Protonet使用的原型。 (2)我们描述了一种嘈杂的噪声学习的新型变压器模型(TRANFS)。 TRANFS利用变压器的注意机制称重标记为错误的样品。 (3)最后,我们对迷你胶原和tieredimagenet的嘈杂版本进行了广泛的测试。我们的结果表明,TRANFS与清洁支持集的领先FSL方法相对应,但到目前为止,在存在标签噪声的情况下,它们的表现优于它们。
translated by 谷歌翻译
由于错误的自动和人类注释程序,NLP中的大型数据集遭受嘈杂的标签。我们研究了标签噪声的文本分类问题,并旨在通过分类器上通过辅助噪声模型捕获这种噪声。我们首先将概率得分分配给每个训练样本,通过训练早期纪要的损失的β混合模型来分配嘈杂的标签。然后,我们使用这个分数来选择性地引导噪声模型和分类器的学习。我们对两种文本分类任务的实证评估表明,我们的方法可以改善基线精度,并防止对噪声过度接近。
translated by 谷歌翻译
带有嘈杂标签的训练深神经网络(DNN)实际上是具有挑战性的,因为不准确的标签严重降低了DNN的概括能力。以前的努力倾向于通过识别带有粗糙的小损失标准来减轻嘈杂标签的干扰的嘈杂数据来处理统一的denoising流中的零件或完整数据,而忽略了嘈杂样本的困难是不同的,因此是刚性和统一的。数据选择管道无法很好地解决此问题。在本文中,我们首先提出了一种称为CREMA的粗到精细的稳健学习方法,以分裂和串扰的方式处理嘈杂的数据。在粗糙水平中,干净和嘈杂的集合首先从统计意义上就可信度分开。由于实际上不可能正确对所有嘈杂样本进行分类,因此我们通过对每个样本的可信度进行建模来进一步处理它们。具体而言,对于清洁集,我们故意设计了一种基于内存的调制方案,以动态调整每个样本在训练过程中的历史可信度顺序方面的贡献,从而减轻了错误地分组为清洁集中的嘈杂样本的效果。同时,对于分类为嘈杂集的样品,提出了选择性标签更新策略,以纠正嘈杂的标签,同时减轻校正错误的问题。广泛的实验是基于不同方式的基准,包括图像分类(CIFAR,Clothing1M等)和文本识别(IMDB),具有合成或自然语义噪声,表明CREMA的优势和普遍性。
translated by 谷歌翻译
在本文中,我们描述了RTZR团队Voxceleb扬声器识别挑战2022(VOXSRC-22)的最高得分提交,在封闭的数据集中,扬声器验证轨道1.最高执行的系统是7型型号的融合,其中包含3种不同类型的类型模型体系结构。我们专注于培训模型以学习周期性信息。因此,所有型号均以4-6秒的镜头训练,每次发言。此外,我们采用了较大的保证金微调策略,该策略在我们的某些融合模型的先前挑战上表现出良好的表现。在评估过程中,我们应用了具有自适应对称归一化(AS-NORM)和矩阵得分平均值(MSA)的评分方法。最后,我们将模型与逻辑回归混合在一起,以融合所有受过训练的模型。最终提交在VOXSRC22测试集上实现了0.165 DCF和2.912%EER。
translated by 谷歌翻译
受视力语言预训练模型的显着零击概括能力的启发,我们试图利用剪辑模型的监督来减轻数据标记的负担。然而,这种监督不可避免地包含标签噪声,从而大大降低了分类模型的判别能力。在这项工作中,我们提出了Transductive Clip,这是一个新型的框架,用于学习具有从头开始的嘈杂标签的分类网络。首先,提出了一种类似的对比学习机制来减轻对伪标签的依赖并提高对嘈杂标签的耐受性。其次,合奏标签被用作伪标签更新策略,以稳定具有嘈杂标签的深神经网络的培训。该框架可以通过组合两种技术有效地从夹子模型中降低嘈杂标签的影响。多个基准数据集的实验证明了比其他最新方法的实质性改进。
translated by 谷歌翻译