我们介绍了Goldilocks Selection,这是一种用于更快的模型训练的技术,该技术选择了一系列“恰到好处”的训练点。我们提出了一个信息理论采集函数 - 可还原验证损失 - 并使用小的代理模型-GoldiProx进行计算,以有效地选择培训点,以最大程度地提高有关验证集的信息。我们表明,通常在优化文献中选择的“硬”(例如高损失)点通常是嘈杂的,而“简单”(例如低噪声)样本通常优先考虑课程学习提供更少的信息。此外,具有不确定标签的点(通常是由主动学习的目标)往往与任务相关。相比之下,Goldilocks选择选择了“恰到好处”的点,并且从经验上优于上述方法。此外,选定的序列可以转移到其他体系结构。从业者可以共享并重复使用它,而无需重新创建它。
translated by 谷歌翻译
对网络规模数据进行培训可能需要几个月的时间。但是,在已经学习或不可学习的冗余和嘈杂点上浪费了很多计算和时间。为了加速训练,我们引入了可减少的持有损失选择(Rho-loss),这是一种简单但原则上的技术,它大致选择了这些训练点,最大程度地减少了模型的概括损失。结果,Rho-loss减轻了现有数据选择方法的弱点:优化文献中的技术通常选择“硬损失”(例如,高损失),但是这种点通常是嘈杂的(不可学习)或更少的任务与任务相关。相反,课程学习优先考虑“简单”的积分,但是一旦学习,就不必对这些要点进行培训。相比之下,Rho-Loss选择了可以学习的点,值得学习的,尚未学习。与先前的艺术相比,Rho-loss火车的步骤要少得多,可以提高准确性,并加快对广泛的数据集,超参数和体系结构(MLP,CNNS和BERT)的培训。在大型Web绑带图像数据集服装1M上,与统一的数据改组相比,步骤少18倍,最终精度的速度少2%。
translated by 谷歌翻译
在Mackay(1992)上展开,我们认为,用于主动学习的基于模式的方法 - 类似的基于模型 - 如秃顶 - 具有基本的缺点:它们未直接解释输入变量的测试时间分布。这可以导致采集策略中的病理,因为模型参数的最大信息是最大信息,可能不是最大地信息,例如,当池集中的数据比最终预测任务的数据更大时,或者池和试验样品的分布不同。为了纠正这一点,我们重新审视了基于最大化关于可能的未来预测的预期信息的收购策略,参考这是预期的预测信息增益(EPIG)。由于EPIG对批量采集不扩展,我们进一步检查了替代策略,秃头和EPIG之间的混合,我们称之为联合预测信息增益(Jepig)。我们考虑在各种数据集中使用贝叶斯神经网络的主动学习,检查池集中分布班下的行为。
translated by 谷歌翻译
We develop BatchBALD, a tractable approximation to the mutual information between a batch of points and model parameters, which we use as an acquisition function to select multiple informative points jointly for the task of deep Bayesian active learning. BatchBALD is a greedy linear-time 1 − 1 /e-approximate algorithm amenable to dynamic programming and efficient caching. We compare BatchBALD to the commonly used approach for batch data acquisition and find that the current approach acquires similar and redundant points, sometimes performing worse than randomly acquiring data. We finish by showing that, using BatchBALD to consider dependencies within an acquisition batch, we achieve new state of the art performance on standard benchmarks, providing substantial data efficiency improvements in batch acquisition.
translated by 谷歌翻译
Acquiring labeled data is challenging in many machine learning applications with limited budgets. Active learning gives a procedure to select the most informative data points and improve data efficiency by reducing the cost of labeling. The info-max learning principle maximizing mutual information such as BALD has been successful and widely adapted in various active learning applications. However, this pool-based specific objective inherently introduces a redundant selection and further requires a high computational cost for batch selection. In this paper, we design and propose a new uncertainty measure, Balanced Entropy Acquisition (BalEntAcq), which captures the information balance between the uncertainty of underlying softmax probability and the label variable. To do this, we approximate each marginal distribution by Beta distribution. Beta approximation enables us to formulate BalEntAcq as a ratio between an augmented entropy and the marginalized joint entropy. The closed-form expression of BalEntAcq facilitates parallelization by estimating two parameters in each marginal Beta distribution. BalEntAcq is a purely standalone measure without requiring any relational computations with other data points. Nevertheless, BalEntAcq captures a well-diversified selection near the decision boundary with a margin, unlike other existing uncertainty measures such as BALD, Entropy, or Mean Standard Deviation (MeanSD). Finally, we demonstrate that our balanced entropy learning principle with BalEntAcq consistently outperforms well-known linearly scalable active learning methods, including a recently proposed PowerBALD, a simple but diversified version of BALD, by showing experimental results obtained from MNIST, CIFAR-100, SVHN, and TinyImageNet datasets.
translated by 谷歌翻译
主动学习是减少训练深神经网络模型中数据量的流行方法。它的成功取决于选择有效的采集函数,该功能尚未根据其预期的信息进行排名。在不确定性抽样中,当前模型具有关于点类标签的不确定性是这种类型排名的主要标准。本文提出了一种在培训卷积神经网络(CNN)中进行不确定性采样的新方法。主要思想是使用CNN提取提取的特征表示作为培训总产品网络(SPN)的数据。由于SPN通常用于估计数据集的分布,因此它们非常适合估算类概率的任务,这些概率可以直接由标准采集函数(例如最大熵和变异比率)使用。此外,我们通过在SPN模型的帮助下通过权重增强了这些采集函数。这些权重使采集功能对数据点的可疑类标签的多样性更加敏感。我们的方法的有效性在对MNIST,时尚持续和CIFAR-10数据集的实验研究中得到了证明,我们将其与最先进的方法MC辍学和贝叶斯批次进行了比较。
translated by 谷歌翻译
主动学习在许多领域中展示了数据效率。现有的主动学习算法,特别是在深贝叶斯活动模型的背景下,严重依赖模型的不确定性估计的质量。然而,这种不确定性估计可能会严重偏见,特别是有限和不平衡的培训数据。在本文中,我们建议平衡,贝叶斯深度活跃的学习框架,减轻这种偏差的影响。具体地,平衡采用了一种新的采集功能,该函数利用了等效假设类别捕获的结构,并促进了不同的等价类别之间的分化。直观地,每个等价类包括具有类似预测的深层模型的实例化,并且平衡适应地将等同类的大小调整为学习进展。除了完整顺序设置之外,我们还提出批量平衡 - 顺序算法的泛化算法到批量设置 - 有效地选择批次的培训实施例,这些培训实施例是对模型改进的联合有效的培训实施例。我们展示批量平衡在多个基准数据集上实现了最先进的性能,用于主动学习,并且这两个算法都可以有效地处理通常涉及多级和不平衡数据的逼真挑战。
translated by 谷歌翻译
The generalisation performance of a convolutional neural networks (CNN) is majorly predisposed by the quantity, quality, and diversity of the training images. All the training data needs to be annotated in-hand before, in many real-world applications data is easy to acquire but expensive and time-consuming to label. The goal of the Active learning for the task is to draw most informative samples from the unlabeled pool which can used for training after annotation. With total different objective, self-supervised learning which have been gaining meteoric popularity by closing the gap in performance with supervised methods on large computer vision benchmarks. self-supervised learning (SSL) these days have shown to produce low-level representations that are invariant to distortions of the input sample and can encode invariance to artificially created distortions, e.g. rotation, solarization, cropping etc. self-supervised learning (SSL) approaches rely on simpler and more scalable frameworks for learning. In this paper, we unify these two families of approaches from the angle of active learning using self-supervised learning mainfold and propose Deep Active Learning using BarlowTwins(DALBT), an active learning method for all the datasets using combination of classifier trained along with self-supervised loss framework of Barlow Twins to a setting where the model can encode the invariance of artificially created distortions, e.g. rotation, solarization, cropping etc.
translated by 谷歌翻译
Deep neural networks have been shown to be very powerful modeling tools for many supervised learning tasks involving complex input patterns. However, they can also easily overfit to training set biases and label noises. In addition to various regularizers, example reweighting algorithms are popular solutions to these problems, but they require careful tuning of additional hyperparameters, such as example mining schedules and regularization hyperparameters. In contrast to past reweighting methods, which typically consist of functions of the cost value of each example, in this work we propose a novel meta-learning algorithm that learns to assign weights to training examples based on their gradient directions. To determine the example weights, our method performs a meta gradient descent step on the current mini-batch example weights (which are initialized from zero) to minimize the loss on a clean unbiased validation set. Our proposed method can be easily implemented on any type of deep network, does not require any additional hyperparameter tuning, and achieves impressive performance on class imbalance and corrupted label problems where only a small amount of clean validation data is available.
translated by 谷歌翻译
预测和模型参数之间的相互信息(也称为预期信息获得或机器学习中的秃头)来衡量信息性。这是贝叶斯活跃学习和贝叶斯最佳实验设计中流行的采集功能。在数据子集选择中,即主动学习和主动采样,最近的几项作品使用Fisher信息,Hessians,基于梯度的相似性矩阵,或者仅仅是梯度长度,以计算指导样本选择的采集分数。这些不同的方法是否连接在一起,如果是这样?在本文中,我们重新访问Fisher信息,并使用它来展示如何将几种不同的不同方法连接为信息理论量的近似值。
translated by 谷歌翻译
Deep neural networks may easily memorize noisy labels present in real-world data, which degrades their ability to generalize. It is therefore important to track and evaluate the robustness of models against noisy label memorization. We propose a metric, called susceptibility, to gauge such memorization for neural networks. Susceptibility is simple and easy to compute during training. Moreover, it does not require access to ground-truth labels and it only uses unlabeled data. We empirically show the effectiveness of our metric in tracking memorization on various architectures and datasets and provide theoretical insights into the design of the susceptibility metric. Finally, we show through extensive experiments on datasets with synthetic and real-world label noise that one can utilize susceptibility and the overall training accuracy to distinguish models that maintain a low memorization on the training set and generalize well to unseen clean data.
translated by 谷歌翻译
通过选择最具信息丰富的样本,已证明主动学习可用于最小化标记成本。但是,现有的主动学习方法在诸如不平衡或稀有类别的现实方案中不适用于未标记集中的分发数据和冗余。在这项工作中,我们提出了类似的(基于子模块信息措施的主动学习),使用最近提出的子模块信息措施(SIM)作为采集函数的统一主动学习框架。我们认为类似的不仅在标准的主动学习中工作,而且还可以轻松扩展到上面考虑的现实设置,并充当活动学习的一站式解决方案,可以扩展到大型真实世界数据集。凭经验,我们表明,在罕见的课程的情况下,在罕见的阶级和〜5% - 10%的情况下,在罕见的几个图像分类任务的情况下,相似显着优异的活动学习算法像CiFar-10,Mnist和Imagenet。类似于Distil Toolkit的一部分:“https://github.com/decile-team/distil”。
translated by 谷歌翻译
实际符号可以传达有价值的直觉并简明扼要地表达新的想法。信息理论对机器学习具有重要性,但信息理论量的符号有时是不透明的。我们提出了一种实用和统一的符号,并将其扩展到包括观察结果(事件)和随机变量之间的信息 - 理论量。这包括在NLP中已知的点亮相互信息,以及在贝叶斯最佳实验设计中的认知科学和信息增益中的特定惊喜和特定信息,如特定的惊喜和特定信息。我们应用了我们的符号来证明使用新的直觉来证明Macka(2003)提到的二项式系数的次要系数的近似值。我们还简明地重新改造了变形自动编码器和大约贝叶斯神经网络中的变差推断的证据。此外,我们将符号应用于贝叶斯主动学习中的流行信息 - 理论采集函数,它选择由专家标记的最具信息性(未标记的)样本,并将此采集功能扩展到核心设置的问题,目标是选择给定标签的大多数信息样本。
translated by 谷歌翻译
As an important data selection schema, active learning emerges as the essential component when iterating an Artificial Intelligence (AI) model. It becomes even more critical given the dominance of deep neural network based models, which are composed of a large number of parameters and data hungry, in application. Despite its indispensable role for developing AI models, research on active learning is not as intensive as other research directions. In this paper, we present a review of active learning through deep active learning approaches from the following perspectives: 1) technical advancements in active learning, 2) applications of active learning in computer vision, 3) industrial systems leveraging or with potential to leverage active learning for data iteration, 4) current limitations and future research directions. We expect this paper to clarify the significance of active learning in a modern AI model manufacturing process and to bring additional research attention to active learning. By addressing data automation challenges and coping with automated machine learning systems, active learning will facilitate democratization of AI technologies by boosting model production at scale.
translated by 谷歌翻译
Even though active learning forms an important pillar of machine learning, deep learning tools are not prevalent within it. Deep learning poses several difficulties when used in an active learning setting. First, active learning (AL) methods generally rely on being able to learn and update models from small amounts of data. Recent advances in deep learning, on the other hand, are notorious for their dependence on large amounts of data. Second, many AL acquisition functions rely on model uncertainty, yet deep learning methods rarely represent such model uncertainty. In this paper we combine recent advances in Bayesian deep learning into the active learning framework in a practical way. We develop an active learning framework for high dimensional data, a task which has been extremely challenging so far, with very sparse existing literature. Taking advantage of specialised models such as Bayesian convolutional neural networks, we demonstrate our active learning techniques with image data, obtaining a significant improvement on existing active learning approaches. We demonstrate this on both the MNIST dataset, as well as for skin cancer diagnosis from lesion images (ISIC2016 task).
translated by 谷歌翻译
现代机器学习研究依赖于相对较少的精心策划数据集。即使在这些数据集中,通常在“不整合”或原始数据中,从业人员也面临着重要的数据质量和多样性问题,这些问题可能会非常强烈地解决。应对这些挑战的现有方法往往会对特定问题做出强烈的假设,并且通常需要先验知识或元数据,例如域标签。我们的工作与这些方法是正交的:相反,我们专注于为元数据考古学提供一个统一和有效的框架 - 在数据集中发现和推断示例的元数据。我们使用简单的转换策划了可能存在的数据集(例如,错误标记,非典型或过度分布示例)中可能存在的数据子集,并利用这些探针套件之间的学习动力学差异来推断感兴趣的元数据。我们的方法与跨不同任务的更复杂的缓解方法相提并论:识别和纠正标签错误的示例,对少数民族样本进行分类,优先考虑与培训相关的点并启用相关示例的可扩展人类审核。
translated by 谷歌翻译
越来越需要与深神经网络兼容的有效主动学习算法。本文激励和重新审视基于经典的Fisher的主动选择目标,并提出了诱饵,实用,易拔和高性能的算法,使其可以与神经模型一起使用。诱饵从参数模型的最大似然估计器(MLE)的理论分析中汲取灵感。它通过在FISHER信息方面优化MLE误差的绑定来选择批次的样本,我们通过利用线性代数结构可以在规模上有效地实现,特别是在现代硬件上执行。我们的实验表明,诱饵始于先前的本领域技术在分类和回归问题上,并且足够灵活,可以与各种模型架构一起使用。
translated by 谷歌翻译
标记数据可以是昂贵的任务,因为它通常由域专家手动执行。对于深度学习而言,这是繁琐的,因为它取决于大型标记的数据集。主动学习(AL)是一种范式,旨在通过仅使用二手车型认为最具信息丰富的数据来减少标签努力。在文本分类设置中,在AL上完成了很少的研究,旁边没有涉及最近的最先进的自然语言处理(NLP)模型。在这里,我们介绍了一个实证研究,可以将基于不确定性的基于不确定性的算法与Bert $ _ {base} $相比,作为使用的分类器。我们评估两个NLP分类数据集的算法:斯坦福情绪树木银行和kvk-Front页面。此外,我们探讨了旨在解决不确定性的al的预定问题的启发式;即,它是不可规范的,并且易于选择异常值。此外,我们探讨了查询池大小对al的性能的影响。虽然发现,AL的拟议启发式没有提高AL的表现;我们的结果表明,使用BERT $ _ {Base} $概率使用不确定性的AL。随着查询池大小变大,性能的这种差异可以减少。
translated by 谷歌翻译
我们介绍了有监督的对比度积极学习(SCAL),并根据功能相似性(功能IM)和基于主成分分析的基于特征重建误差(FRE)提出有效的活动学习策略,以选择具有不同特征表示的信息性数据示例。我们证明了我们提出的方法可实现最新的准确性,模型校准并减少在图像分类任务上平衡和不平衡数据集的主动学习设置中的采样偏差。我们还评估了模型的鲁棒性,从主动学习环境中不同查询策略得出的分配转移。使用广泛的实验,我们表明我们提出的方法的表现优于高性能密集型方法,从而使平均损坏误差降低了9.9%,在数据集偏移下的预期校准误差降低了7.2%,而AUROC降低了8.9%的AUROC。检测。
translated by 谷歌翻译
深入学习的成功已归功于大量数据培训大量的过度公正模型。随着这种趋势的继续,模型培训已经过分昂贵,需要获得强大的计算系统来培训最先进的网络。一大堆研究已经致力于通过各种模型压缩技术解决训练的迭代的成本,如修剪和量化。花费较少的努力来定位迭代的数量。以前的工作,例如忘记得分和宏伟/ el2n分数,通过识别完整数据集中的重要样本并修剪剩余的样本来解决这个问题,从而减少每时代的迭代。虽然这些方法降低了训练时间,但它们在训练前使用昂贵的静态评分算法。在计入得分机制时,通常会增加总运行时间。在这项工作中,我们通过动态数据修剪算法解决了这种缺点。令人惊讶的是,我们发现均匀的随机动态修剪可以以积极的修剪速率更优于现有的工作。我们将其归因于存在“有时”样本 - 对学习决策边界很重要的点,只有一些培训时间。为了更好地利用有时样本的微妙性,我们提出了基于加强学习技术的两种算法,以动态修剪样本并实现比随机动态方法更高的准确性。我们针对全数据集基线和CIFAR-10和CIFAR-100上的先前工作测试所有方法,我们可以将培训时间降低到2倍,而无明显的性能损失。我们的结果表明,数据修剪应理解为与模型的训练轨迹密切相关的动态过程,而不是仅基于数据集的静态步骤。
translated by 谷歌翻译