Acquiring labeled data is challenging in many machine learning applications with limited budgets. Active learning gives a procedure to select the most informative data points and improve data efficiency by reducing the cost of labeling. The info-max learning principle maximizing mutual information such as BALD has been successful and widely adapted in various active learning applications. However, this pool-based specific objective inherently introduces a redundant selection and further requires a high computational cost for batch selection. In this paper, we design and propose a new uncertainty measure, Balanced Entropy Acquisition (BalEntAcq), which captures the information balance between the uncertainty of underlying softmax probability and the label variable. To do this, we approximate each marginal distribution by Beta distribution. Beta approximation enables us to formulate BalEntAcq as a ratio between an augmented entropy and the marginalized joint entropy. The closed-form expression of BalEntAcq facilitates parallelization by estimating two parameters in each marginal Beta distribution. BalEntAcq is a purely standalone measure without requiring any relational computations with other data points. Nevertheless, BalEntAcq captures a well-diversified selection near the decision boundary with a margin, unlike other existing uncertainty measures such as BALD, Entropy, or Mean Standard Deviation (MeanSD). Finally, we demonstrate that our balanced entropy learning principle with BalEntAcq consistently outperforms well-known linearly scalable active learning methods, including a recently proposed PowerBALD, a simple but diversified version of BALD, by showing experimental results obtained from MNIST, CIFAR-100, SVHN, and TinyImageNet datasets.
translated by 谷歌翻译
贝叶斯神经网络在许多应用程序问题(包括不确定性量化)中成功设计和优化了强大的神经网络模型。但是,随着最近的成功,对贝叶斯神经网络的信息理论理解仍处于早期阶段。相互信息是贝叶斯神经网络中一种不确定性度量的示例,以量化认知不确定性。尽管如此,尚无分析公式来描述它,这是了解贝叶斯深度学习框架的基本信息指标之一。在本文中,我们通过利用点过程熵的概念来得出模型参数和预测输出之间相互信息的分析公式。然后,作为应用程序,我们通过证明我们的分析公式可以在实践中进一步提高主动学习的性能,从而讨论DIRICHLET分布的参数估计,并显示其在主动学习不确定性度量中的实际应用。
translated by 谷歌翻译
在Mackay(1992)上展开,我们认为,用于主动学习的基于模式的方法 - 类似的基于模型 - 如秃顶 - 具有基本的缺点:它们未直接解释输入变量的测试时间分布。这可以导致采集策略中的病理,因为模型参数的最大信息是最大信息,可能不是最大地信息,例如,当池集中的数据比最终预测任务的数据更大时,或者池和试验样品的分布不同。为了纠正这一点,我们重新审视了基于最大化关于可能的未来预测的预期信息的收购策略,参考这是预期的预测信息增益(EPIG)。由于EPIG对批量采集不扩展,我们进一步检查了替代策略,秃头和EPIG之间的混合,我们称之为联合预测信息增益(Jepig)。我们考虑在各种数据集中使用贝叶斯神经网络的主动学习,检查池集中分布班下的行为。
translated by 谷歌翻译
We develop BatchBALD, a tractable approximation to the mutual information between a batch of points and model parameters, which we use as an acquisition function to select multiple informative points jointly for the task of deep Bayesian active learning. BatchBALD is a greedy linear-time 1 − 1 /e-approximate algorithm amenable to dynamic programming and efficient caching. We compare BatchBALD to the commonly used approach for batch data acquisition and find that the current approach acquires similar and redundant points, sometimes performing worse than randomly acquiring data. We finish by showing that, using BatchBALD to consider dependencies within an acquisition batch, we achieve new state of the art performance on standard benchmarks, providing substantial data efficiency improvements in batch acquisition.
translated by 谷歌翻译
Active learning enables efficient model training by leveraging interactions between machine learning agents and human annotators. We study and propose a novel framework that formulates batch active learning from the sparse approximation's perspective. Our active learning method aims to find an informative subset from the unlabeled data pool such that the corresponding training loss function approximates its full data pool counterpart. We realize the framework as sparsity-constrained discontinuous optimization problems, which explicitly balance uncertainty and representation for large-scale applications and could be solved by greedy or proximal iterative hard thresholding algorithms. The proposed method can adapt to various settings, including both Bayesian and non-Bayesian neural networks. Numerical experiments show that our work achieves competitive performance across different settings with lower computational complexity.
translated by 谷歌翻译
主动学习在许多领域中展示了数据效率。现有的主动学习算法,特别是在深贝叶斯活动模型的背景下,严重依赖模型的不确定性估计的质量。然而,这种不确定性估计可能会严重偏见,特别是有限和不平衡的培训数据。在本文中,我们建议平衡,贝叶斯深度活跃的学习框架,减轻这种偏差的影响。具体地,平衡采用了一种新的采集功能,该函数利用了等效假设类别捕获的结构,并促进了不同的等价类别之间的分化。直观地,每个等价类包括具有类似预测的深层模型的实例化,并且平衡适应地将等同类的大小调整为学习进展。除了完整顺序设置之外,我们还提出批量平衡 - 顺序算法的泛化算法到批量设置 - 有效地选择批次的培训实施例,这些培训实施例是对模型改进的联合有效的培训实施例。我们展示批量平衡在多个基准数据集上实现了最先进的性能,用于主动学习,并且这两个算法都可以有效地处理通常涉及多级和不平衡数据的逼真挑战。
translated by 谷歌翻译
主动学习是减少训练深神经网络模型中数据量的流行方法。它的成功取决于选择有效的采集函数,该功能尚未根据其预期的信息进行排名。在不确定性抽样中,当前模型具有关于点类标签的不确定性是这种类型排名的主要标准。本文提出了一种在培训卷积神经网络(CNN)中进行不确定性采样的新方法。主要思想是使用CNN提取提取的特征表示作为培训总产品网络(SPN)的数据。由于SPN通常用于估计数据集的分布,因此它们非常适合估算类概率的任务,这些概率可以直接由标准采集函数(例如最大熵和变异比率)使用。此外,我们通过在SPN模型的帮助下通过权重增强了这些采集函数。这些权重使采集功能对数据点的可疑类标签的多样性更加敏感。我们的方法的有效性在对MNIST,时尚持续和CIFAR-10数据集的实验研究中得到了证明,我们将其与最先进的方法MC辍学和贝叶斯批次进行了比较。
translated by 谷歌翻译
收购用于监督学习的标签可能很昂贵。为了提高神经网络回归的样本效率,我们研究了活跃的学习方法,这些方法可以适应地选择未标记的数据进行标记。我们提出了一个框架,用于从(与网络相关的)基础内核,内核转换和选择方法中构造此类方法。我们的框架涵盖了许多基于神经网络的高斯过程近似以及非乘式方法的现有贝叶斯方法。此外,我们建议用草图的有限宽度神经切线核代替常用的最后层特征,并将它们与一种新型的聚类方法结合在一起。为了评估不同的方法,我们引入了一个由15个大型表格回归数据集组成的开源基准。我们所提出的方法的表现优于我们的基准测试上的最新方法,缩放到大数据集,并在不调整网络体系结构或培训代码的情况下开箱即用。我们提供开源代码,包括所有内核,内核转换和选择方法的有效实现,并可用于复制我们的结果。
translated by 谷歌翻译
现代深度学习方法构成了令人难以置信的强大工具,以解决无数的挑战问题。然而,由于深度学习方法作为黑匣子运作,因此与其预测相关的不确定性往往是挑战量化。贝叶斯统计数据提供了一种形式主义来理解和量化与深度神经网络预测相关的不确定性。本教程概述了相关文献和完整的工具集,用于设计,实施,列车,使用和评估贝叶斯神经网络,即使用贝叶斯方法培训的随机人工神经网络。
translated by 谷歌翻译
实际符号可以传达有价值的直觉并简明扼要地表达新的想法。信息理论对机器学习具有重要性,但信息理论量的符号有时是不透明的。我们提出了一种实用和统一的符号,并将其扩展到包括观察结果(事件)和随机变量之间的信息 - 理论量。这包括在NLP中已知的点亮相互信息,以及在贝叶斯最佳实验设计中的认知科学和信息增益中的特定惊喜和特定信息,如特定的惊喜和特定信息。我们应用了我们的符号来证明使用新的直觉来证明Macka(2003)提到的二项式系数的次要系数的近似值。我们还简明地重新改造了变形自动编码器和大约贝叶斯神经网络中的变差推断的证据。此外,我们将符号应用于贝叶斯主动学习中的流行信息 - 理论采集函数,它选择由专家标记的最具信息性(未标记的)样本,并将此采集功能扩展到核心设置的问题,目标是选择给定标签的大多数信息样本。
translated by 谷歌翻译
Accurate uncertainty quantification is a major challenge in deep learning, as neural networks can make overconfident errors and assign high confidence predictions to out-of-distribution (OOD) inputs. The most popular approaches to estimate predictive uncertainty in deep learning are methods that combine predictions from multiple neural networks, such as Bayesian neural networks (BNNs) and deep ensembles. However their practicality in real-time, industrial-scale applications are limited due to the high memory and computational cost. Furthermore, ensembles and BNNs do not necessarily fix all the issues with the underlying member networks. In this work, we study principled approaches to improve uncertainty property of a single network, based on a single, deterministic representation. By formalizing the uncertainty quantification as a minimax learning problem, we first identify distance awareness, i.e., the model's ability to quantify the distance of a testing example from the training data, as a necessary condition for a DNN to achieve high-quality (i.e., minimax optimal) uncertainty estimation. We then propose Spectral-normalized Neural Gaussian Process (SNGP), a simple method that improves the distance-awareness ability of modern DNNs with two simple changes: (1) applying spectral normalization to hidden weights to enforce bi-Lipschitz smoothness in representations and (2) replacing the last output layer with a Gaussian process layer. On a suite of vision and language understanding benchmarks, SNGP outperforms other single-model approaches in prediction, calibration and out-of-domain detection. Furthermore, SNGP provides complementary benefits to popular techniques such as deep ensembles and data augmentation, making it a simple and scalable building block for probabilistic deep learning. Code is open-sourced at https://github.com/google/uncertainty-baselines
translated by 谷歌翻译
越来越需要与深神经网络兼容的有效主动学习算法。本文激励和重新审视基于经典的Fisher的主动选择目标,并提出了诱饵,实用,易拔和高性能的算法,使其可以与神经模型一起使用。诱饵从参数模型的最大似然估计器(MLE)的理论分析中汲取灵感。它通过在FISHER信息方面优化MLE误差的绑定来选择批次的样本,我们通过利用线性代数结构可以在规模上有效地实现,特别是在现代硬件上执行。我们的实验表明,诱饵始于先前的本领域技术在分类和回归问题上,并且足够灵活,可以与各种模型架构一起使用。
translated by 谷歌翻译
It is widely believed that given the same labeling budget, active learning algorithms like uncertainty sampling achieve better predictive performance than passive learning (i.e. uniform sampling), albeit at a higher computational cost. Recent empirical evidence suggests that this added cost might be in vain, as uncertainty sampling can sometimes perform even worse than passive learning. While existing works offer different explanations in the low-dimensional regime, this paper shows that the underlying mechanism is entirely different in high dimensions: we prove for logistic regression that passive learning outperforms uncertainty sampling even for noiseless data and when using the uncertainty of the Bayes optimal classifier. Insights from our proof indicate that this high-dimensional phenomenon is exacerbated when the separation between the classes is small. We corroborate this intuition with experiments on 20 high-dimensional datasets spanning a diverse range of applications, from finance and histology to chemistry and computer vision.
translated by 谷歌翻译
估算高维观测数据的个性化治疗效果在实验设计不可行,不道德或昂贵的情况下是必不可少的。现有方法依赖于拟合对治疗和控制人群的结果的深层模型。然而,当测量单独的结果是昂贵的时,就像肿瘤活检一样,需要一种用于获取每种结果的样本有效的策略。深度贝叶斯主动学习通过选择具有高不确定性的点来提供高效数据采集的框架。然而,现有方法偏置训练数据获取对处理和控制群体之间的非重叠支持区域。这些不是样本效率,因为在这些区域中不可识别治疗效果。我们介绍了因果关系,贝叶斯采集函数接地的信息理论,使数据采集朝向具有重叠支持的地区,以最大限度地提高学习个性化治疗效果的采样效率。我们展示了拟议的综合和半合成数据集IHDP和CMNIST上提出的收购策略及其扩展的表现,旨在模拟常见的数据集偏差和病理学。
translated by 谷歌翻译
Even though active learning forms an important pillar of machine learning, deep learning tools are not prevalent within it. Deep learning poses several difficulties when used in an active learning setting. First, active learning (AL) methods generally rely on being able to learn and update models from small amounts of data. Recent advances in deep learning, on the other hand, are notorious for their dependence on large amounts of data. Second, many AL acquisition functions rely on model uncertainty, yet deep learning methods rarely represent such model uncertainty. In this paper we combine recent advances in Bayesian deep learning into the active learning framework in a practical way. We develop an active learning framework for high dimensional data, a task which has been extremely challenging so far, with very sparse existing literature. Taking advantage of specialised models such as Bayesian convolutional neural networks, we demonstrate our active learning techniques with image data, obtaining a significant improvement on existing active learning approaches. We demonstrate this on both the MNIST dataset, as well as for skin cancer diagnosis from lesion images (ISIC2016 task).
translated by 谷歌翻译
在深神经网络中量化预测性不确定性的流行方法通常涉及一组权重或模型,例如通过合并或蒙特卡罗辍学。这些技术通常必须产生开销,必须培训多种模型实例,或者不会产生非常多样化的预测。该调查旨在熟悉基于证据深度学习的概念的替代类模型的读者:对于不熟悉的数据,他们承认“他们不知道的内容”并返回到先前的信仰。此外,它们允许在单个模型中进行不确定性估计,并通过参数化分布分布来转发传递。该调查重新承认现有工作,重点是在分类设置中的实现。最后,我们调查了相同范例的应用到回归问题。我们还对现有的方法进行了反思,并与现有方法相比,并提供最大的核心理论成果,以便通知未来的研究。
translated by 谷歌翻译
我们提出了一种新方法,用于近似于基于假设标记的候选数据点进行重新培训的主动学习获取策略。尽管这通常与深层网络不可行,但我们使用神经切线内核来近似重新进行重新培训的结果,并证明该近似值即使在主动学习设置中也无效 - 近似于“ look-aead abead”选择标准,所需的计算要少得多。 。这也使我们能够进行顺序的主动学习,即在流态中更新模型,而无需在添加每个新数据点后使用SGD重新训练模型。此外,我们的查询策略可以更好地理解模型的预测将如何通过与标准(“近视”)标准相比,通过大幅度击败其他查看策略,并获得相等或更好的性能,并取得了相等或更好的性能。基于池的主动学习中的几个基准数据集上的最新方法。
translated by 谷歌翻译
标记数据可以是昂贵的任务,因为它通常由域专家手动执行。对于深度学习而言,这是繁琐的,因为它取决于大型标记的数据集。主动学习(AL)是一种范式,旨在通过仅使用二手车型认为最具信息丰富的数据来减少标签努力。在文本分类设置中,在AL上完成了很少的研究,旁边没有涉及最近的最先进的自然语言处理(NLP)模型。在这里,我们介绍了一个实证研究,可以将基于不确定性的基于不确定性的算法与Bert $ _ {base} $相比,作为使用的分类器。我们评估两个NLP分类数据集的算法:斯坦福情绪树木银行和kvk-Front页面。此外,我们探讨了旨在解决不确定性的al的预定问题的启发式;即,它是不可规范的,并且易于选择异常值。此外,我们探讨了查询池大小对al的性能的影响。虽然发现,AL的拟议启发式没有提高AL的表现;我们的结果表明,使用BERT $ _ {Base} $概率使用不确定性的AL。随着查询池大小变大,性能的这种差异可以减少。
translated by 谷歌翻译
Estimating how uncertain an AI system is in its predictions is important to improve the safety of such systems. Uncertainty in predictive can result from uncertainty in model parameters, irreducible data uncertainty and uncertainty due to distributional mismatch between the test and training data distributions. Different actions might be taken depending on the source of the uncertainty so it is important to be able to distinguish between them. Recently, baseline tasks and metrics have been defined and several practical methods to estimate uncertainty developed. These methods, however, attempt to model uncertainty due to distributional mismatch either implicitly through model uncertainty or as data uncertainty. This work proposes a new framework for modeling predictive uncertainty called Prior Networks (PNs) which explicitly models distributional uncertainty. PNs do this by parameterizing a prior distribution over predictive distributions. This work focuses on uncertainty for classification and evaluates PNs on the tasks of identifying out-of-distribution (OOD) samples and detecting misclassification on the MNIST and CIFAR-10 datasets, where they are found to outperform previous methods. Experiments on synthetic and MNIST data show that unlike previous non-Bayesian methods PNs are able to distinguish between data and distributional uncertainty.
translated by 谷歌翻译
本文解决了在水模型部署民主化中采用了机器学习的一些挑战。第一个挑战是减少了在主动学习的帮助下减少了标签努力(因此关注数据质量),模型推断与Oracle之间的反馈循环:如在保险中,未标记的数据通常丰富,主动学习可能会成为一个重要的资产减少标签成本。为此目的,本文在研究其对合成和真实数据集的实证影响之前,阐述了各种古典主动学习方法。保险中的另一个关键挑战是模型推论中的公平问题。我们将在此主动学习框架中介绍和整合一个用于多级任务的后处理公平,以解决这两个问题。最后对不公平数据集的数值实验突出显示所提出的设置在模型精度和公平性之间存在良好的折衷。
translated by 谷歌翻译