随着神经网络分类器部署在现实世界应用中,它们可以可靠地检测到它们的故障至关重要。一个实际解决方案是为每个预测分配置信度分数,然后使用这些分数来过滤可能的错误分类。然而,现有的置信度量尚未充分可靠地对此作用。本文介绍了一种新的框架,可以产生用于检测错误分类错误的定量度量。此框架红色在基本分类器的顶部构建错误检测器,并估计使用高斯过程的检测分数的不确定性。在125 UCI数据集上具有其他错误检测方法的实验比较证明了这种方法是有效的。在两个概率基础分类器上进一步实现以及视觉任务中的两个大型深度学习架构进一步证实了该方法是坚固且可扩展的。第三,用分布外和对抗样本的红色的实证分析表明,该方法不仅可以检测错误,还可以使用,而且可以了解它们来自哪里。因此,红色可以使用未来更广泛地提高神经网络分类器的可信度。
translated by 谷歌翻译
深度神经网络具有令人印象深刻的性能,但是他们无法可靠地估计其预测信心,从而限制了其在高风险领域中的适用性。我们表明,应用多标签的一VS损失揭示了分类的歧义并降低了模型的过度自信。引入的Slova(单标签One-Vs-All)模型重新定义了单个标签情况的典型单VS-ALL预测概率,其中只有一个类是正确的答案。仅当单个类具有很高的概率并且其他概率可忽略不计时,提议的分类器才有信心。与典型的SoftMax函数不同,如果所有其他类的概率都很小,Slova自然会检测到分布的样本。该模型还通过指数校准进行了微调,这使我们能够与模型精度准确地对齐置信分数。我们在三个任务上验证我们的方法。首先,我们证明了斯洛伐克与最先进的分布校准具有竞争力。其次,在数据集偏移下,斯洛伐克的性能很强。最后,我们的方法在检测到分布样品的检测方面表现出色。因此,斯洛伐克是一种工具,可以在需要不确定性建模的各种应用中使用。
translated by 谷歌翻译
Accurate uncertainty quantification is a major challenge in deep learning, as neural networks can make overconfident errors and assign high confidence predictions to out-of-distribution (OOD) inputs. The most popular approaches to estimate predictive uncertainty in deep learning are methods that combine predictions from multiple neural networks, such as Bayesian neural networks (BNNs) and deep ensembles. However their practicality in real-time, industrial-scale applications are limited due to the high memory and computational cost. Furthermore, ensembles and BNNs do not necessarily fix all the issues with the underlying member networks. In this work, we study principled approaches to improve uncertainty property of a single network, based on a single, deterministic representation. By formalizing the uncertainty quantification as a minimax learning problem, we first identify distance awareness, i.e., the model's ability to quantify the distance of a testing example from the training data, as a necessary condition for a DNN to achieve high-quality (i.e., minimax optimal) uncertainty estimation. We then propose Spectral-normalized Neural Gaussian Process (SNGP), a simple method that improves the distance-awareness ability of modern DNNs with two simple changes: (1) applying spectral normalization to hidden weights to enforce bi-Lipschitz smoothness in representations and (2) replacing the last output layer with a Gaussian process layer. On a suite of vision and language understanding benchmarks, SNGP outperforms other single-model approaches in prediction, calibration and out-of-domain detection. Furthermore, SNGP provides complementary benefits to popular techniques such as deep ensembles and data augmentation, making it a simple and scalable building block for probabilistic deep learning. Code is open-sourced at https://github.com/google/uncertainty-baselines
translated by 谷歌翻译
人工智能的最新趋势是将验证的模型用于语言和视觉任务,这些模型已经实现了非凡的表现,但也令人困惑。因此,以各种方式探索这些模型的能力对该领域至关重要。在本文中,我们探讨了模型的可靠性,在其中我们将可靠的模型定义为一个不仅可以实现强大的预测性能,而且在许多涉及不确定性(例如选择性预测,开放式设置识别)的决策任务上,在许多决策任务上表现出色,而且表现良好。强大的概括(例如,准确性和适当的评分规则,例如在分布数据集中和分发数据集上的对数可能性)和适应性(例如,主动学习,几乎没有射击不确定性)。我们设计了40个数据集的10种任务类型,以评估视觉和语言域上可靠性的不同方面。为了提高可靠性,我们分别开发了VIT-PLEX和T5-PLEX,分别针对视觉和语言方式扩展了大型模型。 PLEX极大地改善了跨可靠性任务的最先进,并简化了传统协议,因为它可以改善开箱即用的性能,并且不需要设计分数或为每个任务调整模型。我们演示了高达1B参数的模型尺寸的缩放效果,并预处理数据集大小最多4B示例。我们还展示了PLEX在具有挑战性的任务上的功能,包括零射门的开放式识别,主动学习和对话语言理解中的不确定性。
translated by 谷歌翻译
Modern machine learning methods including deep learning have achieved great success in predictive accuracy for supervised learning tasks, but may still fall short in giving useful estimates of their predictive uncertainty. Quantifying uncertainty is especially critical in real-world settings, which often involve input distributions that are shifted from the training distribution due to a variety of factors including sample bias and non-stationarity. In such settings, well calibrated uncertainty estimates convey information about when a model's output should (or should not) be trusted. Many probabilistic deep learning methods, including Bayesian-and non-Bayesian methods, have been proposed in the literature for quantifying predictive uncertainty, but to our knowledge there has not previously been a rigorous largescale empirical comparison of these methods under dataset shift. We present a largescale benchmark of existing state-of-the-art methods on classification problems and investigate the effect of dataset shift on accuracy and calibration. We find that traditional post-hoc calibration does indeed fall short, as do several other previous methods. However, some methods that marginalize over models give surprisingly strong results across a broad spectrum of tasks.
translated by 谷歌翻译
Deep neural networks have attained remarkable performance when applied to data that comes from the same distribution as that of the training set, but can significantly degrade otherwise. Therefore, detecting whether an example is out-of-distribution (OoD) is crucial to enable a system that can reject such samples or alert users. Recent works have made significant progress on OoD benchmarks consisting of small image datasets. However, many recent methods based on neural networks rely on training or tuning with both in-distribution and out-of-distribution data. The latter is generally hard to define a-priori, and its selection can easily bias the learning. We base our work on a popular method ODIN 1 [21], proposing two strategies for freeing it from the needs of tuning with OoD data, while improving its OoD detection performance. We specifically propose to decompose confidence scoring as well as a modified input pre-processing method. We show that both of these significantly help in detection performance. Our further analysis on a larger scale image dataset shows that the two types of distribution shifts, specifically semantic shift and non-semantic shift, present a significant difference in the difficulty of the problem, providing an analysis of when ODIN-like strategies do or do not work.
translated by 谷歌翻译
Detecting test samples drawn sufficiently far away from the training distribution statistically or adversarially is a fundamental requirement for deploying a good classifier in many real-world machine learning applications. However, deep neural networks with the softmax classifier are known to produce highly overconfident posterior distributions even for such abnormal samples. In this paper, we propose a simple yet effective method for detecting any abnormal samples, which is applicable to any pre-trained softmax neural classifier. We obtain the class conditional Gaussian distributions with respect to (low-and upper-level) features of the deep models under Gaussian discriminant analysis, which result in a confidence score based on the Mahalanobis distance. While most prior methods have been evaluated for detecting either out-of-distribution or adversarial samples, but not both, the proposed method achieves the state-of-the-art performances for both cases in our experiments. Moreover, we found that our proposed method is more robust in harsh cases, e.g., when the training dataset has noisy labels or small number of samples. Finally, we show that the proposed method enjoys broader usage by applying it to class-incremental learning: whenever out-of-distribution samples are detected, our classification rule can incorporate new classes well without further training deep models.
translated by 谷歌翻译
深度神经网络对各种任务取得了出色的性能,但它们具有重要问题:即使对于完全未知的样本,也有过度自信的预测。已经提出了许多研究来成功过滤出这些未知的样本,但它们仅考虑狭窄和特定的任务,称为错误分类检测,开放式识别或分布外检测。在这项工作中,我们认为这些任务应该被视为根本存在相同的问题,因为理想的模型应该具有所有这些任务的检测能力。因此,我们介绍了未知的检测任务,以先前的单独任务的整合,用于严格检查深度神经网络对广谱的广泛未知样品的检测能力。为此,构建了不同尺度上的统一基准数据集,并且存在现有流行方法的未知检测能力进行比较。我们发现深度集合始终如一地优于检测未知的其他方法;但是,所有方法只针对特定类型的未知方式成功。可重复的代码和基准数据集可在https://github.com/daintlab/unknown-detection-benchmarks上获得。
translated by 谷歌翻译
Reliable application of machine learning-based decision systems in the wild is one of the major challenges currently investigated by the field. A large portion of established approaches aims to detect erroneous predictions by means of assigning confidence scores. This confidence may be obtained by either quantifying the model's predictive uncertainty, learning explicit scoring functions, or assessing whether the input is in line with the training distribution. Curiously, while these approaches all state to address the same eventual goal of detecting failures of a classifier upon real-life application, they currently constitute largely separated research fields with individual evaluation protocols, which either exclude a substantial part of relevant methods or ignore large parts of relevant failure sources. In this work, we systematically reveal current pitfalls caused by these inconsistencies and derive requirements for a holistic and realistic evaluation of failure detection. To demonstrate the relevance of this unified perspective, we present a large-scale empirical study for the first time enabling benchmarking confidence scoring functions w.r.t all relevant methods and failure sources. The revelation of a simple softmax response baseline as the overall best performing method underlines the drastic shortcomings of current evaluation in the abundance of publicized research on confidence scoring. Code and trained models are at https://github.com/IML-DKFZ/fd-shifts.
translated by 谷歌翻译
检测到分布(OOD)数据是一项任务,它正在接受计算机视觉的深度学习领域越来越多的研究注意力。但是,通常在隔离任务上评估检测方法的性能,而不是考虑串联中的潜在下游任务。在这项工作中,我们检查了存在OOD数据(SCOD)的选择性分类。也就是说,检测OOD样本的动机是拒绝它们,以便降低它们对预测质量的影响。我们在此任务规范下表明,与仅在OOD检测时进行评估时,现有的事后方法的性能大不相同。这是因为如果ID数据被错误分类,将分布分配(ID)数据与OOD数据混合在一起的问题不再是一个问题。但是,正确和不正确的预测的ID数据中的汇合变得不受欢迎。我们还提出了一种新颖的SCOD,SoftMax信息保留(SIRC)的方法,该方法通过功能不足信息来增强基于软疗法的置信度得分,以便在不牺牲正确和错误的ID预测之间的分离的情况下,可以提高其识别OOD样品的能力。在各种成像网尺度数据集和卷积神经网络体系结构上进行的实验表明,SIRC能够始终如一地匹配或胜过SCOD的基线,而现有的OOD检测方法则无法做到。
translated by 谷歌翻译
缺乏精心校准的置信度估计值使神经网络在安全至关重要的领域(例如自动驾驶或医疗保健)中不足。在这些设置中,有能力放弃对分布(OOD)数据进行预测的能力,就像正确分类分布数据一样重要。我们介绍了$ P $ -DKNN,这是一种新颖的推理程序,该过程采用了经过训练的深神经网络,并分析了其中间隐藏表示形式的相似性结构,以计算与端到端模型预测相关的$ p $值。直觉是,在潜在表示方面执行的统计测试不仅可以用作分类器,还可以提供统计上有充分根据的不确定性估计。 $ P $ -DKNN是可扩展的,并利用隐藏层学到的表示形式的组成,这使深度表示学习成功。我们的理论分析基于Neyman-Pearson的分类,并将其与选择性分类的最新进展(拒绝选项)联系起来。我们证明了在放弃预测OOD输入和保持分布输入的高精度之间的有利权衡。我们发现,$ p $ -DKNN强迫自适应攻击者制作对抗性示例(一种最差的OOD输入形式),以对输入引入语义上有意义的更改。
translated by 谷歌翻译
已知现代深度神经网络模型将错误地将分布式(OOD)测试数据分类为具有很高信心的分数(ID)培训课程之一。这可能会对关键安全应用产生灾难性的后果。一种流行的缓解策略是训练单独的分类器,该分类器可以在测试时间检测此类OOD样本。在大多数实际设置中,在火车时间尚不清楚OOD的示例,因此,一个关键问题是:如何使用合成OOD样品来增加ID数据以训练这样的OOD检测器?在本文中,我们为称为CNC的OOD数据增强提出了一种新颖的复合腐败技术。 CNC的主要优点之一是,除了培训集外,它不需要任何固定数据。此外,与当前的最新技术(SOTA)技术不同,CNC不需要在测试时间进行反向传播或结合,从而使我们的方法在推断时更快。我们与过去4年中主要会议的20种方法进行了广泛的比较,表明,在OOD检测准确性和推理时间方面,使用基于CNC的数据增强训练的模型都胜过SOTA。我们包括详细的事后分析,以研究我们方法成功的原因,并确定CNC样本的较高相对熵和多样性是可能的原因。我们还通过对二维数据集进行零件分解分析提供理论见解,以揭示(视觉和定量),我们的方法导致ID类别周围的边界更紧密,从而更好地检测了OOD样品。源代码链接:https://github.com/cnc-ood
translated by 谷歌翻译
最近,深度学习中的不确定性估计已成为提高安全至关重要应用的可靠性和鲁棒性的关键领域。尽管有许多提出的方法要么关注距离感知模型的不确定性,要么是分布式检测的不确定性,要么是针对分布校准的输入依赖性标签不确定性,但这两种类型的不确定性通常都是必要的。在这项工作中,我们提出了用于共同建模模型和数据不确定性的HETSNGP方法。我们表明,我们提出的模型在这两种类型的不确定性之间提供了有利的组合,因此在包括CIFAR-100C,ImagEnet-C和Imagenet-A在内的一些具有挑战性的分发数据集上优于基线方法。此外,我们提出了HETSNGP Ensemble,这是我们方法的结合版本,该版本还对网络参数的不确定性进行建模,并优于其他集合基线。
translated by 谷歌翻译
飞机行业不断努力在人类的努力,计算时间和资源消耗方面寻求更有效的设计优化方法。当替代模型和最终过渡到HF模型的开关机制均被正确校准时,混合替代物优化保持了高效果,同时提供快速的设计评估。前馈神经网络(FNN)可以捕获高度非线性输入输出映射,从而为飞机绩效因素提供有效的替代物。但是,FNN通常无法概括分布(OOD)样本,这阻碍了它们在关键飞机设计优化中的采用。通过Smood,我们基于平滑度的分布检测方法,我们建议用优化的FNN替代物来编码一个依赖模型的OOD指标,以产生具有选择性但可信度的预测的值得信赖的替代模型。与常规的不确定性接地方法不同,Smood利用了HF模拟的固有平滑性特性,可以通过揭示其可疑敏感性有效地暴露OOD,从而避免对OOD样品的过度自信不确定性估计。通过使用SMOOD,仅将高风险的OOD输入转发到HF模型以进行重新评估,从而以低开销成本获得更准确的结果。研究了三个飞机性能模型。结果表明,基于FNN的代理在预测性能方面优于其高斯过程。此外,在所有研究案例中,Smood的确覆盖了85%的实际OOD。当Smood Plus FNN替代物被部署在混合替代优化设置中时,它们的错误率分别降低了34.65%和计算速度的降低率分别为58.36次。
translated by 谷歌翻译
机器学习模型通常会遇到与训练分布不同的样本。无法识别分布(OOD)样本,因此将该样本分配给课堂标签会显着损害模​​型的可靠性。由于其对在开放世界中的安全部署模型的重要性,该问题引起了重大关注。由于对所有可能的未知分布进行建模的棘手性,检测OOD样品是具有挑战性的。迄今为止,一些研究领域解决了检测陌生样本的问题,包括异常检测,新颖性检测,一级学习,开放式识别识别和分布外检测。尽管有相似和共同的概念,但分别分布,开放式检测和异常检测已被独立研究。因此,这些研究途径尚未交叉授粉,创造了研究障碍。尽管某些调查打算概述这些方法,但它们似乎仅关注特定领域,而无需检查不同领域之间的关系。这项调查旨在在确定其共同点的同时,对各个领域的众多著名作品进行跨域和全面的审查。研究人员可以从不同领域的研究进展概述中受益,并协同发展未来的方法。此外,据我们所知,虽然进行异常检测或单级学习进行了调查,但没有关于分布外检测的全面或最新的调查,我们的调查可广泛涵盖。最后,有了统一的跨域视角,我们讨论并阐明了未来的研究线,打算将这些领域更加紧密地融为一体。
translated by 谷歌翻译
Deep neural networks (NNs) are powerful black box predictors that have recently achieved impressive performance on a wide spectrum of tasks. Quantifying predictive uncertainty in NNs is a challenging and yet unsolved problem. Bayesian NNs, which learn a distribution over weights, are currently the state-of-the-art for estimating predictive uncertainty; however these require significant modifications to the training procedure and are computationally expensive compared to standard (non-Bayesian) NNs. We propose an alternative to Bayesian NNs that is simple to implement, readily parallelizable, requires very little hyperparameter tuning, and yields high quality predictive uncertainty estimates. Through a series of experiments on classification and regression benchmarks, we demonstrate that our method produces well-calibrated uncertainty estimates which are as good or better than approximate Bayesian NNs. To assess robustness to dataset shift, we evaluate the predictive uncertainty on test examples from known and unknown distributions, and show that our method is able to express higher uncertainty on out-of-distribution examples. We demonstrate the scalability of our method by evaluating predictive uncertainty estimates on ImageNet.
translated by 谷歌翻译
背景。通常,深度神经网络(DNN)概括了从类似于训练集的分布的样本概括。然而,当测试样本从不同的分布中抽出时,DNNS的预测是脆性和不可靠的。这是在现实世界应用中部署的主要关注点,这种行为可能以相当大的成本,例如工业生产线,自治车辆或医疗保健应用。贡献。我们将DNN中的分布(OOD)检测出来作为统计假设检测问题。在我们所提出的框架内产生的测试将证据组合来自整个网络。与以前的检测启发式不同,此框架返回每个测试样本的$ p $ -value。有保证维护I型错误(T1E - 错误地识别OOD样本为ID)进行测试数据。此外,这允许在保持T1E的同时组合多个检测器。在此框架上建立,我们建议一种基于低阶统计数据的新型程序。我们的方法在不接受的EOD基准上的最新方法实现了比较或更好的结果,而无需再培训网络参数或假设测试分配的现有知识 - 并且以计算成本的一小部分。
translated by 谷歌翻译
我们表明,著名的混音的有效性[Zhang等,2018],如果而不是将其用作唯一的学习目标,就可以进一步改善它,而是将其用作标准跨侧面损失的附加规则器。这种简单的变化不仅提供了太大的准确性,而且在大多数情况下,在各种形式的协变量转移和分布外检测实验下,在大多数情况下,混合量的预测不确定性估计质量都显着提高了。实际上,我们观察到混合物在检测出分布样本时可能会产生大量退化的性能,因为我们在经验上表现出来,因为它倾向于学习在整个过程中表现出高渗透率的模型。很难区分分布样本与近分离样本。为了显示我们的方法的功效(RegMixup),我们在视觉数据集(Imagenet&Cifar-10/100)上提供了详尽的分析和实验,并将其与最新方法进行比较,以进行可靠的不确定性估计。
translated by 谷歌翻译
Commonly used AI networks are very self-confident in their predictions, even when the evidence for a certain decision is dubious. The investigation of a deep learning model output is pivotal for understanding its decision processes and assessing its capabilities and limitations. By analyzing the distributions of raw network output vectors, it can be observed that each class has its own decision boundary and, thus, the same raw output value has different support for different classes. Inspired by this fact, we have developed a new method for out-of-distribution detection. The method offers an explanatory step beyond simple thresholding of the softmax output towards understanding and interpretation of the model learning process and its output. Instead of assigning the class label of the highest logit to each new sample presented to the network, it takes the distributions over all classes into consideration. A probability score interpreter (PSI) is created based on the joint logit values in relation to their respective correct vs wrong class distributions. The PSI suggests whether the sample is likely to belong to a specific class, whether the network is unsure, or whether the sample is likely an outlier or unknown type for the network. The simple PSI has the benefit of being applicable on already trained networks. The distributions for correct vs wrong class for each output node are established by simply running the training examples through the trained network. We demonstrate our OOD detection method on a challenging transmission electron microscopy virus image dataset. We simulate a real-world application in which images of virus types unknown to a trained virus classifier, yet acquired with the same procedures and instruments, constitute the OOD samples.
translated by 谷歌翻译
最近出现了一系列用于估计具有单个正向通行证的深神经网络中的认知不确定性的新方法,最近已成为贝叶斯神经网络的有效替代方法。在信息性表示的前提下,这些确定性不确定性方法(DUM)在检测到分布(OOD)数据的同时在推理时添加可忽略的计算成本时实现了强大的性能。但是,目前尚不清楚dums是否经过校准,可以无缝地扩展到现实世界的应用 - 这都是其实际部署的先决条件。为此,我们首先提供了DUMS的分类法,并在连续分配转移下评估其校准。然后,我们将它们扩展到语义分割。我们发现,尽管DUMS尺度到现实的视觉任务并在OOD检测方面表现良好,但当前方法的实用性受到分配变化下的校准不良而破坏的。
translated by 谷歌翻译