对于许多应用,分析机器学习模型的不确定性是必不可少的。尽管不确定性量化(UQ)技术的研究对于计算机视觉应用非常先进,但对时空数据的UQ方法的研究较少。在本文中,我们专注于在线手写识别的模型,这是一种特定类型的时空数据。数据是从传感器增强的笔中观察到的,其目标是对书面字符进行分类。我们基于两种突出的贝叶斯推理,平均高斯(赃物)和深层合奏的突出技术对核心(数据)和认知(模型)UQ进行了广泛的评估。在对模型的更好理解后,UQ技术可以在组合右手和左撇子作家(一个代表性不足的组)时检测分布数据和域的变化。
translated by 谷歌翻译
目的。手写是日常生活中最常见的模式之一,由于它具有挑战性的应用,例如手写识别(HWR),作家识别和签名验证。与仅使用空间信息(即图像)的离线HWR相反,在线HWR(ONHWR)使用更丰富的时空信息(即轨迹数据或惯性数据)。尽管存在许多离线HWR数据集,但只有很少的数据可用于开发纸质上的ONHWR方法,因为它需要硬件集成的笔。方法。本文为实时序列到序列(SEQ2SEQ)学习和基于单个字符的识别提供了数据和基准模型。我们的数据由传感器增强的圆珠笔记录,从三轴加速度计,陀螺仪,磁力计和力传感器100 \,\ textit {hz}产生传感器数据流。我们建议各种数据集,包括与作者依赖和作者无关的任务的方程式和单词。我们的数据集允许在平板电脑上的经典ONHWR与传感器增强笔之间进行比较。我们使用经常性和时间卷积网络和变压器与连接派时间分类(CTC)损失(CTC)损失(CE)损失,为SEQ2SEQ和基于单个字符的HWR提供了评估基准。结果。我们的卷积网络与Bilstms相结合,优于基于变压器的架构,与基于序列的分类任务的启动时间相提并论,并且与28种最先进的技术相比,结果更好。时间序列扩展方法改善了基于序列的任务,我们表明CE变体可以改善单个分类任务。
translated by 谷歌翻译
人工神经网络无法评估其预测的不确定性是对它们广泛使用的障碍。我们区分了两种类型的可学习不确定性:由于缺乏训练数据和噪声引起的观察不确定性而导致的模型不确定性。贝叶斯神经网络使用坚实的数学基础来学习其预测的模型不确定性。观察不确定性可以通过在这些网络中添加一层并增强其损失功能来计算观察不确定性。我们的贡献是将这些不确定性概念应用于预测过程监控任务中,以训练基于不确定性的模型以预测剩余时间和结果。我们的实验表明,不确定性估计值允许分化更多和不准确的预测,并在回归和分类任务中构建置信区间。即使在运行过程的早期阶段,这些结论仍然是正确的。此外,部署的技术是快速的,并产生了更准确的预测。学习的不确定性可以增加用户对其流程预测系统的信心,促进人类与这些系统之间的更好合作,并通过较小的数据集实现早期的实施。
translated by 谷歌翻译
Several recent works find empirically that the average test error of deep neural networks can be estimated via the prediction disagreement of models, which does not require labels. In particular, Jiang et al. (2022) show for the disagreement between two separately trained networks that this `Generalization Disagreement Equality' follows from the well-calibrated nature of deep ensembles under the notion of a proposed `class-aggregated calibration.' In this reproduction, we show that the suggested theory might be impractical because a deep ensemble's calibration can deteriorate as prediction disagreement increases, which is precisely when the coupling of test error and disagreement is of interest, while labels are needed to estimate the calibration on new datasets. Further, we simplify the theoretical statements and proofs, showing them to be straightforward within a probabilistic context, unlike the original hypothesis space view employed by Jiang et al. (2022).
translated by 谷歌翻译
最近实现了更准确的短期预测的数据驱动的空气质量预测。尽管取得了成功,但大多数目前的数据驱动解决方案都缺乏适当的模型不确定性的量化,以传达信任预测的程度。最近,在概率深度学习中已经制定了几种估计不确定性的实用工具。但是,在空气质量预测领域的域中没有经验应用和广泛的比较这些工具。因此,这项工作在空气质量预测的真实环境中应用了最先进的不确定性量化。通过广泛的实验,我们描述了培训概率模型,并根据经验性能,信心可靠性,置信度估计和实际适用性评估其预测性不确定性。我们还使用空气质量数据中固有的“自由”对抗培训和利用时间和空间相关性提出改善这些模型。我们的实验表明,所提出的模型比以前的工作更好地在量化数据驱动空气质量预测中的不确定性方面表现出。总体而言,贝叶斯神经网络提供了更可靠的不确定性估计,但可能挑战实施和规模。其他可扩展方法,如深合奏,蒙特卡罗(MC)辍学和随机重量平均-Gaussian(SWAG)可以执行良好,如果正确应用,但具有不同的权衡和性能度量的轻微变化。最后,我们的结果表明了不确定性估计的实际影响,并证明了,实际上,概率模型更适合提出知情决策。代码和数据集可用于\ url {https:/github.com/abdulmajid-murad/deep_probabilistic_forecast}
translated by 谷歌翻译
There are two major types of uncertainty one can model. Aleatoric uncertainty captures noise inherent in the observations. On the other hand, epistemic uncertainty accounts for uncertainty in the model -uncertainty which can be explained away given enough data. Traditionally it has been difficult to model epistemic uncertainty in computer vision, but with new Bayesian deep learning tools this is now possible. We study the benefits of modeling epistemic vs. aleatoric uncertainty in Bayesian deep learning models for vision tasks. For this we present a Bayesian deep learning framework combining input-dependent aleatoric uncertainty together with epistemic uncertainty. We study models under the framework with per-pixel semantic segmentation and depth regression tasks. Further, our explicit uncertainty formulation leads to new loss functions for these tasks, which can be interpreted as learned attenuation. This makes the loss more robust to noisy data, also giving new state-of-the-art results on segmentation and depth regression benchmarks.
translated by 谷歌翻译
在这项工作中,我们使用变分推论来量化无线电星系分类的深度学习模型预测的不确定性程度。我们表明,当标记无线电星系时,个体测试样本的模型后差水平与人类不确定性相关。我们探讨了各种不同重量前沿的模型性能和不确定性校准,并表明稀疏事先产生更良好的校准不确定性估计。使用单个重量的后部分布,我们表明我们可以通过从最低信噪比(SNR)中除去权重来修剪30%的完全连接的层权重,而无需显着损失性能。我们证明,可以使用基于Fisher信息的排名来实现更大程度的修剪,但我们注意到两种修剪方法都会影响Failaroff-Riley I型和II型无线电星系的不确定性校准。最后,我们表明,与此领域的其他工作相比,我们经历了冷的后效,因此后部必须缩小后加权以实现良好的预测性能。我们检查是否调整成本函数以适应模型拼盘可以弥补此效果,但发现它不会产生显着差异。我们还研究了原则数据增强的效果,并发现这改善了基线,而且还没有弥补观察到的效果。我们将其解释为寒冷的后效,因为我们的培训样本过于有效的策划导致可能性拼盘,并将其提高到未来无线电银行分类的潜在问题。
translated by 谷歌翻译
本文介绍了分类器校准原理和实践的简介和详细概述。校准的分类器正确地量化了与其实例明智的预测相关的不确定性或信心水平。这对于关键应用,最佳决策,成本敏感的分类以及某些类型的上下文变化至关重要。校准研究具有丰富的历史,其中几十年来预测机器学习作为学术领域的诞生。然而,校准兴趣的最近增加导致了新的方法和从二进制到多种子体设置的扩展。需要考虑的选项和问题的空间很大,并导航它需要正确的概念和工具集。我们提供了主要概念和方法的介绍性材料和最新的技术细节,包括适当的评分规则和其他评估指标,可视化方法,全面陈述二进制和多字数分类的HOC校准方法,以及几个先进的话题。
translated by 谷歌翻译
当机器学习模型将其应用于与最初训练的数据相似但不同的域中的数据时,它的性能会降低。为了减轻此域移位问题,域Adaptation(DA)技术搜索了最佳转换,该转换将(当前)输入数据从源域转换为目标域,以学习域名不变的表示,以减少域差异。本文根据两个步骤提出了一个新颖的监督DA。首先,我们从几个样本中搜索从源到目标域的最佳类依赖性转换。我们考虑了最佳的运输方法,例如地球搬运工的距离,凹痕传输和相关对准。其次,我们使用嵌入相似技术在推理时选择相应的转换。我们使用相关指标和高阶矩匹配技术。我们对具有域移动的时间序列数据集进行了广泛的评估,包括模拟和各种在线手写数据集,以演示性能。
translated by 谷歌翻译
量化监督学习模型的不确定性在制定更可靠的预测方面发挥着重要作用。认知不确定性,通常是由于对模型的知识不足,可以通过收集更多数据或精炼学习模型来减少。在过去的几年里,学者提出了许多认识的不确定性处理技术,这些技术可以大致分为两类,即贝叶斯和集合。本文对过去五年来提供了对监督学习的认识性不确定性学习技术的全面综述。因此,我们首先,将认知不确定性分解为偏见和方差术语。然后,介绍了认知不确定性学习技术以及其代表模型的分层分类。此外,提出了几种应用,例如计算机视觉(CV)和自然语言处理(NLP),然后讨论研究差距和可能的未来研究方向。
translated by 谷歌翻译
The notion of uncertainty is of major importance in machine learning and constitutes a key element of machine learning methodology. In line with the statistical tradition, uncertainty has long been perceived as almost synonymous with standard probability and probabilistic predictions. Yet, due to the steadily increasing relevance of machine learning for practical applications and related issues such as safety requirements, new problems and challenges have recently been identified by machine learning scholars, and these problems may call for new methodological developments. In particular, this includes the importance of distinguishing between (at least) two different types of uncertainty, often referred to as aleatoric and epistemic. In this paper, we provide an introduction to the topic of uncertainty in machine learning as well as an overview of attempts so far at handling uncertainty in general and formalizing this distinction in particular.
translated by 谷歌翻译
Objective: Convolutional neural networks (CNNs) have demonstrated promise in automated cardiac magnetic resonance image segmentation. However, when using CNNs in a large real-world dataset, it is important to quantify segmentation uncertainty and identify segmentations which could be problematic. In this work, we performed a systematic study of Bayesian and non-Bayesian methods for estimating uncertainty in segmentation neural networks. Methods: We evaluated Bayes by Backprop, Monte Carlo Dropout, Deep Ensembles, and Stochastic Segmentation Networks in terms of segmentation accuracy, probability calibration, uncertainty on out-of-distribution images, and segmentation quality control. Results: We observed that Deep Ensembles outperformed the other methods except for images with heavy noise and blurring distortions. We showed that Bayes by Backprop is more robust to noise distortions while Stochastic Segmentation Networks are more resistant to blurring distortions. For segmentation quality control, we showed that segmentation uncertainty is correlated with segmentation accuracy for all the methods. With the incorporation of uncertainty estimates, we were able to reduce the percentage of poor segmentation to 5% by flagging 31--48% of the most uncertain segmentations for manual review, substantially lower than random review without using neural network uncertainty (reviewing 75--78% of all images). Conclusion: This work provides a comprehensive evaluation of uncertainty estimation methods and showed that Deep Ensembles outperformed other methods in most cases. Significance: Neural network uncertainty measures can help identify potentially inaccurate segmentations and alert users for manual review.
translated by 谷歌翻译
生物关键是一种信号,可以从人体中连续测量,例如呼吸声,心脏活动(ECG),脑波(EEG)等,基于该信号,机器学习模型已经为自动疾病的非常有前途的性能开发检测和健康状态监测。但是,DataSet Shift,即,推理的数据分布因训练的分布而异,对于真实的基于生物信号的应用程序并不罕见。为了提高稳健性,具有不确定性资格的概率模型适于捕获预测的可靠性。然而,评估估计不确定性的质量仍然是一个挑战。在这项工作中,我们提出了一个框架来评估估计不确定性在捕获不同类型的生物数据集转换时估计的不确定性的能力。特别是,我们使用基于呼吸声和心电图信号的三个分类任务,以基准五个代表性的不确定性资格方法。广泛的实验表明,尽管集合和贝叶斯模型可以在数据集移位下提供相对更好的不确定性估计,但所有测试模型都无法满足可靠的预测和模型校准中的承诺。我们的工作为任何新开发的生物宣布分类器进行了全面评估,为全面评估铺平了道路。
translated by 谷歌翻译
Modern machine learning methods including deep learning have achieved great success in predictive accuracy for supervised learning tasks, but may still fall short in giving useful estimates of their predictive uncertainty. Quantifying uncertainty is especially critical in real-world settings, which often involve input distributions that are shifted from the training distribution due to a variety of factors including sample bias and non-stationarity. In such settings, well calibrated uncertainty estimates convey information about when a model's output should (or should not) be trusted. Many probabilistic deep learning methods, including Bayesian-and non-Bayesian methods, have been proposed in the literature for quantifying predictive uncertainty, but to our knowledge there has not previously been a rigorous largescale empirical comparison of these methods under dataset shift. We present a largescale benchmark of existing state-of-the-art methods on classification problems and investigate the effect of dataset shift on accuracy and calibration. We find that traditional post-hoc calibration does indeed fall short, as do several other previous methods. However, some methods that marginalize over models give surprisingly strong results across a broad spectrum of tasks.
translated by 谷歌翻译
不确定性是时间序列预测任务的重要考虑因素。在这项工作中,我们专门致力于量化流量预测的不确定性。为了实现这一目标,我们开发了深层时空的不确定性定量(DeepStuq),可以估计核心和认知不确定性。我们首先利用时空模型来对流量数据的复杂时空相关性进行建模。随后,开发了两个独立的次神经网络,以最大化异质对数可能性,以估计不确定性。为了估计认知不确定性,我们通过整合蒙特卡洛辍学和平均自适应重量的重新训练方法来结合变异推理和深层结合的优点。最后,我们提出了基于温度缩放的后处理校准方法,从而提高了模型的概括能力估计不确定性。在四个公共数据集上进行了广泛的实验,经验结果表明,就点预测和不确定性量化而言,所提出的方法优于最先进的方法。
translated by 谷歌翻译
The ability to estimate epistemic uncertainty is often crucial when deploying machine learning in the real world, but modern methods often produce overconfident, uncalibrated uncertainty predictions. A common approach to quantify epistemic uncertainty, usable across a wide class of prediction models, is to train a model ensemble. In a naive implementation, the ensemble approach has high computational cost and high memory demand. This challenges in particular modern deep learning, where even a single deep network is already demanding in terms of compute and memory, and has given rise to a number of attempts to emulate the model ensemble without actually instantiating separate ensemble members. We introduce FiLM-Ensemble, a deep, implicit ensemble method based on the concept of Feature-wise Linear Modulation (FiLM). That technique was originally developed for multi-task learning, with the aim of decoupling different tasks. We show that the idea can be extended to uncertainty quantification: by modulating the network activations of a single deep network with FiLM, one obtains a model ensemble with high diversity, and consequently well-calibrated estimates of epistemic uncertainty, with low computational overhead in comparison. Empirically, FiLM-Ensemble outperforms other implicit ensemble methods, and it and comes very close to the upper bound of an explicit ensemble of networks (sometimes even beating it), at a fraction of the memory cost.
translated by 谷歌翻译
不确定性量化对于机器人感知至关重要,因为过度自信或点估计人员可以导致环境和机器人侵犯和损害。在本文中,我们评估了单视图监督深度学习中的不确定量化的可扩展方法,特别是MC辍学和深度集成。特别是对于MC辍学,我们探讨了阵列在架构中不同级别的效果。我们表明,在编码器的所有层中添加丢失会带来比文献中的其他变化更好的结果。此配置类似地执行与Deep Ensembles具有更低的内存占用,这是相关的简单。最后,我们探讨了伪RGBD ICP的深度不确定性,并展示其估计具有实际规模的准确的双视图相对运动的可能性。
translated by 谷歌翻译
深度学习(DL)在数字病理应用中表现出很大的潜力。诊断DL的解决方案的鲁棒性对于安全的临床部署至关重要。在这项工作中,我们通过增加数字病理学中的DL预测的不确定性估计,可以通过提高一般预测性能或通过检测错误预测性来导致临床应用的价值增加。我们将模型 - 集成方法(MC辍学和深度集成)的有效性与模型 - 不可知方法(测试时间增强,TTA)进行比较。此外,比较了四个不确定性度量。我们的实验专注于两个域改变情景:转移到不同的医疗中心和癌症的不足亚型。我们的结果表明,不确定性估计可以增加一些可靠性并降低对分类阈值选择的敏感性。虽然高级指标和深度集合在我们的比较中表现最佳,但更简单的度量和TTA的附加值很小。重要的是,所有评估的不确定度估计方法的益处通过域移位减少。
translated by 谷歌翻译
We propose SWA-Gaussian (SWAG), a simple, scalable, and general purpose approach for uncertainty representation and calibration in deep learning. Stochastic Weight Averaging (SWA), which computes the first moment of stochastic gradient descent (SGD) iterates with a modified learning rate schedule, has recently been shown to improve generalization in deep learning. With SWAG, we fit a Gaussian using the SWA solution as the first moment and a low rank plus diagonal covariance also derived from the SGD iterates, forming an approximate posterior distribution over neural network weights; we then sample from this Gaussian distribution to perform Bayesian model averaging. We empirically find that SWAG approximates the shape of the true posterior, in accordance with results describing the stationary distribution of SGD iterates. Moreover, we demonstrate that SWAG performs well on a wide variety of tasks, including out of sample detection, calibration, and transfer learning, in comparison to many popular alternatives including MC dropout, KFAC Laplace, SGLD, and temperature scaling.
translated by 谷歌翻译
以知情方式监测和管理地球林是解决生物多样性损失和气候变化等挑战的重要要求。虽然森林评估的传统或空中运动提供了在区域一级分析的准确数据,但将其扩展到整个国家,以外的高度分辨率几乎不可能。在这项工作中,我们提出了一种贝叶斯深度学习方法,以10米的分辨率为全国范围的森林结构变量,使用自由可用的卫星图像作为输入。我们的方法将Sentinel-2光学图像和Sentinel-1合成孔径雷达图像共同变换为五种不同的森林结构变量的地图:95th高度百分位,平均高度,密度,基尼系数和分数盖。我们从挪威的41个机载激光扫描任务中培训和测试我们的模型,并证明它能够概括取消测试区域,从而达到11%和15%之间的归一化平均值误差,具体取决于变量。我们的工作也是第一个提出贝叶斯深度学习方法的工作,以预测具有良好校准的不确定性估计的森林结构变量。这些提高了模型的可信度及其适用于需要可靠的信心估计的下游任务,例如知情决策。我们提出了一组广泛的实验,以验证预测地图的准确性以及预测的不确定性的质量。为了展示可扩展性,我们为五个森林结构变量提供挪威地图。
translated by 谷歌翻译