我们提出了一种新的方法,可以在复杂模型(例如贝叶斯神经网络)中执行近似贝叶斯推断。该方法比马尔可夫链蒙特卡洛更可扩展到大数据,它具有比变异推断更具表现力的模型,并且不依赖于对抗训练(或密度比估计)。我们采用了构建两个模型的最新方法:(1)一个主要模型,负责执行回归或分类; (2)一个辅助,表达的(例如隐式)模型,该模型定义了主模型参数上的近似后验分布。但是,我们根据后验预测分布的蒙特卡洛估计值通过梯度下降来优化后验模型的参数 - 这是我们唯一的近似值(除后模型除外)。只需要指定一个可能性,可以采用各种形式,例如损失功能和合成可能性,从而提供无可能的方法的形式。此外,我们制定了该方法,使后样品可以独立于或有条件地取决于主要模型的输入。后一种方法被证明能够增加主要模型的明显复杂性。我们认为这在诸如替代和基于物理的模型之类的应用中很有用。为了促进贝叶斯范式如何提供不仅仅是不确定性量化的方式,我们证明了:不确定性量化,多模式以及具有最新预测的神经网络体系结构的应用。
translated by 谷歌翻译
现代深度学习方法构成了令人难以置信的强大工具,以解决无数的挑战问题。然而,由于深度学习方法作为黑匣子运作,因此与其预测相关的不确定性往往是挑战量化。贝叶斯统计数据提供了一种形式主义来理解和量化与深度神经网络预测相关的不确定性。本教程概述了相关文献和完整的工具集,用于设计,实施,列车,使用和评估贝叶斯神经网络,即使用贝叶斯方法培训的随机人工神经网络。
translated by 谷歌翻译
统计模型是机器学习的核心,具有广泛适用性,跨各种下游任务。模型通常由通过最大似然估计从数据估计的自由参数控制。但是,当面对现实世界数据集时,许多模型运行到一个关键问题:它们是在完全观察到的数据方面配制的,而在实践中,数据集会困扰缺失数据。来自不完整数据的统计模型估计理论在概念上类似于潜在变量模型的估计,其中存在强大的工具,例如变分推理(VI)。然而,与标准潜在变量模型相比,具有不完整数据的参数估计通常需要估计缺失变量的指数 - 许多条件分布,因此使标准的VI方法是棘手的。通过引入变分Gibbs推理(VGI),是一种新的通用方法来解决这个差距,以估计来自不完整数据的统计模型参数。我们在一组合成和实际估算任务上验证VGI,从不完整的数据中估算重要的机器学习模型,VAE和标准化流程。拟议的方法,同时通用,实现比现有的特定模型特定估计方法竞争或更好的性能。
translated by 谷歌翻译
隐式过程(IPS)代表一个灵活的框架,可用于描述各种模型,从贝叶斯神经网络,神经抽样器和数据生成器到许多其他模型。 IP还允许在功能空间上进行大致推断。公式的这种变化解决了参数空间的固有退化问题近似推断,即参数数量及其在大型模型中的强大依赖性。为此,文献中先前的作品试图采用IPS来设置先验并近似产生的后部。但是,这被证明是一项具有挑战性的任务。现有的方法可以调整先前的IP导致高斯预测分布,该分布未能捕获重要的数据模式。相比之下,通过使用另一个IP近似后验过程产生灵活预测分布的方法不能将先前的IP调整到观察到的数据中。我们在这里建议第一个可以实现这两个目标的方法。为此,我们依赖于先前IP的诱导点表示,就像在稀疏高斯过程中所做的那样。结果是一种可扩展的方法,用于与IP的近似推断,可以将先前的IP参数调整到数据中,并提供准确的非高斯预测分布。
translated by 谷歌翻译
近似贝叶斯深度学习方法对于解决在智能系统中部署深度学习组件时,包括在智能系统中部署深度学习组件的几个问题,包括减轻过度自信的错误并提供增强的鲁棒性,从而超出分发示例。但是,现有近似贝叶斯推理方法的计算要求可以使它们不适合部署包括低功耗边缘设备的智能IOT系统。在本文中,我们为监督深度学习提供了一系列近似贝叶斯推理方法,并在应用这些方法对当前边缘硬件上的挑战和机遇。我们突出了几种潜在的解决方案来降低模型存储要求,提高计算可扩展性,包括模型修剪和蒸馏方法。
translated by 谷歌翻译
随机过程提供了数学上优雅的方式模型复杂数据。从理论上讲,它们为可以编码广泛有趣的假设的功能类提供了灵活的先验。但是,实际上,难以通过优化或边缘化来有效推断,这一问题进一步加剧了大数据和高维输入空间。我们提出了一种新颖的变性自动编码器(VAE),称为先前的编码变量自动编码器($ \ pi $ vae)。 $ \ pi $ vae是有限的交换且Kolmogorov一致的,因此是一个连续的随机过程。我们使用$ \ pi $ vae学习功能类的低维嵌入。我们表明,我们的框架可以准确地学习表达功能类,例如高斯流程,也可以学习函数的属性以启用统计推断(例如log高斯过程的积分)。对于流行的任务,例如空间插值,$ \ pi $ vae在准确性和计算效率方面都达到了最先进的性能。也许最有用的是,我们证明了所学的低维独立分布的潜在空间表示提供了一种优雅,可扩展的方法,可以在概率编程语言(例如Stan)中对随机过程进行贝叶斯推断。
translated by 谷歌翻译
We develop an optimization algorithm suitable for Bayesian learning in complex models. Our approach relies on natural gradient updates within a general black-box framework for efficient training with limited model-specific derivations. It applies within the class of exponential-family variational posterior distributions, for which we extensively discuss the Gaussian case for which the updates have a rather simple form. Our Quasi Black-box Variational Inference (QBVI) framework is readily applicable to a wide class of Bayesian inference problems and is of simple implementation as the updates of the variational posterior do not involve gradients with respect to the model parameters, nor the prescription of the Fisher information matrix. We develop QBVI under different hypotheses for the posterior covariance matrix, discuss details about its robust and feasible implementation, and provide a number of real-world applications to demonstrate its effectiveness.
translated by 谷歌翻译
Compared to point estimates calculated by standard neural networks, Bayesian neural networks (BNN) provide probability distributions over the output predictions and model parameters, i.e., the weights. Training the weight distribution of a BNN, however, is more involved due to the intractability of the underlying Bayesian inference problem and thus, requires efficient approximations. In this paper, we propose a novel approach for BNN learning via closed-form Bayesian inference. For this purpose, the calculation of the predictive distribution of the output and the update of the weight distribution are treated as Bayesian filtering and smoothing problems, where the weights are modeled as Gaussian random variables. This allows closed-form expressions for training the network's parameters in a sequential/online fashion without gradient descent. We demonstrate our method on several UCI datasets and compare it to the state of the art.
translated by 谷歌翻译
在使用多模式贝叶斯后部分布时,马尔可夫链蒙特卡罗(MCMC)算法难以在模式之间移动,并且默认变分或基于模式的近似推动将低估后不确定性。并且,即使找到最重要的模式,难以评估后部的相对重量。在这里,我们提出了一种使用MCMC,变分或基于模式的模式的并行运行的方法,以便尽可能多地击中多种模式或分离的区域,然后使用贝叶斯堆叠来组合这些用于构建分布的加权平均值的可扩展方法。通过堆叠从多模式后分布的堆叠,最小化交叉验证预测误差的结果,并且代表了比变分推断更好的不确定度,但它不一定是相当于渐近的,以完全贝叶斯推断。我们呈现理论一致性,其中堆叠推断逼近来自未衰退的模型和非混合采样器的真实数据生成过程,预测性能优于完全贝叶斯推断,因此可以被视为祝福而不是模型拼写下的诅咒。我们展示了几个模型家庭的实际实施:潜在的Dirichlet分配,高斯过程回归,分层回归,马蹄素变量选择和神经网络。
translated by 谷歌翻译
引力波(GW)检测现在是普遍的,并且随着GW探测器的全球网络的灵敏度,我们将观察每年瞬态GW事件的$ \ MATHCAL {O}(100)美元。用于估计其源参数的目前的方法采用最佳敏感但是计算昂贵的贝叶斯推理方法,其中典型的分析在6小时和5天之间取。对于二元中子星和中子星黑洞系统提示,预计在1秒 - 1分钟的时间尺度和用于提醒EM随访观察员的最快方法,可以提供估计在$ \ mathcal {o }(1)$分钟,在有限的关键源参数范围内。在这里,我们表明,在二进制黑洞信号上预先培训的条件变形Autiachoder可以返回贝叶斯后概率估计。仅针对给定的先前参数空间执行一次训练程序,然后可以将所得培训的机器能够生成描述后部分配$ \ SIM 6 $幅度的样本比现有技术更快。
translated by 谷歌翻译
We investigate the efficacy of treating all the parameters in a Bayesian neural network stochastically and find compelling theoretical and empirical evidence that this standard construction may be unnecessary. To this end, we prove that expressive predictive distributions require only small amounts of stochasticity. In particular, partially stochastic networks with only $n$ stochastic biases are universal probabilistic predictors for $n$-dimensional predictive problems. In empirical investigations, we find no systematic benefit of full stochasticity across four different inference modalities and eight datasets; partially stochastic networks can match and sometimes even outperform fully stochastic networks, despite their reduced memory costs.
translated by 谷歌翻译
Large multilayer neural networks trained with backpropagation have recently achieved state-ofthe-art results in a wide range of problems. However, using backprop for neural net learning still has some disadvantages, e.g., having to tune a large number of hyperparameters to the data, lack of calibrated probabilistic predictions, and a tendency to overfit the training data. In principle, the Bayesian approach to learning neural networks does not have these problems. However, existing Bayesian techniques lack scalability to large dataset and network sizes. In this work we present a novel scalable method for learning Bayesian neural networks, called probabilistic backpropagation (PBP). Similar to classical backpropagation, PBP works by computing a forward propagation of probabilities through the network and then doing a backward computation of gradients. A series of experiments on ten real-world datasets show that PBP is significantly faster than other techniques, while offering competitive predictive abilities. Our experiments also show that PBP provides accurate estimates of the posterior variance on the network weights.
translated by 谷歌翻译
由于许多科学和工程领域的深层神经网络越来越多,建模和估计其不确定性已成为主要的重要性。已经研究了包括贝叶斯神经网络,合奏,确定性近似等各种方法。尽管关于深度学习中不确定性量化的垃圾越来越多,但不确定性估计的质量仍然是一个悬而未决的问题。在这项工作中,我们试图通过评估置信区的质量以及生成的样品代表未知目标分布的方式来评估几种算法在采样和回归任务上的性能。为此,考虑了几个采样和回归任务,并根据覆盖概率,内核化的Stein差异和最大平均差异进行比较所选算法。
translated by 谷歌翻译
We apply Physics Informed Neural Networks (PINNs) to the problem of wildfire fire-front modelling. The PINN is an approach that integrates a differential equation into the optimisation loss function of a neural network to guide the neural network to learn the physics of a problem. We apply the PINN to the level-set equation, which is a Hamilton-Jacobi partial differential equation that models a fire-front with the zero-level set. This results in a PINN that simulates a fire-front as it propagates through a spatio-temporal domain. We demonstrate the agility of the PINN to learn physical properties of a fire under extreme changes in external conditions (such as wind) and show that this approach encourages continuity of the PINN's solution across time. Furthermore, we demonstrate how data assimilation and uncertainty quantification can be incorporated into the PINN in the wildfire context. This is significant contribution to wildfire modelling as the level-set method -- which is a standard solver to the level-set equation -- does not naturally provide this capability.
translated by 谷歌翻译
变异推理(VI)的核心原理是将计算复杂后概率密度计算的统计推断问题转换为可拖动的优化问题。该属性使VI比几种基于采样的技术更快。但是,传统的VI算法无法扩展到大型数据集,并且无法轻易推断出越野数据点,而无需重新运行优化过程。该领域的最新发展,例如随机,黑框和摊销VI,已帮助解决了这些问题。如今,生成的建模任务广泛利用摊销VI来实现其效率和可扩展性,因为它利用参数化函数来学习近似的后验密度参数。在本文中,我们回顾了各种VI技术的数学基础,以构成理解摊销VI的基础。此外,我们还概述了最近解决摊销VI问题的趋势,例如摊销差距,泛化问题,不一致的表示学习和后验崩溃。最后,我们分析了改善VI优化的替代差异度量。
translated by 谷歌翻译
We marry ideas from deep neural networks and approximate Bayesian inference to derive a generalised class of deep, directed generative models, endowed with a new algorithm for scalable inference and learning. Our algorithm introduces a recognition model to represent an approximate posterior distribution and uses this for optimisation of a variational lower bound. We develop stochastic backpropagation -rules for gradient backpropagation through stochastic variables -and derive an algorithm that allows for joint optimisation of the parameters of both the generative and recognition models. We demonstrate on several real-world data sets that by using stochastic backpropagation and variational inference, we obtain models that are able to generate realistic samples of data, allow for accurate imputations of missing data, and provide a useful tool for high-dimensional data visualisation.
translated by 谷歌翻译
We present the GPry algorithm for fast Bayesian inference of general (non-Gaussian) posteriors with a moderate number of parameters. GPry does not need any pre-training, special hardware such as GPUs, and is intended as a drop-in replacement for traditional Monte Carlo methods for Bayesian inference. Our algorithm is based on generating a Gaussian Process surrogate model of the log-posterior, aided by a Support Vector Machine classifier that excludes extreme or non-finite values. An active learning scheme allows us to reduce the number of required posterior evaluations by two orders of magnitude compared to traditional Monte Carlo inference. Our algorithm allows for parallel evaluations of the posterior at optimal locations, further reducing wall-clock times. We significantly improve performance using properties of the posterior in our active learning scheme and for the definition of the GP prior. In particular we account for the expected dynamical range of the posterior in different dimensionalities. We test our model against a number of synthetic and cosmological examples. GPry outperforms traditional Monte Carlo methods when the evaluation time of the likelihood (or the calculation of theoretical observables) is of the order of seconds; for evaluation times of over a minute it can perform inference in days that would take months using traditional methods. GPry is distributed as an open source Python package (pip install gpry) and can also be found at https://github.com/jonaselgammal/GPry.
translated by 谷歌翻译
神经网络最近显示出对无似然推理的希望,从而为经典方法提供了魔力的速度。但是,当从独立重复估计参数时,当前的实现是次优的。在本文中,我们使用决策理论框架来争辩说,如果这些模型的模拟很简单,则理想地放置了置换不变的神经网络,可用于为任意模型构造贝叶斯估计器。我们说明了这些估计量在传统空间模型以及高度参数化的空间发射模型上的潜力,并表明它们在其网络设计中不适当地说明复制的神经估计量相当大。同时,它们比基于传统可能性的估计量具有很高的竞争力和更快的速度。我们将估计量应用于红海中海面温度的空间分析,在训练之后,我们获得参数估计值,并通过引导采样对估计值进行不确定性定量,从一秒钟的数百个空间场中获取。
translated by 谷歌翻译
We introduce a new, efficient, principled and backpropagation-compatible algorithm for learning a probability distribution on the weights of a neural network, called Bayes by Backprop. It regularises the weights by minimising a compression cost, known as the variational free energy or the expected lower bound on the marginal likelihood. We show that this principled kind of regularisation yields comparable performance to dropout on MNIST classification. We then demonstrate how the learnt uncertainty in the weights can be used to improve generalisation in non-linear regression problems, and how this weight uncertainty can be used to drive the exploration-exploitation trade-off in reinforcement learning.
translated by 谷歌翻译
逆问题本质上是普遍存在的,几乎在科学和工程的几乎所有领域都出现,从地球物理学和气候科学到天体物理学和生物力学。解决反问题的核心挑战之一是解决他们的不良天性。贝叶斯推论提供了一种原则性的方法来克服这一方法,通过将逆问题提出为统计框架。但是,当推断具有大幅度的离散表示的字段(所谓的“维度的诅咒”)和/或仅以先前获取的解决方案的形式可用时。在这项工作中,我们提出了一种新的方法,可以使用深层生成模型进行有效,准确的贝叶斯反转。具体而言,我们证明了如何使用生成对抗网络(GAN)在贝叶斯更新中学到的近似分布,并在GAN的低维度潜在空间中重新解决所得的推断问题,从而有效地解决了大规模的解决方案。贝叶斯逆问题。我们的统计框架保留了潜在的物理学,并且被证明可以通过可靠的不确定性估计得出准确的结果,即使没有有关基础噪声模型的信息,这对于许多现有方法来说都是一个重大挑战。我们证明了提出方法对各种反问题的有效性,包括合成和实验观察到的数据。
translated by 谷歌翻译