Our paper deals with inferring simulator-based statistical models given some observed data. A simulator-based model is a parametrized mechanism which specifies how data are generated. It is thus also referred to as generative model. We assume that only a finite number of parameters are of interest and allow the generative process to be very general; it may be a noisy nonlinear dynamical system with an unrestricted number of hidden variables. This weak assumption is useful for devising realistic models but it renders statistical inference very difficult. The main challenge is the intractability of the likelihood function. Several likelihood-free inference methods have been proposed which share the basic idea of identifying the parameters by finding values for which the discrepancy between simulated and observed data is small. A major obstacle to using these methods is their computational cost. The cost is largely due to the need to repeatedly simulate data sets and the lack of knowledge about how the parameters affect the discrepancy. We propose a strategy which combines probabilistic modeling of the discrepancy with optimization to facilitate likelihood-free inference. The strategy is implemented using Bayesian optimization and is shown to accelerate the inference through a reduction in the number of required simulations by several orders of magnitude.
translated by 谷歌翻译
Approximate Bayesian computation (ABC) can be used for model fitting when the likelihood function is intractable but simulating from the model is feasible. However, even a single evaluation of a complex model may take several hours, limiting the number of model evaluations available. Modelling the discrepancy between the simulated and observed data using a Gaussian process (GP) can be used to reduce the number of model evaluations required by ABC, but the sensitivity of this approach to a specific GP formulation has not yet been thoroughly investigated. We begin with a comprehensive empirical evaluation of using GPs in ABC, including various transformations of the discrepancies and two novel GP formulations. Our results indicate the choice of GP may significantly affect the accuracy of the estimated posterior distribution. Selection of an appropriate GP model is thus important. We formulate expected utility to measure the accuracy of classifying discrepancies below or above the ABC threshold, and show that it can be used to automate the GP model selection step. Finally, based on the understanding gained with toy examples, we fit a population genetic model for bacteria, providing insight into horizontal gene transfer events within the population and from external origins.
translated by 谷歌翻译
我们提出了一种自适应方法来构建贝叶斯推理的高斯过程,并使用昂贵的评估正演模型。我们的方法依赖于完全贝叶斯方法来训练高斯过程模型,并利用贝叶斯全局优化的预期改进思想。我们通过最大化高斯过程模型与噪声观测数据拟合的预期改进来自适应地构建训练设计。对合成数据模型问题的数值实验证明了所获得的自适应设计与固定非自适应设计相比,在前向模型推断成本的精确后验估计方面的有效性。
translated by 谷歌翻译
我们考虑学习嘈杂的黑盒功能超过给定阈值的水平集的问题。为了有效地重建水平集,我们研究了高斯过程(GP)元模型。我们的重点是强随机采样器,特别是重尾模拟噪声和低信噪比。为了防止噪声错误指定,我们评估了三个变量的性能:(i)具有Student-$ t $观察值的GP; (ii)学生 - $ t $流程(TP); (iii)分类GP对响应的符号进行建模。作为第四个扩展,我们研究具有单调性约束的GP代理,这些约束在已知连接的级别集时是相关的。结合这些模型,我们分析了几个采集函数,用于指导顺序实验设计,将现有的逐步不确定性减少标准扩展到随机轮廓发现环境。这也促使我们开发(近似)更新公式以有效地计算取代函数。我们的方案通过在1-6维度中使用各种合成实验进行基准测试。我们还考虑应用水平集估计来确定最佳的运动政策和百慕大金融期权的估值。
translated by 谷歌翻译
Structured additive regression models are perhaps the most commonly used class of models in statistical applications. It includes, among others, (generalized) linear models, (gener-alized) additive models, smoothing spline models, state space models, semiparametric regression , spatial and spatiotemporal models, log-Gaussian Cox processes and geostatistical and geoadditive models. We consider approximate Bayesian inference in a popular subset of struc-tured additive regression models, latent Gaussian models, where the latent field is Gaussian, controlled by a few hyperparameters and with non-Gaussian response variables. The posterior marginals are not available in closed form owing to the non-Gaussian response variables. For such models, Markov chain Monte Carlo methods can be implemented, but they are not without problems, in terms of both convergence and computational time. In some practical applications, the extent of these problems is such that Markov chain Monte Carlo sampling is simply not an appropriate tool for routine analysis. We show that, by using an integrated nested Laplace approximation and its simplified version, we can directly compute very accurate approximations to the posterior marginals. The main benefit of these approximations is computational: where Markov chain Monte Carlo algorithms need hours or days to run, our approximations provide more precise estimates in seconds or minutes. Another advantage with our approach is its generality , which makes it possible to perform Bayesian analysis in an automatic, streamlined way, and to compute model comparison criteria and various predictive measures so that models can be compared and the model under study can be challenged.
translated by 谷歌翻译
许多对科学计算和机器学习感兴趣的概率模型具有昂贵的黑盒可能性,这些可能性阻止了贝叶斯推理的标准技术的应用,例如MCMC,其需要接近梯度或大量可能性评估。我们在这里介绍一种新颖的样本有效推理框架,VariationalBayesian Monte Carlo(VBMC)。 VBMC将变分推理与基于高斯过程的有源采样贝叶斯积分结合起来,使用latterto有效逼近变分目标中的难以求的积分。我们的方法产生了后验分布的非参数近似和模型证据的近似下界,对模型选择很有用。我们在几种合成可能性和神经元模型上展示VBMC,其中包含来自真实神经元的数据。在所有测试的问题和维度(高达$ D = 10 $)中,VBMC始终如一地通过有限的可能性评估预算重建后验证和模型证据,而不像其他仅在非常低维度下工作的方法。我们的框架作为一种新颖的工具,具有昂贵的黑盒可能性,可用于后期模型推理。
translated by 谷歌翻译
We propose a novel approach for nonlinear regression using a two-layer neural network (NN) model structure with sparsity-favoring hierarchical priors on the network weights. We present an expectation propagation (EP) approach for approximate integration over the posterior distribution of the weights, the hierarchical scale parameters of the priors, and the residual scale. Using a factorized posterior approximation we derive a computation-ally efficient algorithm, whose complexity scales similarly to an ensemble of independent sparse linear models. The approach enables flexible definition of weight priors with different sparseness properties such as independent Laplace priors with a common scale parameter or Gaussian automatic relevance determination (ARD) priors with different relevance parameters for all inputs. The approach can be extended beyond standard activation functions and NN model structures to form flexible nonlinear predictors from multiple sparse linear models. The effects of the hierarchical priors and the predictive performance of the algorithm are assessed using both simulated and real-world data. Comparisons are made to two alternative models with ARD priors: a Gaussian process with a NN covariance function and marginal maximum a posteriori estimates of the relevance parameters, and a NN with Markov chain Monte Carlo integration over all the unknown model parameters.
translated by 谷歌翻译
Markov chain Monte Carlo and sequential Monte Carlo methods have emerged as the two main tools to sample from high dimensional probability distributions. Although asymptotic convergence of Markov chain Monte Carlo algorithms is ensured under weak assumptions, the performance of these algorithms is unreliable when the proposal distributions that are used to explore the space are poorly chosen and/or if highly correlated variables are updated independently. We show here how it is possible to build efficient high dimensional proposal distributions by using sequential Monte Carlo methods. This allows us not only to improve over standard Markov chain Monte Carlo schemes but also to make Bayesian inference feasible for a large class of statistical models where this was not previously so. We demonstrate these algorithms on a non-linear state space model and a Lévy-driven stochastic volatility model.
translated by 谷歌翻译
Bayesian inference plays an important role in phylogenetics, evolutionary biology, and in many other branches of science. It provides a principled framework for dealing with uncertainty and quantifying how it changes in the light of new evidence. For many complex models and inference problems, however, only approximate quantitative answers are obtainable. Approximate Bayesian computation (ABC) refers to a family of algorithms for approximate inference that makes a minimal set of assumptions by only requiring that sampling from a model is possible. We explain here the fundamentals of ABC, review the classical algorithms, and highlight recent developments. [ABC; approximate Bayesian computation; Bayesian inference; likelihood-free inference; phylogenetics; simulator-based models; stochastic simulation models; tree-based models.]
translated by 谷歌翻译
Numerical integration is a key component of many problems in scientific computing , statistical modelling, and machine learning. Bayesian Quadrature is a model-based method for numerical integration which, relative to standard Monte Carlo methods, offers increased sample efficiency and a more robust estimate of the uncertainty in the estimated integral. We propose a novel Bayesian Quadrature approach for numerical integration when the integrand is non-negative, such as the case of computing the marginal likelihood, predictive distribution, or normal-ising constant of a probabilistic model. Our approach approximately marginalises the quadrature model's hyperparameters in closed form, and introduces an active learning scheme to optimally select function evaluations, as opposed to using Monte Carlo samples. We demonstrate our method on both a number of synthetic benchmarks and a real scientific problem from astronomy.
translated by 谷歌翻译
Gaussian process (GP) models are widely used in disease mapping as they provide a natural framework for modeling spatial correlations. Their challenges, however, lie in computational burden and memory requirements. In disease mapping models, the other difficulty is inference, which is analytically intractable due to the non-Gaussian observation model. In this paper, we address both these challenges. We show how to efficiently build fully and partially independent conditional (FIC/PIC) sparse approximations for the GP in two-dimensional surface, and how to conduct approximate inference using expectation propagation (EP) algorithm and Laplace approximation (LA). We also propose to combine FIC with a compactly supported covariance function to construct a computationally efficient additive model that can model long and short length-scale spatial correlations simultaneously. The benefit of these approximations is computational. The sparse GPs speed up the computations and reduce the memory requirements. The posterior inference via EP and Laplace approximation is much faster and is practically as accurate as via Markov chain Monte Carlo.
translated by 谷歌翻译
我们引入了一种算法来定位昂贵的函数轮廓进行评估。定位轮廓的问题出现在许多应用中,包括分类,约束优化以及机械和动力系统的性能分析(可靠性,故障概率,稳定性等)。我们的算法使用来自多个源的信息来定位轮廓,这些信息可以以相对便宜,有偏差且可能对原始函数有噪声的近似形式获得。考虑多个信息源可以显着节省成本。我们还介绍了轮廓熵的概念,即关于由统计模型近似的函数零轮廓位置的不确定性的形式测量。我们的算法通过最大化每单位成本的轮廓熵的减少来有效地定位轮廓。
translated by 谷歌翻译
We explore probability modelling of discretization uncertainty for system states defined implicitly by ordinary or partial differential equations. Accounting for this uncertainty can avoid posterior under-coverage when likelihoods are constructed from a coarsely discretized approximation to system equations. A formalism is proposed for inferring a fixed but a priori unknown model trajectory through Bayesian updating of a prior process conditional on model information. A one-step-ahead sampling scheme for interrogating the model is described, its consistency and first order convergence properties are proved, and its computational complexity is shown to be proportional to that of numerical explicit one-step solvers. Examples illustrate the flexibility of this framework to deal with a wide variety of complex and large-scale systems. Within the calibration problem, dis-cretization uncertainty defines a layer in the Bayesian hierarchy, and a Markov chain Monte Carlo algorithm that targets this posterior distribution is presented. This formalism is used for inference on the JAK-STAT delay differential equation model of protein dynamics from indirectly observed measurements. The discussion outlines implications for the new field of probabilistic numerics.
translated by 谷歌翻译
贝叶斯优化是一种优化目标函数的方法,需要花费很长时间(几分钟或几小时)来评估。它最适合于在小于20维的连续域上进行优化,并且在功能评估中容忍随机噪声。它构建了目标的替代品,并使用贝叶斯机器学习技术,高斯过程回归量化该替代品中的不确定性,然后使用从该代理定义的获取函数来决定在何处进行抽样。在本教程中,我们描述了贝叶斯优化的工作原理,包括高斯过程回归和三种常见的采集功能:预期改进,熵搜索和知识梯度。然后,我们讨论了更先进的技术,包括在并行,多保真和多信息源优化,昂贵的评估约束,随机环境条件,多任务贝叶斯优化以及包含衍生信息的情况下运行多功能评估。最后,我们讨论了贝叶斯优化软件和该领域未来的研究方向。在我们的教程材料中,我们提供了对噪声评估的预期改进的时间化,超出了无噪声设置,在更常用的情况下。这种概括通过正式的决策理论论证来证明,与先前的临时修改形成鲜明对比。
translated by 谷歌翻译
贝叶斯实验设计涉及实验中资源的最优分配,目的是优化成本和性能。对于隐性模型,其中可能性是难以处理的,但是从模型中抽样是可能的,这项任务特别困难,因此很大程度上未被探索。这主要是由于与近似后验分布和效用函数相关的技术困难。我们为隐式模型设计了一个新颖的实验设计框架,改进了以前的工作。首先,我们使用参数和数据之间的相互信息作为实用功能,这在以前是不可行的。我们通过比率估计(LFIRE)利用无似然推断来近似后分布来实现这一点,而不是传统的近似贝叶斯计算或合成似然方法。其次,我们使用贝叶斯优化来解决最优设计问题,而不是典型的网格搜索。我们发现这可以提高效率并允许考虑更高的设计尺寸。
translated by 谷歌翻译
This paper deals with the problem of estimating the volume of the excursion set of a function f : R d → R above a given threshold, under a probability measure on R d that is assumed to be known. In the industrial world, this corresponds to the problem of estimating a probability of failure of a system. When only an expensive-to-simulate model of the system is available, the budget for simulations is usually severely limited and therefore classical Monte Carlo methods ought to be avoided. One of the main contributions of this article is to derive SUR (stepwise uncertainty reduction) strategies from a Bayesian-theoretic formulation of the problem of estimating a probability of failure. These sequential strategies use a Gaussian process model of f and aim at performing evaluations of f as efficiently as possible to infer the value of the probability of failure. We compare these strategies to other strategies also based on a Gaussian process model for estimating a probability of failure.
translated by 谷歌翻译
We consider the problem of parametric statistical inference when likelihood computations are prohibitively expensive but sampling from the model is possible. Several so-called likelihood-free methods have been developed to perform inference in the absence of a likelihood function. The popular synthetic likelihood approach infers the parameters by modelling summary statistics of the data by a Gaussian probability distribution. In another popular approach called approximate Bayesian computation, the inference is performed by identifying parameter values for which the summary statistics of the simulated data are close to those of the observed data. Synthetic likelihood is easier to use as no measure of "close-ness" is required but the Gaussianity assumption is often limiting. Moreover, both approaches require judiciously chosen summary statistics. We here present an alternative inference approach that is as easy to use as synthetic likelihood but not as restricted in its assumptions, and that, in a natural way, enables automatic selection of relevant summary statistic from a large set of candidates. The basic idea is to frame the problem of estimating the posterior as a problem of estimating the ratio between the data generating distribution and the marginal distribution. This problem can be solved by logistic regression, and including regularising penalty terms enables automatic selection of the summary statistics relevant to the inference task. We illustrate the general theory on canonical examples and employ it to perform inference for challenging stochastic nonlinear dynamical systems and high-dimensional summary statistics.
translated by 谷歌翻译
概率二分算法基于从嘈杂的oracle响应中获得的知识来执行根查找。我们考虑广义PBA设置(G-PBA),其中oracle的统计分布是未知的和位置依赖的,因此模型推理和贝叶斯知识更新必须同时进行。为此,我们建议通过构建基础逻辑回归步骤的统计代理来利用典型oracle的空间结构。我们研究了几个非参数代理,包括二项式高斯过程(B-GP),多项式,核和样条Logistic回归。与此同时,我们开发了自适应平衡学习oracle分布和学习根的策略。我们的一个建议模仿了B-GP的主动学习,并提供了一种新颖的前瞻预测方差公式。我们用空间PBA算法得到的相对于早期G-PBA模型的增益用合成的例子和来自贝尔丹期权定价的具有挑战性的随机根发现问题来说明。
translated by 谷歌翻译
Bayesian optimization is a sample-efficient method for black-box global optimization. However , the performance of a Bayesian optimization method very much depends on its exploration strategy, i.e. the choice of acquisition function , and it is not clear a priori which choice will result in superior performance. While portfolio methods provide an effective, principled way of combining a collection of acquisition functions, they are often based on measures of past performance which can be misleading. To address this issue, we introduce the Entropy Search Portfolio (ESP): a novel approach to portfolio construction which is motivated by information theoretic considerations. We show that ESP outperforms existing portfolio methods on several real and synthetic problems, including geostatistical datasets and simulated control tasks. We not only show that ESP is able to offer performance as good as the best, but unknown, acquisition function, but surprisingly it often gives better performance. Finally , over a wide range of conditions we find that ESP is robust to the inclusion of poor acquisition functions.
translated by 谷歌翻译
We present a tutorial on Bayesian optimization, a method of finding the maximum of expensive cost functions. Bayesian optimization employs the Bayesian technique of setting a prior over the objective function and combining it with evidence to get a posterior function. This permits a utility-based selection of the next observation to make on the objective function, which must take into account both exploration (sampling from areas of high uncertainty) and exploitation (sampling areas likely to offer improvement over the current best observation). We also present two detailed extensions of Bayesian optimization, with experiments-active user modelling with preferences, and hierarchical reinforcement learning-and a discussion of the pros and cons of Bayesian optimization based on our experiences.
translated by 谷歌翻译