近似贝叶斯计算(ABC)是贝叶斯推理的一种方法,当可能性不可用时,但是可以从模型中进行模拟。然而,许多ABC算法需要大量的模拟,这可能是昂贵的。为了降低计算成本,已经提出了贝叶斯优化(BO)和诸如高斯过程的模拟模型。贝叶斯优化使人们可以智能地决定在哪里评估模型下一个,但是常见的BO策略不是为了估计后验分布而设计的。我们的论文解决了文献中的这一差距。我们建议计算ABC后验密度的不确定性,这是因为缺乏模拟来准确估计这个数量,并且定义了测量这种不确定性的aloss函数。然后,我们建议选择下一个评估位置,以尽量减少预期的损失。实验表明,与普通BO策略相比,所提出的方法通常产生最准确的近似。
translated by 谷歌翻译
Our paper deals with inferring simulator-based statistical models given some observed data. A simulator-based model is a parametrized mechanism which specifies how data are generated. It is thus also referred to as generative model. We assume that only a finite number of parameters are of interest and allow the generative process to be very general; it may be a noisy nonlinear dynamical system with an unrestricted number of hidden variables. This weak assumption is useful for devising realistic models but it renders statistical inference very difficult. The main challenge is the intractability of the likelihood function. Several likelihood-free inference methods have been proposed which share the basic idea of identifying the parameters by finding values for which the discrepancy between simulated and observed data is small. A major obstacle to using these methods is their computational cost. The cost is largely due to the need to repeatedly simulate data sets and the lack of knowledge about how the parameters affect the discrepancy. We propose a strategy which combines probabilistic modeling of the discrepancy with optimization to facilitate likelihood-free inference. The strategy is implemented using Bayesian optimization and is shown to accelerate the inference through a reduction in the number of required simulations by several orders of magnitude.
translated by 谷歌翻译
获取有关嘈杂昂贵的黑盒功能(计算机模拟或物理实验)的信息是一个极具挑战性的问题。有限的计算和财务资源限制了传统方法在实验设计中的应用。当问题中的感兴趣量(QoI)取决于昂贵的黑盒功能时,问题就会被诸如数值误差和随机近似误差等障碍所克服。贝叶斯优化设计的实验已经相当成功地引导设计者针对上述问题的QoI。这通常是通过按照与效用理论兼容的填充采样标准选择的设计顺序查询函数来实现的。但是,大多数当前方法在语义上设计为仅用于优化或推断黑盒功能本身。我们的目标是构建一种启发式方法,无论QoI如何,都能明确地处理上述问题。本文采用上述启发式推断出特定的QoI,即函数的期望值(期望值)。 Kullback Leibler(KL)的差异在用于量化信息增益的技术中相当明显。在本文中,我们推导出预期KLdivergence的表达式,以顺序推断我们的QoI。由Karhunene Loeve扩展围绕黑盒函数的高斯过程(GP)表示提供的分析易处理性允许绕过与样本平均相关的数值问题。建议的方法可以通过合理的假设扩展到任何QoI。所提出的方法在三个具有不同复杂度和维数水平的综合函数上得到验证和验证。我们在钢丝制造问题上展示了我们的方法论。
translated by 谷歌翻译
我们考虑学习嘈杂的黑盒功能超过给定阈值的水平集的问题。为了有效地重建水平集,我们研究了高斯过程(GP)元模型。我们的重点是强随机采样器,特别是重尾模拟噪声和低信噪比。为了防止噪声错误指定,我们评估了三个变量的性能:(i)具有Student-$ t $观察值的GP; (ii)学生 - $ t $流程(TP); (iii)分类GP对响应的符号进行建模。作为第四个扩展,我们研究具有单调性约束的GP代理,这些约束在已知连接的级别集时是相关的。结合这些模型,我们分析了几个采集函数,用于指导顺序实验设计,将现有的逐步不确定性减少标准扩展到随机轮廓发现环境。这也促使我们开发(近似)更新公式以有效地计算取代函数。我们的方案通过在1-6维度中使用各种合成实验进行基准测试。我们还考虑应用水平集估计来确定最佳的运动政策和百慕大金融期权的估值。
translated by 谷歌翻译
许多对科学计算和机器学习感兴趣的概率模型具有昂贵的黑盒可能性,这些可能性阻止了贝叶斯推理的标准技术的应用,例如MCMC,其需要接近梯度或大量可能性评估。我们在这里介绍一种新颖的样本有效推理框架,VariationalBayesian Monte Carlo(VBMC)。 VBMC将变分推理与基于高斯过程的有源采样贝叶斯积分结合起来,使用latterto有效逼近变分目标中的难以求的积分。我们的方法产生了后验分布的非参数近似和模型证据的近似下界,对模型选择很有用。我们在几种合成可能性和神经元模型上展示VBMC,其中包含来自真实神经元的数据。在所有测试的问题和维度(高达$ D = 10 $)中,VBMC始终如一地通过有限的可能性评估预算重建后验证和模型证据,而不像其他仅在非常低维度下工作的方法。我们的框架作为一种新颖的工具,具有昂贵的黑盒可能性,可用于后期模型推理。
translated by 谷歌翻译
We develop an efficient, Bayesian Uncertainty Quantification framework using a novel treed Gaussian process model. The tree is adaptively constructed using information conveyed by the observed data about the length scales of the underlying process. On each leaf of the tree, we utilize Bayesian Experimental Design techniques in order to learn a multi-output Gaussian process. The constructed surrogate can provide analytical point estimates, as well as error bars, for the statistics of interest. We numerically demonstrate the effectiveness of the suggested framework in identifying discontinuities, local features and unimportant dimensions in the solution of stochastic differential equations.
translated by 谷歌翻译
Structured additive regression models are perhaps the most commonly used class of models in statistical applications. It includes, among others, (generalized) linear models, (gener-alized) additive models, smoothing spline models, state space models, semiparametric regression , spatial and spatiotemporal models, log-Gaussian Cox processes and geostatistical and geoadditive models. We consider approximate Bayesian inference in a popular subset of struc-tured additive regression models, latent Gaussian models, where the latent field is Gaussian, controlled by a few hyperparameters and with non-Gaussian response variables. The posterior marginals are not available in closed form owing to the non-Gaussian response variables. For such models, Markov chain Monte Carlo methods can be implemented, but they are not without problems, in terms of both convergence and computational time. In some practical applications, the extent of these problems is such that Markov chain Monte Carlo sampling is simply not an appropriate tool for routine analysis. We show that, by using an integrated nested Laplace approximation and its simplified version, we can directly compute very accurate approximations to the posterior marginals. The main benefit of these approximations is computational: where Markov chain Monte Carlo algorithms need hours or days to run, our approximations provide more precise estimates in seconds or minutes. Another advantage with our approach is its generality , which makes it possible to perform Bayesian analysis in an automatic, streamlined way, and to compute model comparison criteria and various predictive measures so that models can be compared and the model under study can be challenged.
translated by 谷歌翻译
贝叶斯优化是一种优化目标函数的方法,需要花费很长时间(几分钟或几小时)来评估。它最适合于在小于20维的连续域上进行优化,并且在功能评估中容忍随机噪声。它构建了目标的替代品,并使用贝叶斯机器学习技术,高斯过程回归量化该替代品中的不确定性,然后使用从该代理定义的获取函数来决定在何处进行抽样。在本教程中,我们描述了贝叶斯优化的工作原理,包括高斯过程回归和三种常见的采集功能:预期改进,熵搜索和知识梯度。然后,我们讨论了更先进的技术,包括在并行,多保真和多信息源优化,昂贵的评估约束,随机环境条件,多任务贝叶斯优化以及包含衍生信息的情况下运行多功能评估。最后,我们讨论了贝叶斯优化软件和该领域未来的研究方向。在我们的教程材料中,我们提供了对噪声评估的预期改进的时间化,超出了无噪声设置,在更常用的情况下。这种概括通过正式的决策理论论证来证明,与先前的临时修改形成鲜明对比。
translated by 谷歌翻译
We propose a novel approach for nonlinear regression using a two-layer neural network (NN) model structure with sparsity-favoring hierarchical priors on the network weights. We present an expectation propagation (EP) approach for approximate integration over the posterior distribution of the weights, the hierarchical scale parameters of the priors, and the residual scale. Using a factorized posterior approximation we derive a computation-ally efficient algorithm, whose complexity scales similarly to an ensemble of independent sparse linear models. The approach enables flexible definition of weight priors with different sparseness properties such as independent Laplace priors with a common scale parameter or Gaussian automatic relevance determination (ARD) priors with different relevance parameters for all inputs. The approach can be extended beyond standard activation functions and NN model structures to form flexible nonlinear predictors from multiple sparse linear models. The effects of the hierarchical priors and the predictive performance of the algorithm are assessed using both simulated and real-world data. Comparisons are made to two alternative models with ARD priors: a Gaussian process with a NN covariance function and marginal maximum a posteriori estimates of the relevance parameters, and a NN with Markov chain Monte Carlo integration over all the unknown model parameters.
translated by 谷歌翻译
概率二分算法基于从嘈杂的oracle响应中获得的知识来执行根查找。我们考虑广义PBA设置(G-PBA),其中oracle的统计分布是未知的和位置依赖的,因此模型推理和贝叶斯知识更新必须同时进行。为此,我们建议通过构建基础逻辑回归步骤的统计代理来利用典型oracle的空间结构。我们研究了几个非参数代理,包括二项式高斯过程(B-GP),多项式,核和样条Logistic回归。与此同时,我们开发了自适应平衡学习oracle分布和学习根的策略。我们的一个建议模仿了B-GP的主动学习,并提供了一种新颖的前瞻预测方差公式。我们用空间PBA算法得到的相对于早期G-PBA模型的增益用合成的例子和来自贝尔丹期权定价的具有挑战性的随机根发现问题来说明。
translated by 谷歌翻译
我们开发了一种自动变分方法,用于推导具有高斯过程(GP)先验和一般可能性的模型。该方法支持多个输出和多个潜在函数,不需要条件似然的详细知识,只需将其评估为ablack-box函数。使用高斯混合作为变分分布,我们表明使用来自单变量高斯分布的样本可以有效地估计证据下界及其梯度。此外,该方法可扩展到大数据集,这是通过使用诱导变量使用增广先验来实现的。支持最稀疏GP近似的方法,以及并行计算和随机优化。我们在小数据集,中等规模数据集和大型数据集上定量和定性地评估我们的方法,显示其在不同似然模型和稀疏性水平下的竞争力。在涉及航空延误预测和手写数字分类的大规模实验中,我们表明我们的方法与可扩展的GP回归和分类的最先进的硬编码方法相同。
translated by 谷歌翻译
This paper considers the robust and efficient implementation of Gaussian process regression with a Student-t observation model, which has a non-log-concave likelihood. The challenge with the Student-t model is the analytically intractable inference which is why several approximative methods have been proposed. Expectation propagation (EP) has been found to be a very accurate method in many empirical studies but the convergence of EP is known to be problematic with models containing non-log-concave site functions. In this paper we illustrate the situations where standard EP fails to converge and review different modifications and alternative algorithms for improving the convergence. We demonstrate that convergence problems may occur during the type-II maximum a posteriori (MAP) estimation of the hyperparameters and show that standard EP may not converge in the MAP values with some difficult data sets. We present a robust implementation which relies primarily on parallel EP updates and uses a moment-matching-based double-loop algorithm with adaptively selected step size in difficult cases. The predictive performance of EP is compared with Laplace, variational Bayes, and Markov chain Monte Carlo approximations.
translated by 谷歌翻译
We present a tutorial on Bayesian optimization, a method of finding the maximum of expensive cost functions. Bayesian optimization employs the Bayesian technique of setting a prior over the objective function and combining it with evidence to get a posterior function. This permits a utility-based selection of the next observation to make on the objective function, which must take into account both exploration (sampling from areas of high uncertainty) and exploitation (sampling areas likely to offer improvement over the current best observation). We also present two detailed extensions of Bayesian optimization, with experiments-active user modelling with preferences, and hierarchical reinforcement learning-and a discussion of the pros and cons of Bayesian optimization based on our experiences.
translated by 谷歌翻译
粗粒度模型的自动化构建是物理系统计算机模拟的关键组成部分,是与不确定性量化相关的关键推动因素。与参数输入的高维度和有限的数量有限的严重抑制了相关方法。训练输入/输出对,可以在考虑计算要求的前向模型时生成。在随机异质介质的建模中经常遇到这样的情况,其中微结构的规模需要使用高维随机向量和控制方程的非常精细的离散化。本文提出了一种概率机器学习框架,通过利用问题的物理结构和上下文知识的各个方面,能够在小数据存在的情况下运行。因此,它可以在外推条件下表现得相当好。它通过编码器 - 解码器方案统一维度和模型阶数减少的任务,同时识别稀疏的一组显着的低维微结构特征,并校准可预测输出的廉价粗粒度模型。信息损失以概率预测估计的形式进行计算和量化。学习动力源基于随机变分推理。我们演示了变分目标如何不仅可用于训练粗粒度模型,还可用于建议改进预测的改进。
translated by 谷歌翻译
优化涉及昂贵(计算机)实验的非线性系统,以实现相互冲突的目标,这是一个共同的挑战。当实验数量受到严格限制和/或当目标数量增加时,即使对于基于替代的方法,揭示整套最优解(帕累托前沿)也是不可能实现的。由于不妥协的Paretooptimal解决方案在应用程序中通常没什么意义,因此该工作将搜索到与Pareto前端中心相近的相关解决方案。文章首先描述了这个中心。接下来,提出了用于指导搜索的贝叶斯多目标优化方法。描述了用于检测到中心的会聚的标准。如果标准被触发,则帕累托前沿的加宽中心部分被定位,使得在剩余预算中预测到足够准确的收敛。数值实验表明,当比较现有的贝叶斯算法时,得到的算法C-EHI如何更好地定位帕累托前沿的中心部分。
translated by 谷歌翻译
In many global optimization problems motivated by engineering applications, the number of function evaluations is severely limited by time or cost. To ensure that each evaluation contributes to the localization of good candidates for the role of global minimizer, a sequential choice of evaluation points is usually carried out. In particular, when Kriging is used to interpolate past evaluations, the uncertainty associated with the lack of information on the function can be expressed and used to compute a number of criteria accounting for the interest of an additional evaluation at any given point. This paper introduces minimizer entropy as a new Kriging-based criterion for the sequential choice of points at which the function should be evaluated. Based on stepwise uncertainty reduction, it accounts for the informational gain on the minimizer expected from a new evaluation. The criterion is approximated using conditional simulations of the Gaussian process model behind Kriging, and then inserted into an algorithm similar in spirit to the Efficient Global Optimization (EGO) algorithm. An empirical comparison is carried out between our criterion and expected improvement, one of the reference criteria in the literature. Experimental results indicate major evaluation savings over EGO. Finally, the method, which we call IAGO (for Informational Approach to Global Optimization) is extended to robust optimization problems, where both the factors to be tuned and the function evaluations are corrupted by noise.
translated by 谷歌翻译
Gaussian process (GP) models are widely used in disease mapping as they provide a natural framework for modeling spatial correlations. Their challenges, however, lie in computational burden and memory requirements. In disease mapping models, the other difficulty is inference, which is analytically intractable due to the non-Gaussian observation model. In this paper, we address both these challenges. We show how to efficiently build fully and partially independent conditional (FIC/PIC) sparse approximations for the GP in two-dimensional surface, and how to conduct approximate inference using expectation propagation (EP) algorithm and Laplace approximation (LA). We also propose to combine FIC with a compactly supported covariance function to construct a computationally efficient additive model that can model long and short length-scale spatial correlations simultaneously. The benefit of these approximations is computational. The sparse GPs speed up the computations and reduce the memory requirements. The posterior inference via EP and Laplace approximation is much faster and is practically as accurate as via Markov chain Monte Carlo.
translated by 谷歌翻译
We explore probability modelling of discretization uncertainty for system states defined implicitly by ordinary or partial differential equations. Accounting for this uncertainty can avoid posterior under-coverage when likelihoods are constructed from a coarsely discretized approximation to system equations. A formalism is proposed for inferring a fixed but a priori unknown model trajectory through Bayesian updating of a prior process conditional on model information. A one-step-ahead sampling scheme for interrogating the model is described, its consistency and first order convergence properties are proved, and its computational complexity is shown to be proportional to that of numerical explicit one-step solvers. Examples illustrate the flexibility of this framework to deal with a wide variety of complex and large-scale systems. Within the calibration problem, dis-cretization uncertainty defines a layer in the Bayesian hierarchy, and a Markov chain Monte Carlo algorithm that targets this posterior distribution is presented. This formalism is used for inference on the JAK-STAT delay differential equation model of protein dynamics from indirectly observed measurements. The discussion outlines implications for the new field of probabilistic numerics.
translated by 谷歌翻译
We consider prediction and uncertainty analysis for systems which are approximated using complex mathematical models. Such models, implemented as computer codes, are often generic in the sense that by a suitable choice of some of the model's input parameters the code can be used to predict the behaviour of the system in a variety of speci®c applications. However, in any speci®c application the values of necessary parameters may be unknown. In this case, physical observations of the system in the speci®c context are used to learn about the unknown parameters. The process of ®tting the model to the observed data by adjusting the parameters is known as calibration. Calibration is typically effected by ad hoc ®tting, and after calibration the model is used, with the ®tted input values, to predict the future behaviour of the system. We present a Bayesian calibration technique which improves on this traditional approach in two respects. First, the predictions allow for all sources of uncertainty, including the remaining uncertainty over the ®tted parameters. Second, they attempt to correct for any inadequacy of the model which is revealed by a discrepancy between the observed data and the model predictions from even the best-®tting parameter values. The method is illustrated by using data from a nuclear radiation release at Tomsk, and from a more complex simulated nuclear accident exercise. 1. Overview 1.1. Computer models and calibration Various sciences use mathematical models to describe processes that would otherwise be very dif®cult to analyse, and these models are typically implemented in computer codes. Often, the mathematical model is highly complex, and the resulting computer code is large and may be expensive in terms of the computer time required for a single run. Nevertheless, running the computer model will be much cheaper than making direct observations of the process. Sacks, Welch, Mitchell and Wynn (1989) have given several examples. The codes that we consider are deterministic, i.e. running the code with the same inputs always produces the same output. Computer models are generally designed to be applicable to a wide range of particular contexts. However, to use a model to make predictions in a speci®c context it may be necessary ®rst to calibrate the model by using some observed data. To illustrate this process we introduce a simple example. Two more examples are described in detail in Section 2.2. To decide on a dose regime (e.g. size, frequency and release rates of tablets) for a new drug, a pharmacokinetic model is used. This models the movement of the drug through various`compartments various`compartments' of the patient's body and its eventual elimination (e.g. by chemical reactions
translated by 谷歌翻译
When monitoring spatial phenomena, which can often be modeled as Gaussian processes (GPs), choosing sensor locations is a fundamental task. There are several common strategies to address this task, for example, geometry or disk models, placing sensors at the points of highest entropy (vari-ance) in the GP model, and AD D-, or E-optimal design. In this paper, we tackle the combinatorial optimization problem of maximizing the mutual information between the chosen locations and the locations which are not selected. We prove that the problem of finding the configuration that maximizes mutual information is NP-complete. To address this issue, we describe a polynomial-time approximation that is within (1 − 1/e) of the optimum by exploiting the submodularity of mutual information. We also show how submodularity can be used to obtain online bounds, and design branch and bound search procedures. We then extend our algorithm to exploit lazy evaluations and local structure in the GP, yielding significant speedups. We also extend our approach to find placements which are robust against node failures and uncertainties in the model. These extensions are again associated with rigorous theoretical approximation guarantees, exploiting the submodu-larity of the objective function. We demonstrate the advantages of our approach towards optimizing mutual information in a very extensive empirical study on two real-world data sets.
translated by 谷歌翻译