我们提出了一种自适应方法来构建贝叶斯推理的高斯过程,并使用昂贵的评估正演模型。我们的方法依赖于完全贝叶斯方法来训练高斯过程模型,并利用贝叶斯全局优化的预期改进思想。我们通过最大化高斯过程模型与噪声观测数据拟合的预期改进来自适应地构建训练设计。对合成数据模型问题的数值实验证明了所获得的自适应设计与固定非自适应设计相比,在前向模型推断成本的精确后验估计方面的有效性。
translated by 谷歌翻译
我们开发了一种自动变分方法,用于推导具有高斯过程(GP)先验和一般可能性的模型。该方法支持多个输出和多个潜在函数,不需要条件似然的详细知识,只需将其评估为ablack-box函数。使用高斯混合作为变分分布,我们表明使用来自单变量高斯分布的样本可以有效地估计证据下界及其梯度。此外,该方法可扩展到大数据集,这是通过使用诱导变量使用增广先验来实现的。支持最稀疏GP近似的方法,以及并行计算和随机优化。我们在小数据集,中等规模数据集和大型数据集上定量和定性地评估我们的方法,显示其在不同似然模型和稀疏性水平下的竞争力。在涉及航空延误预测和手写数字分类的大规模实验中,我们表明我们的方法与可扩展的GP回归和分类的最先进的硬编码方法相同。
translated by 谷歌翻译
我们考虑学习嘈杂的黑盒功能超过给定阈值的水平集的问题。为了有效地重建水平集,我们研究了高斯过程(GP)元模型。我们的重点是强随机采样器,特别是重尾模拟噪声和低信噪比。为了防止噪声错误指定,我们评估了三个变量的性能:(i)具有Student-$ t $观察值的GP; (ii)学生 - $ t $流程(TP); (iii)分类GP对响应的符号进行建模。作为第四个扩展,我们研究具有单调性约束的GP代理,这些约束在已知连接的级别集时是相关的。结合这些模型,我们分析了几个采集函数,用于指导顺序实验设计,将现有的逐步不确定性减少标准扩展到随机轮廓发现环境。这也促使我们开发(近似)更新公式以有效地计算取代函数。我们的方案通过在1-6维度中使用各种合成实验进行基准测试。我们还考虑应用水平集估计来确定最佳的运动政策和百慕大金融期权的估值。
translated by 谷歌翻译
We develop an automated variational method for approximate inference in Gaus-sian process (GP) models whose posteriors are often intractable. Using a mixture of Gaussians as the variational distribution, we show that (i) the variational objective and its gradients can be approximated efficiently via sampling from univari-ate Gaussian distributions and (ii) the gradients wrt the GP hyperparameters can be obtained analytically regardless of the model likelihood. We further propose two instances of the variational distribution whose covariance matrices can be parametrized linearly in the number of observations. These results allow gradient-based optimization to be done efficiently in a black-box manner. Our approach is thoroughly verified on five models using six benchmark datasets, performing as well as the exact or hard-coded implementations while running orders of magnitude faster than the alternative MCMC sampling approaches. Our method can be a valuable tool for practitioners and researchers to investigate new models with minimal effort in deriving model-specific inference algorithms.
translated by 谷歌翻译
近似贝叶斯计算(ABC)是贝叶斯推理的一种方法,当可能性不可用时,但是可以从模型中进行模拟。然而,许多ABC算法需要大量的模拟,这可能是昂贵的。为了降低计算成本,已经提出了贝叶斯优化(BO)和诸如高斯过程的模拟模型。贝叶斯优化使人们可以智能地决定在哪里评估模型下一个,但是常见的BO策略不是为了估计后验分布而设计的。我们的论文解决了文献中的这一差距。我们建议计算ABC后验密度的不确定性,这是因为缺乏模拟来准确估计这个数量,并且定义了测量这种不确定性的aloss函数。然后,我们建议选择下一个评估位置,以尽量减少预期的损失。实验表明,与普通BO策略相比,所提出的方法通常产生最准确的近似。
translated by 谷歌翻译
Standard Gaussian processes (GPs) model observations' noise as constant throughout input space. This is often a too restrictive assumption, but one that is needed for GP inference to be tractable. In this work we present a non-standard variational approximation that allows accurate inference in heteroscedastic GPs (i.e., under input-dependent noise conditions). Computational cost is roughly twice that of the standard GP, and also scales as O(n 3). Accuracy is verified by comparing with the golden standard MCMC and its effectiveness is illustrated on several synthetic and real datasets of diverse characteristics. An application to volatility forecasting is also considered.
translated by 谷歌翻译
We propose a novel sampling framework for inference in probabilistic models: an active learning approach that converges more quickly (in wall-clock time) than Markov chain Monte Carlo (MCMC) benchmarks. The central challenge in proba-bilistic inference is numerical integration, to average over ensembles of models or unknown (hyper-)parameters (for example to compute the marginal likelihood or a partition function). MCMC has provided approaches to numerical integration that deliver state-of-the-art inference, but can suffer from sample inefficiency and poor convergence diagnostics. Bayesian quadrature techniques offer a model-based solution to such problems, but their uptake has been hindered by prohibitive computation costs. We introduce a warped model for probabilistic integrands (like-lihoods) that are known to be non-negative, permitting a cheap active learning scheme to optimally select sample locations. Our algorithm is demonstrated to offer faster convergence (in seconds) relative to simple Monte Carlo and annealed importance sampling on both synthetic and real-world examples.
translated by 谷歌翻译
This paper considers the robust and efficient implementation of Gaussian process regression with a Student-t observation model, which has a non-log-concave likelihood. The challenge with the Student-t model is the analytically intractable inference which is why several approximative methods have been proposed. Expectation propagation (EP) has been found to be a very accurate method in many empirical studies but the convergence of EP is known to be problematic with models containing non-log-concave site functions. In this paper we illustrate the situations where standard EP fails to converge and review different modifications and alternative algorithms for improving the convergence. We demonstrate that convergence problems may occur during the type-II maximum a posteriori (MAP) estimation of the hyperparameters and show that standard EP may not converge in the MAP values with some difficult data sets. We present a robust implementation which relies primarily on parallel EP updates and uses a moment-matching-based double-loop algorithm with adaptively selected step size in difficult cases. The predictive performance of EP is compared with Laplace, variational Bayes, and Markov chain Monte Carlo approximations.
translated by 谷歌翻译
We propose a simple and effective variational inference algorithm based on stochastic optimi-sation that can be widely applied for Bayesian non-conjugate inference in continuous parameter spaces. This algorithm is based on stochastic approximation and allows for efficient use of gradient information from the model joint density. We demonstrate these properties using illustrative examples as well as in challenging and diverse Bayesian inference problems such as variable selection in logistic regression and fully Bayesian inference over kernel hyperparameters in Gaussian process regression.
translated by 谷歌翻译
Numerical integration is a key component of many problems in scientific computing , statistical modelling, and machine learning. Bayesian Quadrature is a model-based method for numerical integration which, relative to standard Monte Carlo methods, offers increased sample efficiency and a more robust estimate of the uncertainty in the estimated integral. We propose a novel Bayesian Quadrature approach for numerical integration when the integrand is non-negative, such as the case of computing the marginal likelihood, predictive distribution, or normal-ising constant of a probabilistic model. Our approach approximately marginalises the quadrature model's hyperparameters in closed form, and introduces an active learning scheme to optimally select function evaluations, as opposed to using Monte Carlo samples. We demonstrate our method on both a number of synthetic benchmarks and a real scientific problem from astronomy.
translated by 谷歌翻译
We propose a general algorithm for approximating nonstandard Bayesian posterior distributions. The algorithm minimizes the Kullback-Leibler divergence of an approximating distribution to the intractable posterior distribution. Our method can be used to approximate any posterior distribution, provided that it is given in closed form up to the proportionality constant. The approximation can be any distribution in the exponential family or any mixture of such distributions, which means that it can be made arbitrarily precise. Several examples illustrate the speed and accuracy of our approximation method in practice.
translated by 谷歌翻译
获取有关嘈杂昂贵的黑盒功能(计算机模拟或物理实验)的信息是一个极具挑战性的问题。有限的计算和财务资源限制了传统方法在实验设计中的应用。当问题中的感兴趣量(QoI)取决于昂贵的黑盒功能时,问题就会被诸如数值误差和随机近似误差等障碍所克服。贝叶斯优化设计的实验已经相当成功地引导设计者针对上述问题的QoI。这通常是通过按照与效用理论兼容的填充采样标准选择的设计顺序查询函数来实现的。但是,大多数当前方法在语义上设计为仅用于优化或推断黑盒功能本身。我们的目标是构建一种启发式方法,无论QoI如何,都能明确地处理上述问题。本文采用上述启发式推断出特定的QoI,即函数的期望值(期望值)。 Kullback Leibler(KL)的差异在用于量化信息增益的技术中相当明显。在本文中,我们推导出预期KLdivergence的表达式,以顺序推断我们的QoI。由Karhunene Loeve扩展围绕黑盒函数的高斯过程(GP)表示提供的分析易处理性允许绕过与样本平均相关的数值问题。建议的方法可以通过合理的假设扩展到任何QoI。所提出的方法在三个具有不同复杂度和维数水平的综合函数上得到验证和验证。我们在钢丝制造问题上展示了我们的方法论。
translated by 谷歌翻译
由于相关变量的边缘化以及MCMC和VI等随机方法的普遍使用,分析不可处理的后验的统计推断是一个难题。我们认为MCMC和VI使用的随机KLdivergence最小化是有噪声的,我们建议代替EL_2O,L_2距离的期望优化在近似log后验q和p的非标准化log后面之间。当从q采样时,解与在大样本限制中基于VI的随机KL散度最小化一致,但是EL_2O方法没有采样噪声,具有更好的优化特性,并且如果q覆盖p,则仅需要与我们优化的参数的数量一样多的样本评估。因此,增加q的表达性可以改善结果的质量和收敛速度,从而允许EL_2O接近精确的推理。使用自动分化方法使我们能够开发该方法的Hessian,梯度和无梯度版本,其可以分别用单个样本确定q的M(M + 2)/ 2 + 1,M + 1和1参数。 EL_2O提供了近似后验质量的可靠估计,并且快速收敛q和超出它的扩展的全高斯高斯近似,例如非线性变换和高斯混合。这些可以处理通用外部,同时仍然允许快速分析边缘化。我们在几个例子上测试它,包括一个逼真的13维星系聚类分析,显示它比MCMC快几个数量级,同时提供平滑和准确的非高斯后验,通常只需要几十到几十次迭代。
translated by 谷歌翻译
Structured additive regression models are perhaps the most commonly used class of models in statistical applications. It includes, among others, (generalized) linear models, (gener-alized) additive models, smoothing spline models, state space models, semiparametric regression , spatial and spatiotemporal models, log-Gaussian Cox processes and geostatistical and geoadditive models. We consider approximate Bayesian inference in a popular subset of struc-tured additive regression models, latent Gaussian models, where the latent field is Gaussian, controlled by a few hyperparameters and with non-Gaussian response variables. The posterior marginals are not available in closed form owing to the non-Gaussian response variables. For such models, Markov chain Monte Carlo methods can be implemented, but they are not without problems, in terms of both convergence and computational time. In some practical applications, the extent of these problems is such that Markov chain Monte Carlo sampling is simply not an appropriate tool for routine analysis. We show that, by using an integrated nested Laplace approximation and its simplified version, we can directly compute very accurate approximations to the posterior marginals. The main benefit of these approximations is computational: where Markov chain Monte Carlo algorithms need hours or days to run, our approximations provide more precise estimates in seconds or minutes. Another advantage with our approach is its generality , which makes it possible to perform Bayesian analysis in an automatic, streamlined way, and to compute model comparison criteria and various predictive measures so that models can be compared and the model under study can be challenged.
translated by 谷歌翻译
We provide a comprehensive overview of many recent algorithms for approximate inference in Gaussian process models for probabilistic binary classification. The relationships between several approaches are elucidated theoretically, and the properties of the different algorithms are corroborated by experimental results. We examine both 1) the quality of the predictive distributions and 2) the suitability of the different marginal likelihood approximations for model selection (selecting hyperparameters) and compare to a gold standard based on MCMC. Interestingly, some methods produce good predictive distributions although their marginal likelihood approximations are poor. Strong conclusions are drawn about the methods: The Expectation Propagation algorithm is almost always the method of choice unless the computational budget is very tight. We also extend existing methods in various ways, and provide unifying code implementing all approaches.
translated by 谷歌翻译
Our paper deals with inferring simulator-based statistical models given some observed data. A simulator-based model is a parametrized mechanism which specifies how data are generated. It is thus also referred to as generative model. We assume that only a finite number of parameters are of interest and allow the generative process to be very general; it may be a noisy nonlinear dynamical system with an unrestricted number of hidden variables. This weak assumption is useful for devising realistic models but it renders statistical inference very difficult. The main challenge is the intractability of the likelihood function. Several likelihood-free inference methods have been proposed which share the basic idea of identifying the parameters by finding values for which the discrepancy between simulated and observed data is small. A major obstacle to using these methods is their computational cost. The cost is largely due to the need to repeatedly simulate data sets and the lack of knowledge about how the parameters affect the discrepancy. We propose a strategy which combines probabilistic modeling of the discrepancy with optimization to facilitate likelihood-free inference. The strategy is implemented using Bayesian optimization and is shown to accelerate the inference through a reduction in the number of required simulations by several orders of magnitude.
translated by 谷歌翻译
We propose a novel approach for nonlinear regression using a two-layer neural network (NN) model structure with sparsity-favoring hierarchical priors on the network weights. We present an expectation propagation (EP) approach for approximate integration over the posterior distribution of the weights, the hierarchical scale parameters of the priors, and the residual scale. Using a factorized posterior approximation we derive a computation-ally efficient algorithm, whose complexity scales similarly to an ensemble of independent sparse linear models. The approach enables flexible definition of weight priors with different sparseness properties such as independent Laplace priors with a common scale parameter or Gaussian automatic relevance determination (ARD) priors with different relevance parameters for all inputs. The approach can be extended beyond standard activation functions and NN model structures to form flexible nonlinear predictors from multiple sparse linear models. The effects of the hierarchical priors and the predictive performance of the algorithm are assessed using both simulated and real-world data. Comparisons are made to two alternative models with ARD priors: a Gaussian process with a NN covariance function and marginal maximum a posteriori estimates of the relevance parameters, and a NN with Markov chain Monte Carlo integration over all the unknown model parameters.
translated by 谷歌翻译
Gaussian process (GP) models are widely used in disease mapping as they provide a natural framework for modeling spatial correlations. Their challenges, however, lie in computational burden and memory requirements. In disease mapping models, the other difficulty is inference, which is analytically intractable due to the non-Gaussian observation model. In this paper, we address both these challenges. We show how to efficiently build fully and partially independent conditional (FIC/PIC) sparse approximations for the GP in two-dimensional surface, and how to conduct approximate inference using expectation propagation (EP) algorithm and Laplace approximation (LA). We also propose to combine FIC with a compactly supported covariance function to construct a computationally efficient additive model that can model long and short length-scale spatial correlations simultaneously. The benefit of these approximations is computational. The sparse GPs speed up the computations and reduce the memory requirements. The posterior inference via EP and Laplace approximation is much faster and is practically as accurate as via Markov chain Monte Carlo.
translated by 谷歌翻译
概率二分算法基于从嘈杂的oracle响应中获得的知识来执行根查找。我们考虑广义PBA设置(G-PBA),其中oracle的统计分布是未知的和位置依赖的,因此模型推理和贝叶斯知识更新必须同时进行。为此,我们建议通过构建基础逻辑回归步骤的统计代理来利用典型oracle的空间结构。我们研究了几个非参数代理,包括二项式高斯过程(B-GP),多项式,核和样条Logistic回归。与此同时,我们开发了自适应平衡学习oracle分布和学习根的策略。我们的一个建议模仿了B-GP的主动学习,并提供了一种新颖的前瞻预测方差公式。我们用空间PBA算法得到的相对于早期G-PBA模型的增益用合成的例子和来自贝尔丹期权定价的具有挑战性的随机根发现问题来说明。
translated by 谷歌翻译
我们提出了一个改进的贝叶斯框架,用于执行约束函数的仿射变换的推断。我们专注于具有非负函数的正交,这是贝叶斯推理中的常见任务。我们考虑关于感兴趣函数范围的约束,例如非负性或有界性。虽然我们的框架是通用的,但我们推导出这些约束的explicitapproximation方案,并争论对具有高动态范围的函数(如可能性表面)使用对数转换。我们提出了一种在该框架中优化超参数的新方法:我们优化原始空间中的边际似然,与变换空间中的对应。结果是一个更好地解释实际数据的模型。对合成和现实世界数据的实验表明,我们的框架使用比现有贝叶斯正交程序更少的挂钟时间来实现更高的估计。
translated by 谷歌翻译