深度高斯过程(DGP)是GaussianProcesses的分层概括,它将良好校准的不确定性估计与多层模型的高灵活性相结合。这些模型面临的最大挑战之一是精确推断是难以处理的。当前的现有技术参考方法,变分推理(VI),采用高斯近似到后验分布。这可能是通常多模式后路的潜在差的单峰近似。在这项工作中,我们提供了后验的非高斯性质的证据,并且我们应用随机梯度哈密顿蒙特卡罗方法直接从中进行采样。为了有效地优化超参数,我们引入了移动窗口MCEM算法。与VI对应的计算成本相比,这导致了更好的预测。因此,我们的方法为DGP中的推理建立了一种新的先进技术。
translated by 谷歌翻译
我们开发了一种自动变分方法,用于推导具有高斯过程(GP)先验和一般可能性的模型。该方法支持多个输出和多个潜在函数,不需要条件似然的详细知识,只需将其评估为ablack-box函数。使用高斯混合作为变分分布,我们表明使用来自单变量高斯分布的样本可以有效地估计证据下界及其梯度。此外,该方法可扩展到大数据集,这是通过使用诱导变量使用增广先验来实现的。支持最稀疏GP近似的方法,以及并行计算和随机优化。我们在小数据集,中等规模数据集和大型数据集上定量和定性地评估我们的方法,显示其在不同似然模型和稀疏性水平下的竞争力。在涉及航空延误预测和手写数字分类的大规模实验中,我们表明我们的方法与可扩展的GP回归和分类的最先进的硬编码方法相同。
translated by 谷歌翻译
Recent work on Bayesian optimization has shown its effectiveness in global optimization of difficult black-box objective functions. Many real-world optimization problems of interest also have constraints which are unknown a priori. In this paper, we study Bayesian optimization for constrained problems in the general case that noise may be present in the constraint functions, and the objective and constraints may be evaluated independently. We provide motivating practical examples, and present a general framework to solve such problems. We demonstrate the effectiveness of our approach on optimizing the performance of online latent Dirichlet allocation subject to topic sparsity constraints, tuning a neural network given test-time memory constraints, and optimizing Hamiltonian Monte Carlo to achieve maximal effectiveness in a fixed time, subject to passing standard convergence diagnostics.
translated by 谷歌翻译
我们提出了一种自适应方法来构建贝叶斯推理的高斯过程,并使用昂贵的评估正演模型。我们的方法依赖于完全贝叶斯方法来训练高斯过程模型,并利用贝叶斯全局优化的预期改进思想。我们通过最大化高斯过程模型与噪声观测数据拟合的预期改进来自适应地构建训练设计。对合成数据模型问题的数值实验证明了所获得的自适应设计与固定非自适应设计相比,在前向模型推断成本的精确后验估计方面的有效性。
translated by 谷歌翻译
We develop a scalable deep non-parametric generative model by augmenting deep Gaussian processes with a recognition model. Inference is performed in a novel scalable variational framework where the variational posterior distributions are reparametrized through a multilayer perceptron. The key aspect of this reformula-tion is that it prevents the proliferation of variational parameters which otherwise grow linearly in proportion to the sample size. We derive a new formulation of the variational lower bound that allows us to distribute most of the computation in a way that enables to handle datasets of the size of mainstream deep learning tasks. We show the efficacy of the method on a variety of challenges including deep unsupervised learning and deep Bayesian optimization.
translated by 谷歌翻译
Gaussian processes (GPs) are a good choice for function approximation as theyare flexible, robust to over-fitting, and provide well-calibrated predictiveuncertainty. Deep Gaussian processes (DGPs) are multi-layer generalisations ofGPs, but inference in these models has proved challenging. Existing approachesto inference in DGP models assume approximate posteriors that forceindependence between the layers, and do not work well in practice. We present adoubly stochastic variational inference algorithm, which does not forceindependence between layers. With our method of inference we demonstrate that aDGP model can be used effectively on data ranging in size from hundreds to abillion points. We provide strong empirical evidence that our inference schemefor DGPs works well in practice in both classification and regression.
translated by 谷歌翻译
许多对科学计算和机器学习感兴趣的概率模型具有昂贵的黑盒可能性,这些可能性阻止了贝叶斯推理的标准技术的应用,例如MCMC,其需要接近梯度或大量可能性评估。我们在这里介绍一种新颖的样本有效推理框架,VariationalBayesian Monte Carlo(VBMC)。 VBMC将变分推理与基于高斯过程的有源采样贝叶斯积分结合起来,使用latterto有效逼近变分目标中的难以求的积分。我们的方法产生了后验分布的非参数近似和模型证据的近似下界,对模型选择很有用。我们在几种合成可能性和神经元模型上展示VBMC,其中包含来自真实神经元的数据。在所有测试的问题和维度(高达$ D = 10 $)中,VBMC始终如一地通过有限的可能性评估预算重建后验证和模型证据,而不像其他仅在非常低维度下工作的方法。我们的框架作为一种新颖的工具,具有昂贵的黑盒可能性,可用于后期模型推理。
translated by 谷歌翻译
条件密度估计(CDE)模型处理估计条件分布。对分配施加的条件是模型的输入。 CDE是一项具有挑战性的任务,因为模型复杂性,代表性能力和过度拟合之间存在根本的权衡。在这项工作中,我们建议用潜在变量扩展模型的输入,并使用高斯过程(GP)将这个增强的输入映射到条件分布的样本上。我们的贝叶斯方法允许对小数据集进行建模,但我们也提供了使用随机变分推理将其应用于大数据的机制。我们的方法可用于在稀疏数据区域中对densitieseven进行建模,并允许在条件之间共享学习结构。我们说明了我们的模型在各种现实问题上的有效性和广泛适用性,例如出租车下降的时空密度估计,非高斯噪声建模,以及对全方位图像的少量学习。
translated by 谷歌翻译
获取有关嘈杂昂贵的黑盒功能(计算机模拟或物理实验)的信息是一个极具挑战性的问题。有限的计算和财务资源限制了传统方法在实验设计中的应用。当问题中的感兴趣量(QoI)取决于昂贵的黑盒功能时,问题就会被诸如数值误差和随机近似误差等障碍所克服。贝叶斯优化设计的实验已经相当成功地引导设计者针对上述问题的QoI。这通常是通过按照与效用理论兼容的填充采样标准选择的设计顺序查询函数来实现的。但是,大多数当前方法在语义上设计为仅用于优化或推断黑盒功能本身。我们的目标是构建一种启发式方法,无论QoI如何,都能明确地处理上述问题。本文采用上述启发式推断出特定的QoI,即函数的期望值(期望值)。 Kullback Leibler(KL)的差异在用于量化信息增益的技术中相当明显。在本文中,我们推导出预期KLdivergence的表达式,以顺序推断我们的QoI。由Karhunene Loeve扩展围绕黑盒函数的高斯过程(GP)表示提供的分析易处理性允许绕过与样本平均相关的数值问题。建议的方法可以通过合理的假设扩展到任何QoI。所提出的方法在三个具有不同复杂度和维数水平的综合函数上得到验证和验证。我们在钢丝制造问题上展示了我们的方法论。
translated by 谷歌翻译
The Gaussian process latent variable model (GP-LVM) provides a flexible approach for non-linear dimensionality reduction that has been widely applied. However, the current approach for training GP-LVMs is based on maximum likelihood, where the latent projection variables are maximised over rather than integrated out. In this paper we present a Bayesian method for training GP-LVMs by introducing a non-standard variational inference framework that allows to approximately integrate out the latent variables and subsequently train a GP-LVM by maximising an analytic lower bound on the exact marginal likelihood. We apply this method for learning a GP-LVM from i.i.d. observations and for learning non-linear dynamical systems where the observations are temporally correlated. We show that a benefit of the variational Bayesian procedure is its robustness to overfitting and its ability to automatically select the dimensionality of the non-linear latent space. The resulting framework is generic, flexible and easy to extend for other purposes, such as Gaussian process regression with uncertain or partially missing inputs. We demonstrate our method on synthetic data and standard machine learning benchmarks, as well as challenging real world datasets, including high resolution video data.
translated by 谷歌翻译
我们引入了完全可扩展的高斯过程,这是一种实现方案,解决了与高维输入数据一起处理大量训练实例的问题。我们的关键思想是在诱导变量(称为子空间诱导输入)之上的表示技巧。这与基于矩阵预处理的变分分布的参数化相结合,这导致简化和数值稳定的变分下界。我们的说明性应用程序基于挑战极端多标签分类问题,以及大量类标签的额外负担。我们通过呈现预测性能以及具有极大数量的实例和输入维度的低计算时间indatase来证明我们的方法的有用性。
translated by 谷歌翻译
Deep Gaussian processes (DGPs) are multi-layer hierarchical generalisationsof Gaussian processes (GPs) and are formally equivalent to neural networks withmultiple, infinitely wide hidden layers. DGPs are nonparametric probabilisticmodels and as such are arguably more flexible, have a greater capacity togeneralise, and provide better calibrated uncertainty estimates thanalternative deep models. This paper develops a new approximate Bayesianlearning scheme that enables DGPs to be applied to a range of medium to largescale regression problems for the first time. The new method uses anapproximate Expectation Propagation procedure and a novel and efficientextension of the probabilistic backpropagation algorithm for learning. Weevaluate the new method for non-linear regression on eleven real-worlddatasets, showing that it always outperforms GP regression and is almost alwaysbetter than state-of-the-art deterministic and sampling-based approximateinference methods for Bayesian neural networks. As a by-product, this workprovides a comprehensive analysis of six approximate Bayesian methods fortraining neural networks.
translated by 谷歌翻译
We introduce a variational inference framework for training the Gaussian process latent variable model and thus performing Bayesian nonlinear dimensionality reduction. This method allows us to variationally integrate out the input variables of the Gaussian process and compute a lower bound on the exact marginal likelihood of the nonlinear latent variable model. The maxi-mization of the variational lower bound provides a Bayesian training procedure that is robust to overfitting and can automatically select the di-mensionality of the nonlinear latent space. We demonstrate our method on real world datasets. The focus in this paper is on dimensionality reduction problems, but the methodology is more general. For example, our algorithm is immediately applicable for training Gaussian process models in the presence of missing or uncertain inputs.
translated by 谷歌翻译
Standard Gaussian processes (GPs) model observations' noise as constant throughout input space. This is often a too restrictive assumption, but one that is needed for GP inference to be tractable. In this work we present a non-standard variational approximation that allows accurate inference in heteroscedastic GPs (i.e., under input-dependent noise conditions). Computational cost is roughly twice that of the standard GP, and also scales as O(n 3). Accuracy is verified by comparing with the golden standard MCMC and its effectiveness is illustrated on several synthetic and real datasets of diverse characteristics. An application to volatility forecasting is also considered.
translated by 谷歌翻译
我们报告了在随机计算机实验背景下对条件数字估计的主要策略的实证研究。为了确保足够的多样性,提出了六种元模型,基于秩序统计,功能方法和贝叶斯灵感来分为三类。元模型在几个问题上进行了测试,这些问题的特征在于训练集的大小,输入维数,元素顺序以及分位数邻域中概率密度函数的值。研究的元模型揭示了480套实验中的良好对比,可以提取几种模式。根据我们的结果,建议指南允许用户为给定问题选择最佳方法。
translated by 谷歌翻译
This work brings together two powerful concepts in Gaussian processes: the variational approach to sparse approximation and the spectral representation of Gaussian processes. This gives rise to an approximation that inherits the benefits of the variational approach but with the representational power and computational scalability of spectral representations. The work hinges on a key result that there exist spectral features related to a finite domain of the Gaussian process which exhibit almost-independent covariances. We derive these expressions for Matérn kernels in one dimension, and generalize to more dimensions using kernels with specific structures. Under the assumption of additive Gaussian noise, our method requires only a single pass through the dataset, making for very fast and accurate computation. We fit a model to 4 million training points in just a few minutes on a standard laptop. With non-conjugate likelihoods, our MCMC scheme reduces the cost of computation from O(N M 2) (for a sparse Gaussian process) to O(N M) per iteration, where N is the number of data and M is the number of features.
translated by 谷歌翻译
We develop an efficient, Bayesian Uncertainty Quantification framework using a novel treed Gaussian process model. The tree is adaptively constructed using information conveyed by the observed data about the length scales of the underlying process. On each leaf of the tree, we utilize Bayesian Experimental Design techniques in order to learn a multi-output Gaussian process. The constructed surrogate can provide analytical point estimates, as well as error bars, for the statistics of interest. We numerically demonstrate the effectiveness of the suggested framework in identifying discontinuities, local features and unimportant dimensions in the solution of stochastic differential equations.
translated by 谷歌翻译
当廉价获得时,显着地使用多保真方法,但是可能存在偏差和噪声,观察必须与有限或昂贵的真实数据有效地组合以构建可靠的模型。这两个基本的机器学习过程,如贝叶斯优化,以及更实际的科学和工程应用。本文我们开发了一种新的多保真模型,将高斯过程层作为保真度水平处理,并使用变分推理方法进行传播。他们的不确定性。这允许捕获保真度之间的非线性相关性,其中过度拟合利用组成结构的现有方法的风险较低,这相反地受到结构假设和约束的影响。我们表明,所提出的方法在多保真设置中量化和传播不确定性方面取得了实质性的改进,从而提高了决策管道的有效性。
translated by 谷歌翻译
In many global optimization problems motivated by engineering applications, the number of function evaluations is severely limited by time or cost. To ensure that each evaluation contributes to the localization of good candidates for the role of global minimizer, a sequential choice of evaluation points is usually carried out. In particular, when Kriging is used to interpolate past evaluations, the uncertainty associated with the lack of information on the function can be expressed and used to compute a number of criteria accounting for the interest of an additional evaluation at any given point. This paper introduces minimizer entropy as a new Kriging-based criterion for the sequential choice of points at which the function should be evaluated. Based on stepwise uncertainty reduction, it accounts for the informational gain on the minimizer expected from a new evaluation. The criterion is approximated using conditional simulations of the Gaussian process model behind Kriging, and then inserted into an algorithm similar in spirit to the Efficient Global Optimization (EGO) algorithm. An empirical comparison is carried out between our criterion and expected improvement, one of the reference criteria in the literature. Experimental results indicate major evaluation savings over EGO. Finally, the method, which we call IAGO (for Informational Approach to Global Optimization) is extended to robust optimization problems, where both the factors to be tuned and the function evaluations are corrupted by noise.
translated by 谷歌翻译