We propose a sparse method for scalable automated variational inference (AVI) in a large class of models with Gaussian process (GP) priors, multiple latent functions , multiple outputs and non-linear likelihoods. Our approach maintains the statistical efficiency property of the original AVI method, requiring only expectations over univariate Gaussian distributions to approximate the posterior with a mixture of Gaussians. Experiments on small datasets for various problems including regression, classification, Log Gaussian Cox processes, and warped GPs show that our method can perform as well as the full method under high sparsity levels. On larger experiments using the MNIST and the SARCOS datasets we show that our method can provide superior performance to previously published scalable approaches that have been handcrafted to specific likelihood models.
translated by 谷歌翻译
We develop an automated variational method for approximate inference in Gaus-sian process (GP) models whose posteriors are often intractable. Using a mixture of Gaussians as the variational distribution, we show that (i) the variational objective and its gradients can be approximated efficiently via sampling from univari-ate Gaussian distributions and (ii) the gradients wrt the GP hyperparameters can be obtained analytically regardless of the model likelihood. We further propose two instances of the variational distribution whose covariance matrices can be parametrized linearly in the number of observations. These results allow gradient-based optimization to be done efficiently in a black-box manner. Our approach is thoroughly verified on five models using six benchmark datasets, performing as well as the exact or hard-coded implementations while running orders of magnitude faster than the alternative MCMC sampling approaches. Our method can be a valuable tool for practitioners and researchers to investigate new models with minimal effort in deriving model-specific inference algorithms.
translated by 谷歌翻译
We investigate the capabilities and limitations of Gaussian process models by jointly exploring three complementary directions: (i) scalable and statistically efficient inference; (ii) flexible kernels; and (iii) objective functions for hyperparameter learning alternative to the marginal likelihood. Our approach outperforms all previously reported gp methods on the standard mnist dataset; performs comparatively to previous kernel-based methods using the rectangles-image dataset; and breaks the 1% error-rate barrier in gp models using the mnist8m dataset, showing along the way the scalability of our method at unprecedented scale for gp models (8 million observations) in classification problems. Overall, our approach represents a significant breakthrough in kernel methods and gp models, bridging the gap between deep learning approaches and kernel machines.
translated by 谷歌翻译
许多现代无监督或半监督机器学习算法依赖于贝叶斯概率模型。这些模型通常难以处理,因此需要进行近似推断。变分推理(VI)通过解决优化问题,使我们可以通过更简单的变分分布来近似高维贝叶斯后验。这种方法已成功应用于各种模型和大规模应用。在这篇综述中,我们对变分推断的最新趋势进行了概述。我们首先介绍standardmean字段变分推理,然后回顾以下方面的最新进展:(a)可扩展的VI,包括随机近似,(b)通用VI,它将VI的适用性扩展到一大类其他难以处理的模型,如非共轭模型,(c)准确的VI,其中包括超出平均场近似或非典型差异的变分模型,以及(d)摊销的VI,它利用推理网络实现推理超局部潜在变量。最后,我们提供了有希望的未来研究方向的摘要。
translated by 谷歌翻译
我们引入了完全可扩展的高斯过程,这是一种实现方案,解决了与高维输入数据一起处理大量训练实例的问题。我们的关键思想是在诱导变量(称为子空间诱导输入)之上的表示技巧。这与基于矩阵预处理的变分分布的参数化相结合,这导致简化和数值稳定的变分下界。我们的说明性应用程序基于挑战极端多标签分类问题,以及大量类标签的额外负担。我们通过呈现预测性能以及具有极大数量的实例和输入维度的低计算时间indatase来证明我们的方法的有用性。
translated by 谷歌翻译
The Gaussian process latent variable model (GP-LVM) provides a flexible approach for non-linear dimensionality reduction that has been widely applied. However, the current approach for training GP-LVMs is based on maximum likelihood, where the latent projection variables are maximised over rather than integrated out. In this paper we present a Bayesian method for training GP-LVMs by introducing a non-standard variational inference framework that allows to approximately integrate out the latent variables and subsequently train a GP-LVM by maximising an analytic lower bound on the exact marginal likelihood. We apply this method for learning a GP-LVM from i.i.d. observations and for learning non-linear dynamical systems where the observations are temporally correlated. We show that a benefit of the variational Bayesian procedure is its robustness to overfitting and its ability to automatically select the dimensionality of the non-linear latent space. The resulting framework is generic, flexible and easy to extend for other purposes, such as Gaussian process regression with uncertain or partially missing inputs. We demonstrate our method on synthetic data and standard machine learning benchmarks, as well as challenging real world datasets, including high resolution video data.
translated by 谷歌翻译
We generalize the log Gaussian Cox process (lgcp) framework to model multiple correlated point data jointly. The observations are treated as realizations of multiple lgcps, whose log intensities are given by linear combinations of latent functions drawn from Gaus-sian process priors. The combination coefficients are also drawn from Gaussian processes and can incorporate additional dependencies. We derive closed-form expressions for the moments of the intensity functions and develop an efficient variational inference algorithm that is orders of magnitude faster than competing deterministic and stochastic approximations of multivariate lgcp, core-gionalization models, and multi-task perma-nental processes. Our approach outperforms these benchmarks in multiple problems, offering the current state of the art in modeling multivariate point processes.
translated by 谷歌翻译
Structured additive regression models are perhaps the most commonly used class of models in statistical applications. It includes, among others, (generalized) linear models, (gener-alized) additive models, smoothing spline models, state space models, semiparametric regression , spatial and spatiotemporal models, log-Gaussian Cox processes and geostatistical and geoadditive models. We consider approximate Bayesian inference in a popular subset of struc-tured additive regression models, latent Gaussian models, where the latent field is Gaussian, controlled by a few hyperparameters and with non-Gaussian response variables. The posterior marginals are not available in closed form owing to the non-Gaussian response variables. For such models, Markov chain Monte Carlo methods can be implemented, but they are not without problems, in terms of both convergence and computational time. In some practical applications, the extent of these problems is such that Markov chain Monte Carlo sampling is simply not an appropriate tool for routine analysis. We show that, by using an integrated nested Laplace approximation and its simplified version, we can directly compute very accurate approximations to the posterior marginals. The main benefit of these approximations is computational: where Markov chain Monte Carlo algorithms need hours or days to run, our approximations provide more precise estimates in seconds or minutes. Another advantage with our approach is its generality , which makes it possible to perform Bayesian analysis in an automatic, streamlined way, and to compute model comparison criteria and various predictive measures so that models can be compared and the model under study can be challenged.
translated by 谷歌翻译
Gaussian processes (GPs) are flexible distributions over functions that enable high-level assumptions about unknown functions to be encoded in a parsimonious, flexible and general way. Although elegant, the application of GPs is limited by computational and analytical intractabilities that arise when data are sufficiently numerous or when employing non-Gaussian models. Consequently, a wealth of GP approximation schemes have been developed over the last 15 years to address these key limitations. Many of these schemes employ a small set of pseudo data points to summarise the actual data. In this paper we develop a new pseudo-point approximation framework using Power Expectation Propagation (Power EP) that unifies a large number of these pseudo-point approximations. Unlike much of the previous venerable work in this area, the new framework is built on standard methods for approximate inference (variational free-energy, EP and Power EP methods) rather than employing approximations to the probabilistic generative model itself. In this way all of the approximation is performed at 'inference time' rather than at 'modelling time', resolving awkward philosophical and empirical questions that trouble previous approaches. Crucially, we demonstrate that the new framework includes new pseudo-point approximation methods that outperform current approaches on regression and classification tasks.
translated by 谷歌翻译
Gaussian processes (GPs) are a good choice for function approximation as theyare flexible, robust to over-fitting, and provide well-calibrated predictiveuncertainty. Deep Gaussian processes (DGPs) are multi-layer generalisations ofGPs, but inference in these models has proved challenging. Existing approachesto inference in DGP models assume approximate posteriors that forceindependence between the layers, and do not work well in practice. We present adoubly stochastic variational inference algorithm, which does not forceindependence between layers. With our method of inference we demonstrate that aDGP model can be used effectively on data ranging in size from hundreds to abillion points. We provide strong empirical evidence that our inference schemefor DGPs works well in practice in both classification and regression.
translated by 谷歌翻译
Gaussian process classification is a popular method with a number ofappealing properties. We show how to scale the model within a variationalinducing point framework, outperforming the state of the art on benchmarkdatasets. Importantly, the variational formulation can be exploited to allowclassification in problems with millions of data points, as we demonstrate inexperiments.
translated by 谷歌翻译
我们考虑多任务回归模型,其中假设观察是几个潜在节点函数和权重函数的线性组合,它们都是从高斯过程先验得出的。在开发用于预测分布式太阳能和其他可再生能源发电的可扩展方法的问题的推动下,我们提出了在(节点或重量)过程组之间的耦合先验,以利用功能之间的空间依赖性。对多个分布式场地的太阳能预测模型和多个邻近气象站的地面风速进行了预测。我们的研究结果表明,相对于太阳能基准测试,我们的方法可以保持或提高点预测精度,并且可以改善所有测量的风力预测基准模型。与此同时,我们的方法可以更好地量化预测不确定性。
translated by 谷歌翻译
We provide a comprehensive overview of many recent algorithms for approximate inference in Gaussian process models for probabilistic binary classification. The relationships between several approaches are elucidated theoretically, and the properties of the different algorithms are corroborated by experimental results. We examine both 1) the quality of the predictive distributions and 2) the suitability of the different marginal likelihood approximations for model selection (selecting hyperparameters) and compare to a gold standard based on MCMC. Interestingly, some methods produce good predictive distributions although their marginal likelihood approximations are poor. Strong conclusions are drawn about the methods: The Expectation Propagation algorithm is almost always the method of choice unless the computational budget is very tight. We also extend existing methods in various ways, and provide unifying code implementing all approaches.
translated by 谷歌翻译
We propose a simple and effective variational inference algorithm based on stochastic optimi-sation that can be widely applied for Bayesian non-conjugate inference in continuous parameter spaces. This algorithm is based on stochastic approximation and allows for efficient use of gradient information from the model joint density. We demonstrate these properties using illustrative examples as well as in challenging and diverse Bayesian inference problems such as variable selection in logistic regression and fully Bayesian inference over kernel hyperparameters in Gaussian process regression.
translated by 谷歌翻译
We present a novel extension of multi-output Gaussian processes for handling heterogeneous outputs. We assume that each output has its own likelihood function and use a vector-valued Gaussian process prior to jointly model the parameters in all likelihoods as latent functions. Our multi-output Gaussian process uses a covariance function with a linear model of coregionalisation form. Assuming conditional independence across the underlying latent functions together with an inducing variable framework, we are able to obtain tractable variational bounds amenable to stochastic variational inference. We illustrate the performance of the model on synthetic data and two real datasets: a human behavioral study and a demographic high-dimensional dataset.
translated by 谷歌翻译
We propose a general algorithm for approximating nonstandard Bayesian posterior distributions. The algorithm minimizes the Kullback-Leibler divergence of an approximating distribution to the intractable posterior distribution. Our method can be used to approximate any posterior distribution, provided that it is given in closed form up to the proportionality constant. The approximation can be any distribution in the exponential family or any mixture of such distributions, which means that it can be made arbitrarily precise. Several examples illustrate the speed and accuracy of our approximation method in practice.
translated by 谷歌翻译
This work brings together two powerful concepts in Gaussian processes: the variational approach to sparse approximation and the spectral representation of Gaussian processes. This gives rise to an approximation that inherits the benefits of the variational approach but with the representational power and computational scalability of spectral representations. The work hinges on a key result that there exist spectral features related to a finite domain of the Gaussian process which exhibit almost-independent covariances. We derive these expressions for Matérn kernels in one dimension, and generalize to more dimensions using kernels with specific structures. Under the assumption of additive Gaussian noise, our method requires only a single pass through the dataset, making for very fast and accurate computation. We fit a model to 4 million training points in just a few minutes on a standard laptop. With non-conjugate likelihoods, our MCMC scheme reduces the cost of computation from O(N M 2) (for a sparse Gaussian process) to O(N M) per iteration, where N is the number of data and M is the number of features.
translated by 谷歌翻译
变分推理(VI)已成为拟合多种现代概率模型的首选方法。然而,从业者面临着一个碎片文学,它提供了一系列令人眼花缭乱的算法选择。首先,变异家庭。其次,更新的粒度,例如是否更新是每个数据点的本地,并采用消息传递或全局。第三,优化方法(定制或黑盒,封闭形式的随机更新等)。本文提出了一个新的框架,称为分区变分推理(PVI),明确承认VI的这些算法维度,统一不同的文献,并提供使用指导。至关重要的是,建议的PVI框架允许我们确定执行VI的新方法,这些方法非常适合挑战学习场景,包括联合学习(其中分布式计算用于处理非集中数据)和持续学习(新数据和任务随时间推移到达且必须是很快就适应了)。我们通过开发贝叶斯神经网络的通信高效联邦训练和持续学习具有私有伪点的高斯过程模型来展示这些新功能。新方法明显优于现有技术,同时实现标准VI几乎一样简单。
translated by 谷歌翻译
Gaussian process (GP) models are widely used in disease mapping as they provide a natural framework for modeling spatial correlations. Their challenges, however, lie in computational burden and memory requirements. In disease mapping models, the other difficulty is inference, which is analytically intractable due to the non-Gaussian observation model. In this paper, we address both these challenges. We show how to efficiently build fully and partially independent conditional (FIC/PIC) sparse approximations for the GP in two-dimensional surface, and how to conduct approximate inference using expectation propagation (EP) algorithm and Laplace approximation (LA). We also propose to combine FIC with a compactly supported covariance function to construct a computationally efficient additive model that can model long and short length-scale spatial correlations simultaneously. The benefit of these approximations is computational. The sparse GPs speed up the computations and reduce the memory requirements. The posterior inference via EP and Laplace approximation is much faster and is practically as accurate as via Markov chain Monte Carlo.
translated by 谷歌翻译
深度高斯过程(DGP)是GaussianProcesses的分层概括,它将良好校准的不确定性估计与多层模型的高灵活性相结合。这些模型面临的最大挑战之一是精确推断是难以处理的。当前的现有技术参考方法,变分推理(VI),采用高斯近似到后验分布。这可能是通常多模式后路的潜在差的单峰近似。在这项工作中,我们提供了后验的非高斯性质的证据,并且我们应用随机梯度哈密顿蒙特卡罗方法直接从中进行采样。为了有效地优化超参数,我们引入了移动窗口MCEM算法。与VI对应的计算成本相比,这导致了更好的预测。因此,我们的方法为DGP中的推理建立了一种新的先进技术。
translated by 谷歌翻译