本文提出了一种降阶高斯过程回归的新方案。该方法基于协方差函数的近似级数展开,就拉普拉斯算子在$ \ mathbb {R} ^ d $的紧凑子集中的本征函数展开而言。所述协方差函数的这个近似eigenbasisthe特征值可被表示为简单functionsof高斯过程的谱密度,这使得GP inferenceto计算成本缩放为$ \ mathcal {ö}下得到解决(纳米^ 2)$(初始)和$ \ mathcal {O}(m ^ 3)$(超参数学习),带有$ m $基本函数和$ n $数据点。该方法还允许使用希尔伯特空间理论进行严格的误差分析,并且我们表明当紧致子集的大小和本征函数的数量转向无穷大时,近似变得紧密。扩展推广到具有内积的希尔伯特空间,内积被定义为在指定输入密度上的积分。该方法与先前提出的方法在理论上和通过经验测试与模拟和实际数据相比较。
translated by 谷歌翻译
In this paper we introduce a novel framework for making exact nonparametric Bayesian inference on latent functions that is particularly suitable for Big Data tasks. Firstly, we introduce a class of stochastic processes we refer to as string Gaussian processes (string GPs which are not to be mistaken for Gaussian processes operating on text). We construct string GPs so that their finite-dimensional marginals exhibit suitable local conditional independence structures, which allow for scalable, distributed, and flexible nonparametric Bayesian inference, without resorting to approximations , and while ensuring some mild global regularity constraints. Furthermore, string GP priors naturally cope with heterogeneous input data, and the gradient of the learned latent function is readily available for explanatory analysis. Secondly, we provide some theoretical results relating our approach to the standard GP paradigm. In particular, we prove that some string GPs are Gaussian processes, which provides a complementary global perspective on our framework. Finally, we derive a scalable and distributed MCMC scheme for supervised learning tasks under string GP priors. The proposed MCMC scheme has computational time complexity O(N) and memory requirement O(dN), where N is the data size and d the dimension of the input space. We illustrate the efficacy of the proposed approach on several synthetic and real-world data sets, including a data set with 6 millions input points and 8 attributes.
translated by 谷歌翻译
Gaussian processes (GPs) are natural generalisations of multivariate Gaussian random variables to infinite (countably or continuous) index sets. GPs have been applied in a large number of fields to a diverse range of ends, and very many deep theoretical analyses of various properties are available. This paper gives an introduction to Gaussian processes on a fairly elementary level with special emphasis on characteristics relevant in machine learning. It draws explicit connections to branches such as spline smoothing models and support vector machines in which similar ideas have been investigated. Gaussian process models are routinely used to solve hard machine learning problems. They are attractive because of their flexible non-parametric nature and computational simplicity. Treated within a Bayesian framework, very powerful statistical methods can be implemented which offer valid estimates of uncertainties in our predictions and generic model selection procedures cast as nonlinear optimization problems. Their main drawback of heavy computational scaling has recently been alleviated by the introduction of generic sparse approximations [13, 78, 31]. The mathematical literature on GPs is large and often uses deep concepts which are not required to fully understand most machine learning applications. In this tutorial paper, we aim to present characteristics of GPs relevant to machine learning and to show up precise connections to other "kernel machines" popular in the community. Our focus is on a simple presentation, but references to more detailed sources are provided.
translated by 谷歌翻译
Structured additive regression models are perhaps the most commonly used class of models in statistical applications. It includes, among others, (generalized) linear models, (gener-alized) additive models, smoothing spline models, state space models, semiparametric regression , spatial and spatiotemporal models, log-Gaussian Cox processes and geostatistical and geoadditive models. We consider approximate Bayesian inference in a popular subset of struc-tured additive regression models, latent Gaussian models, where the latent field is Gaussian, controlled by a few hyperparameters and with non-Gaussian response variables. The posterior marginals are not available in closed form owing to the non-Gaussian response variables. For such models, Markov chain Monte Carlo methods can be implemented, but they are not without problems, in terms of both convergence and computational time. In some practical applications, the extent of these problems is such that Markov chain Monte Carlo sampling is simply not an appropriate tool for routine analysis. We show that, by using an integrated nested Laplace approximation and its simplified version, we can directly compute very accurate approximations to the posterior marginals. The main benefit of these approximations is computational: where Markov chain Monte Carlo algorithms need hours or days to run, our approximations provide more precise estimates in seconds or minutes. Another advantage with our approach is its generality , which makes it possible to perform Bayesian analysis in an automatic, streamlined way, and to compute model comparison criteria and various predictive measures so that models can be compared and the model under study can be challenged.
translated by 谷歌翻译
We present a new sparse Gaussian Process (GP) model for regression. The key novel idea is to sparsify the spectral representation of the GP. This leads to a simple, practical algorithm for regression tasks. We compare the achievable trade-offs between predictive accuracy and computational requirements, and show that these are typically superior to existing state-of-the-art sparse approximations. We discuss both the weight space and function space representations, and note that the new construction implies priors over functions which are always stationary, and can approximate any covariance function in this class.
translated by 谷歌翻译
The Gaussian process latent variable model (GP-LVM) provides a flexible approach for non-linear dimensionality reduction that has been widely applied. However, the current approach for training GP-LVMs is based on maximum likelihood, where the latent projection variables are maximised over rather than integrated out. In this paper we present a Bayesian method for training GP-LVMs by introducing a non-standard variational inference framework that allows to approximately integrate out the latent variables and subsequently train a GP-LVM by maximising an analytic lower bound on the exact marginal likelihood. We apply this method for learning a GP-LVM from i.i.d. observations and for learning non-linear dynamical systems where the observations are temporally correlated. We show that a benefit of the variational Bayesian procedure is its robustness to overfitting and its ability to automatically select the dimensionality of the non-linear latent space. The resulting framework is generic, flexible and easy to extend for other purposes, such as Gaussian process regression with uncertain or partially missing inputs. We demonstrate our method on synthetic data and standard machine learning benchmarks, as well as challenging real world datasets, including high resolution video data.
translated by 谷歌翻译
高斯过程(GP)为回归和分类中的外推,插值和噪声消除提供了强大的框架。本文考虑将GP限制为具有边界条件的任意形状域。我们在感兴趣的域中解决了GP之前的类似傅立叶的广义谐波特征表示,其既限制GP又获得用于加速推断的低秩表示。方法在预测中按$ \ mathcal {O}(nm ^ 2)$进行缩放,并且$ \ mathcal {O}(m ^ 3)$ inhyperparameter学习回归,其中$ n $是数据点的数量,$ m $是数字的功能。此外,我们利用变化的方法来允许该方法处理非高斯似然。这些实验涵盖了模拟和经验数据,其中边界条件允许包含额外的物理信息。
translated by 谷歌翻译
我们开发了一种自动变分方法,用于推导具有高斯过程(GP)先验和一般可能性的模型。该方法支持多个输出和多个潜在函数,不需要条件似然的详细知识,只需将其评估为ablack-box函数。使用高斯混合作为变分分布,我们表明使用来自单变量高斯分布的样本可以有效地估计证据下界及其梯度。此外,该方法可扩展到大数据集,这是通过使用诱导变量使用增广先验来实现的。支持最稀疏GP近似的方法,以及并行计算和随机优化。我们在小数据集,中等规模数据集和大型数据集上定量和定性地评估我们的方法,显示其在不同似然模型和稀疏性水平下的竞争力。在涉及航空延误预测和手写数字分类的大规模实验中,我们表明我们的方法与可扩展的GP回归和分类的最先进的硬编码方法相同。
translated by 谷歌翻译
The use of covariance kernels is ubiquitous in the field of spatial statistics. Kernels allow data to be mapped into high-dimensional feature spaces and can thus extend simple linear additive methods to nonlinear methods with higher order interactions. However, until recently, there has been a strong reliance on a limited class of stationary kernels such as the Matérn or squared exponential, limiting the expressiveness of these modelling approaches. Recent machine learning research has focused on spectral representations to model arbitrary stationary kernels and introduced more general representations that include classes of nonstationary kernels. In this paper, we exploit the connections between Fourier feature representations , Gaussian processes and neural networks to generalise previous approaches and develop a simple and efficient framework to learn arbitrarily complex nonstationary kernel functions directly from the data, while taking care to avoid overfitting using state-of-the-art methods from deep learning. We highlight the very broad array of kernel classes that could be created within this framework. We apply this to a time series dataset and a remote sensing problem involving land surface temperature in Eastern Africa. We show that without increasing the computational or storage complexity, nonstationary kernels can be used to improve generalisation performance and provide more interpretable results.
translated by 谷歌翻译
We present a practical way of introducing convolutional structure intoGaussian processes, making them more suited to high-dimensional inputs likeimages. The main contribution of our work is the construction of aninter-domain inducing point approximation that is well-tailored to theconvolutional kernel. This allows us to gain the generalisation benefit of aconvolutional kernel, together with fast but accurate posterior inference. Weinvestigate several variations of the convolutional kernel, and apply it toMNIST and CIFAR-10, which have both been known to be challenging for Gaussianprocesses. We also show how the marginal likelihood can be used to find anoptimal weighting between convolutional and RBF kernels to further improveperformance. We hope that this illustration of the usefulness of a marginallikelihood will help automate discovering architectures in larger models.
translated by 谷歌翻译
Standard sparse pseudo-input approximations to the Gaussian process (GP) cannot handle complex functions well. Sparse spectrum alternatives attempt to answer this but are known to over-fit. We suggest the use of variational inference for the sparse spectrum approximation to avoid both issues. We model the covariance function with a finite Fourier series approximation and treat it as a random variable. The random covariance function has a posterior, on which a variational distribution is placed. The variational distribution transforms the random covariance function to fit the data. We study the properties of our approximate inference, compare it to alternative ones, and extend it to the distributed and stochas-tic domains. Our approximation captures complex functions better than standard approaches and avoids over-fitting.
translated by 谷歌翻译
Gaussian process classification is a popular method with a number ofappealing properties. We show how to scale the model within a variationalinducing point framework, outperforming the state of the art on benchmarkdatasets. Importantly, the variational formulation can be exploited to allowclassification in problems with millions of data points, as we demonstrate inexperiments.
translated by 谷歌翻译
Gaussian process (GP) models form a core part of probabilistic machinelearning. Considerable research effort has been made into attacking threeissues with GP models: how to compute efficiently when the number of data islarge; how to approximate the posterior when the likelihood is not Gaussian andhow to estimate covariance function parameter posteriors. This papersimultaneously addresses these, using a variational approximation to theposterior which is sparse in support of the function but otherwise free-form.The result is a Hybrid Monte-Carlo sampling scheme which allows for anon-Gaussian approximation over the function values and covariance parameterssimultaneously, with efficient computations based on inducing-point sparse GPs.Code to replicate each experiment in this paper will be available shortly.
translated by 谷歌翻译
We propose a general algorithm for approximating nonstandard Bayesian posterior distributions. The algorithm minimizes the Kullback-Leibler divergence of an approximating distribution to the intractable posterior distribution. Our method can be used to approximate any posterior distribution, provided that it is given in closed form up to the proportionality constant. The approximation can be any distribution in the exponential family or any mixture of such distributions, which means that it can be made arbitrarily precise. Several examples illustrate the speed and accuracy of our approximation method in practice.
translated by 谷歌翻译
Gaussian processes (GPs) are flexible distributions over functions that enable high-level assumptions about unknown functions to be encoded in a parsimonious, flexible and general way. Although elegant, the application of GPs is limited by computational and analytical intractabilities that arise when data are sufficiently numerous or when employing non-Gaussian models. Consequently, a wealth of GP approximation schemes have been developed over the last 15 years to address these key limitations. Many of these schemes employ a small set of pseudo data points to summarise the actual data. In this paper we develop a new pseudo-point approximation framework using Power Expectation Propagation (Power EP) that unifies a large number of these pseudo-point approximations. Unlike much of the previous venerable work in this area, the new framework is built on standard methods for approximate inference (variational free-energy, EP and Power EP methods) rather than employing approximations to the probabilistic generative model itself. In this way all of the approximation is performed at 'inference time' rather than at 'modelling time', resolving awkward philosophical and empirical questions that trouble previous approaches. Crucially, we demonstrate that the new framework includes new pseudo-point approximation methods that outperform current approaches on regression and classification tasks.
translated by 谷歌翻译
Gaussian process (GP) models are widely used in disease mapping as they provide a natural framework for modeling spatial correlations. Their challenges, however, lie in computational burden and memory requirements. In disease mapping models, the other difficulty is inference, which is analytically intractable due to the non-Gaussian observation model. In this paper, we address both these challenges. We show how to efficiently build fully and partially independent conditional (FIC/PIC) sparse approximations for the GP in two-dimensional surface, and how to conduct approximate inference using expectation propagation (EP) algorithm and Laplace approximation (LA). We also propose to combine FIC with a compactly supported covariance function to construct a computationally efficient additive model that can model long and short length-scale spatial correlations simultaneously. The benefit of these approximations is computational. The sparse GPs speed up the computations and reduce the memory requirements. The posterior inference via EP and Laplace approximation is much faster and is practically as accurate as via Markov chain Monte Carlo.
translated by 谷歌翻译
We propose a novel approach for nonlinear regression using a two-layer neural network (NN) model structure with sparsity-favoring hierarchical priors on the network weights. We present an expectation propagation (EP) approach for approximate integration over the posterior distribution of the weights, the hierarchical scale parameters of the priors, and the residual scale. Using a factorized posterior approximation we derive a computation-ally efficient algorithm, whose complexity scales similarly to an ensemble of independent sparse linear models. The approach enables flexible definition of weight priors with different sparseness properties such as independent Laplace priors with a common scale parameter or Gaussian automatic relevance determination (ARD) priors with different relevance parameters for all inputs. The approach can be extended beyond standard activation functions and NN model structures to form flexible nonlinear predictors from multiple sparse linear models. The effects of the hierarchical priors and the predictive performance of the algorithm are assessed using both simulated and real-world data. Comparisons are made to two alternative models with ARD priors: a Gaussian process with a NN covariance function and marginal maximum a posteriori estimates of the relevance parameters, and a NN with Markov chain Monte Carlo integration over all the unknown model parameters.
translated by 谷歌翻译
The growing field of large-scale time domain astronomy requires methods for probabilistic data analysis that are computationally tractable, even with large datasets. Gaussian Processes are a popular class of models used for this purpose but, since the computational cost scales, in general, as the cube of the number of data points, their application has been limited to small datasets. In this paper, we present a novel method for Gaussian Process modeling in one-dimension where the computational requirements scale linearly with the size of the dataset. We demonstrate the method by applying it to simulated and real astronomical time series datasets. These demonstrations are examples of probabilistic inference of stellar rotation periods, asteroseismic oscillation spectra, and transiting planet parameters. The method exploits structure in the problem when the covariance function is expressed as a mixture of complex exponentials, without requiring evenly spaced observations or uniform noise. This form of covariance arises naturally when the process is a mixture of stochastically-driven damped harmonic oscillators-providing a physical motivation for and interpretation of this choice-but we also demonstrate that it can be a useful effective model in some other cases. We present a mathematical description of the method and compare it to existing scalable Gaussian Process methods. The method is fast and interpretable, with a range of potential applications within astronomical data analysis and beyond. We provide well-tested and documented open-source implementations of this method in C++, Python, and Julia.
translated by 谷歌翻译
内核方法是一种非常流行的技术,通过映射到隐式的高维特征空间将线性模型扩展到非线性问题。虽然内核方法在计算上比显式特征映射便宜,但它们仍然受到点数的立方成本的影响。仅给出几千个位置,这种计算成本迅速超过当前可用的计算能力。本文的目的是在进行随机傅里叶特征(RFF)审查之前,从第一主体(重点是onridge回归)提供内核方法的概述,这是一组能够将内核方法扩展到bigdatasets的方法。在每个阶段,提供相关的R代码。我们首先说明岭回归的双重表示如何仅依赖于内部产品,并允许使用内核将数据映射到高维空间。我们进展到RFF,展示了如何只有几行代码提供显着的计算速度,以实现可忽略不计的成本。我们提供了在模拟空间数据集上实现RFF的示例,以说明这些属性。最后,我们总结了RFF的主要问题,并重点介绍了一些旨在减轻它们的先进技术。
translated by 谷歌翻译
We provide a new unifying view, including all existing proper probabilistic sparse approximations for Gaussian process regression. Our approach relies on expressing the effective prior which the methods are using. This allows new insights to be gained, and highlights the relationship between existing methods. It also allows for a clear theoretically justified ranking of the closeness of the known approximations to the corresponding full GPs. Finally we point directly to designs of new better sparse approximations, combining the best of the existing strategies, within attractive computational constraints. Regression models based on Gaussian processes (GPs) are simple to implement, flexible, fully probabilistic models, and thus a powerful tool in many areas of application. Their main limitation is that memory requirements and computational demands grow as the square and cube respectively, of the number of training cases n, effectively limiting a direct implementation to problems with at most a few thousand cases. To overcome the computational limitations numerous authors have recently suggested a wealth of sparse approximations. Common to all these approximation schemes is that only a subset of the latent variables are treated exactly, and the remaining variables are given some approximate, but computationally cheaper treatment. However, the published algorithms have widely different motivations, emphasis and exposition, so it is difficult to get an overview (see Rasmussen and Williams, 2006, chapter 8) of how they relate to each other, and which can be expected to give rise to the best algorithms. In this paper we provide a unifying view of sparse approximations for GP regression. Our approach is simple, but powerful: for each algorithm we analyze the posterior, and compute the effective prior which it is using. Thus, we reinterpret the algorithms as "exact inference with an approximated prior", rather than the existing (ubiquitous) interpretation "approximate inference with the exact prior". This approach has the advantage of directly expressing the approximations in terms of prior assumptions about the function, which makes the consequences of the approximations much easier to understand. While our view of the approximations is not the only one possible, it has the advantage of putting all existing probabilistic sparse approximations under one umbrella, thus enabling direct comparison and revealing the relation between them. In Section 1 we briefly introduce GP models for regression. In Section 2 we present our unifying framework and write out the key equations in preparation for the unifying analysis of sparse c 2005 Joaquin Quiñonero-Candela and Carl Edward Rasmussen.
translated by 谷歌翻译