我们引入了完全可扩展的高斯过程,这是一种实现方案,解决了与高维输入数据一起处理大量训练实例的问题。我们的关键思想是在诱导变量(称为子空间诱导输入)之上的表示技巧。这与基于矩阵预处理的变分分布的参数化相结合,这导致简化和数值稳定的变分下界。我们的说明性应用程序基于挑战极端多标签分类问题,以及大量类标签的额外负担。我们通过呈现预测性能以及具有极大数量的实例和输入维度的低计算时间indatase来证明我们的方法的有用性。
translated by 谷歌翻译
Deep Gaussian processes (DGPs) are multi-layer hierarchical generalisationsof Gaussian processes (GPs) and are formally equivalent to neural networks withmultiple, infinitely wide hidden layers. DGPs are nonparametric probabilisticmodels and as such are arguably more flexible, have a greater capacity togeneralise, and provide better calibrated uncertainty estimates thanalternative deep models. This paper develops a new approximate Bayesianlearning scheme that enables DGPs to be applied to a range of medium to largescale regression problems for the first time. The new method uses anapproximate Expectation Propagation procedure and a novel and efficientextension of the probabilistic backpropagation algorithm for learning. Weevaluate the new method for non-linear regression on eleven real-worlddatasets, showing that it always outperforms GP regression and is almost alwaysbetter than state-of-the-art deterministic and sampling-based approximateinference methods for Bayesian neural networks. As a by-product, this workprovides a comprehensive analysis of six approximate Bayesian methods fortraining neural networks.
translated by 谷歌翻译
我们开发了一种自动变分方法,用于推导具有高斯过程(GP)先验和一般可能性的模型。该方法支持多个输出和多个潜在函数,不需要条件似然的详细知识,只需将其评估为ablack-box函数。使用高斯混合作为变分分布,我们表明使用来自单变量高斯分布的样本可以有效地估计证据下界及其梯度。此外,该方法可扩展到大数据集,这是通过使用诱导变量使用增广先验来实现的。支持最稀疏GP近似的方法,以及并行计算和随机优化。我们在小数据集,中等规模数据集和大型数据集上定量和定性地评估我们的方法,显示其在不同似然模型和稀疏性水平下的竞争力。在涉及航空延误预测和手写数字分类的大规模实验中,我们表明我们的方法与可扩展的GP回归和分类的最先进的硬编码方法相同。
translated by 谷歌翻译
The Gaussian process latent variable model (GP-LVM) provides a flexible approach for non-linear dimensionality reduction that has been widely applied. However, the current approach for training GP-LVMs is based on maximum likelihood, where the latent projection variables are maximised over rather than integrated out. In this paper we present a Bayesian method for training GP-LVMs by introducing a non-standard variational inference framework that allows to approximately integrate out the latent variables and subsequently train a GP-LVM by maximising an analytic lower bound on the exact marginal likelihood. We apply this method for learning a GP-LVM from i.i.d. observations and for learning non-linear dynamical systems where the observations are temporally correlated. We show that a benefit of the variational Bayesian procedure is its robustness to overfitting and its ability to automatically select the dimensionality of the non-linear latent space. The resulting framework is generic, flexible and easy to extend for other purposes, such as Gaussian process regression with uncertain or partially missing inputs. We demonstrate our method on synthetic data and standard machine learning benchmarks, as well as challenging real world datasets, including high resolution video data.
translated by 谷歌翻译
Gaussian processes (GPs) are a good choice for function approximation as theyare flexible, robust to over-fitting, and provide well-calibrated predictiveuncertainty. Deep Gaussian processes (DGPs) are multi-layer generalisations ofGPs, but inference in these models has proved challenging. Existing approachesto inference in DGP models assume approximate posteriors that forceindependence between the layers, and do not work well in practice. We present adoubly stochastic variational inference algorithm, which does not forceindependence between layers. With our method of inference we demonstrate that aDGP model can be used effectively on data ranging in size from hundreds to abillion points. We provide strong empirical evidence that our inference schemefor DGPs works well in practice in both classification and regression.
translated by 谷歌翻译
We introduce a novel variational method that allows to approximately integrate out kernel hyperparameters, such as length-scales, in Gaussian process regression. This approach consists of a novel variant of the variational framework that has been recently developed for the Gaussian process latent variable model which additionally makes use of a standardised representation of the Gaussian process. We consider this technique for learning Mahalanobis distance metrics in a Gaussian process regression setting and provide experimental evaluations and comparisons with existing methods by considering datasets with high-dimensional inputs.
translated by 谷歌翻译
We propose a sparse method for scalable automated variational inference (AVI) in a large class of models with Gaussian process (GP) priors, multiple latent functions , multiple outputs and non-linear likelihoods. Our approach maintains the statistical efficiency property of the original AVI method, requiring only expectations over univariate Gaussian distributions to approximate the posterior with a mixture of Gaussians. Experiments on small datasets for various problems including regression, classification, Log Gaussian Cox processes, and warped GPs show that our method can perform as well as the full method under high sparsity levels. On larger experiments using the MNIST and the SARCOS datasets we show that our method can provide superior performance to previously published scalable approaches that have been handcrafted to specific likelihood models.
translated by 谷歌翻译
深度学习是图像分类大幅改进的基础。为了提高预测的稳健性,贝叶斯近似已被用于学习深度神经网络中的参数。我们采用另一种方法,通过使用高斯过程作为贝叶斯深度学习模型的构建模块,由于卷积和深层结构的推断,这种模型最近变得可行。我们研究了深度卷积高斯过程,并确定了一个保持逆流性能的问题。为了解决这个问题,我们引入了一个转换敏感卷积内核,它消除了对相同补丁输入的要求相同输出的限制。我们凭经验证明,这种卷积核可以改善浅层和深层模型的性能。在ONMNIST,FASHION-MNIST和CIFAR-10上,我们在准确性方面改进了以前的GP模型,增加了更简单的DNN模型的校准预测概率。
translated by 谷歌翻译
我们为连续学习领域引入了一个概念上简单且可扩展的框架,其中任务是按顺序学习的。我们的方法在参数数量上是恒定的,旨在保持以前遇到的任务的性能,同时加速后续问题的学习进度。这是通过训练具有两个组件的网络来实现的:能够解决先前遇到的问题的知识库,其连接到用于有效地学习当前任务的活动列。在学习新任务后,活动列被提炼到知识库中,注意保护以前获得的任何技能。这种主动学习(进展)循环然后进行整合(压缩)不需要架构增长,不需要访问或存储先前的数据或其他任何特定的参数。我们展示了手写字母顺序分类以及双向强化学习领域的进展和压缩方法:Atari游戏和3D迷宫导航。
translated by 谷歌翻译
We investigate the capabilities and limitations of Gaussian process models by jointly exploring three complementary directions: (i) scalable and statistically efficient inference; (ii) flexible kernels; and (iii) objective functions for hyperparameter learning alternative to the marginal likelihood. Our approach outperforms all previously reported gp methods on the standard mnist dataset; performs comparatively to previous kernel-based methods using the rectangles-image dataset; and breaks the 1% error-rate barrier in gp models using the mnist8m dataset, showing along the way the scalability of our method at unprecedented scale for gp models (8 million observations) in classification problems. Overall, our approach represents a significant breakthrough in kernel methods and gp models, bridging the gap between deep learning approaches and kernel machines.
translated by 谷歌翻译
Gaussian process classification is a popular method with a number ofappealing properties. We show how to scale the model within a variationalinducing point framework, outperforming the state of the art on benchmarkdatasets. Importantly, the variational formulation can be exploited to allowclassification in problems with millions of data points, as we demonstrate inexperiments.
translated by 谷歌翻译
Gaussian process (GP) models form a core part of probabilistic machinelearning. Considerable research effort has been made into attacking threeissues with GP models: how to compute efficiently when the number of data islarge; how to approximate the posterior when the likelihood is not Gaussian andhow to estimate covariance function parameter posteriors. This papersimultaneously addresses these, using a variational approximation to theposterior which is sparse in support of the function but otherwise free-form.The result is a Hybrid Monte-Carlo sampling scheme which allows for anon-Gaussian approximation over the function values and covariance parameterssimultaneously, with efficient computations based on inducing-point sparse GPs.Code to replicate each experiment in this paper will be available shortly.
translated by 谷歌翻译
The experiments used in current continual learning research do not faithfullyassess fundamental challenges of learning continually. We examine standardevaluations and show why these evaluations make some types of continuallearning approaches look better than they are. In particular, currentevaluations are biased towards continual learning approaches that treatprevious models as a prior (e.g., EWC, VCL). We introduce desiderata forcontinual learning evaluations and explain why their absence creates misleadingcomparisons. Our analysis calls for a reprioritization of research effort bythe community.
translated by 谷歌翻译
We introduce a variational Bayesian neural network where the parameters aregoverned via a probability distribution on random matrices. Specifically, weemploy a matrix variate Gaussian \cite{gupta1999matrix} parameter posteriordistribution where we explicitly model the covariance among the input andoutput dimensions of each layer. Furthermore, with approximate covariancematrices we can achieve a more efficient way to represent those correlationsthat is also cheaper than fully factorized parameter posteriors. We furthershow that with the "local reprarametrization trick"\cite{kingma2015variational} on this posterior distribution we arrive at aGaussian Process \cite{rasmussen2006gaussian} interpretation of the hiddenunits in each layer and we, similarly with \cite{gal2015dropout}, provideconnections with deep Gaussian processes. We continue in taking advantage ofthis duality and incorporate "pseudo-data" \cite{snelson2005sparse} in ourmodel, which in turn allows for more efficient sampling while maintaining theproperties of the original model. The validity of the proposed approach isverified through extensive experiments.
translated by 谷歌翻译
We develop a scalable deep non-parametric generative model by augmenting deep Gaussian processes with a recognition model. Inference is performed in a novel scalable variational framework where the variational posterior distributions are reparametrized through a multilayer perceptron. The key aspect of this reformula-tion is that it prevents the proliferation of variational parameters which otherwise grow linearly in proportion to the sample size. We derive a new formulation of the variational lower bound that allows us to distribute most of the computation in a way that enables to handle datasets of the size of mainstream deep learning tasks. We show the efficacy of the method on a variety of challenges including deep unsupervised learning and deep Bayesian optimization.
translated by 谷歌翻译
We propose a simple and effective variational inference algorithm based on stochastic optimi-sation that can be widely applied for Bayesian non-conjugate inference in continuous parameter spaces. This algorithm is based on stochastic approximation and allows for efficient use of gradient information from the model joint density. We demonstrate these properties using illustrative examples as well as in challenging and diverse Bayesian inference problems such as variable selection in logistic regression and fully Bayesian inference over kernel hyperparameters in Gaussian process regression.
translated by 谷歌翻译
We present a practical way of introducing convolutional structure intoGaussian processes, making them more suited to high-dimensional inputs likeimages. The main contribution of our work is the construction of aninter-domain inducing point approximation that is well-tailored to theconvolutional kernel. This allows us to gain the generalisation benefit of aconvolutional kernel, together with fast but accurate posterior inference. Weinvestigate several variations of the convolutional kernel, and apply it toMNIST and CIFAR-10, which have both been known to be challenging for Gaussianprocesses. We also show how the marginal likelihood can be used to find anoptimal weighting between convolutional and RBF kernels to further improveperformance. We hope that this illustration of the usefulness of a marginallikelihood will help automate discovering architectures in larger models.
translated by 谷歌翻译
Deep kernel learning combines the non-parametric flexibility of kernelmethods with the inductive biases of deep learning architectures. We propose anovel deep kernel learning model and stochastic variational inference procedurewhich generalizes deep kernel learning approaches to enable classification,multi-task learning, additive covariance structures, and stochastic gradienttraining. Specifically, we apply additive base kernels to subsets of outputfeatures from deep neural architectures, and jointly learn the parameters ofthe base kernels and deep network through a Gaussian process marginallikelihood objective. Within this framework, we derive an efficient form ofstochastic variational inference which leverages local kernel interpolation,inducing points, and structure exploiting algebra. We show improved performanceover stand alone deep networks, SVMs, and state of the art scalable Gaussianprocesses on several classification benchmarks, including an airline delaydataset containing 6 million training points, CIFAR, and ImageNet.
translated by 谷歌翻译
深度高斯过程(DGP)是GaussianProcesses的分层概括,它将良好校准的不确定性估计与多层模型的高灵活性相结合。这些模型面临的最大挑战之一是精确推断是难以处理的。当前的现有技术参考方法,变分推理(VI),采用高斯近似到后验分布。这可能是通常多模式后路的潜在差的单峰近似。在这项工作中,我们提供了后验的非高斯性质的证据,并且我们应用随机梯度哈密顿蒙特卡罗方法直接从中进行采样。为了有效地优化超参数,我们引入了移动窗口MCEM算法。与VI对应的计算成本相比,这导致了更好的预测。因此,我们的方法为DGP中的推理建立了一种新的先进技术。
translated by 谷歌翻译
Gaussian processes (GPs) are flexible distributions over functions that enable high-level assumptions about unknown functions to be encoded in a parsimonious, flexible and general way. Although elegant, the application of GPs is limited by computational and analytical intractabilities that arise when data are sufficiently numerous or when employing non-Gaussian models. Consequently, a wealth of GP approximation schemes have been developed over the last 15 years to address these key limitations. Many of these schemes employ a small set of pseudo data points to summarise the actual data. In this paper we develop a new pseudo-point approximation framework using Power Expectation Propagation (Power EP) that unifies a large number of these pseudo-point approximations. Unlike much of the previous venerable work in this area, the new framework is built on standard methods for approximate inference (variational free-energy, EP and Power EP methods) rather than employing approximations to the probabilistic generative model itself. In this way all of the approximation is performed at 'inference time' rather than at 'modelling time', resolving awkward philosophical and empirical questions that trouble previous approaches. Crucially, we demonstrate that the new framework includes new pseudo-point approximation methods that outperform current approaches on regression and classification tasks.
translated by 谷歌翻译