条件密度估计(CDE)模型处理估计条件分布。对分配施加的条件是模型的输入。 CDE是一项具有挑战性的任务,因为模型复杂性,代表性能力和过度拟合之间存在根本的权衡。在这项工作中,我们建议用潜在变量扩展模型的输入,并使用高斯过程(GP)将这个增强的输入映射到条件分布的样本上。我们的贝叶斯方法允许对小数据集进行建模,但我们也提供了使用随机变分推理将其应用于大数据的机制。我们的方法可用于在稀疏数据区域中对densitieseven进行建模,并允许在条件之间共享学习结构。我们说明了我们的模型在各种现实问题上的有效性和广泛适用性,例如出租车下降的时空密度估计,非高斯噪声建模,以及对全方位图像的少量学习。
translated by 谷歌翻译
Gaussian processes (GPs) are a good choice for function approximation as theyare flexible, robust to over-fitting, and provide well-calibrated predictiveuncertainty. Deep Gaussian processes (DGPs) are multi-layer generalisations ofGPs, but inference in these models has proved challenging. Existing approachesto inference in DGP models assume approximate posteriors that forceindependence between the layers, and do not work well in practice. We present adoubly stochastic variational inference algorithm, which does not forceindependence between layers. With our method of inference we demonstrate that aDGP model can be used effectively on data ranging in size from hundreds to abillion points. We provide strong empirical evidence that our inference schemefor DGPs works well in practice in both classification and regression.
translated by 谷歌翻译
Variational inference is a powerful tool for approximate inference, and it has been recently applied for representation learning with deep generative models. We develop the variational Gaussian process (VGP), a Bayesian nonparametric varia-tional family, which adapts its shape to match complex posterior distributions. The VGP generates approximate posterior samples by generating latent inputs and warping them through random non-linear mappings; the distribution over random mappings is learned during inference, enabling the transformed outputs to adapt to varying complexity. We prove a universal approximation theorem for the VGP, demonstrating its representative power for learning any model. For inference we present a variational objective inspired by auto-encoders and perform black box inference over a wide class of models. The VGP achieves new state-of-the-art results for unsupervised learning, inferring models such as the deep latent Gaussian model and the recently proposed DRAW.
translated by 谷歌翻译
The Gaussian process latent variable model (GP-LVM) provides a flexible approach for non-linear dimensionality reduction that has been widely applied. However, the current approach for training GP-LVMs is based on maximum likelihood, where the latent projection variables are maximised over rather than integrated out. In this paper we present a Bayesian method for training GP-LVMs by introducing a non-standard variational inference framework that allows to approximately integrate out the latent variables and subsequently train a GP-LVM by maximising an analytic lower bound on the exact marginal likelihood. We apply this method for learning a GP-LVM from i.i.d. observations and for learning non-linear dynamical systems where the observations are temporally correlated. We show that a benefit of the variational Bayesian procedure is its robustness to overfitting and its ability to automatically select the dimensionality of the non-linear latent space. The resulting framework is generic, flexible and easy to extend for other purposes, such as Gaussian process regression with uncertain or partially missing inputs. We demonstrate our method on synthetic data and standard machine learning benchmarks, as well as challenging real world datasets, including high resolution video data.
translated by 谷歌翻译
许多现代无监督或半监督机器学习算法依赖于贝叶斯概率模型。这些模型通常难以处理,因此需要进行近似推断。变分推理(VI)通过解决优化问题,使我们可以通过更简单的变分分布来近似高维贝叶斯后验。这种方法已成功应用于各种模型和大规模应用。在这篇综述中,我们对变分推断的最新趋势进行了概述。我们首先介绍standardmean字段变分推理,然后回顾以下方面的最新进展:(a)可扩展的VI,包括随机近似,(b)通用VI,它将VI的适用性扩展到一大类其他难以处理的模型,如非共轭模型,(c)准确的VI,其中包括超出平均场近似或非典型差异的变分模型,以及(d)摊销的VI,它利用推理网络实现推理超局部潜在变量。最后,我们提供了有希望的未来研究方向的摘要。
translated by 谷歌翻译
We introduce a variational inference framework for training the Gaussian process latent variable model and thus performing Bayesian nonlinear dimensionality reduction. This method allows us to variationally integrate out the input variables of the Gaussian process and compute a lower bound on the exact marginal likelihood of the nonlinear latent variable model. The maxi-mization of the variational lower bound provides a Bayesian training procedure that is robust to overfitting and can automatically select the di-mensionality of the nonlinear latent space. We demonstrate our method on real world datasets. The focus in this paper is on dimensionality reduction problems, but the methodology is more general. For example, our algorithm is immediately applicable for training Gaussian process models in the presence of missing or uncertain inputs.
translated by 谷歌翻译
Deep Gaussian processes (DGPs) are multi-layer hierarchical generalisationsof Gaussian processes (GPs) and are formally equivalent to neural networks withmultiple, infinitely wide hidden layers. DGPs are nonparametric probabilisticmodels and as such are arguably more flexible, have a greater capacity togeneralise, and provide better calibrated uncertainty estimates thanalternative deep models. This paper develops a new approximate Bayesianlearning scheme that enables DGPs to be applied to a range of medium to largescale regression problems for the first time. The new method uses anapproximate Expectation Propagation procedure and a novel and efficientextension of the probabilistic backpropagation algorithm for learning. Weevaluate the new method for non-linear regression on eleven real-worlddatasets, showing that it always outperforms GP regression and is almost alwaysbetter than state-of-the-art deterministic and sampling-based approximateinference methods for Bayesian neural networks. As a by-product, this workprovides a comprehensive analysis of six approximate Bayesian methods fortraining neural networks.
translated by 谷歌翻译
在本文中,我们介绍了深度高斯过程(GP)模型。深度GP是基于高斯过程映射的非常信任的网络。数据被建模为多变量GP的输出。该高斯过程的输入由另一个GP进行转换。单层模型等同于标准GP或GP潜变量模型(GP-LVM)。我们通过近似变分边缘化在模型中进行推理。这导致我们用于模型选择的模型的边际可能性的严格下限(每层的层数和节点数)。深度置信网络通常应用于使用随机梯度下降进行优化的相对大的数据集。即使在数据稀缺的情况下,我们的完全贝叶斯处理也允许应用深度模型。通过我们的变分界限进行模型选择表明即使在仅包含150个示例的数字数据集建模时,五层层次结构也是合理的。
translated by 谷歌翻译
高斯过程(GP)为功能的推理提供了强大的非参数框架。尽管有吸引人的理论,但它的超线性计算和记忆复杂性已经提出了长期的挑战。最先进的稀疏变分推理方法将建模精度与复杂性进行交易。然而,这些方法的复杂性仍然在基础函数的数量上仍然是超线性的,这意味着稀疏的GP方法只有在使用小型模型时才能从大型数据集中学习。最近,提出了一种解耦方法,消除了对GP的均值和协方差函数建模的复杂性之间的不必要的耦合。它实现了平均参数数量的线性复杂性,因此可以建模表达后验均值函数。虽然有希望,但这种方法由于调节和非凸性而受到优化困难的影响。在这项工作中,我们提出了一种替代的解耦参数化。它采用平均函数的正交基来模拟标准耦合方法无法学习的残差。因此,我们的方法扩展而不是取代耦合方法,以达到严格更好的性能。这种结构允许直接自然的梯度更新规则,因此可以利用在解耦期间丢失的信息流的结构来加速学习。根据经验,我们的算法在多个实验中表现出明显更快的收敛。
translated by 谷歌翻译
The natural gradient method has been used effectively in conjugate Gaussian process models , but the non-conjugate case has been largely unexplored. We examine how natural gradients can be used in non-conjugate stochastic settings, together with hyperpa-rameter learning. We conclude that the natural gradient can significantly improve performance in terms of wall-clock time. For ill-conditioned posteriors the benefit of the natural gradient method is especially pronounced, and we demonstrate a practical setting where ordinary gradients are unusable. We show how natural gradients can be computed efficiently and automatically in any parameteri-zation, using automatic differentiation. Our code is integrated into the GPflow package.
translated by 谷歌翻译
使用高斯过程的贝叶斯优化是处理昂贵的黑盒功能优化的流行方法。然而,由于经典GaussianProcesses的协方差矩阵的平稳性的先验,该方法可能不适用于优化问题中涉及的非平稳函数。为了克服这个问题,提出了一种新的贝叶斯优化方法。它基于深度高斯过程的assurrogate模型而不是经典的高斯过程。该建模技术通过简单地考虑静态高斯过程的功能组合来提高表示的能力以捕获非平稳性,从而提供多层结构。本文提出了一种新的全局优化算法,通过耦合深度高斯过程和贝叶斯优化算法。通过学术测试案例讨论并突出了这种优化方法的特殊性。所提出的算法的性能在分析测试用例和航空设计优化问题上进行评估,并与最先进的固定和非静态贝叶斯优化方法进行比较。
translated by 谷歌翻译
我们开发了一种结合马尔可夫链蒙特卡罗(MCMC)和变分推理(VI)的方法,充分利用了两种推理方法的优点。具体来说,我们通过运行一些MCMC步骤来改进变分分布。为了使推理容易处理,我们引入了变分对比分歧(VCD),这是一种新的分歧,取代了VI中使用的标准的Kullback-Leibler(KL)分歧。 VCD捕获了初始变分分布与其改进转换(在运行MCMC步骤之后获得)之间的差异概念,并且它渐渐地收敛于变分分布和感兴趣后验之间的对称KL分歧。通过随机优化,可以相对于变分参数优化VCD目标。我们通过实验证明,优化VCD可以在两个潜变量模型上获得更好的预测性能:逻辑矩阵因子化和变分自动编码器(VAE)。
translated by 谷歌翻译
黑盒变分推理允许研究人员轻松地对一组模型进行原型评估。最近的进展允许这种算法扩展到高维度。然而,一个核心问题仍然存在:如何指定保持高效计算的表达式变分分布?为此,我们开发了分层变分模型(HVM)。 HVM利用其参数的先验增强了变换近似,这使得它能够为离散和连续潜变量提供复杂的结构。我们开发的算法是黑盒子,可以用于任何HVM,并且具有与原始近似相同的计算效率。我们研究了各种深度离散潜变量模型的HVM。 HVM推广了其他的变分分布,并保持了对后验的更高保真度。
translated by 谷歌翻译
非标准化潜变量模型是一类广泛而灵活的统计模型。然而,从数据中学习它们的参数是非常有用的,并且目前很少有估计技术可用于这种模型。为了增加我们的武器库中的技术数量,我们提出了基于NCE的变分噪声对比估计(VNCE),这是一种仅适用于非标准化模型的方法。核心思想是使用NCE目标函数的变分下界,这可以与标准变分推理(VI)中的证据下界(ELBO)相同的方式进行优化。我们证明了VNCE可以用于非正态化模型的参数估计和潜在变量的后验推断。开发的理论表明VNCE具有与标准VI相同的通用性,这意味着可以直接导入到非标准化设置中。我们在玩具模型上验证VNCE并将其应用于从不完整数据中估计无向图形模型的现实问题。
translated by 谷歌翻译
Many functions and signals of interest are formed by the addition of multiple underlying components, often nonlinearly transformed and modified by noise. Examples may be found in the literature on Generalized Additive Models [1] and Un-derdetermined Source Separation [2] or other mode decomposition techniques. Recovery of the underlying component processes often depends on finding and exploiting statistical regularities within them. Gaussian Processes (GPs) [3] have become the dominant way to model statistical expectations over functions. Recent advances make inference of the GP posterior efficient for large scale datasets and arbitrary likelihoods [4, 5]. Here we extend these methods to the additive GP case [6, 7], thus achieving scalable marginal posterior inference over each latent function in settings such as those above.
translated by 谷歌翻译
This work brings together two powerful concepts in Gaussian processes: the variational approach to sparse approximation and the spectral representation of Gaussian processes. This gives rise to an approximation that inherits the benefits of the variational approach but with the representational power and computational scalability of spectral representations. The work hinges on a key result that there exist spectral features related to a finite domain of the Gaussian process which exhibit almost-independent covariances. We derive these expressions for Matérn kernels in one dimension, and generalize to more dimensions using kernels with specific structures. Under the assumption of additive Gaussian noise, our method requires only a single pass through the dataset, making for very fast and accurate computation. We fit a model to 4 million training points in just a few minutes on a standard laptop. With non-conjugate likelihoods, our MCMC scheme reduces the cost of computation from O(N M 2) (for a sparse Gaussian process) to O(N M) per iteration, where N is the number of data and M is the number of features.
translated by 谷歌翻译
我们开发了一种自动变分方法,用于推导具有高斯过程(GP)先验和一般可能性的模型。该方法支持多个输出和多个潜在函数,不需要条件似然的详细知识,只需将其评估为ablack-box函数。使用高斯混合作为变分分布,我们表明使用来自单变量高斯分布的样本可以有效地估计证据下界及其梯度。此外,该方法可扩展到大数据集,这是通过使用诱导变量使用增广先验来实现的。支持最稀疏GP近似的方法,以及并行计算和随机优化。我们在小数据集,中等规模数据集和大型数据集上定量和定性地评估我们的方法,显示其在不同似然模型和稀疏性水平下的竞争力。在涉及航空延误预测和手写数字分类的大规模实验中,我们表明我们的方法与可扩展的GP回归和分类的最先进的硬编码方法相同。
translated by 谷歌翻译
We develop an automated variational method for approximate inference in Gaus-sian process (GP) models whose posteriors are often intractable. Using a mixture of Gaussians as the variational distribution, we show that (i) the variational objective and its gradients can be approximated efficiently via sampling from univari-ate Gaussian distributions and (ii) the gradients wrt the GP hyperparameters can be obtained analytically regardless of the model likelihood. We further propose two instances of the variational distribution whose covariance matrices can be parametrized linearly in the number of observations. These results allow gradient-based optimization to be done efficiently in a black-box manner. Our approach is thoroughly verified on five models using six benchmark datasets, performing as well as the exact or hard-coded implementations while running orders of magnitude faster than the alternative MCMC sampling approaches. Our method can be a valuable tool for practitioners and researchers to investigate new models with minimal effort in deriving model-specific inference algorithms.
translated by 谷歌翻译
We introduce a novel variational method that allows to approximately integrate out kernel hyperparameters, such as length-scales, in Gaussian process regression. This approach consists of a novel variant of the variational framework that has been recently developed for the Gaussian process latent variable model which additionally makes use of a standardised representation of the Gaussian process. We consider this technique for learning Mahalanobis distance metrics in a Gaussian process regression setting and provide experimental evaluations and comparisons with existing methods by considering datasets with high-dimensional inputs.
translated by 谷歌翻译
我们使用变分贝叶斯方法引入非线性反问题的方法,其中未知量是空间场。结构化贝叶斯高斯过程潜变量模型用于构建基于样本的随机先验的低维生成模型,以及前向评估的替代。其贝叶斯公式捕获由有限数量的输入和输出样本引入的认知不确定性,自动为数据的学习表示选择适当的维度,并通过前向模型替代严格传播随机空间的数据驱动维数减少的不确定性。结构化高斯过程模型在提高样本效率之前明确地提取信息生成的空间信息,同时通过相关核矩阵的Kroneckerproduct分解实现计算可处理性。重要的是,贝叶斯反演是通过求解变分优化问题来实现的,取代了传统的计算昂贵的蒙特卡罗采样。该方法在椭圆偏振PDE上得到证明,并且显示为返回校准的后验,并且具有超过100维的潜在空间。
translated by 谷歌翻译