贝叶斯方法通过使用后部分布估计不确定性的衡量。这些方法中的一个难度来源是计算常规常量的计算。计算精确的后验通常是棘手的,并且我们通常近似它。变分推理(VI)方法近似使用优化通常从简单的家庭中选择的分发。描述了这项工作的主要贡献是一种用于自然梯度变分推理的一组更新规则,与高斯的混合,可以为每个混合组分独立地运行,潜在地并联。
translated by 谷歌翻译
Variational inference uses optimization, rather than integration, to approximate the marginal likelihood, and thereby the posterior, in a Bayesian model. Thanks to advances in computational scalability made in the last decade, variational inference is now the preferred choice for many high-dimensional models and large datasets. This tutorial introduces variational inference from the parametric perspective that dominates these recent developments, in contrast to the mean-field perspective commonly found in other introductory texts.
translated by 谷歌翻译
变异推理(VI)的核心原理是将计算复杂后概率密度计算的统计推断问题转换为可拖动的优化问题。该属性使VI比几种基于采样的技术更快。但是,传统的VI算法无法扩展到大型数据集,并且无法轻易推断出越野数据点,而无需重新运行优化过程。该领域的最新发展,例如随机,黑框和摊销VI,已帮助解决了这些问题。如今,生成的建模任务广泛利用摊销VI来实现其效率和可扩展性,因为它利用参数化函数来学习近似的后验密度参数。在本文中,我们回顾了各种VI技术的数学基础,以构成理解摊销VI的基础。此外,我们还概述了最近解决摊销VI问题的趋势,例如摊销差距,泛化问题,不一致的表示学习和后验崩溃。最后,我们分析了改善VI优化的替代差异度量。
translated by 谷歌翻译
How can we perform efficient inference and learning in directed probabilistic models, in the presence of continuous latent variables with intractable posterior distributions, and large datasets? We introduce a stochastic variational inference and learning algorithm that scales to large datasets and, under some mild differentiability conditions, even works in the intractable case. Our contributions is two-fold. First, we show that a reparameterization of the variational lower bound yields a lower bound estimator that can be straightforwardly optimized using standard stochastic gradient methods. Second, we show that for i.i.d. datasets with continuous latent variables per datapoint, posterior inference can be made especially efficient by fitting an approximate inference model (also called a recognition model) to the intractable posterior using the proposed lower bound estimator. Theoretical advantages are reflected in experimental results.
translated by 谷歌翻译
Variational inference has become a widely used method to approximate posteriors in complex latent variables models. However, deriving a variational inference algorithm generally requires significant model-specific analysis, and these efforts can hinder and deter us from quickly developing and exploring a variety of models for a problem at hand. In this paper, we present a "black box" variational inference algorithm, one that can be quickly applied to many models with little additional derivation. Our method is based on a stochastic optimization of the variational objective where the noisy gradient is computed from Monte Carlo samples from the variational distribution. We develop a number of methods to reduce the variance of the gradient, always maintaining the criterion that we want to avoid difficult model-based derivations. We evaluate our method against the corresponding black box sampling based methods. We find that our method reaches better predictive likelihoods much faster than sampling methods. Finally, we demonstrate that Black Box Variational Inference lets us easily explore a wide space of models by quickly constructing and evaluating several models of longitudinal healthcare data.
translated by 谷歌翻译
We develop an optimization algorithm suitable for Bayesian learning in complex models. Our approach relies on natural gradient updates within a general black-box framework for efficient training with limited model-specific derivations. It applies within the class of exponential-family variational posterior distributions, for which we extensively discuss the Gaussian case for which the updates have a rather simple form. Our Quasi Black-box Variational Inference (QBVI) framework is readily applicable to a wide class of Bayesian inference problems and is of simple implementation as the updates of the variational posterior do not involve gradients with respect to the model parameters, nor the prescription of the Fisher information matrix. We develop QBVI under different hypotheses for the posterior covariance matrix, discuss details about its robust and feasible implementation, and provide a number of real-world applications to demonstrate its effectiveness.
translated by 谷歌翻译
One of the core problems of modern statistics is to approximate difficult-to-compute probability densities. This problem is especially important in Bayesian statistics, which frames all inference about unknown quantities as a calculation involving the posterior density. In this paper, we review variational inference (VI), a method from machine learning that approximates probability densities through optimization. VI has been used in many applications and tends to be faster than classical methods, such as Markov chain Monte Carlo sampling. The idea behind VI is to first posit a family of densities and then to find the member of that family which is close to the target. Closeness is measured by Kullback-Leibler divergence. We review the ideas behind mean-field variational inference, discuss the special case of VI applied to exponential family models, present a full example with a Bayesian mixture of Gaussians, and derive a variant that uses stochastic optimization to scale up to massive data. We discuss modern research in VI and highlight important open problems. VI is powerful, but it is not yet well understood. Our hope in writing this paper is to catalyze statistical research on this class of algorithms.
translated by 谷歌翻译
近似复杂的概率密度是现代统计中的核心问题。在本文中,我们介绍了变分推理(VI)的概念,这是一种机器学习中的流行方法,该方法使用优化技术来估计复杂的概率密度。此属性允许VI汇聚速度比经典方法更快,例如Markov Chain Monte Carlo采样。概念上,VI通过选择一个概率密度函数,然后找到最接近实际概率密度的家庭 - 通常使用Kullback-Leibler(KL)发散作为优化度量。我们介绍了缩窄的证据,以促进近似的概率密度,我们审查了平均场变分推理背后的想法。最后,我们讨论VI对变分式自动编码器(VAE)和VAE-生成的对抗网络(VAE-GAN)的应用。用本文,我们的目标是解释VI的概念,并通过这种方法协助协助。
translated by 谷歌翻译
为了最大程度地减少一组对数符号函数的平均值,随机牛顿方法迭代使用完整目标的梯度和Hessian的亚采样版本更新其估计。我们将这个优化问题与具有区分指定观察过程的潜在状态空间模型上的顺序贝叶斯推断相关。然后,应用贝叶斯过滤会产生一种新颖的优化算法,该算法在形成更新时考虑了梯度和黑森的整个历史。我们建立基于基质的条件,在这种条件下,旧观测的影响随着时间的流逝而减少,类似于Polyak的重球动量。我们通过示例说明了我们方法的各个方面,并回顾了随机牛顿方法的其他相关创新。
translated by 谷歌翻译
退火重要性采样(AIS)是一种流行的算法,用于估计深层生成模型的棘手边际可能性。尽管AIS可以保证为任何一组超参数提供无偏估计,但共同的实现依赖于简单的启发式方法,例如初始和目标分布之间的几何平均桥接分布,这些分布在计算预算有限时会影响估计性性能。由于使用Markov过渡中的大都市磨碎(MH)校正步骤,因此对完全参数AI的优化仍然具有挑战性。我们提出一个具有灵活中间分布的参数AIS过程,并优化桥接分布以使用较少数量的采样步骤。一种重新聚集方法,它允许我们优化分布序列和Markov转换的参数,该参数适用于具有MH校正的大型Markov内核。我们评估了优化AIS的性能,以进行深层生成模型的边际可能性估计,并将其与其他估计器进行比较。
translated by 谷歌翻译
使用高斯混合模型(GMM)的变异推断能够学习可侵入性目标分布的高度扣除但多模式的近似值。 GMM与最多数百个维度的问题设置特别相关,例如机器人技术,用于对轨迹或联合分布进行建模。这项工作着重于基于GMM的两种非常有效的方法,这些方法既采用独立的自然梯度更新来为单个组件和权重的分类分布。我们首次表明,尽管它们的实际实现和理论保证有所不同,但他们的派生更新是等效的。我们确定了几种设计选择,可以区分两种方法,即在样本选择,自然梯度估计,步骤适应以及信任区域是否得到强制或适应的组件数量方面。我们对这些设计选择进行广泛的消融,并表明它们强烈影响了优化的效率和学习分布的可变性。基于我们的见解,我们提出了对广义框架的新颖实例化,该实例将一阶自然梯度估计与信任区域和组件适应相结合,并且在我们所有实验中都显着优于以前的两种方法。
translated by 谷歌翻译
扩散模型显示出令人难以置信的能力作为生成模型。实际上,它们为文本条件形成的图像生成(例如Imagen和dall-e2)提供了当前最新模型的启动基于观点。我们首先推导了变异扩散模型(VDM)作为马尔可夫分层变异自动编码器的特殊情况,其中三个关键假设可实现ELBO的可拖动计算和可扩展的优化。然后,我们证明,优化VDM归结为学习神经网络以预测三个潜在目标之一:来自任何任意噪声的原始源输入,任何任意噪声输入的原始源噪声或噪声的得分函数输入任何任意噪声水平。然后,我们更深入地研究学习分数函数的含义,并将扩散模型的变异透视图与通过Tweedie的公式明确地与基于得分的生成建模的角度联系起来。最后,我们涵盖了如何通过指导使用扩散模型学习条件分布的方法。
translated by 谷歌翻译
该报告解释,实施和扩展了“更紧密的变化界限不一定更好”所介绍的作品(T Rainforth等,2018)。我们提供了理论和经验证据,这些证据增加了重要性的重要性数量$ k $在重要性加权自动编码器(IWAE)中(Burda等,2016)降低了推理中梯度估计量的信噪比(SNR)网络,从而影响完整的学习过程。换句话说,即使增加$ k $减少了梯度的标准偏差,但它也会更快地降低真实梯度的幅度,从而增加梯度更新的相对差异。进行广泛的实验以了解$ k $的重要性。这些实验表明,更紧密的变化界限对生成网络有益,而宽松的边界对推理网络来说是可取的。通过这些见解,可以实施和研究三种方法:部分重要性加权自动编码器(PIWAE),倍增重要性加权自动编码器(MIWAE)和组合重要性加权自动编码器(CIWAE)。这三种方法中的每一种都需要IWAE作为一种特殊情况,但采用不同的重量权重,以确保较高的梯度估计器的SNR。在我们的研究和分析中,这些算法的疗效在多个数据集(如MNIST和Omniglot)上进行了测试。最后,我们证明了三种呈现的IWAE变化能够产生近似后验分布,这些分布与IWAE更接近真正的后验分布,同时匹配IWAE生成网络的性能,或者在PIWAE的情况下可能超过其表现。
translated by 谷歌翻译
We develop stochastic variational inference, a scalable algorithm for approximating posterior distributions. We develop this technique for a large class of probabilistic models and we demonstrate it with two probabilistic topic models, latent Dirichlet allocation and the hierarchical Dirichlet process topic model. Using stochastic variational inference, we analyze several large collections of documents: 300K articles from Nature, 1.8M articles from The New York Times, and 3.8M articles from Wikipedia. Stochastic inference can easily handle data sets of this size and outperforms traditional variational inference, which can only handle a smaller subset. (We also show that the Bayesian nonparametric topic model outperforms its parametric counterpart.) Stochastic variational inference lets us apply complex Bayesian models to massive data sets.
translated by 谷歌翻译
我们提出了使用多级蒙特卡罗(MLMC)方法的变分推理的差异减少框架。我们的框架是基于Reparameterized梯度估计的梯度估计,并在优化中从过去更新历史记录获得的“回收”参数。此外,我们的框架还提供了一种基于随机梯度下降(SGD)的新优化算法,其自适应地估计根据梯度方差的比率用于梯度估计的样本大小。理论上,通过我们的方法,梯度估计器的方差随着优化进行而降低,并且学习率调度器函数有助于提高收敛。我们还表明,就\ Texit {信噪比}比率而言,我们的方法可以通过提高初始样本大小来提高学习速率调度器功能的梯度估计的质量。最后,我们确认我们的方法通过使用多个基准数据集的基线方法的实验比较来实现更快的收敛性并降低梯度估计器的方差,并降低了与其他方法相比的其他方法。
translated by 谷歌翻译
We investigate a local reparameterizaton technique for greatly reducing the variance of stochastic gradients for variational Bayesian inference (SGVB) of a posterior over model parameters, while retaining parallelizability. This local reparameterization translates uncertainty about global parameters into local noise that is independent across datapoints in the minibatch. Such parameterizations can be trivially parallelized and have variance that is inversely proportional to the minibatch size, generally leading to much faster convergence. Additionally, we explore a connection with dropout: Gaussian dropout objectives correspond to SGVB with local reparameterization, a scale-invariant prior and proportionally fixed posterior variance. Our method allows inference of more flexibly parameterized posteriors; specifically, we propose variational dropout, a generalization of Gaussian dropout where the dropout rates are learned, often leading to better models. The method is demonstrated through several experiments.
translated by 谷歌翻译
量子哈密顿学习和量子吉布斯采样的双重任务与物理和化学中的许多重要问题有关。在低温方案中,这些任务的算法通常会遭受施状能力,例如因样本或时间复杂性差而遭受。为了解决此类韧性,我们将量子自然梯度下降的概括引入了参数化的混合状态,并提供了稳健的一阶近似算法,即量子 - 固定镜下降。我们使用信息几何学和量子计量学的工具证明了双重任务的数据样本效率,因此首次将经典Fisher效率的开创性结果推广到变异量子算法。我们的方法扩展了以前样品有效的技术,以允许模型选择的灵活性,包括基于量子汉密尔顿的量子模型,包括基于量子的模型,这些模型可能会规避棘手的时间复杂性。我们的一阶算法是使用经典镜下降二元性的新型量子概括得出的。两种结果都需要特殊的度量选择,即Bogoliubov-Kubo-Mori度量。为了从数值上测试我们提出的算法,我们将它们的性能与现有基准进行了关于横向场ISING模型的量子Gibbs采样任务的现有基准。最后,我们提出了一种初始化策略,利用几何局部性来建模状态的序列(例如量子 - 故事过程)的序列。我们从经验上证明了它在实际和想象的时间演化的经验上,同时定义了更广泛的潜在应用。
translated by 谷歌翻译
我们制定自然梯度变推理(VI),期望传播(EP),和后线性化(PL)作为牛顿法用于优化贝叶斯后验分布的参数扩展。这种观点明确地把数值优化框架下的推理算法。我们表明,通用近似牛顿法从优化文献,即高斯 - 牛顿和准牛顿方法(例如,该BFGS算法),仍然是这种“贝叶斯牛顿”框架下有效。这导致了一套这些都保证以产生半正定协方差矩阵,不像标准VI和EP新颖算法。我们统一的观点提供了新的见解各种推理方案之间的连接。所有提出的方法适用于具有高斯事先和非共轭的可能性,这是我们与(疏)高斯过程和状态空间模型展示任何模型。
translated by 谷歌翻译
稀疏变分高斯工艺(SVGP)方法是由于其计算效益的非共轭高斯工艺推论的常见选择。在本文中,我们通过使用双重参数化来提高其计算效率,其中每个数据示例被分配双参数,类似于期望传播中使用的站点参数。我们使用自然梯度下降的双重参数化速度推断,并提供了较小的证据,用于近似参数学习。该方法具有与当前SVGP方法相同的内存成本,但它更快,更准确。
translated by 谷歌翻译
自动编码变化贝叶斯(AEVB)是一种用于拟合潜在变量模型(无监督学习的有前途的方向)的强大而通用的算法,并且是训练变量自动编码器(VAE)的众所周知的。在本教程中,我们专注于从经典的期望最大化(EM)算法中激励AEVB,而不是确定性自动编码器。尽管自然而有些不言而喻,但在最近的深度学习文献中并未强调EM与AEVB之间的联系,我们认为强调这种联系可以改善社区对AEVB的理解。特别是,我们发现(1)优化有关推理参数的证据下限(ELBO)作为近似E-step,并且(2)优化ELBO相对于生成参数作为近似M-step;然后,与AEVB中的同时进行同时进行,然后同时拧紧并推动Elbo。我们讨论如何将近似E-Step解释为执行变异推断。详细讨论了诸如摊销和修复技巧之类的重要概念。最后,我们从划痕中得出了非深度和几个深层变量模型的AEVB训练程序,包括VAE,有条件的VAE,高斯混合物VAE和变异RNN。我们希望读者能够将AEVB认识为一种通用算法,可用于拟合广泛的潜在变量模型(不仅仅是VAE),并将AEVB应用于自己的研究领域中出现的此类模型。所有纳入型号的Pytorch代码均可公开使用。
translated by 谷歌翻译