由于本地潜在变量的数量与数据集缩放,因此难以使用分层模型中的变分推理。因此,分层模型中的推断仍然是大规模的挑战。使用与后部匹配的结构进行变形家庭是有帮助的,但由于局部分布的巨大数量,优化仍然缓慢。相反,本文建议摊销方法,其中共享参数同时表示所有本地分布。这种方法类似地是使用给定的联合分布(例如,全级高斯),但在数据集上是可行的,这些数量幅度较大。它也比使用结构化的变分布速度更快。
translated by 谷歌翻译
分层模型代表了推理算法的挑战性设置。 MCMC方法难以扩展到具有许多局部变量和观测值的大型模型,并且由于使用简单的变异家族,变异推理(VI)可能无法提供准确的近似值。一些变异方法(例如,重要性加权VI)整合了蒙特卡洛方法以提供更好的准确性,但是这些方法往往不适合层次模型,因为它们不允许亚采样,并且其性能往往会降低高维模型。我们基于分别针对每组局部随机变量的拧紧方法(例如重要性加权)的应用,为分层模型提出了一个新的差异界限家族。我们表明,我们的方法自然允许使用子采样来获得公正的梯度,并且它完全利用了通过在较低维空间中独立应用它们来建立更紧密的下限的方法的力量基线。
translated by 谷歌翻译
Variational inference has become a widely used method to approximate posteriors in complex latent variables models. However, deriving a variational inference algorithm generally requires significant model-specific analysis, and these efforts can hinder and deter us from quickly developing and exploring a variety of models for a problem at hand. In this paper, we present a "black box" variational inference algorithm, one that can be quickly applied to many models with little additional derivation. Our method is based on a stochastic optimization of the variational objective where the noisy gradient is computed from Monte Carlo samples from the variational distribution. We develop a number of methods to reduce the variance of the gradient, always maintaining the criterion that we want to avoid difficult model-based derivations. We evaluate our method against the corresponding black box sampling based methods. We find that our method reaches better predictive likelihoods much faster than sampling methods. Finally, we demonstrate that Black Box Variational Inference lets us easily explore a wide space of models by quickly constructing and evaluating several models of longitudinal healthcare data.
translated by 谷歌翻译
One of the core problems of modern statistics is to approximate difficult-to-compute probability densities. This problem is especially important in Bayesian statistics, which frames all inference about unknown quantities as a calculation involving the posterior density. In this paper, we review variational inference (VI), a method from machine learning that approximates probability densities through optimization. VI has been used in many applications and tends to be faster than classical methods, such as Markov chain Monte Carlo sampling. The idea behind VI is to first posit a family of densities and then to find the member of that family which is close to the target. Closeness is measured by Kullback-Leibler divergence. We review the ideas behind mean-field variational inference, discuss the special case of VI applied to exponential family models, present a full example with a Bayesian mixture of Gaussians, and derive a variant that uses stochastic optimization to scale up to massive data. We discuss modern research in VI and highlight important open problems. VI is powerful, but it is not yet well understood. Our hope in writing this paper is to catalyze statistical research on this class of algorithms.
translated by 谷歌翻译
项目反应理论(IRT)是一个无处不在的模型,可以根据他们对问题的回答理解人类行为和态度。大型现代数据集为捕捉人类行为的更多细微差别提供了机会,从而有可能改善心理测量模型,从而改善科学理解和公共政策。但是,尽管较大的数据集允许采用更灵活的方法,但许多用于拟合IRT模型的当代算法也可能具有禁止现实世界应用的巨大计算需求。为了解决这种瓶颈,我们引入了IRT的变异贝叶斯推理算法,并表明它在不牺牲准确性的情况下快速可扩展。将此方法应用于认知科学和教育的五个大规模项目响应数据集中,比替代推理算法更高的对数可能性和更高的准确性。然后,使用这种新的推论方法,我们将IRT概括为具有表现力的贝叶斯响应模型,利用深度学习的最新进展来捕获具有神经网络的非线性项目特征曲线(ICC)。使用TIMSS的特定级数学测试,我们显示我们的非线性IRT模型可以捕获有趣的不对称ICC。该算法实现是开源的,易于使用。
translated by 谷歌翻译
自动编码变化贝叶斯(AEVB)是一种用于拟合潜在变量模型(无监督学习的有前途的方向)的强大而通用的算法,并且是训练变量自动编码器(VAE)的众所周知的。在本教程中,我们专注于从经典的期望最大化(EM)算法中激励AEVB,而不是确定性自动编码器。尽管自然而有些不言而喻,但在最近的深度学习文献中并未强调EM与AEVB之间的联系,我们认为强调这种联系可以改善社区对AEVB的理解。特别是,我们发现(1)优化有关推理参数的证据下限(ELBO)作为近似E-step,并且(2)优化ELBO相对于生成参数作为近似M-step;然后,与AEVB中的同时进行同时进行,然后同时拧紧并推动Elbo。我们讨论如何将近似E-Step解释为执行变异推断。详细讨论了诸如摊销和修复技巧之类的重要概念。最后,我们从划痕中得出了非深度和几个深层变量模型的AEVB训练程序,包括VAE,有条件的VAE,高斯混合物VAE和变异RNN。我们希望读者能够将AEVB认识为一种通用算法,可用于拟合广泛的潜在变量模型(不仅仅是VAE),并将AEVB应用于自己的研究领域中出现的此类模型。所有纳入型号的Pytorch代码均可公开使用。
translated by 谷歌翻译
We develop an optimization algorithm suitable for Bayesian learning in complex models. Our approach relies on natural gradient updates within a general black-box framework for efficient training with limited model-specific derivations. It applies within the class of exponential-family variational posterior distributions, for which we extensively discuss the Gaussian case for which the updates have a rather simple form. Our Quasi Black-box Variational Inference (QBVI) framework is readily applicable to a wide class of Bayesian inference problems and is of simple implementation as the updates of the variational posterior do not involve gradients with respect to the model parameters, nor the prescription of the Fisher information matrix. We develop QBVI under different hypotheses for the posterior covariance matrix, discuss details about its robust and feasible implementation, and provide a number of real-world applications to demonstrate its effectiveness.
translated by 谷歌翻译
统计模型是机器学习的核心,具有广泛适用性,跨各种下游任务。模型通常由通过最大似然估计从数据估计的自由参数控制。但是,当面对现实世界数据集时,许多模型运行到一个关键问题:它们是在完全观察到的数据方面配制的,而在实践中,数据集会困扰缺失数据。来自不完整数据的统计模型估计理论在概念上类似于潜在变量模型的估计,其中存在强大的工具,例如变分推理(VI)。然而,与标准潜在变量模型相比,具有不完整数据的参数估计通常需要估计缺失变量的指数 - 许多条件分布,因此使标准的VI方法是棘手的。通过引入变分Gibbs推理(VGI),是一种新的通用方法来解决这个差距,以估计来自不完整数据的统计模型参数。我们在一组合成和实际估算任务上验证VGI,从不完整的数据中估算重要的机器学习模型,VAE和标准化流程。拟议的方法,同时通用,实现比现有的特定模型特定估计方法竞争或更好的性能。
translated by 谷歌翻译
We investigate a local reparameterizaton technique for greatly reducing the variance of stochastic gradients for variational Bayesian inference (SGVB) of a posterior over model parameters, while retaining parallelizability. This local reparameterization translates uncertainty about global parameters into local noise that is independent across datapoints in the minibatch. Such parameterizations can be trivially parallelized and have variance that is inversely proportional to the minibatch size, generally leading to much faster convergence. Additionally, we explore a connection with dropout: Gaussian dropout objectives correspond to SGVB with local reparameterization, a scale-invariant prior and proportionally fixed posterior variance. Our method allows inference of more flexibly parameterized posteriors; specifically, we propose variational dropout, a generalization of Gaussian dropout where the dropout rates are learned, often leading to better models. The method is demonstrated through several experiments.
translated by 谷歌翻译
We develop stochastic variational inference, a scalable algorithm for approximating posterior distributions. We develop this technique for a large class of probabilistic models and we demonstrate it with two probabilistic topic models, latent Dirichlet allocation and the hierarchical Dirichlet process topic model. Using stochastic variational inference, we analyze several large collections of documents: 300K articles from Nature, 1.8M articles from The New York Times, and 3.8M articles from Wikipedia. Stochastic inference can easily handle data sets of this size and outperforms traditional variational inference, which can only handle a smaller subset. (We also show that the Bayesian nonparametric topic model outperforms its parametric counterpart.) Stochastic variational inference lets us apply complex Bayesian models to massive data sets.
translated by 谷歌翻译
概率分布允许从业者发现数据中的隐藏结构,并构建模型,以使用有限的数据解决监督的学习问题。该报告的重点是变异自动编码器,这是一种学习大型复杂数据集概率分布的方法。该报告提供了对变异自动编码器的理论理解,并巩固了该领域的当前研究。该报告分为多个章节,第一章介绍了问题,描述了变异自动编码器并标识了该领域的关键研究方向。第2、3、4和5章深入研究了每个关键研究领域的细节。第6章总结了报告,并提出了未来工作的指示。具有机器学习基本思想但想了解机器学习研究中的一般主题的读者可以从报告中受益。该报告解释了有关学习概率分布的中心思想,人们为使这种危险做些什么,并介绍了有关当前如何应用深度学习的细节。该报告还为希望为这个子场做出贡献的人提供了温和的介绍。
translated by 谷歌翻译
变异推理(VI)的核心原理是将计算复杂后概率密度计算的统计推断问题转换为可拖动的优化问题。该属性使VI比几种基于采样的技术更快。但是,传统的VI算法无法扩展到大型数据集,并且无法轻易推断出越野数据点,而无需重新运行优化过程。该领域的最新发展,例如随机,黑框和摊销VI,已帮助解决了这些问题。如今,生成的建模任务广泛利用摊销VI来实现其效率和可扩展性,因为它利用参数化函数来学习近似的后验密度参数。在本文中,我们回顾了各种VI技术的数学基础,以构成理解摊销VI的基础。此外,我们还概述了最近解决摊销VI问题的趋势,例如摊销差距,泛化问题,不一致的表示学习和后验崩溃。最后,我们分析了改善VI优化的替代差异度量。
translated by 谷歌翻译
How can we perform efficient inference and learning in directed probabilistic models, in the presence of continuous latent variables with intractable posterior distributions, and large datasets? We introduce a stochastic variational inference and learning algorithm that scales to large datasets and, under some mild differentiability conditions, even works in the intractable case. Our contributions is two-fold. First, we show that a reparameterization of the variational lower bound yields a lower bound estimator that can be straightforwardly optimized using standard stochastic gradient methods. Second, we show that for i.i.d. datasets with continuous latent variables per datapoint, posterior inference can be made especially efficient by fitting an approximate inference model (also called a recognition model) to the intractable posterior using the proposed lower bound estimator. Theoretical advantages are reflected in experimental results.
translated by 谷歌翻译
Variational inference uses optimization, rather than integration, to approximate the marginal likelihood, and thereby the posterior, in a Bayesian model. Thanks to advances in computational scalability made in the last decade, variational inference is now the preferred choice for many high-dimensional models and large datasets. This tutorial introduces variational inference from the parametric perspective that dominates these recent developments, in contrast to the mean-field perspective commonly found in other introductory texts.
translated by 谷歌翻译
Recent advances in coreset methods have shown that a selection of representative datapoints can replace massive volumes of data for Bayesian inference, preserving the relevant statistical information and significantly accelerating subsequent downstream tasks. Existing variational coreset constructions rely on either selecting subsets of the observed datapoints, or jointly performing approximate inference and optimizing pseudodata in the observed space akin to inducing points methods in Gaussian Processes. So far, both approaches are limited by complexities in evaluating their objectives for general purpose models, and require generating samples from a typically intractable posterior over the coreset throughout inference and testing. In this work, we present a black-box variational inference framework for coresets that overcomes these constraints and enables principled application of variational coresets to intractable models, such as Bayesian neural networks. We apply our techniques to supervised learning problems, and compare them with existing approaches in the literature for data summarization and inference.
translated by 谷歌翻译
这项正在进行的工作旨在为统计学习提供统一的介绍,从诸如GMM和HMM等经典模型到现代神经网络(如VAE和扩散模型)缓慢地构建。如今,有许多互联网资源可以孤立地解释这一点或新的机器学习算法,但是它们并没有(也不能在如此简短的空间中)将这些算法彼此连接起来,或者与统计模型的经典文献相连现代算法出现了。同样明显缺乏的是一个单一的符号系统,尽管对那些已经熟悉材料的人(如这些帖子的作者)不满意,但对新手的入境造成了重大障碍。同样,我的目的是将各种模型(尽可能)吸收到一个用于推理和学习的框架上,表明(以及为什么)如何以最小的变化将一个模型更改为另一个模型(其中一些是新颖的,另一些是文献中的)。某些背景当然是必要的。我以为读者熟悉基本的多变量计算,概率和统计以及线性代数。这本书的目标当然不是​​完整性,而是从基本知识到过去十年中极强大的新模型的直线路径或多或少。然后,目标是补充而不是替换,诸如Bishop的\ emph {模式识别和机器学习}之类的综合文本,该文本现在已经15岁了。
translated by 谷歌翻译
贝叶斯神经网络具有潜在变量(BNN + LVS)通过明确建模模型不确定性(通过网络权重)和环境暂停(通过潜在输入噪声变量)来捕获预测的不确定性。在这项工作中,我们首先表明BNN + LV具有严重形式的非可识别性:可以在模型参数和潜在变量之间传输解释性,同时拟合数据。我们证明,在无限数据的极限中,网络权重和潜变量的后部模式从地面真理渐近地偏离。由于这种渐近偏差,传统的推理方法可以在实践中,产量参数概括不确定和不确定的不确定性。接下来,我们开发一种新推断过程,明确地减轻了训练期间不可识别性的影响,并产生高质量的预测以及不确定性估计。我们展示我们的推理方法在一系列合成和实际数据集中改善了基准方法。
translated by 谷歌翻译
随机梯度马尔可夫链蒙特卡洛(SGMCMC)被认为是大型模型(例如贝叶斯神经网络)中贝叶斯推断的金标准。由于从业人员在这些模型中面临速度与准确性权衡,因此变异推理(VI)通常是可取的选择。不幸的是,VI对后部的分解和功能形式做出了有力的假设。在这项工作中,我们提出了一个新的非参数变分近似,该近似没有对后验功能形式进行假设,并允许从业者指定算法应尊重或断裂的确切依赖性。该方法依赖于在修改的能量函数上运行的新的langevin型算法,其中潜在变量的一部分是在马尔可夫链的早期迭代中平均的。这样,统计依赖性可以以受控的方式破裂,从而使链条混合更快。可以以“辍学”方式进一步修改该方案,从而导致更大的可扩展性。我们在CIFAR-10,SVHN和FMNIST上测试RESNET-20的计划。在所有情况下,与SG-MCMC和VI相比,我们都会发现收敛速度和/或最终精度的提高。
translated by 谷歌翻译
The choice of approximate posterior distribution is one of the core problems in variational inference. Most applications of variational inference employ simple families of posterior approximations in order to allow for efficient inference, focusing on mean-field or other simple structured approximations. This restriction has a significant impact on the quality of inferences made using variational methods. We introduce a new approach for specifying flexible, arbitrarily complex and scalable approximate posterior distributions. Our approximations are distributions constructed through a normalizing flow, whereby a simple initial density is transformed into a more complex one by applying a sequence of invertible transformations until a desired level of complexity is attained. We use this view of normalizing flows to develop categories of finite and infinitesimal flows and provide a unified view of approaches for constructing rich posterior approximations. We demonstrate that the theoretical advantages of having posteriors that better match the true posterior, combined with the scalability of amortized variational approaches, provides a clear improvement in performance and applicability of variational inference.
translated by 谷歌翻译
贝叶斯结构学习允许从数据推断贝叶斯网络结构,同时推理认识性不确定性 - 朝着实现现实世界系统的主动因果发现和设计干预的关键因素。在这项工作中,我们为贝叶斯结构学习(DIBS)提出了一般,完全可微分的框架,其在潜在概率图表表示的连续空间中运行。与现有的工作相反,DIBS对局部条件分布的形式不可知,并且允许图形结构和条件分布参数的关节后部推理。这使得我们的配方直接适用于复杂贝叶斯网络模型的后部推理,例如,具有由神经网络编码的非线性依赖性。使用DIBS,我们设计了一种高效,通用的变分推理方法,用于近似结构模型的分布。在模拟和现实世界数据的评估中,我们的方法显着优于关节后部推理的相关方法。
translated by 谷歌翻译