可逆的神经网络(Inns)已被用于设计生成模型,实现节省内存梯度计算,并解决逆问题。在这项工作中,我们展示了普通二手纪念架构遭受爆炸逆,因此易于变得数值不可逆转。在广泛的Inn用例中,我们揭示了包括在分配和分配的变化(OOD)数据的变化公式的不适用性的失败,用于节省内存返回的不正确渐变,以及无法从标准化流量模型中采样。我们进一步推出了普通架构原子构建块的双嘴唇特性。这些见解对旅馆的稳定性然后提供了前进的方法来解决这些故障。对于本地可释放足够的任务,如记忆保存的倒退,我们提出了一种灵活且高效的常规器。对于必要的全球可逆性的问题,例如在ood数据上应用标准化流动,我们展示了设计稳定的旅馆构建块的重要性。
translated by 谷歌翻译
We show that standard ResNet architectures can be made invertible, allowing the same model to be used for classification, density estimation, and generation. Typically, enforcing invertibility requires partitioning dimensions or restricting network architectures. In contrast, our approach only requires adding a simple normalization step during training, already available in standard frameworks. Invertible ResNets define a generative model which can be trained by maximum likelihood on unlabeled data. To compute likelihoods, we introduce a tractable approximation to the Jacobian log-determinant of a residual block. Our empirical evaluation shows that invertible ResNets perform competitively with both stateof-the-art image classifiers and flow-based generative models, something that has not been previously achieved with a single architecture.
translated by 谷歌翻译
正常化流动在过去几年中已经变得更加流行;然而,他们继续计算得昂贵,使得它们难以被接受到更广泛的机器学习界中。在本文中,我们介绍了一个简单的一维一层网络,其封闭形式的Lipschitz常数;使用此,我们介绍了一种新的精确嘴唇流(ELF),这些流量(ELF)结合了剩余流量的易于采样,并具有自回归流的强烈性能。此外,我们表明,与多个其他流相比,ELF被证明是通用密度近似器,更新和参数有效,并且在多个大规模数据集上实现最先进的性能。
translated by 谷歌翻译
Normalizing Flows are generative models which produce tractable distributions where both sampling and density evaluation can be efficient and exact. The goal of this survey article is to give a coherent and comprehensive review of the literature around the construction and use of Normalizing Flows for distribution learning. We aim to provide context and explanation of the models, review current state-of-the-art literature, and identify open questions and promising future directions.
translated by 谷歌翻译
Normalizing flows provide a general mechanism for defining expressive probability distributions, only requiring the specification of a (usually simple) base distribution and a series of bijective transformations. There has been much recent work on normalizing flows, ranging from improving their expressive power to expanding their application. We believe the field has now matured and is in need of a unified perspective. In this review, we attempt to provide such a perspective by describing flows through the lens of probabilistic modeling and inference. We place special emphasis on the fundamental principles of flow design, and discuss foundational topics such as expressive power and computational trade-offs. We also broaden the conceptual framing of flows by relating them to more general probability transformations. Lastly, we summarize the use of flows for tasks such as generative modeling, approximate inference, and supervised learning.
translated by 谷歌翻译
平均场游戏(MFGS)是针对具有大量交互代理的系统的建模框架。他们在经济学,金融和游戏理论中有应用。标准化流(NFS)是一个深层生成模型的家族,通过使用可逆映射来计算数据的可能性,该映射通常通过使用神经网络进行参数化。它们对于密度建模和数据生成很有用。尽管对这两种模型进行了积极的研究,但很少有人注意到两者之间的关系。在这项工作中,我们通过将NF的训练视为解决MFG来揭示MFGS和NFS之间的联系。这是通过根据试剂轨迹重新解决MFG问题的实现,并通过流量体系结构对所得MFG的离散化进行参数化。通过这种联系,我们探讨了两个研究方向。首先,我们采用表达的NF体系结构来准确地求解高维MFG,以避开传统数值方法中维度的诅咒。与其他深度学习方法相比,我们的基于轨迹的公式编码神经网络中的连续性方程,从而更好地近似人口动态。其次,我们对NFS进行运输成本的培训正规,并显示了控制模型Lipschitz绑定的有效性,从而获得了更好的概括性能。我们通过对各种合成和现实生活数据集的全面实验来展示数值结果。
translated by 谷歌翻译
标准化流是生成模型,其通过从简单的基本分布到复杂的目标分布的可逆性转换提供易于变换的工艺模型。然而,该技术不能直接模拟支持未知的低维歧管的数据,在诸如图像数据之类的现实世界域中的公共发生。最近的补救措施的尝试引入了击败归一化流量的中央好处的几何并发症:精确密度估计。我们通过保形嵌入流量来恢复这种福利,这是一种设计流动与贸易密度的流动的流动的框架。我们争辩说,使用培训保育嵌入的标准流量是模型支持数据的最自然的方式。为此,我们提出了一系列保形构建块,并在具有合成和实际数据的实验中应用它们,以证明流动可以在不牺牲贸易可能性的情况下模拟歧管支持的分布。
translated by 谷歌翻译
归一化流提供一种优雅的方法,用于通过使用可逆的变换获得来自分布的易于密度估计。主要挑战是提高模型的表现,同时保持可逆性约束完整。我们建议通过纳入本地化的自我关注来这样做。然而,传统的自我关注机制不满足获得可逆流的要求,并且不能胆无利地结合到标准化流中。为了解决这一点,我们介绍了一种称为细微的收缩流(ACF)的新方法,它利用了一种特殊类别的基于流的生成模型 - 收缩流。我们证明可以以即插即用的方式将ACF引入到最新的现有技术的状态。这被证明是不仅改善了这些模型的表示力(改善了每次昏暗度量的比特),而且还导致训练它们的速度明显更快。在包括测试图像之间的分隔的定性结果证明样本更加现实并捕获数据中的本地相关性。我们通过使用AWGN进行扰动分析来进一步评估结果,证明ACF模型(特别是点 - 产品变体)表现出更好,更加一致的恢复能力噪声。
translated by 谷歌翻译
A normalizing flow models a complex probability density as an invertible transformation of a simple base density. Flows based on either coupling or autoregressive transforms both offer exact density evaluation and sampling, but rely on the parameterization of an easily invertible elementwise transformation, whose choice determines the flexibility of these models. Building upon recent work, we propose a fully-differentiable module based on monotonic rational-quadratic splines, which enhances the flexibility of both coupling and autoregressive transforms while retaining analytic invertibility. We demonstrate that neural spline flows improve density estimation, variational inference, and generative modeling of images.
translated by 谷歌翻译
Accurate uncertainty quantification is a major challenge in deep learning, as neural networks can make overconfident errors and assign high confidence predictions to out-of-distribution (OOD) inputs. The most popular approaches to estimate predictive uncertainty in deep learning are methods that combine predictions from multiple neural networks, such as Bayesian neural networks (BNNs) and deep ensembles. However their practicality in real-time, industrial-scale applications are limited due to the high memory and computational cost. Furthermore, ensembles and BNNs do not necessarily fix all the issues with the underlying member networks. In this work, we study principled approaches to improve uncertainty property of a single network, based on a single, deterministic representation. By formalizing the uncertainty quantification as a minimax learning problem, we first identify distance awareness, i.e., the model's ability to quantify the distance of a testing example from the training data, as a necessary condition for a DNN to achieve high-quality (i.e., minimax optimal) uncertainty estimation. We then propose Spectral-normalized Neural Gaussian Process (SNGP), a simple method that improves the distance-awareness ability of modern DNNs with two simple changes: (1) applying spectral normalization to hidden weights to enforce bi-Lipschitz smoothness in representations and (2) replacing the last output layer with a Gaussian process layer. On a suite of vision and language understanding benchmarks, SNGP outperforms other single-model approaches in prediction, calibration and out-of-domain detection. Furthermore, SNGP provides complementary benefits to popular techniques such as deep ensembles and data augmentation, making it a simple and scalable building block for probabilistic deep learning. Code is open-sourced at https://github.com/google/uncertainty-baselines
translated by 谷歌翻译
A neural network deployed in the wild may be asked to make predictions for inputs that were drawn from a different distribution than that of the training data. A plethora of work has demonstrated that it is easy to find or synthesize inputs for which a neural network is highly confident yet wrong. Generative models are widely viewed to be robust to such mistaken confidence as modeling the density of the input features can be used to detect novel, out-of-distribution inputs. In this paper we challenge this assumption. We find that the density learned by flow-based models, VAEs, and PixelCNNs cannot distinguish images of common objects such as dogs, trucks, and horses (i.e. CIFAR-10) from those of house numbers (i.e. SVHN), assigning a higher likelihood to the latter when the model is trained on the former. Moreover, we find evidence of this phenomenon when pairing several popular image data sets: FashionMNIST vs MNIST, CelebA vs SVHN, ImageNet vs CIFAR-10 / CIFAR-100 / SVHN. To investigate this curious behavior, we focus analysis on flow-based generative models in particular since they are trained and evaluated via the exact marginal likelihood. We find such behavior persists even when we restrict the flows to constant-volume transformations. These transformations admit some theoretical analysis, and we show that the difference in likelihoods can be explained by the location and variances of the data and the model curvature. Our results caution against using the density estimates from deep generative models to identify inputs similar to the training distribution until their behavior for out-of-distribution inputs is better understood.
translated by 谷歌翻译
作为一种概率建模技术,基于流的模型在无损压缩\ cite {idf,idf ++,lbb,ivpf,iflow}的领域表现出了巨大的潜力。与其他深层生成模型(例如自动回应,VAE)\ cite {bitswap,hilloc,pixelcnn ++,pixelsnail},这些模型明确地模拟了数据分布概率,因此基于流的模型的性能更好,因为它们的出色概率密度估计和满意度的概率和满意度的概率。在基于流量的模型中,多尺度体系结构提供了从浅层到输出层的快捷方式,从而大大降低了计算复杂性并避免添加更多层时性能降解。这对于构建基于先进的基于流动的可学习射击映射至关重要。此外,实用压缩任务中模型设计的轻量级要求表明,具有多尺度体系结构的流量在编码复杂性和压缩效率之间取得了最佳的权衡。
translated by 谷歌翻译
归一化流量是具有易于易变量的神经网络的可逆性网络,其允许通过最大可能性优化它们的参数来有效地执行。然而,通常假设感兴趣的数据生活在嵌入在高维环境空间中的一些(通常未知)的低维歧管中。结果是自建设中以来的建模不匹配 - 可逆性要求意味着学习分布的高维支持。注射流量,从低到高维空间的映射,旨在通过学习歧管的分布来解决这种差异,但是由此产生的体积变化术语变得更具挑战性。目前方法避免完全使用各种启发式计算该术语,或者假设歧管预先已知,因此不广泛适用。相反,我们提出了两种方法来对模型的参数来促进该术语的梯度,依赖于仔细使用来自数值线性代数的自动分化和技术。两种方法都对将其投射到这种歧管上的数据执行端到端非线性歧管学习和密度估计。我们研究了我们所提出的方法之间的权衡,经验验证我们优于更准确地学习歧管和对应的相应分布忽略音量变化术语的优先级,并显示出对分布外检测的有希望的结果。我们的代码可在https://github.com/layer6ai-labs/rectangular-flows中找到。
translated by 谷歌翻译
将数据作为几个原子的组合表示数据的字典学习问题,长期以来作为一种流行的学习统计信息和信号处理方法。最受欢迎的字典学习算法在稀疏编码和词典上的交替交替,富有的文献研究了其理论融合。神经卓越的展开稀疏编码网络的日益普及导致了经验发现,通过这种网络的反向化执行字典学习。本文通过Pudle提供了这些经验结果的第一个理论证明,可提供展开的展开字典学习方法。我们突出了损失,展开和背交对融合的影响。我们发现隐式加速:作为展开的函数,BackPropagated梯度会收敛得更快,比梯度从交替最小化更准确。我们通过合成和图像去噪实验补充我们的研究结果。调查结果支持使用加速深度学习优化器和展开网络用于字典学习。
translated by 谷歌翻译
我们介绍了用于生成建模的广义能量模型(GEBM)。这些模型组合了两个训练有素的组件:基本分布(通常是隐式模型),可以在高维空间中学习具有低固有尺寸的数据的支持;和能量功能,优化学习支持的概率质量。能量函数和基座都共同构成了最终模型,与GANS不同,它仅保留基本分布(“发电机”)。通过在学习能量和基础之间交替进行培训GEBMS。我们表明,两种培训阶段都明确定义:通过最大化广义可能性来学习能量,并且由此产生的能源的损失提供了学习基础的信息梯度。可以通过MCMC获得来自训练模型的潜在空间的后部的样品,从而在该空间中找到产生更好的质量样本的区域。经验上,图像生成任务上的GEBM样本比来自学习发电机的图像更好,表明所有其他相同,GEBM将优于同样复杂性的GAN。当使用归一化流作为基础测量时,GEBMS成功地启动密度建模任务,返回相当的性能以直接相同网络的最大可能性。
translated by 谷歌翻译
现代生成的对抗网络(GANS)主要使用判别者(或批评者)中的分段线性激活功能,包括Relu和Leaceryru。这些模型学习分段线性映射,其中每个部分处理输入空间的子集,每个子​​集的梯度​​是分段常数。在这样一类鉴别者(或批评者)函数下,我们呈现梯度标准化(Gran),一种新的输入相关标准化方法,可确保输入空间中的分段k-lipschitz约束。与光谱归一化相比,Gran不约束各个网络层的处理,并且与梯度惩罚不同,严格执行几乎无处不在的分段Lipschitz约束。凭经验,我们展示了多个数据集的改进了图像生成性能(包括Cifar-10/100,STL-10,LSUN卧室和Celeba),GaN丢失功能和指标。此外,我们分析了在几个标准GAN中改变了经常无核的Lipschitz常数K,而不仅仅是实现显着的性能增益,还可以在普通的ADAM优化器中找到K和培训动态之间的连接,特别是在低梯度损失平台之间。
translated by 谷歌翻译
We propose a deep learning framework for modeling complex high-dimensional densities called Non-linear Independent Component Estimation (NICE). It is based on the idea that a good representation is one in which the data has a distribution that is easy to model. For this purpose, a non-linear deterministic transformation of the data is learned that maps it to a latent space so as to make the transformed data conform to a factorized distribution, i.e., resulting in independent latent variables. We parametrize this transformation so that computing the determinant of the Jacobian and inverse Jacobian is trivial, yet we maintain the ability to learn complex non-linear transformations, via a composition of simple building blocks, each based on a deep neural network. The training criterion is simply the exact log-likelihood, which is tractable. Unbiased ancestral sampling is also easy. We show that this approach yields good generative models on four image datasets and can be used for inpainting.
translated by 谷歌翻译
基于似然或显式的深层生成模型使用神经网络来构建灵活的高维密度。该公式直接与歧管假设相矛盾,该假设指出,观察到的数据位于嵌入高维环境空间中的低维歧管上。在本文中,我们研究了在这种维度不匹配的情况下,最大可能的训练的病理。我们正式证明,在学习歧管本身而不是分布的情况下,可以实现堕落的优点,而我们称之为多种歧视的现象过于拟合。我们提出了一类两步程序,该过程包括降低降低步骤,然后进行最大样子密度估计,并证明它们在非参数方面恢复了数据生成分布,从而避免了多种歧视。我们还表明,这些过程能够对隐式模型(例如生成对抗网络)学到的流形进行密度估计,从而解决了这些模型的主要缺点。最近提出的几种方法是我们两步程序的实例。因此,我们统一,扩展和理论上证明了一大批模型。
translated by 谷歌翻译
在概率密度范围内相对于Wassersein度量的空间的梯度流程通常具有很好的特性,并且已在几种机器学习应用中使用。计算Wasserstein梯度流量的标准方法是有限差异,使网格上的基础空间离散,并且不可扩展。在这项工作中,我们提出了一种可扩展的近端梯度型算法,用于Wassersein梯度流。我们的方法的关键是目标函数的变分形式,这使得可以通过引流 - 双重优化实现JKO近端地图。可以通过替代地更新内部和外环中的参数来有效地解决该原始问题。我们的框架涵盖了包括热方程和多孔介质方程的所有经典Wasserstein梯度流。我们展示了若干数值示例的算法的性能和可扩展性。
translated by 谷歌翻译
标准化流动,扩散归一化流量和变形自动置换器是强大的生成模型。在本文中,我们提供了一个统一的框架来通过马尔可夫链处理这些方法。实际上,我们考虑随机标准化流量作为一对马尔可夫链,满足一些属性,并表明许多用于数据生成的最先进模型适合该框架。马尔可夫链的观点使我们能够将确定性层作为可逆的神经网络和随机层作为大都会加速层,Langevin层和变形自身偏移,以数学上的声音方式。除了具有Langevin层的密度的层,扩散层或变形自身形式,也可以处理与确定性层或大都会加热器层没有密度的层。因此,我们的框架建立了一个有用的数学工具来结合各种方法。
translated by 谷歌翻译