Normalizing flows are a powerful tool for generative modelling, density estimation and posterior reconstruction in Bayesian inverse problems. In this paper, we introduce proximal residual flows, a new architecture of normalizing flows. Based on the fact, that proximal neural networks are by definition averaged operators, we ensure invertibility of certain residual blocks. Moreover, we extend the architecture to conditional proximal residual flows for posterior reconstruction within Bayesian inverse problems. We demonstrate the performance of proximal residual flows on numerical examples.
translated by 谷歌翻译
标准化流动,扩散归一化流量和变形自动置换器是强大的生成模型。在本文中,我们提供了一个统一的框架来通过马尔可夫链处理这些方法。实际上,我们考虑随机标准化流量作为一对马尔可夫链,满足一些属性,并表明许多用于数据生成的最先进模型适合该框架。马尔可夫链的观点使我们能够将确定性层作为可逆的神经网络和随机层作为大都会加速层,Langevin层和变形自身偏移,以数学上的声音方式。除了具有Langevin层的密度的层,扩散层或变形自身形式,也可以处理与确定性层或大都会加热器层没有密度的层。因此,我们的框架建立了一个有用的数学工具来结合各种方法。
translated by 谷歌翻译
为了克服拓扑限制并提高常规流量架构,吴,K \“ohler和No \'e的表达性引入了随机采样方法的随机标准化流程,该流程与随机取样方法相结合的确定性,可学习的流动变换。在本文中,我们考虑随机标准化流量一个马尔可夫链的观点。特别是,我们通过马尔可夫内核替换过渡密度,并通过氡-Nikodym衍生物建立证据,允许以声音方式结合没有密度的分布。此外,我们概括了从后部分布中抽样的结果逆问题所需。通过数值实施例证明了所提出的条件随机标准化流程的性能。
translated by 谷歌翻译
仅使用少量数据学习神经网络是一个重要的研究主题,具有巨大的应用潜力。在本文中,我们介绍了基于归一化流量的成像中反问题的变异建模的常规化器。我们的常规器称为PatchNR,涉及在很少的图像的贴片上学习的正常流。特别是,培训独立于考虑的逆问题,因此可以将相同的正规化程序用于在同一类图像上作用的不同前向操作员。通过研究斑块的分布与整个图像类别的分布,我们证明我们的变分模型确实是一种地图方法。如果有其他监督信息,我们的模型可以推广到有条件的补丁。材料图像和低剂量或限量角度计算机断层扫描(CT)的层分辨率的数值示例表明,我们的方法在具有相似假设的方法之间提供了高质量的结果,但仅需要很少的数据。
translated by 谷歌翻译
Normalizing Flows are generative models which produce tractable distributions where both sampling and density evaluation can be efficient and exact. The goal of this survey article is to give a coherent and comprehensive review of the literature around the construction and use of Normalizing Flows for distribution learning. We aim to provide context and explanation of the models, review current state-of-the-art literature, and identify open questions and promising future directions.
translated by 谷歌翻译
我们研究了由覆盖在R ^ M中的N维歧管支持的概率措施的近似 - 由可逆流和单层注射部件组成的神经网络。当M <= 3N时,我们显示R ^ n和r ^ m之间的注射流量在可扩展的嵌入物图像中支持的普遍近似措施,这是标准嵌入的适当子集。在这个制度拓扑障碍物中,拓扑障碍能够作为可允许的目标。当m> = 3n + 1时,我们使用称为*清洁技巧*的代数拓扑的论点来证明拓扑障碍物消失和注射般的流动普遍近似任何可分辨率的嵌入。沿途,我们表明,可以在Brehmer et Cranmer 2020中的猜想中建立“反向”可以建立铭刻流动网络的最优性。此外,设计的网络可以简单,它们可以配备其他属性,例如一个新的投影结果。
translated by 谷歌翻译
在概率密度范围内相对于Wassersein度量的空间的梯度流程通常具有很好的特性,并且已在几种机器学习应用中使用。计算Wasserstein梯度流量的标准方法是有限差异,使网格上的基础空间离散,并且不可扩展。在这项工作中,我们提出了一种可扩展的近端梯度型算法,用于Wassersein梯度流。我们的方法的关键是目标函数的变分形式,这使得可以通过引流 - 双重优化实现JKO近端地图。可以通过替代地更新内部和外环中的参数来有效地解决该原始问题。我们的框架涵盖了包括热方程和多孔介质方程的所有经典Wasserstein梯度流。我们展示了若干数值示例的算法的性能和可扩展性。
translated by 谷歌翻译
Normalizing flows provide a general mechanism for defining expressive probability distributions, only requiring the specification of a (usually simple) base distribution and a series of bijective transformations. There has been much recent work on normalizing flows, ranging from improving their expressive power to expanding their application. We believe the field has now matured and is in need of a unified perspective. In this review, we attempt to provide such a perspective by describing flows through the lens of probabilistic modeling and inference. We place special emphasis on the fundamental principles of flow design, and discuss foundational topics such as expressive power and computational trade-offs. We also broaden the conceptual framing of flows by relating them to more general probability transformations. Lastly, we summarize the use of flows for tasks such as generative modeling, approximate inference, and supervised learning.
translated by 谷歌翻译
Normalizing flow is a class of deep generative models for efficient sampling and density estimation. In practice, the flow often appears as a chain of invertible neural network blocks; to facilitate training, existing works have regularized flow trajectories and designed special network architectures. The current paper develops a neural ODE flow network inspired by the Jordan-Kinderleherer-Otto (JKO) scheme, which allows efficient block-wise training of the residual blocks and avoids inner loops of score matching or variational learning. As the JKO scheme unfolds the dynamic of gradient flow, the proposed model naturally stacks residual network blocks one-by-one, reducing the memory load and difficulty of performing end-to-end training of deep flow networks. We also develop adaptive time reparameterization of the flow network with a progressive refinement of the trajectory in probability space, which improves the model training efficiency and accuracy in practice. Using numerical experiments with synthetic and real data, we show that the proposed JKO-iFlow model achieves similar or better performance in generating new samples compared with existing flow and diffusion models at a significantly reduced computational and memory cost.
translated by 谷歌翻译
度量的运输提供了一种用于建模复杂概率分布的多功能方法,并具有密度估计,贝叶斯推理,生成建模及其他方法的应用。单调三角传输地图$ \ unicode {x2014} $近似值$ \ unicode {x2013} $ rosenblatt(kr)重新安排$ \ unicode {x2014} $是这些任务的规范选择。然而,此类地图的表示和参数化对它们的一般性和表现力以及对从数据学习地图学习(例如,通过最大似然估计)出现的优化问题的属性产生了重大影响。我们提出了一个通用框架,用于通过平滑函数的可逆变换来表示单调三角图。我们建立了有关转化的条件,以使相关的无限维度最小化问题没有伪造的局部最小值,即所有局部最小值都是全球最小值。我们展示了满足某些尾巴条件的目标分布,唯一的全局最小化器与KR地图相对应。鉴于来自目标的样品,我们提出了一种自适应算法,该算法估计了基础KR映射的稀疏半参数近似。我们证明了如何将该框架应用于关节和条件密度估计,无可能的推断以及有向图形模型的结构学习,并在一系列样本量之间具有稳定的概括性能。
translated by 谷歌翻译
We show that standard ResNet architectures can be made invertible, allowing the same model to be used for classification, density estimation, and generation. Typically, enforcing invertibility requires partitioning dimensions or restricting network architectures. In contrast, our approach only requires adding a simple normalization step during training, already available in standard frameworks. Invertible ResNets define a generative model which can be trained by maximum likelihood on unlabeled data. To compute likelihoods, we introduce a tractable approximation to the Jacobian log-determinant of a residual block. Our empirical evaluation shows that invertible ResNets perform competitively with both stateof-the-art image classifiers and flow-based generative models, something that has not been previously achieved with a single architecture.
translated by 谷歌翻译
正常化流动在过去几年中已经变得更加流行;然而,他们继续计算得昂贵,使得它们难以被接受到更广泛的机器学习界中。在本文中,我们介绍了一个简单的一维一层网络,其封闭形式的Lipschitz常数;使用此,我们介绍了一种新的精确嘴唇流(ELF),这些流量(ELF)结合了剩余流量的易于采样,并具有自回归流的强烈性能。此外,我们表明,与多个其他流相比,ELF被证明是通用密度近似器,更新和参数有效,并且在多个大规模数据集上实现最先进的性能。
translated by 谷歌翻译
三角形流量,也称为kn \“{o}的Rosenblatt测量耦合,包括用于生成建模和密度估计的归一化流模型的重要构建块,包括诸如实值的非体积保存变换模型的流行自回归流模型(真实的NVP)。我们提出了三角形流量统计模型的统计保证和样本复杂性界限。特别是,我们建立了KN的统计一致性和kullback-leibler估算器的rospblatt的kullback-leibler估计的有限样本会聚率使用实证过程理论的工具测量耦合。我们的结果突出了三角形流动下播放功能类的各向异性几何形状,优化坐标排序,并导致雅各比比流动的统计保证。我们对合成数据进行数值实验,以说明我们理论发现的实际意义。
translated by 谷歌翻译
The modeling of probability distributions, specifically generative modeling and density estimation, has become an immensely popular subject in recent years by virtue of its outstanding performance on sophisticated data such as images and texts. Nevertheless, a theoretical understanding of its success is still incomplete. One mystery is the paradox between memorization and generalization: In theory, the model is trained to be exactly the same as the empirical distribution of the finite samples, whereas in practice, the trained model can generate new samples or estimate the likelihood of unseen samples. Likewise, the overwhelming diversity of distribution learning models calls for a unified perspective on this subject. This paper provides a mathematical framework such that all the well-known models can be derived based on simple principles. To demonstrate its efficacy, we present a survey of our results on the approximation error, training error and generalization error of these models, which can all be established based on this framework. In particular, the aforementioned paradox is resolved by proving that these models enjoy implicit regularization during training, so that the generalization error at early-stopping avoids the curse of dimensionality. Furthermore, we provide some new results on landscape analysis and the mode collapse phenomenon.
translated by 谷歌翻译
我们介绍了用于生成建模的广义能量模型(GEBM)。这些模型组合了两个训练有素的组件:基本分布(通常是隐式模型),可以在高维空间中学习具有低固有尺寸的数据的支持;和能量功能,优化学习支持的概率质量。能量函数和基座都共同构成了最终模型,与GANS不同,它仅保留基本分布(“发电机”)。通过在学习能量和基础之间交替进行培训GEBMS。我们表明,两种培训阶段都明确定义:通过最大化广义可能性来学习能量,并且由此产生的能源的损失提供了学习基础的信息梯度。可以通过MCMC获得来自训练模型的潜在空间的后部的样品,从而在该空间中找到产生更好的质量样本的区域。经验上,图像生成任务上的GEBM样本比来自学习发电机的图像更好,表明所有其他相同,GEBM将优于同样复杂性的GAN。当使用归一化流作为基础测量时,GEBMS成功地启动密度建模任务,返回相当的性能以直接相同网络的最大可能性。
translated by 谷歌翻译
近年来,深度学习在图像重建方面取得了显着的经验成功。这已经促进了对关键用例中数据驱动方法的正确性和可靠性的精确表征的持续追求,例如在医学成像中。尽管基于深度学习的方法具有出色的性能和功效,但对其稳定性或缺乏稳定性的关注以及严重的实际含义。近年来,已经取得了重大进展,以揭示数据驱动的图像恢复方法的内部运作,从而挑战了其广泛认为的黑盒本质。在本文中,我们将为数据驱动的图像重建指定相关的融合概念,该概念将构成具有数学上严格重建保证的学习方法调查的基础。强调的一个例子是ICNN的作用,提供了将深度学习的力量与经典凸正则化理论相结合的可能性,用于设计被证明是融合的方法。这篇调查文章旨在通过提供对数据驱动的图像重建方法以及从业人员的理解,旨在通过提供可访问的融合概念的描述,并通过将一些现有的经验实践放在可靠的数学上,来推进我们对数据驱动图像重建方法的理解以及从业人员的了解。基础。
translated by 谷歌翻译
对复杂模型执行精确的贝叶斯推理是计算的难治性的。马尔可夫链蒙特卡罗(MCMC)算法可以提供后部分布的可靠近似,但对于大型数据集和高维模型昂贵。减轻这种复杂性的标准方法包括使用子采样技术或在群集中分发数据。然而,这些方法通常在高维方案中不可靠。我们在此处专注于最近的替代类别的MCMC方案,利用类似于乘客(ADMM)优化算法的庆祝交替方向使用的分裂策略。这些方法似乎提供了凭经验最先进的性能,但其高维层的理论行为目前未知。在本文中,我们提出了一个详细的理论研究,该算法之一称为分裂Gibbs采样器。在规律条件下,我们使用RICCI曲率和耦合思路为此方案建立了明确的收敛速率。我们以数字插图支持我们的理论。
translated by 谷歌翻译
我们提出了一个利用归一化流的拓扑非平凡流形的学习概率分布的框架。当前的方法集中在对欧几里得空间同质形态的流形上,在学习模型上执行强大的结构先验或不容易扩展到高维度的操作。相比之下,我们的方法通过将多个局部模型“粘合”一起学习数据歧管上的分布,从而定义了数据歧管的开放覆盖。我们证明了我们的方法在已知流形的合成数据以及未知拓扑的较高维歧管上的效率,在许多任务中,我们的方法在许多任务中表现出更好的样品效率和竞争性或优越的性能。
translated by 谷歌翻译
Inverse medium scattering solvers generally reconstruct a single solution without an associated measure of uncertainty. This is true both for the classical iterative solvers and for the emerging deep learning methods. But ill-posedness and noise can make this single estimate inaccurate or misleading. While deep networks such as conditional normalizing flows can be used to sample posteriors in inverse problems, they often yield low-quality samples and uncertainty estimates. In this paper, we propose U-Flow, a Bayesian U-Net based on conditional normalizing flows, which generates high-quality posterior samples and estimates physically-meaningful uncertainty. We show that the proposed model significantly outperforms the recent normalizing flows in terms of posterior sample quality while having comparable performance with the U-Net in point estimation.
translated by 谷歌翻译
For distributions $\mathbb{P}$ and $\mathbb{Q}$ with different supports or undefined densities, the divergence $\textrm{D}(\mathbb{P}||\mathbb{Q})$ may not exist. We define a Spread Divergence $\tilde{\textrm{D}}(\mathbb{P}||\mathbb{Q})$ on modified $\mathbb{P}$ and $\mathbb{Q}$ and describe sufficient conditions for the existence of such a divergence. We demonstrate how to maximize the discriminatory power of a given divergence by parameterizing and learning the spread. We also give examples of using a Spread Divergence to train implicit generative models, including linear models (Independent Components Analysis) and non-linear models (Deep Generative Networks).
translated by 谷歌翻译