We propose the Wasserstein Auto-Encoder (WAE)-a new algorithm for building a gen-erative model of the data distribution. WAE minimizes a penalized form of the Wasserstein distance between the model distribution and the target distribution, which leads to a different regularizer than the one used by the Variational Auto-Encoder (VAE) [1]. This regularizer encourages the encoded training distribution to match the prior. We compare our algorithm with several other techniques and show that it is a generalization of adversarial auto-encoders (AAE) [2]. Our experiments show that WAE shares many of the properties of VAEs (sta-ble training, encoder-decoder architecture, nice latent manifold structure) while generating samples of better quality, as measured by the FID score.
translated by 谷歌翻译
A new form of variational autoencoder (VAE) is developed, in which the jointdistribution of data and codes is considered in two (symmetric) forms: ($i$)from observed data fed through the encoder to yield codes, and ($ii$) fromlatent codes drawn from a simple prior and propagated through the decoder tomanifest data. Lower bounds are learned for marginal log-likelihood fitsobserved data and latent codes. When learning with the variational bound, oneseeks to minimize the symmetric Kullback-Leibler divergence of joint densityfunctions from ($i$) and ($ii$), while simultaneously seeking to maximize thetwo marginal log-likelihoods. To facilitate learning, a new form of adversarialtraining is developed. An extensive set of experiments is performed, in whichwe demonstrate state-of-the-art data reconstruction and generation on severalimage benchmark datasets.
translated by 谷歌翻译
We define and address the problem of unsuper-vised learning of disentangled representations on data generated from independent factors of variation. We propose FactorVAE, a method that disentangles by encouraging the distribution of representations to be factorial and hence independent across the dimensions. We show that it improves upon β-VAE by providing a better trade-off between disentanglement and reconstruction quality. Moreover, we highlight the problems of a commonly used disentanglement metric and introduce a new metric that does not suffer from them.
translated by 谷歌翻译
Generative adversarial networks (GANs) are a family of generative models that do not minimize a single training criterion. Unlike other generative models, the data distribution is learned via a game between a generator (the generative model) and a discriminator (a teacher providing training signal) that each minimize their own cost. GANs are designed to reach a Nash equilibrium at which each player cannot reduce their cost without changing the other players' parameters. One useful approach for the theory of GANs is to show that a divergence between the training distribution and the model distribution obtains its minimum value at equilibrium. Several recent research directions have been motivated by the idea that this divergence is the primary guide for the learning process and that every step of learning should decrease the divergence. We show that this view is overly restrictive. During GAN training, the discriminator provides learning signal in situations where the gradients of the divergences between distributions would not be useful. We provide empirical counterexamples to the view of GAN training as divergence minimization. Specifically, we demonstrate that GANs are able to learn distributions in situations where the divergence minimization point of view predicts they would fail. We also show that gradient penalties motivated from the divergence minimization perspective are equally helpful when applied in other contexts in which the divergence minimization perspective does not predict they would be helpful. This contributes to a growing body of evidence that GAN training may be more usefully viewed as approaching Nash equilibria via trajectories that do not necessarily minimize a specific divergence at each step.
translated by 谷歌翻译
我们介绍了对抗性学习推理(ALI)模型,该模型使用对抗过程共同学习生成网络和推理网络。生成网络将来自随机潜变量的样本映射到数据空间,而推理网络将数据空间中的训练示例映射到潜在变量的空间。在这两个网络之间投射对抗性游戏,并且训练辨别网络以区分来自生成网络的联合潜在/数据空间样本和来自推理网络的联合样本。我们通过模型样本和重建的检验来说明模型容忍相互相干推理和生成网络的能力,并通过在半监督的SVHN和CIFAR10任务中获得与现有技术竞争的性能来确认学习表示的有用性。
translated by 谷歌翻译
深度生成模型为复杂流形的分布提供了强大的工具,例如自然图像。但是这些方法中的许多方法,包括生成对抗网络(GAN),可能很难完成,部分原因是它们容易出现模式崩溃,这意味着它们只能表征真实分布的几种模式。为了解决这个问题,我们引入了具有重构网络的VEEGAN,通过从数据到噪声的映射来逆转生成器的动作。我们的训练目标是保留GAN的原始渐近一致性保证,并且可以解释为一种新的自动编码器损失噪声。与传统的数据点上的传统编码器形成鲜明对比的是,VEEGAN不需要在数据上指定丢失函数,而只需要在表示中指定,这些表示通过假设是标准的。在广泛的合成和真实世界图像数据集上,VEEGAN确实能够抵抗模式崩溃,远远超过其他最近的GAN变体,并产生更逼真的样本。
translated by 谷歌翻译
Recent work in unsupervised representation learning has focused on learning deep directed latent-variable models. Fitting these models by maximizing the marginal likelihood or evidence is typically intractable, thus a common approximation is to maximize the evidence lower bound (ELBO) instead. However, maximum likelihood training (whether exact or approximate) does not necessarily result in a good latent representation, as we demonstrate both theoretically and empirically. In particular, we derive variational lower and upper bounds on the mutual information between the input and the latent variable, and use these bounds to derive a rate-distortion curve that characterizes the tradeoff between compression and reconstruction accuracy. Using this framework, we demonstrate that there is a family of models with identical ELBO, but different quantitative and qualitative characteristics. Our framework also suggests a simple new method to ensure that latent variable models with powerful stochastic decoders do not ignore their latent code.
translated by 谷歌翻译
变分自动编码器(VAE)是表达性潜变量模型,可用于从训练数据中学习复杂的概率分布。然而,所得模型的质量至关重要地依赖于推理模型的表现。我们介绍了Adversarial VariationalBayes(AVB),这是一种使用任意表达式推理模型训练变分自动编码器的技术。我们通过引入辅助判别网络来实现这一目标,该网络允许重新解释双人游戏的最大似然问题,从而在VAE和生成对抗网络(GAN)之间建立原则连接。我们证明了在非参数极限中,我们的方法为生成模型的参数提供了精确的最大似然赋值,以及给出观察的潜在变量的精确后验分布。与将VAE与GAN相结合的竞争方法相反,我们的方法具有明确的理论上的理由,保留了标准变分自动编码器的大部分优点并且易于实现。
translated by 谷歌翻译
我们介绍了自回归隐式分位数网络(AIQN),这是一种与普遍使用的生成不同的生成建模方法,它们使用分位数回归隐式捕获分布。 AIQN能够实现卓越的感知质量和评估度量的改进,而不会导致样本多样性的损失。该方法可以应用于许多现有模型和体系结构。在这项工作中,我们使用AIQN扩展了PixelCNN模型,并使用感知分数,FID,非樱桃采样样本和修复结果展示了CIFAR-10和ImageNet上的结果。我们一致地观察到AIQN产生了一种高度稳定的算法,可以提高感知质量,同时保持高度多样化的分布。
translated by 谷歌翻译
我们提出了一种称为成对增强GAN的新型自动编码模型。共同地以对抗方式弄湿发电机和编码器。生成器网络学习对真实对象进行采样。反过来,编码网络同时被训练以将真实数据分布映射到潜在空间中的先验。为了确保良好的重建,我们引入了一种综合的对抗性重建损失。在这里,我们训练一个鉴别器来区分两种类型的对:一个具有增强的对象和一个具有重建的对象。我们表明,这种对抗性损失会根据内容而不是完全匹配来比较对象。我们通过实验证明我们的模型使用最先进的数据集MNIST,CIFAR10,CelebA生成质量竞争的样本和重建,并在CIFAR10上获得了良好的定量结果。
translated by 谷歌翻译
VAE要求标准高斯分布作为潜在空间中的先验。由于所有代码往往遵循相同的先验,它经常遭受所谓的“后塌陷”。为避免这种情况,本文介绍了潜在代码的类特定分布。但与CVAE不同,我们提出了一种方法,用于将潜在空间解开为标签相关和相关维度,$ \ bm {\ mathrm {z}} _ s $和$ \ bm {\ mathrm {z}} _ u $,用于单个输入。我们应用两个独立的编码器将输入分别映射到$ \ bm {\ mathrm {z}} _ s $和$ \ bm {\ mathrm {z}} _ u $,然后将连接的代码提供给解码器以重建输入。标签无关代码$ \ bm {\ mathrm {z}} _ u $表示所有输入的共同特征,因此它们受标准高斯约束,并且它们的编码器以摊销的变分推理方式(如VAE)进行训练。虽然假设$ \ bm {\ mathrm {z}} _ s $遵循高斯混合分布,其中每个分量对应于特定类。 $ \ bm {\ mathrm {z}} _ s $ encoder中的高斯分量参数由labelsupervision以全局随机方式进行优化。理论上,我们表明我们的方法实际上相当于在$ \ bm {\ mathrm {z}} _ s $和类标签$ c $的联合分布上添加KL分歧项,它可以直接增加$之间的相关信息。 \ bm {\ mathrm {z}} _ s $和标签$ c $。我们的模型还可以通过在像素域中添加鉴别器来扩展到GAN,从而生成高质量和多样化的图像。
translated by 谷歌翻译
生成模型,特别是生成性对抗网络(GAN),最近引起了人们的极大关注。已经提出了许多GAN变体并且已经在许多应用中使用。尽管理论上取得了很大进步,但评估和比较GAN仍然是一项艰巨的任务。虽然已经引入了几项措施,但目前尚未就哪种措施最好地捕捉模型的优势和局限性以及应该用于公平模型比较达成共识。与计算机视觉和机器学习的其他领域一样,关键是要采取一项或几项措施来指导这一领域的进展。在本文中,我回顾并批判性地讨论了超过24种定量模型的定量和5种定性测量方法,特别强调了GAN衍生模型。我还提供了一组7个需求,然后评估了agiven测量或一系列测量与他们兼容。
translated by 谷歌翻译
生成性对抗网络(GAN)在机器学习领域受到广泛关注,因为它们有可能学习高维,复杂的数据分布。具体而言,它们不依赖于关于分布的任何假设,并且可以以简单的方式从潜在空间生成真实样本。这种强大的属性使GAN可以应用于各种应用,如图像合成,图像属性编辑,图像翻译,领域适应和其他学术领域。在本文中,我们的目的是为那些熟悉的读者讨论GAN的细节,但不要深入理解GAN或者希望从各个角度看待GAN。另外,我们解释了GAN如何运作以及最近提出的各种目标函数的基本含义。然后,我们将重点放在如何将GAN与自动编码器框架相结合。最后,我们列举了适用于各种任务和其他领域的GAN变体,适用于那些有兴趣利用GAN进行研究的人。
translated by 谷歌翻译
Adversarial learning of probabilistic models has recently emerged as a promising alternative to maximum likelihood. Implicit models such as generative adversarial networks (GAN) often generate better samples compared to explicit models trained by maximum likelihood. Yet, GANs sidestep the characterization of an explicit density which makes quantitative evaluations challenging. To bridge this gap, we propose Flow-GANs, a generative adversarial network for which we can perform exact likelihood evaluation, thus supporting both adversarial and maximum likelihood training. When trained adversarially, Flow-GANs generate high-quality samples but attain extremely poor log-likelihood scores, inferior even to a mixture model memorizing the training data; the opposite is true when trained by maximum likelihood. Results on MNIST and CIFAR-10 demonstrate that hybrid training can attain high held-out likelihoods while retaining visual fidelity in the generated samples.
translated by 谷歌翻译
Recent progress in variational inference has paid much attention to the flexibility of variational posteriors. One promising direction is to use implicit distributions, i.e., distributions without tractable densities as the variational posterior. However, existing methods on implicit posteriors still face challenges of noisy estimation and computational infeasibility when applied to models with high-dimensional latent variables. In this paper, we present a new approach named Kernel Implicit Variational Inference that addresses these challenges. As far as we know, for the first time implicit variational inference is successfully applied to Bayesian neural networks, which shows promising results on both regression and classification tasks.
translated by 谷歌翻译
We propose a simple, tractable lower bound on the mutual information contained in the joint generative density of any latent variable generative model: the GILBO (Generative Information Lower BOund). It offers a data-independent measure of the complexity of the learned latent variable description, giving the log of the effective description length. It is well-defined for both VAEs and GANs. We compute the GILBO for 800 GANs and VAEs each trained on four datasets (MNIST, FashionMNIST, CIFAR-10 and CelebA) and discuss the results.
translated by 谷歌翻译
我们提出了一种概率深度学习方法,可以构建随机系统的预测数据驱动的替代方法。利用隐式分布的变分推理的最新进展,我们提出了一个统计推断框架,可以对配对输入的替代模型进行端到端的训练。输出观察可能是随机的,源于可变保真度的不同信息源,或者被复杂的噪声过程破坏。由此产生的结果可以容纳高维输入和输出,并且能够以量化的不确定性返回预测。通过一系列规范研究证明了我们的方法的有效性,包括噪声数据的回归,随机过程的多保真建模和高维动力系统中的不确定性传播。
translated by 谷歌翻译
本文的主要思想是探索从神经网络生成样本的可能性,主要集中在灰度图像的颜色化。我将比较现有的着色方法,并探索使用新的生成建模来完成着色任务的可能性。本文的贡献是将现有结构与相似的生成结构(解码器)进行比较,并应用包括条件VAE(CVAE),带有梯度惩罚的条件Wasserstein GAN(CWGAN-GP),具有L1重建损失的CWGAN-GP,AdversarialGenerative Encoders的小说结构。 (年龄)和内省VAE(IVAE)。我使用CIFAR-10图像训练了这些模型。为了测量性能,我使用初始分数(IS)来测量每个图像的独特性以及CIFAR-10图像的整体样本和人眼的多样性。事实证明,具有L1重建损失和IVAE的CVAE在IS中获得最高分。 CWGAN-GP与L1的学习速度比CWGAN-GP快,但IS不会从CWGAN-GP增加.CWGAN-GP往往会产生比使用重构损失的其他模型更多样化的图像。此外,我发现适当的正则化在生成建模中起着重要作用。
translated by 谷歌翻译
从视觉数据中学习解开的表示,其中不同的高级生成因子被独立编码,对于许多计算机视觉任务是重要的。然而,解决这个问题通常需要明确标记训练图像中所有感兴趣的因素。为了减少注释成本,我们引入了一个学习环境,我们将其视为“基于参考的解开”。给定一组未标记的图像,目标是学习一组目标因素与其他因素相关联的表示。唯一的监督来自一个辅助的“参考集”,其中包含感兴趣的因子是不变的图像。为了解决这个问题,我们提出了基于参考的变换算法,这是一种新的深度生成模型,旨在利用参考文献提供的弱监督。组。通过解决诸如特征学习,条件图像生成或属性转移等任务,我们验证了所提出的模型从这种最小形式的监督中学习解开的代表的能力。
translated by 谷歌翻译
在深度学习成功的基础上,学习观测数据的概率模型的两种现代方法是生成性对抗网络(GAN)和变分自动编码器(VAE)。 VAE考虑数据的显式概率模型,并通过最大化对数似然函数的变分下界来计算生成分布。然而,GAN通过最小化观察到的和生成的概率分布之间的距离来计算生成模型,而不考虑观察数据的显式模型。 GAN中没有明确的概率模型,在其框架中计算样本可能性,并限制其在统计推断问题中的使用。在这项工作中,我们表明,具有熵正则化的最优运输GAN可以被视为最大化平均样本可能性的下限的年龄模型,这是VAE基于的方法。特别是,我们的证明为GAN构建了一个明确的概率模型,可用于计算GAN框架内的似然统计。我们在几个数据集上的数值结果证明了与所提出的理论一致的趋势。
translated by 谷歌翻译