最近,在深度生成模型中,不可能是非线性ICA的可识别性的文艺复兴。对于i.I.D.数据,先前的作品已经假定访问足够丰富的辅助观察集,表示$ \ mathbf {u} $。我们在这里展示了在没有这种侧面信息的情况下可以获得可识别性。以前的方法必须制定强烈的假设,以获得可识别的模型。在这里,我们在一组宽松的约束集中获得了经验识别的模型。特别是,我们专注于在其潜在空间中执行聚类的生成模型 - 一种匹配以前可识别模型的模型结构,而是使用学习群集提供辅助信息的合成形式。我们评估我们的提案,包括通过统计测试,并发现学习群集有效功能:具有潜在群集的深度生成模型是经验识别的,与依赖侧面信息的模型相同。
translated by 谷歌翻译
The framework of variational autoencoders allows us to efficiently learn deep latent-variable models, such that the model's marginal distribution over observed variables fits the data. Often, we're interested in going a step further, and want to approximate the true joint distribution over observed and latent variables, including the true prior and posterior distributions over latent variables. This is known to be generally impossible due to unidentifiability of the model. We address this issue by showing that for a broad family of deep latentvariable models, identification of the true joint distribution over observed and latent variables is actually possible up to very simple transformations, thus achieving a principled and powerful form of disentanglement. Our result requires a factorized prior distribution over the latent variables that is conditioned on an additionally observed variable, such as a class label or almost any other observation. We build on recent developments in nonlinear ICA, which we extend to the case with noisy or undercomplete observations, integrated in a maximum likelihood framework. The result also trivially contains identifiable flow-based generative models as a special case.
translated by 谷歌翻译
我们证明了(a)具有通用近似功能的广泛的深层变量模型的可识别性,并且(b)是通常在实践中使用的变异自动编码器的解码器。与现有工作不同,我们的分析不需要弱监督,辅助信息或潜在空间中的条件。最近,研究了此类模型的可识别性。在这些作品中,主要的假设是,还可以观察到辅助变量$ u $(也称为侧面信息)。同时,几项作品从经验上观察到,这在实践中似乎并不是必需的。在这项工作中,我们通过证明具有通用近似功能的广泛生成(即无监督的)模型来解释这种行为,无需侧面信息$ u $:我们证明了整个生成模型的可识别性$ u $,仅观察数据$ x $。我们考虑的模型与实践中使用的自动编码器体系结构紧密连接,该体系结构利用了潜在空间中的混合先验和编码器中的Relu/Leaky-Relu激活。我们的主要结果是可识别性层次结构,该层次结构显着概括了先前的工作,并揭示了不同的假设如何导致可识别性的“优势”不同。例如,我们最薄弱的结果确定了(无监督的)可识别性,直到仿射转换已经改善了现有工作。众所周知,这些模型具有通用近似功能,而且它们已被广泛用于实践中来学习数据表示。
translated by 谷歌翻译
The key idea behind the unsupervised learning of disentangled representations is that real-world data is generated by a few explanatory factors of variation which can be recovered by unsupervised learning algorithms. In this paper, we provide a sober look at recent progress in the field and challenge some common assumptions. We first theoretically show that the unsupervised learning of disentangled representations is fundamentally impossible without inductive biases on both the models and the data. Then, we train more than 12 000 models covering most prominent methods and evaluation metrics in a reproducible large-scale experimental study on seven different data sets. We observe that while the different methods successfully enforce properties "encouraged" by the corresponding losses, well-disentangled models seemingly cannot be identified without supervision. Furthermore, increased disentanglement does not seem to lead to a decreased sample complexity of learning for downstream tasks. Our results suggest that future work on disentanglement learning should be explicit about the role of inductive biases and (implicit) supervision, investigate concrete benefits of enforcing disentanglement of the learned representations, and consider a reproducible experimental setup covering several data sets.
translated by 谷歌翻译
We define and address the problem of unsupervised learning of disentangled representations on data generated from independent factors of variation. We propose FactorVAE, a method that disentangles by encouraging the distribution of representations to be factorial and hence independent across the dimensions. We show that it improves upon β-VAE by providing a better trade-off between disentanglement and reconstruction quality. Moreover, we highlight the problems of a commonly used disentanglement metric and introduce a new metric that does not suffer from them.
translated by 谷歌翻译
We present a principled approach to incorporating labels in VAEs that captures the rich characteristic information associated with those labels. While prior work has typically conflated these by learning latent variables that directly correspond to label values, we argue this is contrary to the intended effect of supervision in VAEs-capturing rich label characteristics with the latents. For example, we may want to capture the characteristics of a face that make it look young, rather than just the age of the person. To this end, we develop the CCVAE, a novel VAE model and concomitant variational objective which captures label characteristics explicitly in the latent space, eschewing direct correspondences between label values and latents. Through judicious structuring of mappings between such characteristic latents and labels, we show that the CCVAE can effectively learn meaningful representations of the characteristics of interest across a variety of supervision schemes. In particular, we show that the CCVAE allows for more effective and more general interventions to be performed, such as smooth traversals within the characteristics for a given label, diverse conditional generation, and transferring characteristics across datapoints.
translated by 谷歌翻译
The ever-increasing size of modern data sets combined with the difficulty of obtaining label information has made semi-supervised learning one of the problems of significant practical importance in modern data analysis. We revisit the approach to semi-supervised learning with generative models and develop new models that allow for effective generalisation from small labelled data sets to large unlabelled ones. Generative approaches have thus far been either inflexible, inefficient or non-scalable. We show that deep generative models and approximate Bayesian inference exploiting recent advances in variational methods can be used to provide significant improvements, making generative approaches highly competitive for semi-supervised learning.
translated by 谷歌翻译
Variational autoencoders model high-dimensional data by positing low-dimensional latent variables that are mapped through a flexible distribution parametrized by a neural network. Unfortunately, variational autoencoders often suffer from posterior collapse: the posterior of the latent variables is equal to its prior, rendering the variational autoencoder useless as a means to produce meaningful representations. Existing approaches to posterior collapse often attribute it to the use of neural networks or optimization issues due to variational approximation. In this paper, we consider posterior collapse as a problem of latent variable non-identifiability. We prove that the posterior collapses if and only if the latent variables are non-identifiable in the generative model. This fact implies that posterior collapse is not a phenomenon specific to the use of flexible distributions or approximate inference. Rather, it can occur in classical probabilistic models even with exact inference, which we also demonstrate. Based on these results, we propose a class of latent-identifiable variational autoencoders, deep generative models which enforce identifiability without sacrificing flexibility. This model class resolves the problem of latent variable non-identifiability by leveraging bijective Brenier maps and parameterizing them with input convex neural networks, without special variational inference objectives or optimization tricks. Across synthetic and real datasets, latent-identifiable variational autoencoders outperform existing methods in mitigating posterior collapse and providing meaningful representations of the data.
translated by 谷歌翻译
Multimodal VAEs seek to model the joint distribution over heterogeneous data (e.g.\ vision, language), whilst also capturing a shared representation across such modalities. Prior work has typically combined information from the modalities by reconciling idiosyncratic representations directly in the recognition model through explicit products, mixtures, or other such factorisations. Here we introduce a novel alternative, the MEME, that avoids such explicit combinations by repurposing semi-supervised VAEs to combine information between modalities implicitly through mutual supervision. This formulation naturally allows learning from partially-observed data where some modalities can be entirely missing -- something that most existing approaches either cannot handle, or do so to a limited extent. We demonstrate that MEME outperforms baselines on standard metrics across both partial and complete observation schemes on the MNIST-SVHN (image-image) and CUB (image-text) datasets. We also contrast the quality of the representations learnt by mutual supervision against standard approaches and observe interesting trends in its ability to capture relatedness between data.
translated by 谷歌翻译
基于连续的潜在空间(例如变异自动编码器)的概率模型可以理解为无数混合模型,其中组件连续取决于潜在代码。它们具有用于生成和概率建模的表达性工具,但与可牵引的概率推断不符,即计算代表概率分布的边际和条件。同时,可以将概率模型(例如概率电路(PC))理解为层次离散混合模型,从而使它们可以执行精确的推断,但是与连续的潜在空间模型相比,它们通常显示出低于标准的性能。在本文中,我们研究了一种混合方法,即具有较小潜在尺寸的可拖动模型的连续混合物。尽管这些模型在分析上是棘手的,但基于一组有限的集成点,它们非常适合数值集成方案。有足够数量的集成点,近似值变得精确。此外,使用一组有限的集成点,可以将近似方法编译成PC中,以“在近似模型中的精确推断”执行。在实验中,我们表明这种简单的方案被证明非常有效,因为PC在许多标准密度估计基准上以这种方式为可拖动模型设定了新的最新模型。
translated by 谷歌翻译
提出了一种新的双峰生成模型,用于生成条件样品和关节样品,并采用学习简洁的瓶颈表示的训练方法。所提出的模型被称为变异Wyner模型,是基于网络信息理论中的两个经典问题(分布式仿真和信道综合)设计的,其中Wyner的共同信息是对公共表示简洁性的基本限制。该模型是通过最大程度地减少对称的kullback的训练 - 差异 - 变异分布和模型分布之间具有正则化项,用于常见信息,重建一致性和潜在空间匹配项,该术语是通过对逆密度比率估计技术进行的。通过与合成和现实世界数据集的联合和有条件生成的实验以及具有挑战性的零照片图像检索任务,证明了所提出的方法的实用性。
translated by 谷歌翻译
深度学习在学习高维数据的低维表示方面取得了巨大的成功。如果在感兴趣的数据中没有隐藏的低维结构,那么这一成功将是不可能的。这种存在是由歧管假设提出的,该假设指出数据在于固有维度低的未知流形。在本文中,我们认为该假设无法正确捕获数据中通常存在的低维结构。假设数据在于单个流形意味着整个数据空间的内在维度相同,并且不允许该空间的子区域具有不同数量的变异因素。为了解决这一缺陷,我们提出了多种假设的结合,该假设适应了非恒定固有维度的存在。我们从经验上验证了在常用图像数据集上的这一假设,发现确实应该允许内在维度变化。我们还表明,具有较高内在维度的类更难分类,以及如何使用这种见解来提高分类精度。然后,我们将注意力转移到该假设的影响下,在深层生成模型(DGM)的背景下。当前的大多数DGM都难以建模具有几个连接组件和/或不同固有维度的数据集建模。为了解决这些缺点,我们提出了群集的DGM,首先将数据聚集,然后在每个群集上训练DGM。我们表明,聚类的DGM可以模拟具有不同固有维度的多个连接组件,并在没有增加计算要求的情况下经验优于其非簇的非群体。
translated by 谷歌翻译
带有变异自动编码器(VAE)的学习分解表示通常归因于损失的正则化部分。在这项工作中,我们强调了数据与损失的重建项之间的相互作用,这是VAE中解散的主要贡献者。我们注意到,标准化的基准数据集的构建方式有利于学习似乎是分解的表示形式。我们设计了一个直观的对抗数据集,该数据集利用这种机制破坏了现有的最新分解框架。最后,我们提供了一种解决方案,可以通过修改重建损失来实现分离,从而影响VAES如何感知数据点之间的距离。
translated by 谷歌翻译
在这项工作中,我们寻求弥合神经网络中地形组织和设备的概念。为实现这一目标,我们介绍了一种新颖的方法,用于有效地培训具有地形组织的潜变量的深度生成模型。我们表明,这种模型确实学会根据突出的特征,例如在MNIST上的数字,宽度和样式等突出特征来组织激活。此外,通过地形组织随着时间的推移(即时间相干),我们展示了如何鼓励预定义的潜空间转换运营商,以便观察到的转换输入序列 - 这是一种无监督的学习设备的原始形式。我们展示了该模型直接从序列中直接从序列中学习大约成反比的特征(即“胶囊”)并在相应变换测试序列上实现更高的似然性。通过测量推理网络的近似扩展和序列变换来定量验证标准验证。最后,我们展示了复杂转化的近似值,扩大了现有组的常量神经网络的能力。
translated by 谷歌翻译
本文提出了在适当的监督信息下进行分解的生成因果代表(亲爱的)学习方法。与实施潜在变量独立性的现有分解方法不同,我们考虑了一种基本利益因素可以因果关系相关的一般情况。我们表明,即使在监督下,先前具有独立先验的方法也无法解散因果关系。在这一发现的激励下,我们提出了一种称为DEAR的新的解开学习方法,该方法可以使因果可控的产生和因果代表学习。这种新公式的关键要素是使用结构性因果模型(SCM)作为双向生成模型的先验分布。然后,使用合适的GAN算法与发电机和编码器共同训练了先验,并与有关地面真相因子及其基本因果结构的监督信息合并。我们提供了有关该方法的可识别性和渐近收敛性的理论理由。我们对合成和真实数据集进行了广泛的实验,以证明DEAR在因果可控生成中的有效性,以及在样本效率和分布鲁棒性方面,学到的表示表示对下游任务的好处。
translated by 谷歌翻译
基于似然或显式的深层生成模型使用神经网络来构建灵活的高维密度。该公式直接与歧管假设相矛盾,该假设指出,观察到的数据位于嵌入高维环境空间中的低维歧管上。在本文中,我们研究了在这种维度不匹配的情况下,最大可能的训练的病理。我们正式证明,在学习歧管本身而不是分布的情况下,可以实现堕落的优点,而我们称之为多种歧视的现象过于拟合。我们提出了一类两步程序,该过程包括降低降低步骤,然后进行最大样子密度估计,并证明它们在非参数方面恢复了数据生成分布,从而避免了多种歧视。我们还表明,这些过程能够对隐式模型(例如生成对抗网络)学到的流形进行密度估计,从而解决了这些模型的主要缺点。最近提出的几种方法是我们两步程序的实例。因此,我们统一,扩展和理论上证明了一大批模型。
translated by 谷歌翻译
A grand goal in deep learning research is to learn representations capable of generalizing across distribution shifts. Disentanglement is one promising direction aimed at aligning a models representations with the underlying factors generating the data (e.g. color or background). Existing disentanglement methods, however, rely on an often unrealistic assumption: that factors are statistically independent. In reality, factors (like object color and shape) are correlated. To address this limitation, we propose a relaxed disentanglement criterion - the Hausdorff Factorized Support (HFS) criterion - that encourages a factorized support, rather than a factorial distribution, by minimizing a Hausdorff distance. This allows for arbitrary distributions of the factors over their support, including correlations between them. We show that the use of HFS consistently facilitates disentanglement and recovery of ground-truth factors across a variety of correlation settings and benchmarks, even under severe training correlations and correlation shifts, with in parts over +60% in relative improvement over existing disentanglement methods. In addition, we find that leveraging HFS for representation learning can even facilitate transfer to downstream tasks such as classification under distribution shifts. We hope our original approach and positive empirical results inspire further progress on the open problem of robust generalization.
translated by 谷歌翻译
The success of machine learning algorithms generally depends on data representation, and we hypothesize that this is because different representations can entangle and hide more or less the different explanatory factors of variation behind the data. Although specific domain knowledge can be used to help design representations, learning with generic priors can also be used, and the quest for AI is motivating the design of more powerful representation-learning algorithms implementing such priors. This paper reviews recent work in the area of unsupervised feature learning and deep learning, covering advances in probabilistic models, auto-encoders, manifold learning, and deep networks. This motivates longer-term unanswered questions about the appropriate objectives for learning good representations, for computing representations (i.e., inference), and the geometrical connections between representation learning, density estimation and manifold learning.
translated by 谷歌翻译
在没有监督信号的情况下学习简洁的数据表示是机器学习的基本挑战。实现此目标的一种突出方法是基于可能性的模型,例如变异自动编码器(VAE),以基于元元素来学习潜在表示,这是对下游任务有益的一般前提(例如,disentanglement)。但是,这种方法通常偏离原始的可能性体系结构,以应用引入的元优势,从而导致他们的培训不良变化。在本文中,我们提出了一种新颖的表示学习方法,Gromov-Wasserstein自动编码器(GWAE),该方法与潜在和数据分布直接匹配。 GWAE模型不是基于可能性的目标,而是通过最小化Gromov-Wasserstein(GW)度量的训练优化。 GW度量测量了在无与伦比的空间上支持的分布之间的面向结构的差异,例如具有不同的维度。通过限制可训练的先验的家庭,我们可以介绍元主题来控制下游任务的潜在表示。与现有基于VAE的方法的经验比较表明,GWAE模型可以通过更改先前的家族而无需进一步修改GW目标来基于元家庭学习表示。
translated by 谷歌翻译
我们研究是否使用两个条件型号$ p(x | z)$和$ q(z | x)$,以使用循环的两个条件型号,我们如何建模联合分配$ p(x,z)$。这是通过观察到深入生成模型的动机,除了可能的型号$ p(x | z)$,通常也使用推理型号$ q(z | x)$来提取表示,但它们通常依赖不表征的先前分配$ P(z)$来定义联合分布,这可能会使后塌和歧管不匹配等问题。为了探讨仅使用$ p(x | z)$和$ q(z | x)$模拟联合分布的可能性,我们研究其兼容性和确定性,对应于其条件分布一致的联合分布的存在和唯一性跟他们。我们为可操作的等价标准开发了一般理论,以实现兼容性,以及足够的确定条件。基于该理论,我们提出了一种新颖的生成建模框架来源,仅使用两个循环条件模型。我们开发方法以实现兼容性和确定性,并使用条件模型适合和生成数据。通过预先删除的约束,Cygen更好地适合数据并捕获由合成和现实世界实验支持的更多代表性特征。
translated by 谷歌翻译