The standard recurrent neural network language model (rnnlm) generates sentences one word at a time and does not work from an explicit global sentence representation. In this work, we introduce and study an rnn-based variational autoencoder generative model that incorporates distributed latent representations of entire sentences. This factorization allows it to explicitly model holistic properties of sentences such as style, topic, and high-level syntactic features. Samples from the prior over these sentence representations remarkably produce diverse and well-formed sentences through simple deterministic decoding. By examining paths through this latent space, we are able to generate coherent novel sentences that interpolate between known sentences. We present techniques for solving the difficult learning problem presented by this model, demonstrate its effectiveness in imputing missing words, explore many interesting properties of the model's latent sentence space, and present negative results on the use of the model in language modeling.
translated by 谷歌翻译
原则上,将变异自动编码器(VAE)应用于顺序数据提供了一种用于控制序列生成,操纵和结构化表示学习的方法。但是,训练序列VAE具有挑战性:自回归解码器通常可以解释数据而无需使用潜在空间,即后置倒塌。为了减轻这种情况,最新的模型通过将均匀的随机辍学量应用于解码器输入来削弱强大的解码器。从理论上讲,我们表明,这可以消除解码器输入提供的点式互信息,该信息通过利用潜在空间来补偿。然后,我们提出了一种对抗性训练策略,以实现基于信息的随机辍学。与标准文本基准数据集上的均匀辍学相比,我们的目标方法同时提高了序列建模性能和潜在空间中捕获的信息。
translated by 谷歌翻译
The success of machine learning algorithms generally depends on data representation, and we hypothesize that this is because different representations can entangle and hide more or less the different explanatory factors of variation behind the data. Although specific domain knowledge can be used to help design representations, learning with generic priors can also be used, and the quest for AI is motivating the design of more powerful representation-learning algorithms implementing such priors. This paper reviews recent work in the area of unsupervised feature learning and deep learning, covering advances in probabilistic models, auto-encoders, manifold learning, and deep networks. This motivates longer-term unanswered questions about the appropriate objectives for learning good representations, for computing representations (i.e., inference), and the geometrical connections between representation learning, density estimation and manifold learning.
translated by 谷歌翻译
In this paper, we explore the inclusion of latent random variables into the hidden state of a recurrent neural network (RNN) by combining the elements of the variational autoencoder. We argue that through the use of high-level latent random variables, the variational RNN (VRNN) 1 can model the kind of variability observed in highly structured sequential data such as natural speech. We empirically evaluate the proposed model against other related sequential models on four speech datasets and one handwriting dataset. Our results show the important roles that latent random variables can play in the RNN dynamics.
translated by 谷歌翻译
近似复杂的概率密度是现代统计中的核心问题。在本文中,我们介绍了变分推理(VI)的概念,这是一种机器学习中的流行方法,该方法使用优化技术来估计复杂的概率密度。此属性允许VI汇聚速度比经典方法更快,例如Markov Chain Monte Carlo采样。概念上,VI通过选择一个概率密度函数,然后找到最接近实际概率密度的家庭 - 通常使用Kullback-Leibler(KL)发散作为优化度量。我们介绍了缩窄的证据,以促进近似的概率密度,我们审查了平均场变分推理背后的想法。最后,我们讨论VI对变分式自动编码器(VAE)和VAE-生成的对抗网络(VAE-GAN)的应用。用本文,我们的目标是解释VI的概念,并通过这种方法协助协助。
translated by 谷歌翻译
Learning useful representations without supervision remains a key challenge in machine learning. In this paper, we propose a simple yet powerful generative model that learns such discrete representations. Our model, the Vector Quantised-Variational AutoEncoder (VQ-VAE), differs from VAEs in two key ways: the encoder network outputs discrete, rather than continuous, codes; and the prior is learnt rather than static. In order to learn a discrete latent representation, we incorporate ideas from vector quantisation (VQ). Using the VQ method allows the model to circumvent issues of "posterior collapse" --where the latents are ignored when they are paired with a powerful autoregressive decoder --typically observed in the VAE framework. Pairing these representations with an autoregressive prior, the model can generate high quality images, videos, and speech as well as doing high quality speaker conversion and unsupervised learning of phonemes, providing further evidence of the utility of the learnt representations.
translated by 谷歌翻译
我们通过为变压器嵌入的变异信息瓶颈常规剂开发变压器提出了VAE。我们将变压器编码器的嵌入空间形式化为混合概率分布,并使用贝叶斯非参数来推导非参数变化信息瓶颈(NVIB)来用于此类基于注意力的嵌入。非参数方法支持的混合成分数量可变数量,可捕获注意力支持的向量数量,而我们的非参数分布的交换性捕获了注意力的置换不变性。这使得NVIB能够将注意力访问的向量数量以及各个向量中的信息量进行正规化。通过将变压器编码器与NVIB进行正规注意,我们提出了一个非参数变异自动编码器(NVAE)。关于训练自然语言文本的NVAE的最初实验表明,诱导的嵌入空间具有VAE对于变压器的所需特性。
translated by 谷歌翻译
变异自动编码器(VAE)是最常用的无监督机器学习模型之一。但是,尽管对先前和后验的高斯分布的默认选择通常代表了数学方便的分布通常会导致竞争结果,但我们表明该参数化无法用潜在的超球体结构对数据进行建模。为了解决这个问题,我们建议使用von Mises-fisher(VMF)分布,从而导致超级潜在空间。通过一系列实验,我们展示了这种超球vae或$ \ mathcal {s} $ - vae如何更适合于用超球形结构捕获数据,同时胜过正常的,$ \ mathcal {n} $ - vae-,在其他数据类型的低维度中。http://github.com/nicola-decao/s-vae-tf和https://github.com/nicola-decao/nicola-decao/s-vae-pytorch
translated by 谷歌翻译
我们研究可以从有限,凸面和良好的控制空间中生成任意自然语言文本(例如所有英语句子)的模型。我们称它们为通用Vec2Text模型。这样的模型将允许在矢量空间中做出语义决策(例如,通过强化学习),而自然语言产生由VEC2Text模型处理。我们提出了四个所需的特性:这种VEC2Text模型应具有的普遍性,多样性,流利性和语义结构,我们提供了定量和定性的方法来评估它们。我们通过将瓶颈添加到250m参数变压器模型中来实现VEC2Text模型,并通过从大型Web语料库中提取的400m句子(10B代币)对其进行自动编码目标进行训练。我们提出了一种基于往返翻译的简单数据增强技术,并在广泛的实验中表明,所得的VEC2Text模型令人惊讶地导致矢量空间,从而满足我们的四个所需属性,并且该模型强烈超过了标准和denoso自动编码器的表现。
translated by 谷歌翻译
GPT-2经常在故事生成模型中适应,因为它提供了强大的生成能力。但是,它仍然无法生成一致的故事并缺乏多样性。目前的故事生成模型利用诸如情节或致法性的附加信息进入GPT-2以引导生成过程。这些方法侧重于提高故事的发电质量,而我们的工作表明质量和多样性。我们探索BERT和GPT-2的组合构建一个变形式自动化器(VAE),并通过添加其他目标来扩展它来学习故事主题和话语关系等全局功能。我们的评估表明我们的增强型VAE可以提供更好的质量和多样性折衷,从而产生更少的重复故事内容,并学习更具信息丰富的潜在变量。
translated by 谷歌翻译
矢量量化变量自动编码器(VQ-VAE)是基于数据的离散潜在表示的生成模型,其中输入映射到有限的学习嵌入式集合。要生成新样品,必须对离散状态进行自动介绍的先验分布。分别地。这一先验通常非常复杂,并导致生成缓慢。在这项工作中,我们提出了一个新模型,以同时训练先验和编码器/解码器网络。我们在连续编码的向量和非信息性先验分布之间建立扩散桥。然后将潜在离散状态作为这些连续向量的随机函数。我们表明,我们的模型与迷你imagenet和Cifar数据集的自动回归先验具有竞争力,并且在优化和采样方面都有效。我们的框架还扩展了标准VQ-VAE,并可以启用端到端培训。
translated by 谷歌翻译
How can we perform efficient inference and learning in directed probabilistic models, in the presence of continuous latent variables with intractable posterior distributions, and large datasets? We introduce a stochastic variational inference and learning algorithm that scales to large datasets and, under some mild differentiability conditions, even works in the intractable case. Our contributions is two-fold. First, we show that a reparameterization of the variational lower bound yields a lower bound estimator that can be straightforwardly optimized using standard stochastic gradient methods. Second, we show that for i.i.d. datasets with continuous latent variables per datapoint, posterior inference can be made especially efficient by fitting an approximate inference model (also called a recognition model) to the intractable posterior using the proposed lower bound estimator. Theoretical advantages are reflected in experimental results.
translated by 谷歌翻译
项目反应理论(IRT)是一个无处不在的模型,可以根据他们对问题的回答理解人类行为和态度。大型现代数据集为捕捉人类行为的更多细微差别提供了机会,从而有可能改善心理测量模型,从而改善科学理解和公共政策。但是,尽管较大的数据集允许采用更灵活的方法,但许多用于拟合IRT模型的当代算法也可能具有禁止现实世界应用的巨大计算需求。为了解决这种瓶颈,我们引入了IRT的变异贝叶斯推理算法,并表明它在不牺牲准确性的情况下快速可扩展。将此方法应用于认知科学和教育的五个大规模项目响应数据集中,比替代推理算法更高的对数可能性和更高的准确性。然后,使用这种新的推论方法,我们将IRT概括为具有表现力的贝叶斯响应模型,利用深度学习的最新进展来捕获具有神经网络的非线性项目特征曲线(ICC)。使用TIMSS的特定级数学测试,我们显示我们的非线性IRT模型可以捕获有趣的不对称ICC。该算法实现是开源的,易于使用。
translated by 谷歌翻译
在过去的几年中,在各种文本生成任务中见证了各种自动编码器的优势。但是,由于文本的顺序性质,自动回归解码器倾向于忽略潜在变量,然后降低到简单的语言模型,称为KL消失的问题,当VAE与基于变压器的结构结合时,这将进一步恶化。为了改善这个问题,我们提出了一种新型变化变压器框架Della。德拉(Della)从较低层的层中得知一系列层的潜在变量,每个变量都从下层的层中推断出,并通过低级张量产品与隐藏状态紧密耦合。通过这种方式,Della强迫这些后部潜在变量将其与整个计算路径深入融合,从而结合了更多信息。从理论上讲,我们可以将我们的方法视为纠缠潜在变量,以避免通过层减少后验信息,从而使DELLA即使没有任何退火或阈值技巧,也可以使DELLA获得更高的非零KL值。与多个强大的基线相比,对四个无条件和三个条件生成任务的实验表明,Della可以更好地减轻KL消失并改善质量和多样性。
translated by 谷歌翻译
变异推理(VI)的核心原理是将计算复杂后概率密度计算的统计推断问题转换为可拖动的优化问题。该属性使VI比几种基于采样的技术更快。但是,传统的VI算法无法扩展到大型数据集,并且无法轻易推断出越野数据点,而无需重新运行优化过程。该领域的最新发展,例如随机,黑框和摊销VI,已帮助解决了这些问题。如今,生成的建模任务广泛利用摊销VI来实现其效率和可扩展性,因为它利用参数化函数来学习近似的后验密度参数。在本文中,我们回顾了各种VI技术的数学基础,以构成理解摊销VI的基础。此外,我们还概述了最近解决摊销VI问题的趋势,例如摊销差距,泛化问题,不一致的表示学习和后验崩溃。最后,我们分析了改善VI优化的替代差异度量。
translated by 谷歌翻译
数据增强是通过转换为机器学习的人工创建数据的人工创建,是一个跨机器学习学科的研究领域。尽管它对于增加模型的概括功能很有用,但它还可以解决许多其他挑战和问题,从克服有限的培训数据到正规化目标到限制用于保护隐私的数据的数量。基于对数据扩展的目标和应用的精确描述以及现有作品的分类法,该调查涉及用于文本分类的数据增强方法,并旨在为研究人员和从业者提供简洁而全面的概述。我们将100多种方法划分为12种不同的分组,并提供最先进的参考文献来阐述哪种方法可以通过将它们相互关联,从而阐述了哪种方法。最后,提供可能构成未来工作的基础的研究观点。
translated by 谷歌翻译
当前独立于域的经典计划者需要问题域和实例作为输入的符号模型,从而导致知识采集瓶颈。同时,尽管深度学习在许多领域都取得了重大成功,但知识是在与符号系统(例如计划者)不兼容的亚符号表示中编码的。我们提出了Latplan,这是一种无监督的建筑,结合了深度学习和经典计划。只有一组未标记的图像对,显示了环境中允许的过渡子集(训练输入),Latplan学习了环境的完整命题PDDL动作模型。稍后,当给出代表初始状态和目标状态(计划输入)的一对图像时,Latplan在符号潜在空间中找到了目标状态的计划,并返回可视化的计划执行。我们使用6个计划域的基于图像的版本来评估LATPLAN:8个插头,15个式嘴,Blockworld,Sokoban和两个LightsOut的变体。
translated by 谷歌翻译
深度神经语言模型的最新进展与大规模数据集的能力相结合,加速了自然语言生成系统的发展,这些系统在多种任务和应用程序上下文中产生流利和连贯的文本(在各种成功程度上)。但是,为所需的用户控制这些模型的输出仍然是一个开放的挑战。这不仅对于自定义生成语言的内容和样式至关重要,而且对于他们在现实世界中的安全可靠部署至关重要。我们提出了一项关于受约束神经语言生成的新兴主题的广泛调查,在该主题中,我们通过区分条件和约束(后者是在输出文本上而不是输入的可检验条件),正式定义和分类自然语言生成问题,目前是可检验的)约束文本生成任务,并查看受限文本生成的现有方法和评估指标。我们的目的是强调这个新兴领域的最新进展和趋势,以告知最有希望的方向和局限性,以推动受约束神经语言生成研究的最新作品。
translated by 谷歌翻译
We present a principled approach to incorporating labels in VAEs that captures the rich characteristic information associated with those labels. While prior work has typically conflated these by learning latent variables that directly correspond to label values, we argue this is contrary to the intended effect of supervision in VAEs-capturing rich label characteristics with the latents. For example, we may want to capture the characteristics of a face that make it look young, rather than just the age of the person. To this end, we develop the CCVAE, a novel VAE model and concomitant variational objective which captures label characteristics explicitly in the latent space, eschewing direct correspondences between label values and latents. Through judicious structuring of mappings between such characteristic latents and labels, we show that the CCVAE can effectively learn meaningful representations of the characteristics of interest across a variety of supervision schemes. In particular, we show that the CCVAE allows for more effective and more general interventions to be performed, such as smooth traversals within the characteristics for a given label, diverse conditional generation, and transferring characteristics across datapoints.
translated by 谷歌翻译
We present a noisy channel generative model of two sequences, for example text and speech, which enables uncovering the association between the two modalities when limited paired data is available. To address the intractability of the exact model under a realistic data setup, we propose a variational inference approximation. To train this variational model with categorical data, we propose a KL encoder loss approach which has connections to the wake-sleep algorithm. Identifying the joint or conditional distributions by only observing unpaired samples from the marginals is only possible under certain conditions in the data distribution and we discuss under what type of conditional independence assumptions that might be achieved, which guides the architecture designs. Experimental results show that even tiny amount of paired data (5 minutes) is sufficient to learn to relate the two modalities (graphemes and phonemes here) when a massive amount of unpaired data is available, paving the path to adopting this principled approach for all seq2seq models in low data resource regimes.
translated by 谷歌翻译