配备具有推断人类意图的能力的机器人是有效合作的重要前提。对于这种目标的大多数计算方法采用了概率的推理,以回收机器人感知状态的“意图”的分布。然而,这些方法通常假设人类意图的特定任务概念(例如标记目标)是先验的。为了克服这一限制,我们提出了解离序列聚类变分性Autiachoder(Discvae),该群集框架可以用于以无监督的方式学习意图的这种分布。 DiscVae利用最近在无监督的学习方面的进步导出了顺序数据的解除不诚格潜在表示,从时间不变的全局方面分离时变化的本地特征。虽然与前面的解剖学框架不同,但是所提出的变体也涉及分立变量,以形成潜在混合模型,并使全局序列概念进行聚类,例如,观察到人类行为的意图。为了评估Discvae,首先使用弹跳数字和2D动画的视频数据集来验证其从未标记序列发现类的容量。然后,我们从机器人轮椅上进行的现实世界机器人交互实验报告结果。我们的调查结果引入了推断离散变量如何与人类意图一致,从而用于改善协作设置的帮助,例如共享控制。
translated by 谷歌翻译
We present a unified probabilistic model that learns a representative set of discrete vehicle actions and predicts the probability of each action given a particular scenario. Our model also enables us to estimate the distribution over continuous trajectories conditioned on a scenario, representing what each discrete action would look like if executed in that scenario. While our primary objective is to learn representative action sets, these capabilities combine to produce accurate multimodal trajectory predictions as a byproduct. Although our learned action representations closely resemble semantically meaningful categories (e.g., "go straight", "turn left", etc.), our method is entirely self-supervised and does not utilize any manually generated labels or categories. Our method builds upon recent advances in variational inference and deep unsupervised clustering, resulting in full distribution estimates based on deterministic model evaluations.
translated by 谷歌翻译
解开的顺序自动编码器(DSAE)代表一类概率图形模型,该模型描述了具有动态潜在变量和静态潜在变量的观察到的序列。前者以与观测值相同的帧速率编码信息,而后者在全球范围内控制整个序列。这引入了归纳偏见,并促进了基础本地和全球因素的无监督分解。在本文中,我们表明,香草dsae对动态潜在变量的模型结构和容量的选择敏感,并且容易折叠静态潜在变量。作为对策,我们提出了TS-DSAE,这是一个两阶段的培训框架,首先学习序列级别的先验分布,随后将其用于正规化该模型并促进辅助目标以促进分解。在广泛的模型配置中,对全局因子崩溃问题进行了完全无监督和强大的框架。它还避免了典型的解决方案,例如通常涉及费力参数调整和特定于域的数据增强的对抗训练。我们进行定量和定性评估,以证明其在人工音乐和现实音乐音频数据集上的分离方面的鲁棒性。
translated by 谷歌翻译
有效推论是一种数学框架,它起源于计算神经科学,作为大脑如何实现动作,感知和学习的理论。最近,已被证明是在不确定性下存在国家估算和控制问题的有希望的方法,以及一般的机器人和人工代理人的目标驱动行为的基础。在这里,我们审查了最先进的理论和对国家估计,控制,规划和学习的积极推断的实现;描述当前的成就,特别关注机器人。我们展示了相关实验,以适应,泛化和稳健性而言说明其潜力。此外,我们将这种方法与其他框架联系起来,并讨论其预期的利益和挑战:使用变分贝叶斯推理具有功能生物合理性的统一框架。
translated by 谷歌翻译
Learning useful representations without supervision remains a key challenge in machine learning. In this paper, we propose a simple yet powerful generative model that learns such discrete representations. Our model, the Vector Quantised-Variational AutoEncoder (VQ-VAE), differs from VAEs in two key ways: the encoder network outputs discrete, rather than continuous, codes; and the prior is learnt rather than static. In order to learn a discrete latent representation, we incorporate ideas from vector quantisation (VQ). Using the VQ method allows the model to circumvent issues of "posterior collapse" --where the latents are ignored when they are paired with a powerful autoregressive decoder --typically observed in the VAE framework. Pairing these representations with an autoregressive prior, the model can generate high quality images, videos, and speech as well as doing high quality speaker conversion and unsupervised learning of phonemes, providing further evidence of the utility of the learnt representations.
translated by 谷歌翻译
使用人工智能(AI)以意图创建舞蹈编舞仍在早期。有条件生成舞蹈序列的方法在遵循编舞特定的创意意图的能力上仍然有限,通常依靠外部提示或监督学习。同样,完全注释的舞蹈数据集罕见且劳动密集型。为了填补这一空白,并帮助深入学习作为编舞者的有意义的工具,我们提出了“ Pirounet”,这是一种半监督的条件性复发性自动编码器以及舞蹈标签网络应用程序。 Pirounet允许舞蹈专业人士使用自己的主观创意标签注释数据,并根据其美学标准生成新的编舞。得益于提议的半监督方法,PirOnet仅需要标记数据集的一小部分,通常以1%的订单为单位。我们展示了Pirounet的功能,因为它基于“ Laban Time努力”生成原始的编排,这是一个既定的舞蹈概念,描述了动作时间动态的意图。我们通过一系列定性和定量指标广泛评估了Pirounet的舞蹈创作,从而证实了其作为编舞工具的适用性。
translated by 谷歌翻译
The task of video prediction and generation is known to be notoriously difficult, with the research in this area largely limited to short-term predictions. Though plagued with noise and stochasticity, videos consist of features that are organised in a spatiotemporal hierarchy, different features possessing different temporal dynamics. In this paper, we introduce Dynamic Latent Hierarchy (DLH) -- a deep hierarchical latent model that represents videos as a hierarchy of latent states that evolve over separate and fluid timescales. Each latent state is a mixture distribution with two components, representing the immediate past and the predicted future, causing the model to learn transitions only between sufficiently dissimilar states, while clustering temporally persistent states closer together. Using this unique property, DLH naturally discovers the spatiotemporal structure of a dataset and learns disentangled representations across its hierarchy. We hypothesise that this simplifies the task of modeling temporal dynamics of a video, improves the learning of long-term dependencies, and reduces error accumulation. As evidence, we demonstrate that DLH outperforms state-of-the-art benchmarks in video prediction, is able to better represent stochasticity, as well as to dynamically adjust its hierarchical and temporal structure. Our paper shows, among other things, how progress in representation learning can translate into progress in prediction tasks.
translated by 谷歌翻译
The combination of machine learning models with physical models is a recent research path to learn robust data representations. In this paper, we introduce p$^3$VAE, a generative model that integrates a perfect physical model which partially explains the true underlying factors of variation in the data. To fully leverage our hybrid design, we propose a semi-supervised optimization procedure and an inference scheme that comes along meaningful uncertainty estimates. We apply p$^3$VAE to the semantic segmentation of high-resolution hyperspectral remote sensing images. Our experiments on a simulated data set demonstrated the benefits of our hybrid model against conventional machine learning models in terms of extrapolation capabilities and interpretability. In particular, we show that p$^3$VAE naturally has high disentanglement capabilities. Our code and data have been made publicly available at https://github.com/Romain3Ch216/p3VAE.
translated by 谷歌翻译
自由能原理及其必然的积极推论构成了一种生物启发的理论,该理论假设生物学作用保留在一个受限制的世界首选状态中,即它们最小化自由能。根据这一原则,生物学家学习了世界的生成模型和未来的计划行动,该模型将使代理保持稳态状态,以满足其偏好。该框架使自己在计算机中实现,因为它理解了使其计算负担得起的重要方面,例如变异推断和摊销计划。在这项工作中,我们研究了深度学习的工具,以设计和实现基于主动推断的人造代理,对自由能原理进行深入学习的呈现,调查工作与机器学习和主动推理领域相关,以及讨论实施过程中涉及的设计选择。该手稿探究了积极推理框架的新观点,将其理论方面扎根于更务实的事务中,为活跃推理的新手提供了实用指南,并为深度学习从业人员的起点提供了研究,以调查自由能源原则的实施。
translated by 谷歌翻译
我们研究了实时的协作机器人(Cobot)处理,Cobot在人类命令下操纵工件。当人类直接处理工件时,这是有用的。但是,在可能的操作中难以使COBOT易于命令和灵活。在这项工作中,我们提出了一个实时协作机器人处理(RTCOHand)框架,其允许通过用户定制的动态手势控制COBOT。由于用户,人类运动不确定性和嘈杂的人类投入的变化,这很难。我们将任务塑造为概率的生成过程,称为条件协作处理过程(CCHP),并从人类的合作中学习。我们彻底评估了CCHP的适应性和稳健性,并将我们的方法应用于Kinova Gen3机器人手臂的实时Cobot处理任务。我们实现了与经验丰富和新用户的无缝人员合作。与古典控制器相比,RTCEHAND允许更复杂的操作和更低的用户认知负担。它还消除了对试验和错误的需求,在安全关键任务中呈现。
translated by 谷歌翻译
The key idea behind the unsupervised learning of disentangled representations is that real-world data is generated by a few explanatory factors of variation which can be recovered by unsupervised learning algorithms. In this paper, we provide a sober look at recent progress in the field and challenge some common assumptions. We first theoretically show that the unsupervised learning of disentangled representations is fundamentally impossible without inductive biases on both the models and the data. Then, we train more than 12 000 models covering most prominent methods and evaluation metrics in a reproducible large-scale experimental study on seven different data sets. We observe that while the different methods successfully enforce properties "encouraged" by the corresponding losses, well-disentangled models seemingly cannot be identified without supervision. Furthermore, increased disentanglement does not seem to lead to a decreased sample complexity of learning for downstream tasks. Our results suggest that future work on disentanglement learning should be explicit about the role of inductive biases and (implicit) supervision, investigate concrete benefits of enforcing disentanglement of the learned representations, and consider a reproducible experimental setup covering several data sets.
translated by 谷歌翻译
We present a principled approach to incorporating labels in VAEs that captures the rich characteristic information associated with those labels. While prior work has typically conflated these by learning latent variables that directly correspond to label values, we argue this is contrary to the intended effect of supervision in VAEs-capturing rich label characteristics with the latents. For example, we may want to capture the characteristics of a face that make it look young, rather than just the age of the person. To this end, we develop the CCVAE, a novel VAE model and concomitant variational objective which captures label characteristics explicitly in the latent space, eschewing direct correspondences between label values and latents. Through judicious structuring of mappings between such characteristic latents and labels, we show that the CCVAE can effectively learn meaningful representations of the characteristics of interest across a variety of supervision schemes. In particular, we show that the CCVAE allows for more effective and more general interventions to be performed, such as smooth traversals within the characteristics for a given label, diverse conditional generation, and transferring characteristics across datapoints.
translated by 谷歌翻译
人类是熟练的导航员:我们恰当地在新的地方进行了操纵,意识到我们回到以前见过的位置,甚至可以想到经历我们从未参观过的部分环境的捷径。另一方面,基于模型的强化学习中的当前方法与从训练分布中概括环境动态的努力。我们认为,两个原则可以帮助弥合这一差距:潜在的学习和简约的动态。人类倾向于以简单的术语来思考环境动态 - 我们认为轨迹不是指我们期望在路径上看到的东西,而是在抽象的潜在空间中,其中包含有关该位置的空间坐标的信息。此外,我们假设在环境的新颖部分中四处走动的工作方式与我们所熟悉的部分相同。这两个原则在串联中共同起作用:在潜在空间中,动态表现出了简约的特征。我们开发了一种学习这种简约动态的模型。使用一个变异目标,我们的模型经过培训,可以使用本地线性转换在潜在空间中重建经验丰富的过渡,同时鼓励尽可能少地调用不同的变换。使用我们的框架,我们演示了在一系列政策学习和计划任务中学习简化潜在动态模型的实用性。
translated by 谷歌翻译
从视觉观察中了解动态系统的潜在因果因素被认为是对复杂环境中推理的推理的关键步骤。在本文中,我们提出了Citris,这是一种变异自动编码器框架,从图像的时间序列中学习因果表示,其中潜在的因果因素可能已被干预。与最近的文献相反,Citris利用了时间性和观察干预目标,以鉴定标量和多维因果因素,例如3D旋转角度。此外,通过引入归一化流,可以轻松扩展柑橘,以利用和删除已验证的自动编码器获得的删除表示形式。在标量因果因素上扩展了先前的结果,我们在更一般的环境中证明了可识别性,其中仅因果因素的某些成分受干预措施影响。在对3D渲染图像序列的实验中,柑橘类似于恢复基本因果变量的先前方法。此外,使用预验证的自动编码器,Citris甚至可以概括为因果因素的实例化,从而在SIM到现实的概括中开放了未来的研究领域,以进行因果关系学习。
translated by 谷歌翻译
Latent variable models such as the Variational Auto-Encoder (VAE) have become a go-to tool for analyzing biological data, especially in the field of single-cell genomics. One remaining challenge is the interpretability of latent variables as biological processes that define a cell's identity. Outside of biological applications, this problem is commonly referred to as learning disentangled representations. Although several disentanglement-promoting variants of the VAE were introduced, and applied to single-cell genomics data, this task has been shown to be infeasible from independent and identically distributed measurements, without additional structure. Instead, recent methods propose to leverage non-stationary data, as well as the sparse mechanism shift assumption in order to learn disentangled representations with a causal semantic. Here, we extend the application of these methodological advances to the analysis of single-cell genomics data with genetic or chemical perturbations. More precisely, we propose a deep generative model of single-cell gene expression data for which each perturbation is treated as a stochastic intervention targeting an unknown, but sparse, subset of latent variables. We benchmark these methods on simulated single-cell data to evaluate their performance at latent units recovery, causal target identification and out-of-domain generalization. Finally, we apply those approaches to two real-world large-scale gene perturbation data sets and find that models that exploit the sparse mechanism shift hypothesis surpass contemporary methods on a transfer learning task. We implement our new model and benchmarks using the scvi-tools library, and release it as open-source software at \url{https://github.com/Genentech/sVAE}.
translated by 谷歌翻译
我们为高维顺序数据提出了深度潜在的变量模型。我们的模型将潜在空间分解为内容和运动变量。为了模拟多样化的动态,我们将运动空间分成子空间,并为每个子空间引入一个独特的哈密顿运算符。Hamiltonian配方提供可逆动态,学习限制运动路径以保护不变性属性。运动空间的显式分裂将哈密顿人分解成对称组,并提供动态的长期可分离性。这种拆分也意味着可以学习的表示,这很容易解释和控制。我们展示了我们模型来交换两个视频的运动,从给定的图像和无条件序列生成产生各种动作的序列。
translated by 谷歌翻译
In this paper, we explore the inclusion of latent random variables into the hidden state of a recurrent neural network (RNN) by combining the elements of the variational autoencoder. We argue that through the use of high-level latent random variables, the variational RNN (VRNN) 1 can model the kind of variability observed in highly structured sequential data such as natural speech. We empirically evaluate the proposed model against other related sequential models on four speech datasets and one handwriting dataset. Our results show the important roles that latent random variables can play in the RNN dynamics.
translated by 谷歌翻译
当前独立于域的经典计划者需要问题域和实例作为输入的符号模型,从而导致知识采集瓶颈。同时,尽管深度学习在许多领域都取得了重大成功,但知识是在与符号系统(例如计划者)不兼容的亚符号表示中编码的。我们提出了Latplan,这是一种无监督的建筑,结合了深度学习和经典计划。只有一组未标记的图像对,显示了环境中允许的过渡子集(训练输入),Latplan学习了环境的完整命题PDDL动作模型。稍后,当给出代表初始状态和目标状态(计划输入)的一对图像时,Latplan在符号潜在空间中找到了目标状态的计划,并返回可视化的计划执行。我们使用6个计划域的基于图像的版本来评估LATPLAN:8个插头,15个式嘴,Blockworld,Sokoban和两个LightsOut的变体。
translated by 谷歌翻译
神经活动的意义和简化表示可以产生深入了解如何以及什么信息被神经回路内处理。然而,如果没有标签,也揭示了大脑和行为之间的联系的发现表示可以挑战。在这里,我们介绍了所谓的交换,VAE学习神经活动的解开表示一种新型的无监督的办法。我们的方法结合了特定实例的排列损失,试图最大限度地输入(大脑状态)的转变观点之间的代表性相似性的生成模型框架。这些转化(或增强)视图是通过掉出神经元和抖动样品中的时间,这直观地应导致网络维护既时间一致性和不变性用于表示神经状态的特定的神经元的表示创建的。通过对从数百个不同的灵长类动物大脑的神经元的模拟数据和神经录音的评价,我们表明,它是不可能建立的表示沿有关潜在维度解开神经的数据集与行为相联系。
translated by 谷歌翻译
每年,在越来越复杂的多种域名,包括GO,Poker和Starcraft II在内的著名示例中都能达到专家级的性能。这种快速的进步伴随着相应的需求,以更好地了解这种代理如何实现这种绩效,以实现其安全的部署,确定局限性并揭示其改善它们的潜力。在本文中,我们从以性能为中心的多种学习中退后一步,而是将注意力转向代理行为分析。我们介绍了一种模型 - 反应方法,用于使用变异推理在多种基因域中发现行为簇,以学习关节和局部代理水平的行为层次结构。我们的框架没有对代理的基础学习算法的假设,不需要访问其潜在状态或模型,并且可以使用完全离线观察数据进行培训。我们说明了我们方法在联合和地方代理层面上对行为的耦合理解的有效性,在整个培训过程中检测行为变更点,发现核心行为概念(例如,那些促进更高回报的核心行为概念)的有效性,并证明了方法的可扩展性高维的多基金会木叶控制结构域。
translated by 谷歌翻译