强化学习的一个主要挑战是发现奖励分布稀疏的任务的有效政策。我们假设在没有有用的奖励信号的情况下,有效的探索策略应该找出{\ it decision states}。这些状态位于状态空间中的关键交叉点,代理可以从这些交叉点转换到新的,可能未开发的区域。我们建议从先前的经验中了解决策状态。通过训练具有信息瓶颈的目标条件,我们可以通过检查模型实际利用目标状态的位置来识别决策状态。我们发现,这种简单的机制可以有效地识别决策状态,即使在部分观察到的环境中实际上,该模型学习了与潜在子目标相关的理论线索。在新的环境中,这个模型可以识别新的子目标以进行进一步的探索,引导代理通过一系列潜在的决策状态并通过状态空间的新区域。
translated by 谷歌翻译
我们探讨了Embodied QuestionAnswering的盲目(仅限问题)基线。 EmbodiedQ​​A任务要求代理人通过在模拟环境中智能地导航来回答问题,在最终回答之前仅通过第一人称视觉收集必要的视觉信息。因此,忽略环境和视觉信息的盲目基线是一种退化解决方案,但我们通过我们在EQAv1数据集上的实验表明,在所有情况下,一个简单的仅问题基线可以在EmbodiedQ​​A任务中获得最先进的结果,除非该代理非常靠近该对象。
translated by 谷歌翻译
It is commonly assumed that language refers to high-level visual conceptswhile leaving low-level visual processing unaffected. This view dominates thecurrent literature in computational models for language-vision tasks, wherevisual and linguistic input are mostly processed independently before beingfused into a single representation. In this paper, we deviate from this classicpipeline and propose to modulate the \emph{entire visual processing} bylinguistic input. Specifically, we condition the batch normalization parametersof a pretrained residual network (ResNet) on a language embedding. Thisapproach, which we call MOdulated RESnet (\MRN), significantly improves strongbaselines on two visual question answering tasks. Our ablation study shows thatmodulating from the early stages of the visual processing is beneficial.
translated by 谷歌翻译
Artificial Neural Networks are computational network models inspired bysignal processing in the brain. These models have dramatically improved theperformance of many learning tasks, including speech and object recognition.However, today's computing hardware is inefficient at implementing neuralnetworks, in large part because much of it was designed for von Neumanncomputing schemes. Significant effort has been made to develop electronicarchitectures tuned to implement artificial neural networks that improve uponboth computational speed and energy efficiency. Here, we propose a newarchitecture for a fully-optical neural network that, using unique advantagesof optics, promises a computational speed enhancement of at least two orders ofmagnitude over the state-of-the-art and three orders of magnitude in powerefficiency for conventional learning tasks. We experimentally demonstrateessential parts of our architecture using a programmable nanophotonicprocessor.
translated by 谷歌翻译
We present Neural Autoregressive Distribution Estimation (NADE) models, whichare neural network architectures applied to the problem of unsuperviseddistribution and density estimation. They leverage the probability product ruleand a weight sharing scheme inspired from restricted Boltzmann machines, toyield an estimator that is both tractable and has good generalizationperformance. We discuss how they achieve competitive performance in modelingboth binary and real-valued observations. We also present how deep NADE modelscan be trained to be agnostic to the ordering of input dimensions used by theautoregressive product rule decomposition. Finally, we also show how to exploitthe topological structure of pixels in images using a deep convolutionalarchitecture for NADE.
translated by 谷歌翻译
我们提出了一个自动编码器,它利用学习的表示来更好地测量数据空间中的相似性。通过组合具有生成对抗网络的变分自动编码器,我们可以使用GAN鉴别器中的学习特征表示作为VAE重建目标的基础。因此,我们用特征错误替换元素错误,以更好地捕获数据分布,同时提供对例如数据的不变性。翻译。我们将方法应用于面部图像,并且在视觉保真度方面表现出优于VAE的元素相似性度量。此外,我们展示该方法学习嵌入,其中可以使用简单的算法来修改高级抽象视觉特征(例如,佩戴眼镜)。
translated by 谷歌翻译
在本文中,我们提出了一种基于深度神经网络(DNNs)的全自动脑肿瘤分割方法。拟议的网络适用于MR图像中的成人母细胞瘤(低级和高级)。通过它们非常自然,这些肿瘤可以出现在大脑的任何地方,并且几乎具有任何形状,大小和对比度。这些原因激发了我们对机器学习解决方案的探索,该解决方案利用灵活,高容量的DNN,同时效率极高。在这里,我们描述了我们发现获得竞争性能所必需的不同模型选择。我们特别探讨了基于卷积神经网络(CNN)的不同架构,即专门适用于图像数据的DNN。我们提出了一种新颖的CNN架构,它与传统上用于计算机视觉的架构不同。我们的CNN同时利用本地特征以及更多的全局上下文特征。此外,与CNN的大多数传统使用不同,我们的网络使用最终层,该最终层是完全连接层的卷积实现,允许40倍速。我们还描述了一个两阶段训练程序,它允许我们解决与肿瘤标签不平衡相关的困难。最后,我们探索了一种级联架构,其中基本CNN的输出被视为后续CNN的附加信息源。在2013年BRATS测试数据集中报告的结果表明,我们的架构比目前发布的最新技术有所改进,同时速度提高了30多倍。
translated by 谷歌翻译
使用递归神经网络(RNN)进行图像描述的最新进展促使人们探索了它们对视频描述的应用。然而,虽然图像是静态的,但处理视频需要建模其动态时间结构,然后将该信息正确地整合到自然语言描述中。在这种情况下,我们提出了一种方法,该方法成功地考虑了视频的本地和全球时间结构,以产生描述。首先,我们的方法包括短时间动态的空间时间3-D卷积神经网络(3-D CNN)表示。 3-D CNN表示在视频动作识别任务上被训练,以便产生被调谐到人类运动和行为的表示。其次,我们提出了一种临时保留机制,它允许超越局部时间建模并学习在给定生成文本的RNN的情况下自动选择最相关的时间段。我们的方法超出了Youtube2Text数据集上的WORU和METEOR指标的当前最新技术水平。我们还提供了一个新的,更大的,更具挑战性的配对视频和自然语言描述数据集的结果。
translated by 谷歌翻译
最近有很多兴趣设计神经网络模型来估计一组例子中的分布。我们为自动编码器神经网络引入了一个简单的修改,产生了强大的生成模型。我们的方法屏蔽自动编码器的参数以尊重自回归约束:每个输入仅从给定序列中的先前输入重建。以这种方式约束,自动编码器输出可以被解释为条件概率的一组,以及它们的产品,完全联合概率。我们还可以训练单个网络,该网络可以以多种不同的顺序分解联合概率。我们的简单框架可以应用于多种体系结构,包括深层体系结构。 Vectorizedimplementations,例如GPU,简单快速。实验表明,这种方法与最先进的易处理分布测量仪相比具有竞争力。在测试时,该方法明显更快,并且比其他自回归估计器更好。
translated by 谷歌翻译
Cross-language learning allows us to use training data from one language tobuild models for a different language. Many approaches to bilingual learningrequire that we have word-level alignment of sentences from parallel corpora.In this work we explore the use of autoencoder-based methods for cross-languagelearning of vectorial word representations that are aligned between twolanguages, while not relying on word-level alignments. We show that by simplylearning to reconstruct the bag-of-words representations of aligned sentences,within and between languages, we can in fact learn high-quality representationsand do without word alignments. Since training autoencoders on wordobservations presents certain computational issues, we propose and comparedifferent variations adapted to this setting. We also propose an explicitcorrelation maximizing regularizer that leads to significant improvement in theperformance. We empirically investigate the success of our approach on theproblem of cross-language test classification, where a classifier trained on agiven language (e.g., English) must learn to generalize to a different language(e.g., German). These experiments demonstrate that our approaches arecompetitive with the state-of-the-art, achieving up to 10-14 percentage pointimprovements over the best reported results on this task.
translated by 谷歌翻译