我们介绍了一种基于识别范围模型(RPM)的概率无监督学习方法的新方法:一种归一化的半参数假设类别,用于观察到的和潜在变量的联合分布。在关键的假设下,观察值在有条件地独立的情况下,rpm直接编码“识别”过程,从而在观测值的情况下参数参数既参数潜在的潜在分布及其条件分布。该识别模型与每个观察到的变量的边际分布的非参数描述配对。因此,重点是学习一种良好的潜在表示,该表示可以捕获测量值之间的依赖性。 RPM允许在具有离散潜在的设置和可牵引力的设置中进行精确的最大似然学习,即使连续观测和潜在的映射是通过灵活的模型(例如神经网络)表示的。我们开发有效的近似值,以具有可拖动先验的连续潜在变量。与诸如Helmholtz机器和变异自动编码器之类的双聚材料模型中所需的近似值不同,这些RPM近似仅引入次要偏置,这些偏置通常可能渐近地消失。此外,在潜在的先验上的棘手中,RPM可以与标准概率技术(例如变异贝叶斯)有效结合。我们在高维数据设置中演示了该模型,包括对MNIST数字的弱监督学习形式以及从感觉观察发现潜在地图的形式。 RPM提供了一种有效的方法来发现,代表和理由关于观察数据的潜在结构,即对动物和人工智能至关重要的功能。
translated by 谷歌翻译
无监督学习的一个关键目标是超越密度估计和样本生成,以揭示观察到的数据内固有的结构。这种结构可以用通过概率图形模型捕获的解释性潜在变量之间的相互作用模式表示。尽管结构化图形模型的学习历史悠久,但无监督模型中的许多最新工作都强调了灵活的基于深网的生成,要么将独立的潜在发电机转换为建模复杂数据,要么假设明显的观察到的变量来自不同的潜在节点。在这里,我们扩展了摊销的变异推理的输出,以在多个变量上纳入结构化因子,能够捕获观察到的潜在的后依赖潜在,从而导致“解释解释”,从而允许复杂的观察结果取决于结构性图的多个节点。我们表明,适当的参数化因子可以有效地与精美的图形结构中的变异消息合并。我们基于高斯过程因子分析模型实例化框架,并通过经验评估其对具有已知生成过程的合成数据的现有方法的改进。然后,我们将结构化模型拟合到来自自由移动啮齿动物海马的高维神经尖峰时间序列,表明该模型识别出与行为协变量相关的潜在信号。
translated by 谷歌翻译
我们建议通过Retracing学习,一种用于学习强化学习任务的国家代表性(和相关动态模型)的新型自我监督方法。除了前进方向的预测(重建)监督外,我们建议包括使用原始和撤回状态之间的循环一致性约束来包括“回归”转换,从而提高样本效率学习。此外,通过Retracing学习的学习明确地传播关于后向后转换的信息,以推断先前的状态,从而有助于更强的表示学习。我们介绍了周期一致性的世界模型(CCWM),通过在现有的基于模型的加强学习框架下实现的雷则来学习的具体实例化。此外,我们提出了一种新的自适应“截断”机制,用于抵消“不可逆转”过渡所带来的负面影响,使得通过回程学习可以最大效果。通过对连续控制基准的广泛实证研究,我们表明CCWM在样品效率和渐近性能方面实现了最先进的性能。
translated by 谷歌翻译
强化学习(RL)涉及在未知系统中执行探索性动作。这可以将学习代理放在危险且潜在的灾难性系统中。当前在RL中解决安全学习的方法同时权衡了安全探索和任务实现。在本文中,我们介绍了新一代的RL求解器,这些求解器学会最大程度地减少安全性违规行为,同时在安全政策可以容忍的范围内最大化任务奖励。我们的方法引入了一个新型的两人框架,用于安全RL,称为分配探索安全培训算法(DESTA)。 DESTA的核心是两种自适应代理之间的游戏:安全代理,其任务是最大程度地减少安全违规行为和任务代理,其目标是最大程度地提高环境奖励。具体而言,安全代理可以在任何给定点有选择地控制系统,以防止任务代理在任何其他州自由执行其策略时违反安全性。该框架使安全代理能够学会在培训和测试时间中最大程度地减少未来安全违规行为的某些行动,而任务代理人执行的动作可以最大程度地提高其他任何地方的任务绩效。从理论上讲,我们证明DESTA会汇合到稳定的点,从而最大程度地违反了对预验证的政策的行为。从经验上讲,我们表明了DESTA提高现有政策安全性的能力,其次,当对任务代理和安全代理人同时培训时,构建安全的RL政策。我们展示了DESTA在Lunar Lander和Openai Gym的Frozen Lake中的领先RL方法的出色表现。
translated by 谷歌翻译
我们研究了强化学习(RL)中的策略扩展值函数近似器(PEVFA),其扩展了传统的价值函数近似器(VFA),不仅将输入的输入(和动作)而且是一个显式策略表示。这样的扩展使PEVFA能够同时保留多个策略的值,并带来吸引人的特性,即\ \ emph {策略之间的值泛化}。我们正式分析了广义政策迭代(GPI)下的价值概括。从理论和经验镜头来看,PEVFA提供的广义值估计值可能对连续策略的真实值较低的初始近似误差,这预计将在GPI期间提高连续值近似。基于上述线索,我们介绍了一种新的GPI形式,PEVFA,利用了政策改进路径的价值泛化。此外,我们向RL策略提出了一个表示学习框架,提供了从策略网络参数或状态操作对中学习有效策略嵌入的几种方法。在我们的实验中,我们评估了PEVFA和政策代表学习在几个Openai健身房连续控制任务中提供的价值概括的效果。对于算法实现的代表性实例,在GPI的GPI范式下重新实现的近端策略优化(PPO)在大多数环境中对其VANILLA对应物的绩效改进约为40 \%。
translated by 谷歌翻译
以前的无监督句子嵌入研究集中在数据增强方法上,例如辍学和基于规则的句子转换方法。但是,这些方法限制了控制句子增强观点的细粒语义。这导致监督信号不足以捕获类似句子的语义相似性。在这项工作中,我们发现使用邻居句子可以捕获相似句子之间更准确的语义相似性。基于这一发现,我们提出了RankEncoder,该发现使用了输入句子和语料库中的句子之间的关系来训练无监督的句子编码器。我们从三个角度评估rankencoder:1)语义文本相似性性能,2)相似句子对的功效,以及3)rankencoder的普遍性。实验结果表明,与先前的最新性能相比,Rankencoder达到80.07 \%Spearman的相关性,绝​​对提高了1.1%。在类似的句子对上,改进更加显着,改善了1.73%。另外,我们证明了RankEncoder普遍适用于现有的无监督句子编码器。
translated by 谷歌翻译
Unsupervised domain adaptation (UDA) for semantic segmentation is a promising task freeing people from heavy annotation work. However, domain discrepancies in low-level image statistics and high-level contexts compromise the segmentation performance over the target domain. A key idea to tackle this problem is to perform both image-level and feature-level adaptation jointly. Unfortunately, there is a lack of such unified approaches for UDA tasks in the existing literature. This paper proposes a novel UDA pipeline for semantic segmentation that unifies image-level and feature-level adaptation. Concretely, for image-level domain shifts, we propose a global photometric alignment module and a global texture alignment module that align images in the source and target domains in terms of image-level properties. For feature-level domain shifts, we perform global manifold alignment by projecting pixel features from both domains onto the feature manifold of the source domain; and we further regularize category centers in the source domain through a category-oriented triplet loss and perform target domain consistency regularization over augmented target domain images. Experimental results demonstrate that our pipeline significantly outperforms previous methods. In the commonly tested GTA5$\rightarrow$Cityscapes task, our proposed method using Deeplab V3+ as the backbone surpasses previous SOTA by 8%, achieving 58.2% in mIoU.
translated by 谷歌翻译
Different people speak with diverse personalized speaking styles. Although existing one-shot talking head methods have made significant progress in lip sync, natural facial expressions, and stable head motions, they still cannot generate diverse speaking styles in the final talking head videos. To tackle this problem, we propose a one-shot style-controllable talking face generation framework. In a nutshell, we aim to attain a speaking style from an arbitrary reference speaking video and then drive the one-shot portrait to speak with the reference speaking style and another piece of audio. Specifically, we first develop a style encoder to extract dynamic facial motion patterns of a style reference video and then encode them into a style code. Afterward, we introduce a style-controllable decoder to synthesize stylized facial animations from the speech content and style code. In order to integrate the reference speaking style into generated videos, we design a style-aware adaptive transformer, which enables the encoded style code to adjust the weights of the feed-forward layers accordingly. Thanks to the style-aware adaptation mechanism, the reference speaking style can be better embedded into synthesized videos during decoding. Extensive experiments demonstrate that our method is capable of generating talking head videos with diverse speaking styles from only one portrait image and an audio clip while achieving authentic visual effects. Project Page: https://github.com/FuxiVirtualHuman/styletalk.
translated by 谷歌翻译
Witnessing the impressive achievements of pre-training techniques on large-scale data in the field of computer vision and natural language processing, we wonder whether this idea could be adapted in a grab-and-go spirit, and mitigate the sample inefficiency problem for visuomotor driving. Given the highly dynamic and variant nature of the input, the visuomotor driving task inherently lacks view and translation invariance, and the visual input contains massive irrelevant information for decision making, resulting in predominant pre-training approaches from general vision less suitable for the autonomous driving task. To this end, we propose PPGeo (Policy Pre-training via Geometric modeling), an intuitive and straightforward fully self-supervised framework curated for the policy pretraining in visuomotor driving. We aim at learning policy representations as a powerful abstraction by modeling 3D geometric scenes on large-scale unlabeled and uncalibrated YouTube driving videos. The proposed PPGeo is performed in two stages to support effective self-supervised training. In the first stage, the geometric modeling framework generates pose and depth predictions simultaneously, with two consecutive frames as input. In the second stage, the visual encoder learns driving policy representation by predicting the future ego-motion and optimizing with the photometric error based on current visual observation only. As such, the pre-trained visual encoder is equipped with rich driving policy related representations and thereby competent for multiple visuomotor driving tasks. Extensive experiments covering a wide span of challenging scenarios have demonstrated the superiority of our proposed approach, where improvements range from 2% to even over 100% with very limited data. Code and models will be available at https://github.com/OpenDriveLab/PPGeo.
translated by 谷歌翻译
Increasing research interests focus on sequential recommender systems, aiming to model dynamic sequence representation precisely. However, the most commonly used loss function in state-of-the-art sequential recommendation models has essential limitations. To name a few, Bayesian Personalized Ranking (BPR) loss suffers the vanishing gradient problem from numerous negative sampling and predictionbiases; Binary Cross-Entropy (BCE) loss subjects to negative sampling numbers, thereby it is likely to ignore valuable negative examples and reduce the training efficiency; Cross-Entropy (CE) loss only focuses on the last timestamp of the training sequence, which causes low utilization of sequence information and results in inferior user sequence representation. To avoid these limitations, in this paper, we propose to calculate Cumulative Cross-Entropy (CCE) loss over the sequence. CCE is simple and direct, which enjoys the virtues of painless deployment, no negative sampling, and effective and efficient training. We conduct extensive experiments on five benchmark datasets to demonstrate the effectiveness and efficiency of CCE. The results show that employing CCE loss on three state-of-the-art models GRU4Rec, SASRec, and S3-Rec can reach 125.63%, 69.90%, and 33.24% average improvement of full ranking NDCG@5, respectively. Using CCE, the performance curve of the models on the test data increases rapidly with the wall clock time, and is superior to that of other loss functions in almost the whole process of model training.
translated by 谷歌翻译