智能机器之间合作的必要性已在人工智能(AI)研究界普及了合作的多代理增强学习(MARL)。但是,许多研究的努力一直集中在开发实用的MARL算法上,其有效性仅在经验上进行了研究,从而缺乏理论保证。正如最近的研究所表明的那样,MARL方法通常达到奖励单调性或收敛性次优的性能。为了解决这些问题,在本文中,我们介绍了一个名为异质的镜像学习(HAML)的新颖框架,该框架为MARL算法设计提供了一个通用模板。我们证明,源自HAML模板的算法满足了关节奖励的单调改善的所需特性以及与NASH平衡的收敛性。我们通过证明当前最新的合作社Marl算法,HATRPO和HAPKO实际上是HAML实例,来验证HAML的实用性。接下来,作为我们理论的自然结果,我们提出了两种众所周知的RL算法HAA2C(用于A2C)和HADDPG(用于DDPG)的HAML扩展,并证明了它们针对StarcraftII和多代理Mujoco任务的强大基准的有效性。
translated by 谷歌翻译
实现人类水平的灵活性是机器人技术中的重要开放问题。但是,即使在婴儿级别,灵巧的手动操纵任务也是通过增强学习(RL)的挑战。困难在于高度的自由度和异质因素(例如手指关节)之间所需的合作。在这项研究中,我们提出了双人灵感手基准(BI-DEXHANDS),这是一种模拟器,涉及两只灵巧的手,其中包含数十只双人操纵任务和数千个目标对象。具体而言,根据认知科学文献,BI-DEXHANDS中的任务旨在匹配不同级别的人类运动技能。我们在ISSAC体育馆里建造了Bi-Dexhands;这可以实现高效的RL培训,仅在一个NVIDIA RTX 3090中达到30,000+ fps。我们在不同的设置下为流行的RL算法提供了全面的基准;这包括单代理/多代理RL,离线RL,多任务RL和META RL。我们的结果表明,PPO类型的上车算法可以掌握简单的操纵任务,该任务等效到48个月的人类婴儿(例如,捕获飞行的物体,打开瓶子),而多代理RL可以进一步帮助掌握掌握需要熟练的双人合作的操作(例如,举起锅,堆叠块)。尽管每个任务都取得了成功,但在获得多个操纵技能方面,现有的RL算法无法在大多数多任务和少量学习设置中工作,这需要从RL社区进行更实质性的发展。我们的项目通过https://github.com/pku-marl/dexteroushands开放。
translated by 谷歌翻译
近年来,基于梯度的Meta-RL(GMRL)方法在发现一个单一任务的有效在线超参数中取得了显着的成功(XU等,2018)或学习多任务转移学习的良好初始化(Finn等人。 ,2017)。尽管有经验的成功,但经常被忽视,通过香草背交计算元梯度是不明定义的。在本文中,我们认为许多现有的MGRL方法采用的随机元梯度估计实际上是偏见的;偏差来自两个来源:1)在组成优化问题的结构中自然的成分偏差和2)由直接自动分化引起的多步粗糙估计的偏差。为了更好地了解元梯度偏差,我们首先执行其研究,以量化每个研究。我们首先为现有的GMRL算法提供统一的推导,然后理论上分析偏差和现有梯度估计方法的方差。了解偏见的基本原则,我们提出了两种缓解解决方案,基于脱离政策校正和多步理估计技术。已经进行了综合烧蚀研究,结果显示:(1)当与不同估计器/示例大小/步骤和学习率相结合时,它们的存在以及它们如何影响元梯度估计。 (2)这些缓解方法对Meta梯度估计的有效性,从而最终回报率两种实用的Meta-RL算法:Lola-Dice和Meta-梯度加固学习。
translated by 谷歌翻译
在解决双球员零和游戏时,多代理强化学习(MARL)算法通常会在每次迭代时创造代理人群,在每次迭代时,将被发现为对对手人口对混合的最佳响应。在这样的过程中,“遵循”(即对手混合物)和“如何击败它们”(即寻找最佳响应)的更新规则是由手动开发的游戏理论原则基础,如虚构的游戏和双倍甲骨文。在本文中,我们介绍了一种新颖的框架 - 神经自动课程(NAC) - 利用元梯度下降来自动化学习更新规则的发现,而无明确的人类设计。具体而言,我们通过优化子程序参数通过神经网络和最佳响应模块参数化对手选择模块,并通过与游戏引擎的交互仅更新其参数,其中播放器旨在最大限度地减少其利用性。令人惊讶的是,即使没有人类的设计,发现的Marl算法也可以通过基于最先进的人口的游戏,在技能游戏,可微分的乐透,不转化的混合物游戏中实现竞争或更好的性能,实现竞争或更好的性能。迭代匹配的便士和kuhn扑克。此外,我们表明NAC能够从小型游戏到大型游戏,例如Kuhn Poker培训,在LEDUC扑克上表现优于PSRO。我们的工作激发了一个未来的未来方向,以完全从数据发现一般的Marl算法。
translated by 谷歌翻译
Increasing research interests focus on sequential recommender systems, aiming to model dynamic sequence representation precisely. However, the most commonly used loss function in state-of-the-art sequential recommendation models has essential limitations. To name a few, Bayesian Personalized Ranking (BPR) loss suffers the vanishing gradient problem from numerous negative sampling and predictionbiases; Binary Cross-Entropy (BCE) loss subjects to negative sampling numbers, thereby it is likely to ignore valuable negative examples and reduce the training efficiency; Cross-Entropy (CE) loss only focuses on the last timestamp of the training sequence, which causes low utilization of sequence information and results in inferior user sequence representation. To avoid these limitations, in this paper, we propose to calculate Cumulative Cross-Entropy (CCE) loss over the sequence. CCE is simple and direct, which enjoys the virtues of painless deployment, no negative sampling, and effective and efficient training. We conduct extensive experiments on five benchmark datasets to demonstrate the effectiveness and efficiency of CCE. The results show that employing CCE loss on three state-of-the-art models GRU4Rec, SASRec, and S3-Rec can reach 125.63%, 69.90%, and 33.24% average improvement of full ranking NDCG@5, respectively. Using CCE, the performance curve of the models on the test data increases rapidly with the wall clock time, and is superior to that of other loss functions in almost the whole process of model training.
translated by 谷歌翻译
In the scenario of black-box adversarial attack, the target model's parameters are unknown, and the attacker aims to find a successful adversarial perturbation based on query feedback under a query budget. Due to the limited feedback information, existing query-based black-box attack methods often require many queries for attacking each benign example. To reduce query cost, we propose to utilize the feedback information across historical attacks, dubbed example-level adversarial transferability. Specifically, by treating the attack on each benign example as one task, we develop a meta-learning framework by training a meta-generator to produce perturbations conditioned on benign examples. When attacking a new benign example, the meta generator can be quickly fine-tuned based on the feedback information of the new task as well as a few historical attacks to produce effective perturbations. Moreover, since the meta-train procedure consumes many queries to learn a generalizable generator, we utilize model-level adversarial transferability to train the meta-generator on a white-box surrogate model, then transfer it to help the attack against the target model. The proposed framework with the two types of adversarial transferability can be naturally combined with any off-the-shelf query-based attack methods to boost their performance, which is verified by extensive experiments.
translated by 谷歌翻译
Supervised Deep-Learning (DL)-based reconstruction algorithms have shown state-of-the-art results for highly-undersampled dynamic Magnetic Resonance Imaging (MRI) reconstruction. However, the requirement of excessive high-quality ground-truth data hinders their applications due to the generalization problem. Recently, Implicit Neural Representation (INR) has appeared as a powerful DL-based tool for solving the inverse problem by characterizing the attributes of a signal as a continuous function of corresponding coordinates in an unsupervised manner. In this work, we proposed an INR-based method to improve dynamic MRI reconstruction from highly undersampled k-space data, which only takes spatiotemporal coordinates as inputs. Specifically, the proposed INR represents the dynamic MRI images as an implicit function and encodes them into neural networks. The weights of the network are learned from sparsely-acquired (k, t)-space data itself only, without external training datasets or prior images. Benefiting from the strong implicit continuity regularization of INR together with explicit regularization for low-rankness and sparsity, our proposed method outperforms the compared scan-specific methods at various acceleration factors. E.g., experiments on retrospective cardiac cine datasets show an improvement of 5.5 ~ 7.1 dB in PSNR for extremely high accelerations (up to 41.6-fold). The high-quality and inner continuity of the images provided by INR has great potential to further improve the spatiotemporal resolution of dynamic MRI, without the need of any training data.
translated by 谷歌翻译
Recent studies have shown that using an external Language Model (LM) benefits the end-to-end Automatic Speech Recognition (ASR). However, predicting tokens that appear less frequently in the training set is still quite challenging. The long-tail prediction problems have been widely studied in many applications, but only been addressed by a few studies for ASR and LMs. In this paper, we propose a new memory augmented lookup dictionary based Transformer architecture for LM. The newly introduced lookup dictionary incorporates rich contextual information in training set, which is vital to correctly predict long-tail tokens. With intensive experiments on Chinese and English data sets, our proposed method is proved to outperform the baseline Transformer LM by a great margin on both word/character error rate and tail tokens error rate. This is achieved without impact on the decoding efficiency. Overall, we demonstrate the effectiveness of our proposed method in boosting the ASR decoding performance, especially for long-tail tokens.
translated by 谷歌翻译
The objective of this paper is to learn dense 3D shape correspondence for topology-varying generic objects in an unsupervised manner. Conventional implicit functions estimate the occupancy of a 3D point given a shape latent code. Instead, our novel implicit function produces a probabilistic embedding to represent each 3D point in a part embedding space. Assuming the corresponding points are similar in the embedding space, we implement dense correspondence through an inverse function mapping from the part embedding vector to a corresponded 3D point. Both functions are jointly learned with several effective and uncertainty-aware loss functions to realize our assumption, together with the encoder generating the shape latent code. During inference, if a user selects an arbitrary point on the source shape, our algorithm can automatically generate a confidence score indicating whether there is a correspondence on the target shape, as well as the corresponding semantic point if there is one. Such a mechanism inherently benefits man-made objects with different part constitutions. The effectiveness of our approach is demonstrated through unsupervised 3D semantic correspondence and shape segmentation.
translated by 谷歌翻译
Patients take care of what their teeth will be like after the orthodontics. Orthodontists usually describe the expectation movement based on the original smile images, which is unconvincing. The growth of deep-learning generative models change this situation. It can visualize the outcome of orthodontic treatment and help patients foresee their future teeth and facial appearance. While previous studies mainly focus on 2D or 3D virtual treatment outcome (VTO) at a profile level, the problem of simulating treatment outcome at a frontal facial image is poorly explored. In this paper, we build an efficient and accurate system for simulating virtual teeth alignment effects in a frontal facial image. Our system takes a frontal face image of a patient with visible malpositioned teeth and the patient's 3D scanned teeth model as input, and progressively generates the visual results of the patient's teeth given the specific orthodontics planning steps from the doctor (i.e., the specification of translations and rotations of individual tooth). We design a multi-modal encoder-decoder based generative model to synthesize identity-preserving frontal facial images with aligned teeth. In addition, the original image color information is used to optimize the orthodontic outcomes, making the results more natural. We conduct extensive qualitative and clinical experiments and also a pilot study to validate our method.
translated by 谷歌翻译