最先进的多机构增强学习(MARL)方法为各种复杂问题提供了有希望的解决方案。然而,这些方法都假定代理执行同步的原始操作执行,因此它们不能真正可扩展到长期胜利的真实世界多代理/机器人任务,这些任务固有地要求代理/机器人以异步的理由,涉及有关高级动作选择的理由。不同的时间。宏观行动分散的部分可观察到的马尔可夫决策过程(MACDEC-POMDP)是在完全合作的多代理任务中不确定的异步决策的一般形式化。在本论文中,我们首先提出了MacDec-Pomdps的一组基于价值的RL方法,其中允许代理在三个范式中使用宏观成果功能执行异步学习和决策:分散学习和控制,集中学习,集中学习和控制,以及分散执行的集中培训(CTDE)。在上述工作的基础上,我们在三个训练范式下制定了一组基于宏观行动的策略梯度算法,在该训练范式下,允许代理以异步方式直接优化其参数化策略。我们在模拟和真实的机器人中评估了我们的方法。经验结果证明了我们在大型多代理问题中的方法的优势,并验证了我们算法在学习具有宏观actions的高质量和异步溶液方面的有效性。
translated by 谷歌翻译
在现实设置中跨多个代理的决策同步是有问题的,因为它要求代理等待其他代理人终止和交流有关终止的终止。理想情况下,代理应该学习和执行异步。这样的异步方法还允许暂时扩展的动作,这些操作可能会根据执行的情况和操作花费不同的时间。不幸的是,当前的策略梯度方法不适用于异步设置,因为他们认为代理在每个时间步骤中都同步推理了动作选择。为了允许异步学习和决策,我们制定了一组异步的多代理参与者 - 批判性方法,这些方法使代理可以在三个标准培训范式中直接优化异步策略:分散的学习,集中学习,集中学习和集中培训以进行分解执行。各种现实域中的经验结果(在模拟和硬件中)证明了我们在大型多代理问题中的优势,并验证了我们算法在学习高质量和异步解决方案方面的有效性。
translated by 谷歌翻译
分散执行的集中培训,其中培训是以集中的离线方式完成的,已成为多智能经纪增强学习中的流行解决方案范例。许多这样的方法采用了基于国家的批评者的演员 - 评论家的形式,因为集中式训练允许访问真正的系统状态,尽管在执行时间没有可用,但在训练期间可以有用。基于国家的评论家已成为一个共同的经验选择,尽管是一个具有有限的理论性理由或分析。在本文中,我们表明,国家基本批评者可以在政策梯度估计中引入偏差,可能会破坏算法的渐近保证。我们还表明,即使国家的批评者没有引入任何偏差,它们仍然可以导致更大的梯度方差,与常见的直觉相反。最后,我们通过比较了在实践中的影响,通过比较不同形式的集中评论家对广泛的共同基准,以及详细的各种环境特性与不同类型批评者的有效性有关。
translated by 谷歌翻译
风险的准确器官(OAR)分割对于减少治疗后并发症的放射治疗至关重要。达人指南推荐头部和颈部(H&N)区域的一套超过40桨的桨,然而,由于这项任务的可预测的禁止劳动力成本,大多数机构通过划定较小的桨子和忽视的少数,选择了大量简化的协议与其他桨相关的剂量分布。在这项工作中,我们提出了一种使用深度学习的新颖,自动化和高效的分层OAR分段(SOARS)系统,精确地描绘了一套全面的42 H&N OAR。 SOARS将42桨分层进入锚,中级和小型和硬质子类别,通过神经结构搜索(NAS)原则,专门为每个类别提供神经网络架构。我们在内在机构中使用176名培训患者建立了SOAR模型,并在六个不同的机构中独立评估了1327名外部患者。对于每个机构评估,它始终如一地表现出其他最先进的方法至少3-5%的骰子得分(在其他度量的相对误差减少36%)。更重要的是,广泛的多用户研究明显证明,98%的SOARE预测只需要非常轻微或没有直接临床验收的修订(节省90%的辐射脑神经工作负载),并且它们的分割和剂量准确度在于或小于帧 - 用户的变化。这些调查结果证实了H&N癌症放射疗法工作流OAR描绘过程的强烈临床适用性,提高了效率,全面性和质量。
translated by 谷歌翻译
政策梯度方法在多智能体增强学习中变得流行,但由于存在环境随机性和探索代理(即非公平性​​),它们遭受了高度的差异,这可能因信用分配难度而受到困扰。结果,需要一种方法,该方法不仅能够有效地解决上述两个问题,而且需要足够强大地解决各种任务。为此,我们提出了一种新的多代理政策梯度方法,称为强大的本地优势(ROLA)演员 - 评论家。 Rola允许每个代理人将个人动作值函数作为当地评论家,以及通过基于集中评论家的新型集中培训方法来改善环境不良。通过使用此本地批评,每个代理都计算基准,以减少对其策略梯度估计的差异,这导致含有其他代理的预期优势动作值,这些选项可以隐式提高信用分配。我们在各种基准测试中评估ROLA,并在许多最先进的多代理政策梯度算法上显示其鲁棒性和有效性。
translated by 谷歌翻译
用于分散执行的集中培训,其中代理商使用集中信息训练,但在线以分散的方式执行,在多智能体增强学习界中获得了普及。特别是,具有集中评论家和分散的演员的演员 - 批评方法是这个想法的常见实例。然而,即使它是许多算法的标准选择,也没有完全讨论和理解使用集中评论批读的影响。因此,我们正式分析集中和分散的批评批评方法,了解对评论家选择的影响。由于我们的理论使得不切实际的假设,我们还经验化地比较了广泛的环境中集中式和分散的批评方法来验证我们的理论并提供实用建议。我们展示了当前文献中集中评论家存在误解,并表明集中式评论家设计并不是严格用的,而是集中和分散的批评者具有不同的利弊,算法设计人员应该考虑到不同的利弊。
translated by 谷歌翻译
Weakly-supervised object localization aims to indicate the category as well as the scope of an object in an image given only the image-level labels. Most of the existing works are based on Class Activation Mapping (CAM) and endeavor to enlarge the discriminative area inside the activation map to perceive the whole object, yet ignore the co-occurrence confounder of the object and context (e.g., fish and water), which makes the model inspection hard to distinguish object boundaries. Besides, the use of CAM also brings a dilemma problem that the classification and localization always suffer from a performance gap and can not reach their highest accuracy simultaneously. In this paper, we propose a casual knowledge distillation method, dubbed KD-CI-CAM, to address these two under-explored issues in one go. More specifically, we tackle the co-occurrence context confounder problem via causal intervention (CI), which explores the causalities among image features, contexts, and categories to eliminate the biased object-context entanglement in the class activation maps. Based on the de-biased object feature, we additionally propose a multi-teacher causal distillation framework to balance the absorption of classification knowledge and localization knowledge during model training. Extensive experiments on several benchmarks demonstrate the effectiveness of KD-CI-CAM in learning clear object boundaries from confounding contexts and addressing the dilemma problem between classification and localization performance.
translated by 谷歌翻译
An increasing number of public datasets have shown a marked clinical impact on assessing anatomical structures. However, each of the datasets is small, partially labeled, and rarely investigates severe tumor subjects. Moreover, current models are limited to segmenting specific organs/tumors, which can not be extended to novel domains and classes. To tackle these limitations, we introduce embedding learned from Contrastive Language-Image Pre-training (CLIP) to segmentation models, dubbed the CLIP-Driven Universal Model. The Universal Model can better segment 25 organs and 6 types of tumors by exploiting the semantic relationship between abdominal structures. The model is developed from an assembly of 14 datasets with 3,410 CT scans and evaluated on 6,162 external CT scans from 3 datasets. We rank first on the public leaderboard of the Medical Segmentation Decathlon (MSD) and achieve the state-of-the-art results on Beyond The Cranial Vault (BTCV). Compared with dataset-specific models, the Universal Model is computationally more efficient (6x faster), generalizes better to CT scans from varying sites, and shows stronger transfer learning performance on novel tasks. The design of CLIP embedding enables the Universal Model to be easily extended to new classes without catastrophically forgetting the previously learned classes.
translated by 谷歌翻译
In this work, we tackle two vital tasks in automated driving systems, i.e., driver intent prediction and risk object identification from egocentric images. Mainly, we investigate the question: what would be good road scene-level representations for these two tasks? We contend that a scene-level representation must capture higher-level semantic and geometric representations of traffic scenes around ego-vehicle while performing actions to their destinations. To this end, we introduce the representation of semantic regions, which are areas where ego-vehicles visit while taking an afforded action (e.g., left-turn at 4-way intersections). We propose to learn scene-level representations via a novel semantic region prediction task and an automatic semantic region labeling algorithm. Extensive evaluations are conducted on the HDD and nuScenes datasets, and the learned representations lead to state-of-the-art performance for driver intention prediction and risk object identification.
translated by 谷歌翻译
New architecture GPUs like A100 are now equipped with multi-instance GPU (MIG) technology, which allows the GPU to be partitioned into multiple small, isolated instances. This technology provides more flexibility for users to support both deep learning training and inference workloads, but efficiently utilizing it can still be challenging. The vision of this paper is to provide a more comprehensive and practical benchmark study for MIG in order to eliminate the need for tedious manual benchmarking and tuning efforts. To achieve this vision, the paper presents MIGPerf, an open-source tool that streamlines the benchmark study for MIG. Using MIGPerf, the authors conduct a series of experiments, including deep learning training and inference characterization on MIG, GPU sharing characterization, and framework compatibility with MIG. The results of these experiments provide new insights and guidance for users to effectively employ MIG, and lay the foundation for further research on the orchestration of hybrid training and inference workloads on MIGs. The code and results are released on https://github.com/MLSysOps/MIGProfiler. This work is still in progress and more results will be published soon.
translated by 谷歌翻译