Decentralized learning with private data is a central problem in machine learning. We propose a novel distillation-based decentralized learning technique that allows multiple agents with private non-iid data to learn from each other, without having to share their data, weights or weight updates. Our approach is communication efficient, utilizes an unlabeled public dataset and uses multiple auxiliary heads for each client, greatly improving training efficiency in the case of heterogeneous data. This approach allows individual models to preserve and enhance performance on their private tasks while also dramatically improving their performance on the global aggregated data distribution. We study the effects of data and model architecture heterogeneity and the impact of the underlying communication graph topology on learning efficiency and show that our agents can significantly improve their performance compared to learning in isolation.
translated by 谷歌翻译
FM合成是一种众所周知的算法,用于从紧凑的设计原始素中生成复杂的音色。通常具有MIDI接口,通常是不切实际的,从音频源进行控制。另一方面,可区分的数字信号处理(DDSP)已通过深度神经网络(DNN)启用了细微的音频渲染,这些音频渲染学会了从任意声音输入中控制可区分的合成层。训练过程涉及一系列音频进行监督和光谱重建损失功能。这样的功能虽然非常适合匹配光谱振幅,但却存在缺乏俯仰方向,这可能会阻碍FM合成器参数的关节优化。在本文中,我们采取了步骤,从音频输入中连续控制良好的FM合成体系结构。首先,我们讨论一组设计约束,通过标准重建损失来简化可区分的FM合成器的光谱优化。接下来,我们介绍可区分的DX7(DDX7),这是一种轻巧的体系结构,可根据一组紧凑的参数来进行乐器声音的神经FM重新合成。我们在从URMP数据集中提取的仪器样品上训练该模型,并定量证明其针对选定基准测试的音频质量可比。
translated by 谷歌翻译
损失级别用于解释深度学习模型的决策过程。在这项工作中,我们通过遮挡输入的一部分并将遮挡输入的性能与原始输入进行比较来评估基于损失奖的归因方法。我们观察到,在某些条件下,阻塞输入的性能比测试数据集的原始性能更好。在声音和图像识别任务中观察到类似的行为。我们探索不同的损失授予归因方法,遮挡水平和替换值,以解释遮挡下性能改善的现象。
translated by 谷歌翻译
在这项工作中,我们提出了一种超大形态器,一种基于变压器的模型,用于几次学习,直接从支持样品产生卷积神经网络(CNN)的权重。由于小生成的CNN模型对特定任务的依赖性由高容量变压器模型编码,因此我们有效地将大型任务空间的复杂性与各个任务的复杂性分离。我们的方法对于小目标CNN架构特别有效,其中学习固定的通用任务无关的嵌入不是最佳的,并且在关于任务的信息可以调制所有模型参数时实现更好的性能。对于较大的模型,我们发现单独生成最后一层允许我们产生比使用最先进的方法获得的竞争或更好的结果,同时端到端可分辨率。最后,我们将我们的方法扩展到一个半监督的政权,利用支持集中的未标记样本,进一步提高少量射击性能。
translated by 谷歌翻译
We present the next generation of MobileNets based on a combination of complementary search techniques as well as a novel architecture design. MobileNetV3 is tuned to mobile phone CPUs through a combination of hardwareaware network architecture search (NAS) complemented by the NetAdapt algorithm and then subsequently improved through novel architecture advances. This paper starts the exploration of how automated search algorithms and network design can work together to harness complementary approaches improving the overall state of the art. Through this process we create two new MobileNet models for release: MobileNetV3-Large and MobileNetV3-Small which are targeted for high and low resource use cases. These models are then adapted and applied to the tasks of object detection and semantic segmentation. For the task of semantic segmentation (or any dense pixel prediction), we propose a new efficient segmentation decoder Lite Reduced Atrous Spatial Pyramid Pooling (LR-ASPP). We achieve new state of the art results for mobile classification, detection and segmentation. MobileNetV3-Large is 3.2% more accurate on ImageNet classification while reducing latency by 20% compared to MobileNetV2. MobileNetV3-Small is 6.6% more accurate compared to a MobileNetV2 model with comparable latency. MobileNetV3-Large detection is over 25% faster at roughly the same accuracy as Mo-bileNetV2 on COCO detection. MobileNetV3-Large LR-ASPP is 34% faster than MobileNetV2 R-ASPP at similar accuracy for Cityscapes segmentation.
translated by 谷歌翻译
Designing convolutional neural networks (CNN) for mobile devices is challenging because mobile models need to be small and fast, yet still accurate. Although significant efforts have been dedicated to design and improve mobile CNNs on all dimensions, it is very difficult to manually balance these trade-offs when there are so many architectural possibilities to consider. In this paper, we propose an automated mobile neural architecture search (MNAS) approach, which explicitly incorporate model latency into the main objective so that the search can identify a model that achieves a good trade-off between accuracy and latency. Unlike previous work, where latency is considered via another, often inaccurate proxy (e.g., FLOPS), our approach directly measures real-world inference latency by executing the model on mobile phones. To further strike the right balance between flexibility and search space size, we propose a novel factorized hierarchical search space that encourages layer diversity throughout the network. Experimental results show that our approach consistently outperforms state-of-the-art mobile CNN models across multiple vision tasks. On the ImageNet classification task, our MnasNet achieves 75.2% top-1 accuracy with 78ms latency on a Pixel phone, which is 1.8× faster than MobileNetV2 [29] with 0.5% higher accuracy and 2.3× faster than NASNet [36] with 1.2% higher accuracy. Our MnasNet also achieves better mAP quality than MobileNets for COCO object detection. Code is at https://github.com/tensorflow/tpu/ tree/master/models/official/mnasnet.
translated by 谷歌翻译
In this paper we describe a new mobile architecture, MobileNetV2, that improves the state of the art performance of mobile models on multiple tasks and benchmarks as well as across a spectrum of different model sizes. We also describe efficient ways of applying these mobile models to object detection in a novel framework we call SSDLite. Additionally, we demonstrate how to build mobile semantic segmentation models through a reduced form of DeepLabv3 which we call Mobile DeepLabv3.is based on an inverted residual structure where the shortcut connections are between the thin bottleneck layers. The intermediate expansion layer uses lightweight depthwise convolutions to filter features as a source of non-linearity. Additionally, we find that it is important to remove non-linearities in the narrow layers in order to maintain representational power. We demonstrate that this improves performance and provide an intuition that led to this design.Finally, our approach allows decoupling of the input/output domains from the expressiveness of the transformation, which provides a convenient framework for further analysis. We measure our performance on ImageNet [1] classification, COCO object detection [2], VOC image segmentation [3]. We evaluate the trade-offs between accuracy, and number of operations measured by multiply-adds (MAdd), as well as actual latency, and the number of parameters.
translated by 谷歌翻译
从单个放射学图像中学到的功能无法提供有关随着时间的流逝可能发生的病变以及多少变化的信息。从重复图像计算出的时间相关特征可以捕获这些变化,并通过其时间行为来识别恶性病变。但是,纵向医学成像提出了数据获取的稀疏,不规则时间间隔的独特挑战。虽然自我注意事项已被证明是时间序列和自然图像的一种多功能,有效的学习机制,但尚未探索其在稀疏,不规则采样的空​​间特征之间解释时间距离的潜力。在这项工作中,我们通过使用(1)连续时间的矢量嵌入和(2)时间强调自我注意力的权重来提出两种解释时间距离视觉变压器(VIT)。这两种算法是根据合成肺结节的良性与恶性肺癌区分和肺筛查计算机断层扫描研究(NLST)评估的。与标准VIT相比,评估合成结节的时间段VIT的实验表明,在对不规则采样的纵向图像进行分类方面有了基本改进。在从NLST筛选胸部CTS的交叉验证中,我们的方法(分别为0.785和0.786 AUC)显着超过了横截面方法(0.734 AUC)(0.734 AUC),并匹配领先的纵向医学成像算法(0.779 AUC)在良好的良性上的判别性能与恶性分类。这项工作代表了第一个基于自我注意的框架,用于对纵向医学图像进行分类。我们的代码可从https://github.com/tom1193/time-distance-transformer获得。
translated by 谷歌翻译
我们研究了从记录的匪徒反馈中进行额外学习的增强合奏模型。为了实现这一目标,我们提出了一种新的增强算法,该算法直接优化了对政策预期奖励的估计。我们分析了该算法,并证明,只要满足“弱”的学习条件,每轮增强的经验风险会随着每一轮增强而降低(可能是指数迅速)。我们进一步展示了基础学习者如何减少标准监督学习问题。实验表明,我们的算法可以胜过仅在观察到的奖励上回归的深层外部学习和方法,从而证明了增强和选择正确的学习目标的好处。
translated by 谷歌翻译
肺部以外的视野(FOV)组织截断在常规的肺筛查计算机断层扫描(CT)中很常见。这对机会性CT的身体组成(BC)评估构成了局限性,因为缺少关键的解剖结构。传统上,扩展CT的FOV被认为是使用有限数据的CT重建问题。但是,这种方法依赖于应用程序中可能无法使用的投影域数据。在这项工作中,我们从语义图像扩展角度提出问题,该角度仅需要图像数据作为输入。提出的两阶段方法根据完整体的估计范围识别新的FOV边框,并在截短区域中渗出了缺失的组织。使用在FOV中具有完整主体的CT切片对训练样品进行模拟,从而使模型开发自制。我们使用有限FOV的肺筛选CT评估了所提出的方法在自动BC评估中的有效性。提出的方法有效地恢复了缺失的组织并减少了FOV组织截断引入的BC评估误差。在大规模肺部筛查CT数据集的BC评估中,这种校正既可以提高受试者内的一致性和与人体测量近似值的相关性。已开发的方法可在https://github.com/masilab/s-efov上获得。
translated by 谷歌翻译