Image super-resolution is a common task on mobile and IoT devices, where one often needs to upscale and enhance low-resolution images and video frames. While numerous solutions have been proposed for this problem in the past, they are usually not compatible with low-power mobile NPUs having many computational and memory constraints. In this Mobile AI challenge, we address this problem and propose the participants to design an efficient quantized image super-resolution solution that can demonstrate a real-time performance on mobile NPUs. The participants were provided with the DIV2K dataset and trained INT8 models to do a high-quality 3X image upscaling. The runtime of all models was evaluated on the Synaptics VS680 Smart Home board with a dedicated edge NPU capable of accelerating quantized neural networks. All proposed solutions are fully compatible with the above NPU, demonstrating an up to 60 FPS rate when reconstructing Full HD resolution images. A detailed description of all models developed in the challenge is provided in this paper.
translated by 谷歌翻译
联邦学习是一种广泛采用的方法,可以通过分布式数据训练神经网络。一个主要限制是数据异构地分布时发生的性能下降。虽然许多作品已经尝试解决这个问题,但这些方法是因为它们的内容而不是对神经网络的理解。在这项工作中,我们验证了神经网络中只有某些重要层数需要正规化以获得有效的培训。我们还验证了中心内核对齐(CKA)最精确地计算在不同数据上培训的神经网络层之间的相似性。通过在培训期间将基于CKA的正则化应用于重要层,我们显着提高了异构环境的性能。我们展示了Fedcka:一个简单的框架,在各种深度学习任务上出于以前的最先进方法,同时提高了效率和可扩展性。
translated by 谷歌翻译
开放式识别(OSR)假设未知实例在推理时间出现在蓝色中。 OSR的主要挑战是,模型对未知数的响应是完全无法预测的。此外,由于实例的难度级别不同,因此开放式设置的多样性使情况变得更加困难。因此,我们提出了一个新颖的框架,难以感知的模拟器(DIAS),该框架产生了具有不同难度水平的假货来模拟现实世界。我们首先在分​​类器的角度研究了生成对抗网络(GAN)的假货,并观察到这些伪造并不具有挑战性。这使我们通过对具有中等难题的甘恩产生的样品来定义难度的标准。为了产生难题的示例,我们介绍模仿者,模仿分类器的行为。此外,我们的修改后的gan和模仿者也分别产生了中等和易于缺陷的样品。结果,DIAS的表现优于AUROC和F-SCORE指标的最先进方法。我们的代码可在https://github.com/wjun0830/difficulty-aware-simulator上找到。
translated by 谷歌翻译
我们专注于在黑框设置中对模型的对抗性攻击的问题,攻击者旨在制作对受害者模型的查询访问有限的对抗性示例。现有的黑框攻击主要基于贪婪的算法,使用预先计算的关键位置来扰动,从而严重限制了搜索空间,并可能导致次优的解决方案。为此,我们提出了使用贝叶斯优化的查询有效的黑盒攻击,该贝叶斯优化使用自动相关性确定(ARD)分类内核动态计算重要位置。我们引入了块分解和历史次采样技术,以提高输入序列长时间时贝叶斯优化的可伸缩性。此外,我们开发了一种优化后算法,该算法找到了具有较小扰动大小的对抗示例。关于自然语言和蛋白质分类任务的实验表明,与先前的最新方法相比,我们的方法始终达到更高的攻击成功率,查询计数和修改率的显着降低。
translated by 谷歌翻译
现代消费电子设备已为其主要功能采用了深度学习的情报服务。供应商最近开始在设备上执行情报服务,以在设备中保存个人数据,降低网络和云成本。我们发现了通过使用用户数据更新神经网络的情况,而无需将数据暴露在设备中:设备培训。例如,我们可能会添加一个新课程,我的狗Alpha用于机器人真空吸尘器,适应用户口音的语音识别,让文本到语音说话,好像用户会说话。但是,目标设备的资源限制遇到了重大困难。我们建议NNTrainer,这是一个轻巧的设备培训框架。我们描述了NNTrainer实施的神经网络的优化技术,这些技术与传统一起评估。评估表明,NNTrainer可以将内存消耗降低至1/28,而不会恶化准确性或训练时间,并有效地个性化了对设备上的应用程序。 NNTrainer是跨平台和实用的开源软件,该软件正在作者隶属关系中部署到数百万个设备。
translated by 谷歌翻译
深度神经网络已成为现代图像识别系统的驱动力。然而,神经网络对抗对抗性攻击的脆弱性对受这些系统影响的人构成严重威胁。在本文中,我们专注于一个真实的威胁模型,中间对手恶意拦截和erturbs网页用户上传在线。这种类型的攻击可以在简单的性能下降之上提高严重的道德问题。为了防止这种攻击,我们设计了一种新的双层优化算法,该算法在对抗对抗扰动的自然图像附近找到点。CiFar-10和Imagenet的实验表明我们的方法可以有效地强制在给定的修改预算范围内的自然图像。我们还显示所提出的方法可以在共同使用随机平滑时提高鲁棒性。
translated by 谷歌翻译
In robotics and computer vision communities, extensive studies have been widely conducted regarding surveillance tasks, including human detection, tracking, and motion recognition with a camera. Additionally, deep learning algorithms are widely utilized in the aforementioned tasks as in other computer vision tasks. Existing public datasets are insufficient to develop learning-based methods that handle various surveillance for outdoor and extreme situations such as harsh weather and low illuminance conditions. Therefore, we introduce a new large-scale outdoor surveillance dataset named eXtremely large-scale Multi-modAl Sensor dataset (X-MAS) containing more than 500,000 image pairs and the first-person view data annotated by well-trained annotators. Moreover, a single pair contains multi-modal data (e.g. an IR image, an RGB image, a thermal image, a depth image, and a LiDAR scan). This is the first large-scale first-person view outdoor multi-modal dataset focusing on surveillance tasks to the best of our knowledge. We present an overview of the proposed dataset with statistics and present methods of exploiting our dataset with deep learning-based algorithms. The latest information on the dataset and our study are available at https://github.com/lge-robot-navi, and the dataset will be available for download through a server.
translated by 谷歌翻译
Training agents via off-policy deep reinforcement learning (RL) requires a large memory, named replay memory, that stores past experiences used for learning. These experiences are sampled, uniformly or non-uniformly, to create the batches used for training. When calculating the loss function, off-policy algorithms assume that all samples are of the same importance. In this paper, we hypothesize that training can be enhanced by assigning different importance for each experience based on their temporal-difference (TD) error directly in the training objective. We propose a novel method that introduces a weighting factor for each experience when calculating the loss function at the learning stage. In addition to improving convergence speed when used with uniform sampling, the method can be combined with prioritization methods for non-uniform sampling. Combining the proposed method with prioritization methods improves sampling efficiency while increasing the performance of TD-based off-policy RL algorithms. The effectiveness of the proposed method is demonstrated by experiments in six environments of the OpenAI Gym suite. The experimental results demonstrate that the proposed method achieves a 33%~76% reduction of convergence speed in three environments and an 11% increase in returns and a 3%~10% increase in success rate for other three environments.
translated by 谷歌翻译
A fundamental challenge to providing edge-AI services is the need for a machine learning (ML) model that achieves personalization (i.e., to individual clients) and generalization (i.e., to unseen data) properties concurrently. Existing techniques in federated learning (FL) have encountered a steep tradeoff between these objectives and impose large computational requirements on edge devices during training and inference. In this paper, we propose SplitGP, a new split learning solution that can simultaneously capture generalization and personalization capabilities for efficient inference across resource-constrained clients (e.g., mobile/IoT devices). Our key idea is to split the full ML model into client-side and server-side components, and impose different roles to them: the client-side model is trained to have strong personalization capability optimized to each client's main task, while the server-side model is trained to have strong generalization capability for handling all clients' out-of-distribution tasks. We analytically characterize the convergence behavior of SplitGP, revealing that all client models approach stationary points asymptotically. Further, we analyze the inference time in SplitGP and provide bounds for determining model split ratios. Experimental results show that SplitGP outperforms existing baselines by wide margins in inference time and test accuracy for varying amounts of out-of-distribution samples.
translated by 谷歌翻译
Diffusion-based generative models have achieved remarkable success in image generation. Their guidance formulation allows an external model to plug-and-play control the generation process for various tasks without fine-tuning the diffusion model. However, the direct use of publicly available off-the-shelf models for guidance fails due to their poor performance on noisy inputs. For that, the existing practice is to fine-tune the guidance models with labeled data corrupted with noises. In this paper, we argue that this practice has limitations in two aspects: (1) performing on inputs with extremely various noises is too hard for a single model; (2) collecting labeled datasets hinders scaling up for various tasks. To tackle the limitations, we propose a novel strategy that leverages multiple experts where each expert is specialized in a particular noise range and guides the reverse process at its corresponding timesteps. However, as it is infeasible to manage multiple networks and utilize labeled data, we present a practical guidance framework termed Practical Plug-And-Play (PPAP), which leverages parameter-efficient fine-tuning and data-free knowledge transfer. We exhaustively conduct ImageNet class conditional generation experiments to show that our method can successfully guide diffusion with small trainable parameters and no labeled data. Finally, we show that image classifiers, depth estimators, and semantic segmentation models can guide publicly available GLIDE through our framework in a plug-and-play manner.
translated by 谷歌翻译