在本文中,我们提出了用于卷积神经网络的可分散的信道稀疏性搜索(DCS)。与需要用户手动设置每个卷积层的紫星比的传统信道修剪算法不同,DCSS自动搜索稀疏的最佳组合。灵感来自可怜的架构搜索(飞镖),我们从连续放松中汲取课程,并利用梯度信息来平衡计算成本和指标。由于直接应用飞镖方案引起形状不匹配和过度的记忆消耗,因此在过滤器内引入一种名为重量共享的新技术。这种技术优雅地消除了具有可忽略额外资源的形状不匹配的问题。我们不仅开展全面的实验,不仅是图像分类,还可以找到包括语义分割和图像超分辨率的粒度任务,以验证DCSS的有效性。与以前的网络修剪方法相比,DCSS实现了图像分类的最先进结果。语义分割和图像超分辨率的实验结果表明,特定于任务特定搜索的性能比转移超薄模型实现了更好的性能,展示了广泛的适用性和高效率的DCSS。
translated by 谷歌翻译
卷积神经网络(CNN)具有一定量的参数冗余,滤波器修剪旨在去除冗余滤波器,并提供在终端设备上应用CNN的可能性。但是,以前的作品更加注重设计了滤波器重要性的评估标准,然后缩短了具有固定修剪率的重要滤波器或固定数量,以减少卷积神经网络的冗余。它不考虑为每层预留有多少筛选器是最合理的选择。从这个角度来看,我们通过搜索适当的过滤器(SNF)来提出新的过滤器修剪方法。 SNF专用于搜索每层的最合理的保留过滤器,然后是具有特定标准的修剪过滤器。它可以根据不同的拖鞋定制最合适的网络结构。通过我们的方法进行过滤器修剪导致CIFAR-10的最先进(SOTA)精度,并在Imagenet ILSVRC-2012上实现了竞争性能。基于Reset-56网络,在Top-中增加了0.14%的增加0.14% 1对CIFAR-10拖出的52.94%的精度为52.94%。在减少68.68%拖鞋时,CiFar-10上的修剪Resnet-110还提高了0.03%的1 0.03%的精度。对于Imagenet,我们将修剪速率设置为52.10%的拖鞋,前1个精度只有0.74%。该代码可以在https://github.com/pk-l/snf上获得。
translated by 谷歌翻译
深度学习技术在各种任务中都表现出了出色的有效性,并且深度学习具有推进多种应用程序(包括在边缘计算中)的潜力,其中将深层模型部署在边缘设备上,以实现即时的数据处理和响应。一个关键的挑战是,虽然深层模型的应用通常会产生大量的内存和计算成本,但Edge设备通常只提供非常有限的存储和计算功能,这些功能可能会在各个设备之间差异很大。这些特征使得难以构建深度学习解决方案,以释放边缘设备的潜力,同时遵守其约束。应对这一挑战的一种有希望的方法是自动化有效的深度学习模型的设计,这些模型轻巧,仅需少量存储,并且仅产生低计算开销。该调查提供了针对边缘计算的深度学习模型设计自动化技术的全面覆盖。它提供了关键指标的概述和比较,这些指标通常用于量化模型在有效性,轻度和计算成本方面的水平。然后,该调查涵盖了深层设计自动化技术的三类最新技术:自动化神经体系结构搜索,自动化模型压缩以及联合自动化设计和压缩。最后,调查涵盖了未来研究的开放问题和方向。
translated by 谷歌翻译
Low-rankness plays an important role in traditional machine learning, but is not so popular in deep learning. Most previous low-rank network compression methods compress the networks by approximating pre-trained models and re-training. However, the optimal solution in the Euclidean space may be quite different from the one in the low-rank manifold. A well-pre-trained model is not a good initialization for the model with low-rank constraints. Thus, the performance of a low-rank compressed network degrades significantly. Compared to other network compression methods such as pruning, low-rank methods attracts less attention in recent years. In this paper, we devise a new training method, low-rank projection with energy transfer (LRPET), that trains low-rank compressed networks from scratch and achieves competitive performance. First, we propose to alternately perform stochastic gradient descent training and projection onto the low-rank manifold. Compared to re-training on the compact model, this enables full utilization of model capacity since solution space is relaxed back to Euclidean space after projection. Second, the matrix energy (the sum of squares of singular values) reduction caused by projection is compensated by energy transfer. We uniformly transfer the energy of the pruned singular values to the remaining ones. We theoretically show that energy transfer eases the trend of gradient vanishing caused by projection. Third, we propose batch normalization (BN) rectification to cut off its effect on the optimal low-rank approximation of the weight matrix, which further improves the performance. Comprehensive experiments on CIFAR-10 and ImageNet have justified that our method is superior to other low-rank compression methods and also outperforms recent state-of-the-art pruning methods. Our code is available at https://github.com/BZQLin/LRPET.
translated by 谷歌翻译
过滤器修剪方法通过去除选定的过滤器来引入结构稀疏性,因此对于降低复杂性特别有效。先前的作品从验证较小规范的过滤器的角度从经验修剪网络中造成了较小的最终结果贡献。但是,此类标准已被证明对过滤器的分布敏感,并且由于修剪后的容量差距是固定的,因此准确性可能很难恢复。在本文中,我们提出了一种称为渐近软簇修剪(ASCP)的新型过滤器修剪方法,以根据过滤器的相似性来识别网络的冗余。首先通过聚类来区分来自参数过度的网络的每个过滤器,然后重建以手动将冗余引入其中。提出了一些聚类指南,以更好地保留特征提取能力。重建后,允许更新过滤器,以消除错误选择的效果。此外,还采用了各种修剪率的衰减策略来稳定修剪过程并改善最终性能。通过逐渐在每个群集中生成更相同的过滤器,ASCP可以通过通道添加操作将其删除,几乎没有准确性下降。 CIFAR-10和Imagenet数据集的广泛实验表明,与许多最新算法相比,我们的方法可以取得竞争性结果。
translated by 谷歌翻译
Network pruning is widely used for reducing the heavy inference cost of deep models in low-resource settings. A typical pruning algorithm is a three-stage pipeline, i.e., training (a large model), pruning and fine-tuning. During pruning, according to a certain criterion, redundant weights are pruned and important weights are kept to best preserve the accuracy. In this work, we make several surprising observations which contradict common beliefs. For all state-of-the-art structured pruning algorithms we examined, fine-tuning a pruned model only gives comparable or worse performance than training that model with randomly initialized weights. For pruning algorithms which assume a predefined target network architecture, one can get rid of the full pipeline and directly train the target network from scratch. Our observations are consistent for multiple network architectures, datasets, and tasks, which imply that: 1) training a large, over-parameterized model is often not necessary to obtain an efficient final model, 2) learned "important" weights of the large model are typically not useful for the small pruned model, 3) the pruned architecture itself, rather than a set of inherited "important" weights, is more crucial to the efficiency in the final model, which suggests that in some cases pruning can be useful as an architecture search paradigm. Our results suggest the need for more careful baseline evaluations in future research on structured pruning methods. We also compare with the "Lottery Ticket Hypothesis" (Frankle & Carbin, 2019), and find that with optimal learning rate, the "winning ticket" initialization as used in Frankle & Carbin (2019) does not bring improvement over random initialization. * Equal contribution. † Work done while visiting UC Berkeley.
translated by 谷歌翻译
边缘设备上卷积神经网络(CNN)的部署受到性能要求和可用处理能力之间的巨大差距的阻碍。尽管最近的研究在开发网络修剪方法以减少CNN的计算开销方面取得了长足的进步,但仍然存在相当大的准确性损失,尤其是在高修剪比率下。质疑为非封闭网络设计的架构可能对修剪网络没有效,我们建议通过定义新的搜索空间和新颖的搜索目标来搜索架构修剪方法。为了改善修剪网络的概括,我们提出了两个新型的原始孔和prunedlinearaare操作。具体而言,这些操作通过正规化修剪网络的目标函数来缓解不稳定梯度的问题。提出的搜索目标使我们能够培训有关修剪权重元素的体系结构参数。定量分析表明,我们的搜索架构优于在CIFAR-10和Imagenet上最先进的修剪网络中使用的体系结构。就硬件效率而言,PR-DARTS将Mobilenet-V2的准确性从73.44%提高到81.35%(+7.91%提高),并且运行3.87 $ \ times $的速度更快。
translated by 谷歌翻译
可区分架构搜索(飞镖)是基于解决双重优化问题的数据驱动神经网络设计的有效方法。尽管在许多体系结构搜索任务中取得了成功,但仍然担心一阶飞镖的准确性和二阶飞镖的效率。在本文中,我们制定了单个级别的替代方案和放松的体系结构搜索(RARTS)方法,该方法通过数据和网络拆分利用整个数据集在体系结构学习中,而无需涉及相应损失功能(如飞镖)的混合第二个衍生物。在我们制定网络拆分的过程中,两个具有不同但相关权重的网络在寻找共享体系结构时进行了合作。 RART比飞镖的优势通过收敛定理和可解析的模型证明是合理的。此外,RART在准确性和搜索效率方面优于飞镖及其变体,如足够的实验结果所示。对于搜索拓扑结构(即边缘和操作)的任务,RART获得了比CIFAR-10上的二阶Darts更高的精度和60 \%的计算成本降低。转移到Imagenet时,RART继续超越表演飞镖,并且与最近的飞镖变体相提并论,尽管我们的创新纯粹是在训练算法上,而无需修改搜索空间。对于搜索宽度的任务,即卷积层中的频道数量,RARTS还优于传统的网络修剪基准。关于公共体系结构搜索基准等NATS BENCH的进一步实验也支持RARTS的优势。
translated by 谷歌翻译
In this paper, we propose a novel meta learning approach for automatic channel pruning of very deep neural networks. We first train a PruningNet, a kind of meta network, which is able to generate weight parameters for any pruned structure given the target network. We use a simple stochastic structure sampling method for training the PruningNet. Then, we apply an evolutionary procedure to search for good-performing pruned networks. The search is highly efficient because the weights are directly generated by the trained PruningNet and we do not need any finetuning at search time. With a single PruningNet trained for the target network, we can search for various Pruned Networks under different constraints with little human participation. Compared to the state-of-the-art pruning methods, we have demonstrated superior performances on Mo-bileNet V1/V2 and ResNet. Codes are available on https: //github.com/liuzechun/MetaPruning. This work is done when Zechun Liu and Haoyuan Mu are interns at Megvii Technology.
translated by 谷歌翻译
The deployment of deep convolutional neural networks (CNNs) in many real world applications is largely hindered by their high computational cost. In this paper, we propose a novel learning scheme for CNNs to simultaneously 1) reduce the model size; 2) decrease the run-time memory footprint; and 3) lower the number of computing operations, without compromising accuracy. This is achieved by enforcing channel-level sparsity in the network in a simple but effective way. Different from many existing approaches, the proposed method directly applies to modern CNN architectures, introduces minimum overhead to the training process, and requires no special software/hardware accelerators for the resulting models. We call our approach network slimming, which takes wide and large networks as input models, but during training insignificant channels are automatically identified and pruned afterwards, yielding thin and compact models with comparable accuracy. We empirically demonstrate the effectiveness of our approach with several state-of-the-art CNN models, including VGGNet, ResNet and DenseNet, on various image classification datasets. For VGGNet, a multi-pass version of network slimming gives a 20× reduction in model size and a 5× reduction in computing operations.
translated by 谷歌翻译
卷积神经网络(CNN)已在许多物联网(IoT)设备中应用于多种下游任务。但是,随着边缘设备上的数据量的增加,CNN几乎无法及时完成某些任务,而计算和存储资源有限。最近,过滤器修剪被认为是压缩和加速CNN的有效技术,但是从压缩高维张量的角度来看,现有的方法很少是修剪CNN。在本文中,我们提出了一种新颖的理论,可以在三维张量中找到冗余信息,即量化特征图(QSFM)之间的相似性,并利用该理论来指导滤波器修剪过程。我们在数据集(CIFAR-10,CIFAR-100和ILSVRC-12)上执行QSFM和Edge设备,证明所提出的方法可以在神经网络中找到冗余信息,具有可比的压缩和可耐受的准确性下降。没有任何微调操作,QSFM可以显着压缩CIFAR-56(48.7%的Flops和57.9%的参数),而TOP-1的准确性仅损失0.54%。对于边缘设备的实际应用,QSFM可以将Mobilenet-V2推理速度加速1.53倍,而ILSVRC-12 TOP-1的精度仅损失1.23%。
translated by 谷歌翻译
Model compression is a critical technique to efficiently deploy neural network models on mobile devices which have limited computation resources and tight power budgets. Conventional model compression techniques rely on hand-crafted heuristics and rule-based policies that require domain experts to explore the large design space trading off among model size, speed, and accuracy, which is usually sub-optimal and time-consuming. In this paper, we propose AutoML for Model Compression (AMC) which leverage reinforcement learning to provide the model compression policy. This learning-based compression policy outperforms conventional rule-based compression policy by having higher compression ratio, better preserving the accuracy and freeing human labor. Under 4× FLOPs reduction, we achieved 2.7% better accuracy than the handcrafted model compression policy for VGG-16 on ImageNet. We applied this automated, push-the-button compression pipeline to MobileNet and achieved 1.81× speedup of measured inference latency on an Android phone and 1.43× speedup on the Titan XP GPU, with only 0.1% loss of ImageNet Top-1 accuracy.
translated by 谷歌翻译
近年来,通过开发大型的深层模型,图像修复任务已经见证了绩效的巨大提高。尽管表现出色,但深层模型要求的重量计算限制了图像恢复的应用。为了提高限制,需要减少网络的大小,同时保持准确性。最近,N:M结构化修剪似乎是使模型具有准确性约束的有效且实用的修剪方法之一。但是,它无法解释图像恢复网络不同层的不同计算复杂性和性能要求。为了进一步优化效率和恢复精度之间的权衡,我们提出了一种新型的修剪方法,该方法确定了每一层N:M结构稀疏性的修剪比。关于超分辨率和脱张任务的广泛实验结果证明了我们方法的功效,该方法的表现胜过以前的修剪方法。拟议方法的Pytorch实施将在https://github.com/junghunoh/sls_cvpr2r2022上公开获得。
translated by 谷歌翻译
可扩展的网络已经证明了它们在处理灾难性遗忘问题方面的优势。考虑到不同的任务可能需要不同的结构,最近的方法设计了通过复杂技能适应不同任务的动态结构。他们的例程是首先搜索可扩展的结构,然后训练新任务,但是,这将任务分为多个培训阶段,从而导致次优或过度计算成本。在本文中,我们提出了一个名为E2-AEN的端到端可训练的可自适应扩展网络,该网络动态生成了新任务的轻量级结构,而没有任何精确的先前任务下降。具体而言,该网络包含一个功能强大的功能适配器的序列,用于扩大以前学习的表示新任务的表示形式,并避免任务干扰。这些适配器是通过基于自适应门的修剪策略来控制的,该策略决定是否可以修剪扩展的结构,从而根据新任务的复杂性动态地改变网络结构。此外,我们引入了一种新颖的稀疏激活正则化,以鼓励模型学习具有有限参数的区分特征。 E2-aen可以降低成本,并且可以以端到端的方式建立在任何饲喂前架构上。关于分类(即CIFAR和VDD)和检测(即可可,VOC和ICCV2021 SSLAD挑战)的广泛实验证明了提出的方法的有效性,从而实现了新的出色结果。
translated by 谷歌翻译
网络压缩对于使深网的效率更高,更快且可推广到低端硬件至关重要。当前的网络压缩方法有两个开放问题:首先,缺乏理论框架来估计最大压缩率;其次,有些层可能会过多地进行,从而导致网络性能大幅下降。为了解决这两个问题,这项研究提出了一种基于梯度矩阵分析方法,以估计最大网络冗余。在最大速率的指导下,开发了一种新颖而有效的层次网络修剪算法,以最大程度地凝结神经元网络结构而无需牺牲网络性能。进行实质性实验以证明新方法修剪几个高级卷积神经网络(CNN)体系结构的功效。与现有的修剪方法相比,拟议的修剪算法实现了最先进的性能。与其他方法相比,在相同或相似的压缩比下,新方法提供了最高的网络预测准确性。
translated by 谷歌翻译
Deploying convolutional neural networks (CNNs) on embedded devices is difficult due to the limited memory and computation resources. The redundancy in feature maps is an important characteristic of those successful CNNs, but has rarely been investigated in neural architecture design. This paper proposes a novel Ghost module to generate more feature maps from cheap operations. Based on a set of intrinsic feature maps, we apply a series of linear transformations with cheap cost to generate many ghost feature maps that could fully reveal information underlying intrinsic features. The proposed Ghost module can be taken as a plug-and-play component to upgrade existing convolutional neural networks. Ghost bottlenecks are designed to stack Ghost modules, and then the lightweight Ghost-Net can be easily established. Experiments conducted on benchmarks demonstrate that the proposed Ghost module is an impressive alternative of convolution layers in baseline models, and our GhostNet can achieve higher recognition performance (e.g. 75.7% top-1 accuracy) than MobileNetV3 with similar computational cost on the ImageNet ILSVRC-2012 classification dataset. Code is available at https: //github.com/huawei-noah/ghostnet.
translated by 谷歌翻译
We propose an efficient and unified framework, namely ThiNet, to simultaneously accelerate and compress CNN models in both training and inference stages. We focus on the filter level pruning, i.e., the whole filter would be discarded if it is less important. Our method does not change the original network structure, thus it can be perfectly supported by any off-the-shelf deep learning libraries. We formally establish filter pruning as an optimization problem, and reveal that we need to prune filters based on statistics information computed from its next layer, not the current layer, which differentiates ThiNet from existing methods. Experimental results demonstrate the effectiveness of this strategy, which has advanced the state-of-the-art. We also show the performance of ThiNet on ILSVRC-12 benchmark. ThiNet achieves 3.31× FLOPs reduction and 16.63× compression on VGG-16, with only 0.52% top-5 accuracy drop. Similar experiments with ResNet-50 reveal that even for a compact network, ThiNet can also reduce more than half of the parameters and FLOPs, at the cost of roughly 1% top-5 accuracy drop. Moreover, the original VGG-16 model can be further pruned into a very small model with only 5.05MB model size, preserving AlexNet level accuracy but showing much stronger generalization ability.
translated by 谷歌翻译
现有的可区分通道修剪方法通常将缩放因子或掩模在通道后面的掩盖范围内,以减少重要性的修剪过滤器,并假设输入样品统一贡献以过滤重要性。具体而言,实例复杂性对修剪性能的影响尚未得到充分研究。在本文中,我们提出了一个基于实例复杂性滤波器的重要性得分的简单而有效的可区分网络修剪方法上限。我们通过给硬样品给出更高的权重来定义每个样品的实例复杂性与重量相关的重量,并测量样品特异性软膜的加权总和,以模拟不同输入的非均匀贡献,这鼓励硬样品主导修剪过程和模型性能保存完好。此外,我们还引入了一个新的正规器,以鼓励面具两极分化,以便很容易找到甜蜜的位置以识别要修剪的过滤器。各种网络体系结构和数据集的性能评估表明,CAP在修剪大型网络方面具有优势。例如,CAP在删除65.64%的拖鞋后,CAP在CIFAR-10数据集上的RESNET56的准确性提高了0.33%,而Prunes在ImagEnet数据集上的RESNET50的PRUNES 87.75%,只有0.89%的TOP-1精度损失。
translated by 谷歌翻译
While machine learning is traditionally a resource intensive task, embedded systems, autonomous navigation, and the vision of the Internet of Things fuel the interest in resource-efficient approaches. These approaches aim for a carefully chosen trade-off between performance and resource consumption in terms of computation and energy. The development of such approaches is among the major challenges in current machine learning research and key to ensure a smooth transition of machine learning technology from a scientific environment with virtually unlimited computing resources into everyday's applications. In this article, we provide an overview of the current state of the art of machine learning techniques facilitating these real-world requirements. In particular, we focus on deep neural networks (DNNs), the predominant machine learning models of the past decade. We give a comprehensive overview of the vast literature that can be mainly split into three non-mutually exclusive categories: (i) quantized neural networks, (ii) network pruning, and (iii) structural efficiency. These techniques can be applied during training or as post-processing, and they are widely used to reduce the computational demands in terms of memory footprint, inference speed, and energy efficiency. We also briefly discuss different concepts of embedded hardware for DNNs and their compatibility with machine learning techniques as well as potential for energy and latency reduction. We substantiate our discussion with experiments on well-known benchmark datasets using compression techniques (quantization, pruning) for a set of resource-constrained embedded systems, such as CPUs, GPUs and FPGAs. The obtained results highlight the difficulty of finding good trade-offs between resource efficiency and predictive performance.
translated by 谷歌翻译
混合精确的深神经网络达到了硬件部署所需的能源效率和吞吐量,尤其是在资源有限的情况下,而无需牺牲准确性。但是,不容易找到保留精度的最佳每层钻头精度,尤其是在创建巨大搜索空间的大量模型,数据集和量化技术中。为了解决这一困难,最近出现了一系列文献,并且已经提出了一些实现有希望的准确性结果的框架。在本文中,我们首先总结了文献中通常使用的量化技术。然后,我们对混合精液框架进行了彻底的调查,该调查是根据其优化技术进行分类的,例如增强学习和量化技术,例如确定性舍入。此外,讨论了每个框架的优势和缺点,我们在其中呈现并列。我们最终为未来的混合精液框架提供了指南。
translated by 谷歌翻译