动态模型修剪是最近的方向,其允许不同的子网络中的部署过程中每个输入采样的推断。然而,当前的动态方法依赖于学习的连续通道通过诱导稀疏性损失通过正则化门控。这一提法介绍了平衡不同损失的复杂性(如任务的损失,正规化损失)。此外,基于正则化方法缺乏透明的折衷选择超参数,实现计算的预算。我们的贡献是双重的:1)分离任务和修剪培训。 2)简单的超参数选择,使训练前FLOPS减少估计。在神经科学的赫布理论的启发:“神经元一起火一起丝”,我们提出来预测基于其上一层的活化层口罩方法K过滤器。我们提出的问题,因为自监督二元分类问题。每个掩模预测模块被训练以预测,如果对数似然在当前层中的每个过滤器属于前k激活的过滤器。值k被动态地估计基于使用热图的质量的新颖标准每个输入。我们发现在几个神经结构,如VGG,RESNET和MobileNet上CIFAR和ImageNet数据集实验。在CIFAR,我们得出了类似的精度SOTA方法有15%和24%以上FLOPS减少。同样,在ImageNet,我们达到的精度低下降高达13%的改善FLOPS减少。
translated by 谷歌翻译
由于稀疏神经网络通常包含许多零权重,因此可以在不降低网络性能的情况下潜在地消除这些不必要的网络连接。因此,设计良好的稀疏神经网络具有显着降低拖鞋和计算资源的潜力。在这项工作中,我们提出了一种新的自动修剪方法 - 稀疏连接学习(SCL)。具体地,重量被重新参数化为可培训权重变量和二进制掩模的元素方向乘法。因此,由二进制掩模完全描述网络连接,其由单位步进函数调制。理论上,从理论上证明了使用直通估计器(STE)进行网络修剪的基本原理。这一原则是STE的代理梯度应该是积极的,确保掩模变量在其最小值处收敛。在找到泄漏的Relu后,SoftPlus和Identity Stes可以满足这个原理,我们建议采用SCL的身份STE以进行离散面膜松弛。我们发现不同特征的面具梯度非常不平衡,因此,我们建议将每个特征的掩模梯度标准化以优化掩码变量训练。为了自动训练稀疏掩码,我们将网络连接总数作为我们的客观函数中的正则化术语。由于SCL不需要由网络层设计人员定义的修剪标准或超级参数,因此在更大的假设空间中探讨了网络,以实现最佳性能的优化稀疏连接。 SCL克服了现有自动修剪方法的局限性。实验结果表明,SCL可以自动学习并选择各种基线网络结构的重要网络连接。 SCL培训的深度学习模型以稀疏性,精度和减少脚波特的SOTA人类设计和自动修剪方法训练。
translated by 谷歌翻译
卷积神经网络(CNN)具有一定量的参数冗余,滤波器修剪旨在去除冗余滤波器,并提供在终端设备上应用CNN的可能性。但是,以前的作品更加注重设计了滤波器重要性的评估标准,然后缩短了具有固定修剪率的重要滤波器或固定数量,以减少卷积神经网络的冗余。它不考虑为每层预留有多少筛选器是最合理的选择。从这个角度来看,我们通过搜索适当的过滤器(SNF)来提出新的过滤器修剪方法。 SNF专用于搜索每层的最合理的保留过滤器,然后是具有特定标准的修剪过滤器。它可以根据不同的拖鞋定制最合适的网络结构。通过我们的方法进行过滤器修剪导致CIFAR-10的最先进(SOTA)精度,并在Imagenet ILSVRC-2012上实现了竞争性能。基于Reset-56网络,在Top-中增加了0.14%的增加0.14% 1对CIFAR-10拖出的52.94%的精度为52.94%。在减少68.68%拖鞋时,CiFar-10上的修剪Resnet-110还提高了0.03%的1 0.03%的精度。对于Imagenet,我们将修剪速率设置为52.10%的拖鞋,前1个精度只有0.74%。该代码可以在https://github.com/pk-l/snf上获得。
translated by 谷歌翻译
The deployment of deep convolutional neural networks (CNNs) in many real world applications is largely hindered by their high computational cost. In this paper, we propose a novel learning scheme for CNNs to simultaneously 1) reduce the model size; 2) decrease the run-time memory footprint; and 3) lower the number of computing operations, without compromising accuracy. This is achieved by enforcing channel-level sparsity in the network in a simple but effective way. Different from many existing approaches, the proposed method directly applies to modern CNN architectures, introduces minimum overhead to the training process, and requires no special software/hardware accelerators for the resulting models. We call our approach network slimming, which takes wide and large networks as input models, but during training insignificant channels are automatically identified and pruned afterwards, yielding thin and compact models with comparable accuracy. We empirically demonstrate the effectiveness of our approach with several state-of-the-art CNN models, including VGGNet, ResNet and DenseNet, on various image classification datasets. For VGGNet, a multi-pass version of network slimming gives a 20× reduction in model size and a 5× reduction in computing operations.
translated by 谷歌翻译
过滤器修剪的目标是搜索不重要的过滤器以删除以便使卷积神经网络(CNNS)有效而不牺牲过程中的性能。挑战在于找到可以帮助确定每个过滤器关于神经网络的最终输出的重要或相关的信息的信息。在这项工作中,我们分享了我们的观察说,预先训练的CNN的批量标准化(BN)参数可用于估计激活输出的特征分布,而无需处理训练数据。在观察时,我们通过基于预先训练的CNN的BN参数评估每个滤波器的重要性来提出简单而有效的滤波修剪方法。 CiFar-10和Imagenet的实验结果表明,该方法可以在准确性下降和计算复杂性的计算复杂性和降低的折衷方面具有和不进行微调的卓越性能。
translated by 谷歌翻译
过滤器修剪方法通过去除选定的过滤器来引入结构稀疏性,因此对于降低复杂性特别有效。先前的作品从验证较小规范的过滤器的角度从经验修剪网络中造成了较小的最终结果贡献。但是,此类标准已被证明对过滤器的分布敏感,并且由于修剪后的容量差距是固定的,因此准确性可能很难恢复。在本文中,我们提出了一种称为渐近软簇修剪(ASCP)的新型过滤器修剪方法,以根据过滤器的相似性来识别网络的冗余。首先通过聚类来区分来自参数过度的网络的每个过滤器,然后重建以手动将冗余引入其中。提出了一些聚类指南,以更好地保留特征提取能力。重建后,允许更新过滤器,以消除错误选择的效果。此外,还采用了各种修剪率的衰减策略来稳定修剪过程并改善最终性能。通过逐渐在每个群集中生成更相同的过滤器,ASCP可以通过通道添加操作将其删除,几乎没有准确性下降。 CIFAR-10和Imagenet数据集的广泛实验表明,与许多最新算法相比,我们的方法可以取得竞争性结果。
translated by 谷歌翻译
Network pruning is widely used for reducing the heavy inference cost of deep models in low-resource settings. A typical pruning algorithm is a three-stage pipeline, i.e., training (a large model), pruning and fine-tuning. During pruning, according to a certain criterion, redundant weights are pruned and important weights are kept to best preserve the accuracy. In this work, we make several surprising observations which contradict common beliefs. For all state-of-the-art structured pruning algorithms we examined, fine-tuning a pruned model only gives comparable or worse performance than training that model with randomly initialized weights. For pruning algorithms which assume a predefined target network architecture, one can get rid of the full pipeline and directly train the target network from scratch. Our observations are consistent for multiple network architectures, datasets, and tasks, which imply that: 1) training a large, over-parameterized model is often not necessary to obtain an efficient final model, 2) learned "important" weights of the large model are typically not useful for the small pruned model, 3) the pruned architecture itself, rather than a set of inherited "important" weights, is more crucial to the efficiency in the final model, which suggests that in some cases pruning can be useful as an architecture search paradigm. Our results suggest the need for more careful baseline evaluations in future research on structured pruning methods. We also compare with the "Lottery Ticket Hypothesis" (Frankle & Carbin, 2019), and find that with optimal learning rate, the "winning ticket" initialization as used in Frankle & Carbin (2019) does not bring improvement over random initialization. * Equal contribution. † Work done while visiting UC Berkeley.
translated by 谷歌翻译
通道修剪被广泛用于降低深网模型的复杂性。最近的修剪方法通常通过提出通道重要性标准来识别网络的哪些部分。但是,最近的研究表明,这些标准在所有情况下都不能很好地工作。在本文中,我们提出了一种新颖的功能最小化方法(FSM)方法来压缩CNN模型,该模型通过收敛功能和过滤器的信息来评估特征转移。具体而言,我们首先使用不同层深度的一些普遍方法研究压缩效率,然后提出特征转移概念。然后,我们引入了一种近似方法来估计特征移位的幅度,因为很难直接计算它。此外,我们提出了一种分布优化算法,以补偿准确性损失并提高网络压缩效率。该方法在各种基准网络和数据集上产生最先进的性能,并通过广泛的实验验证。这些代码可以在\ url {https://github.com/lscgx/fsm}上可用。
translated by 谷歌翻译
The success of CNNs in various applications is accompanied by a significant increase in the computation and parameter storage costs. Recent efforts toward reducing these overheads involve pruning and compressing the weights of various layers without hurting original accuracy. However, magnitude-based pruning of weights reduces a significant number of parameters from the fully connected layers and may not adequately reduce the computation costs in the convolutional layers due to irregular sparsity in the pruned networks. We present an acceleration method for CNNs, where we prune filters from CNNs that are identified as having a small effect on the output accuracy. By removing whole filters in the network together with their connecting feature maps, the computation costs are reduced significantly. In contrast to pruning weights, this approach does not result in sparse connectivity patterns. Hence, it does not need the support of sparse convolution libraries and can work with existing efficient BLAS libraries for dense matrix multiplications. We show that even simple filter pruning techniques can reduce inference costs for VGG-16 by up to 34% and ResNet-110 by up to 38% on CIFAR10 while regaining close to the original accuracy by retraining the networks.
translated by 谷歌翻译
在过去几年中,神经网络的性能在越来越多的浮点操作(拖鞋)的成本上显着提高。但是,当计算资源有限时,更多的拖鞋可能是一个问题。作为解决这个问题的尝试,修剪过滤器是一种常见的解决方案,但大多数现有的修剪方法不有效地保持模型精度,因此需要大量的芬降时期。在本文中,我们提出了一种自动修剪方法,该方法学习保存的神经元以保持模型精度,同时将絮凝到预定目标。为了完成这项任务,我们介绍了一种可训练的瓶颈,只需要一个单一的单一时期,只需要一个数据集的25.6%(Cifar-10)或7.49%(ILSVRC2012)来了解哪些过滤器。在各种架构和数据集上的实验表明,该方法不仅可以在修剪后保持精度,而且在FineTuning之后也优越现有方法。我们在Reset-50上达到了52.00%的拖鞋,在ILSVRC2012上的灌溉后的前1个精度为47.51%,最先进的(SOTA)精度为76.63%。代码可用(链接匿名审核)。
translated by 谷歌翻译
This paper proposed a Soft Filter Pruning (SFP) method to accelerate the inference procedure of deep Convolutional Neural Networks (CNNs). Specifically, the proposed SFP enables the pruned filters to be updated when training the model after pruning. SFP has two advantages over previous works: (1) Larger model capacity. Updating previously pruned filters provides our approach with larger optimization space than fixing the filters to zero. Therefore, the network trained by our method has a larger model capacity to learn from the training data. (2) Less dependence on the pretrained model. Large capacity enables SFP to train from scratch and prune the model simultaneously. In contrast, previous filter pruning methods should be conducted on the basis of the pre-trained model to guarantee their performance. Empirically, SFP from scratch outperforms the previous filter pruning methods. Moreover, our approach has been demonstrated effective for many advanced CNN architectures. Notably, on ILSCRC-2012, SFP reduces more than 42% FLOPs on ResNet-101 with even 0.2% top-5 accuracy improvement, which has advanced the state-of-the-art. Code is publicly available on GitHub: https://github.com/he-y/softfilter-pruning
translated by 谷歌翻译
Structured channel pruning has been shown to significantly accelerate inference time for convolution neural networks (CNNs) on modern hardware, with a relatively minor loss of network accuracy. Recent works permanently zero these channels during training, which we observe to significantly hamper final accuracy, particularly as the fraction of the network being pruned increases. We propose Soft Masking for cost-constrained Channel Pruning (SMCP) to allow pruned channels to adaptively return to the network while simultaneously pruning towards a target cost constraint. By adding a soft mask re-parameterization of the weights and channel pruning from the perspective of removing input channels, we allow gradient updates to previously pruned channels and the opportunity for the channels to later return to the network. We then formulate input channel pruning as a global resource allocation problem. Our method outperforms prior works on both the ImageNet classification and PASCAL VOC detection datasets.
translated by 谷歌翻译
现有的可区分通道修剪方法通常将缩放因子或掩模在通道后面的掩盖范围内,以减少重要性的修剪过滤器,并假设输入样品统一贡献以过滤重要性。具体而言,实例复杂性对修剪性能的影响尚未得到充分研究。在本文中,我们提出了一个基于实例复杂性滤波器的重要性得分的简单而有效的可区分网络修剪方法上限。我们通过给硬样品给出更高的权重来定义每个样品的实例复杂性与重量相关的重量,并测量样品特异性软膜的加权总和,以模拟不同输入的非均匀贡献,这鼓励硬样品主导修剪过程和模型性能保存完好。此外,我们还引入了一个新的正规器,以鼓励面具两极分化,以便很容易找到甜蜜的位置以识别要修剪的过滤器。各种网络体系结构和数据集的性能评估表明,CAP在修剪大型网络方面具有优势。例如,CAP在删除65.64%的拖鞋后,CAP在CIFAR-10数据集上的RESNET56的准确性提高了0.33%,而Prunes在ImagEnet数据集上的RESNET50的PRUNES 87.75%,只有0.89%的TOP-1精度损失。
translated by 谷歌翻译
Many applications require sparse neural networks due to space or inference time restrictions. There is a large body of work on training dense networks to yield sparse networks for inference, but this limits the size of the largest trainable sparse model to that of the largest trainable dense model. In this paper we introduce a method to train sparse neural networks with a fixed parameter count and a fixed computational cost throughout training, without sacrificing accuracy relative to existing dense-tosparse training methods. Our method updates the topology of the sparse network during training by using parameter magnitudes and infrequent gradient calculations. We show that this approach requires fewer floating-point operations (FLOPs) to achieve a given level of accuracy compared to prior techniques. We demonstrate state-of-the-art sparse training results on a variety of networks and datasets, including ResNet-50, MobileNets on Imagenet-2012, and RNNs on WikiText-103. Finally, we provide some insights into why allowing the topology to change during the optimization can overcome local minima encountered when the topology remains static * .
translated by 谷歌翻译
尽管现在使用自我监督方法构建的计算机视觉模型现在很普遍,但仍然存在一些重要问题。自我监督的模型是否学习高度冗余的频道功能?如果一个自我监督的网络可以动态选择重要的渠道并摆脱不必要的渠道怎么办?目前,与计算机视觉中的有监督的对手相比,通过自我训练预先训练的Convnet在下游任务上获得了可比的性能。但是,有一些自我监督模型的缺点,包括大量参数,计算昂贵的培训策略以及对下游任务更快推断的明确需求。在这项工作中,我们的目标是通过研究如何将用于监督学习的标准渠道选择方法应用于经过自学训练的网络。我们验证我们在一系列目标预算上验证我们的发现$ t_ {d} $,用于跨不同数据集的图像分类任务的频道计算,特别是CIFAR-10,CIFAR-100和IMAGENET-100,获得了与原始网络的可比性性能when selecting all channels but at a significant reduction in computation reported in terms of FLOPs.
translated by 谷歌翻译
深度学习技术在各种任务中都表现出了出色的有效性,并且深度学习具有推进多种应用程序(包括在边缘计算中)的潜力,其中将深层模型部署在边缘设备上,以实现即时的数据处理和响应。一个关键的挑战是,虽然深层模型的应用通常会产生大量的内存和计算成本,但Edge设备通常只提供非常有限的存储和计算功能,这些功能可能会在各个设备之间差异很大。这些特征使得难以构建深度学习解决方案,以释放边缘设备的潜力,同时遵守其约束。应对这一挑战的一种有希望的方法是自动化有效的深度学习模型的设计,这些模型轻巧,仅需少量存储,并且仅产生低计算开销。该调查提供了针对边缘计算的深度学习模型设计自动化技术的全面覆盖。它提供了关键指标的概述和比较,这些指标通常用于量化模型在有效性,轻度和计算成本方面的水平。然后,该调查涵盖了深层设计自动化技术的三类最新技术:自动化神经体系结构搜索,自动化模型压缩以及联合自动化设计和压缩。最后,调查涵盖了未来研究的开放问题和方向。
translated by 谷歌翻译
Structural pruning of neural network parameters reduces computation, energy, and memory transfer costs during inference. We propose a novel method that estimates the contribution of a neuron (filter) to the final loss and iteratively removes those with smaller scores. We describe two variations of our method using the first and secondorder Taylor expansions to approximate a filter's contribution. Both methods scale consistently across any network layer without requiring per-layer sensitivity analysis and can be applied to any kind of layer, including skip connections. For modern networks trained on ImageNet, we measured experimentally a high (>93%) correlation between the contribution computed by our methods and a reliable estimate of the true importance. Pruning with the proposed methods leads to an improvement over state-ofthe-art in terms of accuracy, FLOPs, and parameter reduction. On ResNet-101, we achieve a 40% FLOPS reduction by removing 30% of the parameters, with a loss of 0.02% in the top-1 accuracy on ImageNet. Code is available at https://github.com/NVlabs/Taylor_pruning.
translated by 谷歌翻译
Neural network pruning offers a promising prospect to facilitate deploying deep neural networks on resourcelimited devices. However, existing methods are still challenged by the training inefficiency and labor cost in pruning designs, due to missing theoretical guidance of non-salient network components. In this paper, we propose a novel filter pruning method by exploring the High Rank of feature maps (HRank). Our HRank is inspired by the discovery that the average rank of multiple feature maps generated by a single filter is always the same, regardless of the number of image batches CNNs receive. Based on HRank, we develop a method that is mathematically formulated to prune filters with low-rank feature maps. The principle behind our pruning is that low-rank feature maps contain less information, and thus pruned results can be easily reproduced. Besides, we experimentally show that weights with high-rank feature maps contain more important information, such that even when a portion is not updated, very little damage would be done to the model performance. Without introducing any additional constraints, HRank leads to significant improvements over the state-of-the-arts in terms of FLOPs and parameters reduction, with similar accuracies. For example, with ResNet-110, we achieve a 58.2%-FLOPs reduction by removing 59.2% of the parameters, with only a small loss of 0.14% in top-1 accuracy on CIFAR-10. With Res-50, we achieve a 43.8%-FLOPs reduction by removing 36.7% of the parameters, with only a loss of 1.17% in the top-1 accuracy on ImageNet. The codes can be available at https://github.com/lmbxmu/HRank.
translated by 谷歌翻译
We propose a new formulation for pruning convolutional kernels in neural networks to enable efficient inference. We interleave greedy criteria-based pruning with finetuning by backpropagation-a computationally efficient procedure that maintains good generalization in the pruned network. We propose a new criterion based on Taylor expansion that approximates the change in the cost function induced by pruning network parameters. We focus on transfer learning, where large pretrained networks are adapted to specialized tasks. The proposed criterion demonstrates superior performance compared to other criteria, e.g. the norm of kernel weights or feature map activation, for pruning large CNNs after adaptation to fine-grained classification tasks (Birds-200 and Flowers-102) relaying only on the first order gradient information. We also show that pruning can lead to more than 10× theoretical reduction in adapted 3D-convolutional filters with a small drop in accuracy in a recurrent gesture classifier. Finally, we show results for the largescale ImageNet dataset to emphasize the flexibility of our approach.
translated by 谷歌翻译
While machine learning is traditionally a resource intensive task, embedded systems, autonomous navigation, and the vision of the Internet of Things fuel the interest in resource-efficient approaches. These approaches aim for a carefully chosen trade-off between performance and resource consumption in terms of computation and energy. The development of such approaches is among the major challenges in current machine learning research and key to ensure a smooth transition of machine learning technology from a scientific environment with virtually unlimited computing resources into everyday's applications. In this article, we provide an overview of the current state of the art of machine learning techniques facilitating these real-world requirements. In particular, we focus on deep neural networks (DNNs), the predominant machine learning models of the past decade. We give a comprehensive overview of the vast literature that can be mainly split into three non-mutually exclusive categories: (i) quantized neural networks, (ii) network pruning, and (iii) structural efficiency. These techniques can be applied during training or as post-processing, and they are widely used to reduce the computational demands in terms of memory footprint, inference speed, and energy efficiency. We also briefly discuss different concepts of embedded hardware for DNNs and their compatibility with machine learning techniques as well as potential for energy and latency reduction. We substantiate our discussion with experiments on well-known benchmark datasets using compression techniques (quantization, pruning) for a set of resource-constrained embedded systems, such as CPUs, GPUs and FPGAs. The obtained results highlight the difficulty of finding good trade-offs between resource efficiency and predictive performance.
translated by 谷歌翻译