在过去几年中,神经网络的性能在越来越多的浮点操作(拖鞋)的成本上显着提高。但是,当计算资源有限时,更多的拖鞋可能是一个问题。作为解决这个问题的尝试,修剪过滤器是一种常见的解决方案,但大多数现有的修剪方法不有效地保持模型精度,因此需要大量的芬降时期。在本文中,我们提出了一种自动修剪方法,该方法学习保存的神经元以保持模型精度,同时将絮凝到预定目标。为了完成这项任务,我们介绍了一种可训练的瓶颈,只需要一个单一的单一时期,只需要一个数据集的25.6%(Cifar-10)或7.49%(ILSVRC2012)来了解哪些过滤器。在各种架构和数据集上的实验表明,该方法不仅可以在修剪后保持精度,而且在FineTuning之后也优越现有方法。我们在Reset-50上达到了52.00%的拖鞋,在ILSVRC2012上的灌溉后的前1个精度为47.51%,最先进的(SOTA)精度为76.63%。代码可用(链接匿名审核)。
translated by 谷歌翻译
现有的可区分通道修剪方法通常将缩放因子或掩模在通道后面的掩盖范围内,以减少重要性的修剪过滤器,并假设输入样品统一贡献以过滤重要性。具体而言,实例复杂性对修剪性能的影响尚未得到充分研究。在本文中,我们提出了一个基于实例复杂性滤波器的重要性得分的简单而有效的可区分网络修剪方法上限。我们通过给硬样品给出更高的权重来定义每个样品的实例复杂性与重量相关的重量,并测量样品特异性软膜的加权总和,以模拟不同输入的非均匀贡献,这鼓励硬样品主导修剪过程和模型性能保存完好。此外,我们还引入了一个新的正规器,以鼓励面具两极分化,以便很容易找到甜蜜的位置以识别要修剪的过滤器。各种网络体系结构和数据集的性能评估表明,CAP在修剪大型网络方面具有优势。例如,CAP在删除65.64%的拖鞋后,CAP在CIFAR-10数据集上的RESNET56的准确性提高了0.33%,而Prunes在ImagEnet数据集上的RESNET50的PRUNES 87.75%,只有0.89%的TOP-1精度损失。
translated by 谷歌翻译
Neural network pruning offers a promising prospect to facilitate deploying deep neural networks on resourcelimited devices. However, existing methods are still challenged by the training inefficiency and labor cost in pruning designs, due to missing theoretical guidance of non-salient network components. In this paper, we propose a novel filter pruning method by exploring the High Rank of feature maps (HRank). Our HRank is inspired by the discovery that the average rank of multiple feature maps generated by a single filter is always the same, regardless of the number of image batches CNNs receive. Based on HRank, we develop a method that is mathematically formulated to prune filters with low-rank feature maps. The principle behind our pruning is that low-rank feature maps contain less information, and thus pruned results can be easily reproduced. Besides, we experimentally show that weights with high-rank feature maps contain more important information, such that even when a portion is not updated, very little damage would be done to the model performance. Without introducing any additional constraints, HRank leads to significant improvements over the state-of-the-arts in terms of FLOPs and parameters reduction, with similar accuracies. For example, with ResNet-110, we achieve a 58.2%-FLOPs reduction by removing 59.2% of the parameters, with only a small loss of 0.14% in top-1 accuracy on CIFAR-10. With Res-50, we achieve a 43.8%-FLOPs reduction by removing 36.7% of the parameters, with only a loss of 1.17% in the top-1 accuracy on ImageNet. The codes can be available at https://github.com/lmbxmu/HRank.
translated by 谷歌翻译
This paper proposed a Soft Filter Pruning (SFP) method to accelerate the inference procedure of deep Convolutional Neural Networks (CNNs). Specifically, the proposed SFP enables the pruned filters to be updated when training the model after pruning. SFP has two advantages over previous works: (1) Larger model capacity. Updating previously pruned filters provides our approach with larger optimization space than fixing the filters to zero. Therefore, the network trained by our method has a larger model capacity to learn from the training data. (2) Less dependence on the pretrained model. Large capacity enables SFP to train from scratch and prune the model simultaneously. In contrast, previous filter pruning methods should be conducted on the basis of the pre-trained model to guarantee their performance. Empirically, SFP from scratch outperforms the previous filter pruning methods. Moreover, our approach has been demonstrated effective for many advanced CNN architectures. Notably, on ILSCRC-2012, SFP reduces more than 42% FLOPs on ResNet-101 with even 0.2% top-5 accuracy improvement, which has advanced the state-of-the-art. Code is publicly available on GitHub: https://github.com/he-y/softfilter-pruning
translated by 谷歌翻译
过滤器修剪方法通过去除选定的过滤器来引入结构稀疏性,因此对于降低复杂性特别有效。先前的作品从验证较小规范的过滤器的角度从经验修剪网络中造成了较小的最终结果贡献。但是,此类标准已被证明对过滤器的分布敏感,并且由于修剪后的容量差距是固定的,因此准确性可能很难恢复。在本文中,我们提出了一种称为渐近软簇修剪(ASCP)的新型过滤器修剪方法,以根据过滤器的相似性来识别网络的冗余。首先通过聚类来区分来自参数过度的网络的每个过滤器,然后重建以手动将冗余引入其中。提出了一些聚类指南,以更好地保留特征提取能力。重建后,允许更新过滤器,以消除错误选择的效果。此外,还采用了各种修剪率的衰减策略来稳定修剪过程并改善最终性能。通过逐渐在每个群集中生成更相同的过滤器,ASCP可以通过通道添加操作将其删除,几乎没有准确性下降。 CIFAR-10和Imagenet数据集的广泛实验表明,与许多最新算法相比,我们的方法可以取得竞争性结果。
translated by 谷歌翻译
Structural pruning of neural network parameters reduces computation, energy, and memory transfer costs during inference. We propose a novel method that estimates the contribution of a neuron (filter) to the final loss and iteratively removes those with smaller scores. We describe two variations of our method using the first and secondorder Taylor expansions to approximate a filter's contribution. Both methods scale consistently across any network layer without requiring per-layer sensitivity analysis and can be applied to any kind of layer, including skip connections. For modern networks trained on ImageNet, we measured experimentally a high (>93%) correlation between the contribution computed by our methods and a reliable estimate of the true importance. Pruning with the proposed methods leads to an improvement over state-ofthe-art in terms of accuracy, FLOPs, and parameter reduction. On ResNet-101, we achieve a 40% FLOPS reduction by removing 30% of the parameters, with a loss of 0.02% in the top-1 accuracy on ImageNet. Code is available at https://github.com/NVlabs/Taylor_pruning.
translated by 谷歌翻译
Low-rankness plays an important role in traditional machine learning, but is not so popular in deep learning. Most previous low-rank network compression methods compress the networks by approximating pre-trained models and re-training. However, the optimal solution in the Euclidean space may be quite different from the one in the low-rank manifold. A well-pre-trained model is not a good initialization for the model with low-rank constraints. Thus, the performance of a low-rank compressed network degrades significantly. Compared to other network compression methods such as pruning, low-rank methods attracts less attention in recent years. In this paper, we devise a new training method, low-rank projection with energy transfer (LRPET), that trains low-rank compressed networks from scratch and achieves competitive performance. First, we propose to alternately perform stochastic gradient descent training and projection onto the low-rank manifold. Compared to re-training on the compact model, this enables full utilization of model capacity since solution space is relaxed back to Euclidean space after projection. Second, the matrix energy (the sum of squares of singular values) reduction caused by projection is compensated by energy transfer. We uniformly transfer the energy of the pruned singular values to the remaining ones. We theoretically show that energy transfer eases the trend of gradient vanishing caused by projection. Third, we propose batch normalization (BN) rectification to cut off its effect on the optimal low-rank approximation of the weight matrix, which further improves the performance. Comprehensive experiments on CIFAR-10 and ImageNet have justified that our method is superior to other low-rank compression methods and also outperforms recent state-of-the-art pruning methods. Our code is available at https://github.com/BZQLin/LRPET.
translated by 谷歌翻译
由于深度学习模型通常包含数百万可培训的权重,因此对更有效的网络结构具有越来越高的存储空间和提高的运行时效率。修剪是最受欢迎的网络压缩技术之一。在本文中,我们提出了一种新颖的非结构化修剪管线,基于关注的同时稀疏结构和体重学习(ASWL)。与传统的频道和体重注意机制不同,ASWL提出了一种有效的算法来计算每层的层次引起的修剪比率,并且跟踪密度网络和稀疏网络的两种权重,以便修剪结构是同时从随机初始化的权重学习。我们在Mnist,CiFar10和Imagenet上的实验表明,与最先进的网络修剪方法相比,ASWL在准确性,修剪比率和操作效率方面取得了卓越的修剪。
translated by 谷歌翻译
尽管在许多计算机视觉任务上具有卓越的性能,但深度卷积神经网络众所周知,在具有资源限制的设备上被压缩。大多数现有的网络修剪方法需要艰苦的人类努力和禁止的计算资源,特别是当约束改变时。当需要部署在各种设备上时,这实际上限制了模型压缩的应用。此外,现有的方法仍然受到缺失的理论指导挑战。在本文中,我们提出了一种信息理论启发的自动模型压缩策略。我们的方法背后的原理是信息瓶颈理论,即隐藏的表示应该彼此压缩信息。因此,我们在网络激活中介绍了标准化的Hilbert-Schmidt独立性标准(NHSIC),作为层重要性的稳定和广义指标。当给出某个资源约束时,我们将HSIC指示器与约束将架构搜索问题转换为具有二次约束的线性编程问题。这种问题很容易通过几秒钟的凸优化方法解决。我们还提供严格的证据,揭示优化归一化的HSIC同时最小化不同层之间的相互信息。没有任何搜索过程,我们的方法实现了与最先进的压缩算法相比的更好的压缩权衡。例如,通过Reset-50,我们达到了45.3%的杂志,在想象中有75.75前1个精度。代码是在https://github.com/mac-automl/itpruner/tree/master上的途径。
translated by 谷歌翻译
神经网络修剪具有显着性能,可以降低深网络模型的复杂性。最近的网络修剪方法通常集中在网络中删除不重要或冗余过滤器。在本文中,通过探索特征图之间的相似性,我们提出了一种新颖的滤波器修剪方法,中央滤波器(CF),这表明在适当的调整之后滤波器大致等于一组其他滤波器。我们的方法基于发现特征贴图之间的平均相似性的发现,而不管输入图像的数量如何,都会很少变化。基于此发现,我们在特征映射上建立相似性图,并计算每个节点的近密中心以选择中央滤波器。此外,我们设计一种方法,可以在与中央滤波器对应的下一层中直接调整权重,有效地最小化由修剪引起的误差。通过对各种基准网络和数据集的实验,CF产生最先进的性能。例如,对于Reset-56,CF通过去除47.1%的参数来减少约39.7%的絮凝物,甚至在CiFar-10上的精度改善0.33%。通过Googlenet,CF通过去除55.6%的参数来减少大约63.2%的拖鞋,仅在CIFAR-10上的前1个精度下降0.35%的损失。通过resnet-50,CF通过去除36.9%的参数减少约47.9%的拖鞋,仅在Imagenet上的前1个精度下降1.07%。该代码可以在https://github.com/8ubpshlr23/centrter上获得。
translated by 谷歌翻译
卷积神经网络(CNN)具有一定量的参数冗余,滤波器修剪旨在去除冗余滤波器,并提供在终端设备上应用CNN的可能性。但是,以前的作品更加注重设计了滤波器重要性的评估标准,然后缩短了具有固定修剪率的重要滤波器或固定数量,以减少卷积神经网络的冗余。它不考虑为每层预留有多少筛选器是最合理的选择。从这个角度来看,我们通过搜索适当的过滤器(SNF)来提出新的过滤器修剪方法。 SNF专用于搜索每层的最合理的保留过滤器,然后是具有特定标准的修剪过滤器。它可以根据不同的拖鞋定制最合适的网络结构。通过我们的方法进行过滤器修剪导致CIFAR-10的最先进(SOTA)精度,并在Imagenet ILSVRC-2012上实现了竞争性能。基于Reset-56网络,在Top-中增加了0.14%的增加0.14% 1对CIFAR-10拖出的52.94%的精度为52.94%。在减少68.68%拖鞋时,CiFar-10上的修剪Resnet-110还提高了0.03%的1 0.03%的精度。对于Imagenet,我们将修剪速率设置为52.10%的拖鞋,前1个精度只有0.74%。该代码可以在https://github.com/pk-l/snf上获得。
translated by 谷歌翻译
The mainstream approach for filter pruning is usually either to force a hard-coded importance estimation upon a computation-heavy pretrained model to select "important" filters, or to impose a hyperparameter-sensitive sparse constraint on the loss objective to regularize the network training. In this paper, we present a novel filter pruning method, dubbed dynamic-coded filter fusion (DCFF), to derive compact CNNs in a computation-economical and regularization-free manner for efficient image classification. Each filter in our DCFF is firstly given an inter-similarity distribution with a temperature parameter as a filter proxy, on top of which, a fresh Kullback-Leibler divergence based dynamic-coded criterion is proposed to evaluate the filter importance. In contrast to simply keeping high-score filters in other methods, we propose the concept of filter fusion, i.e., the weighted averages using the assigned proxies, as our preserved filters. We obtain a one-hot inter-similarity distribution as the temperature parameter approaches infinity. Thus, the relative importance of each filter can vary along with the training of the compact CNN, leading to dynamically changeable fused filters without both the dependency on the pretrained model and the introduction of sparse constraints. Extensive experiments on classification benchmarks demonstrate the superiority of our DCFF over the compared counterparts. For example, our DCFF derives a compact VGGNet-16 with only 72.77M FLOPs and 1.06M parameters while reaching top-1 accuracy of 93.47% on CIFAR-10. A compact ResNet-50 is obtained with 63.8% FLOPs and 58.6% parameter reductions, retaining 75.60% top-1 accuracy on ILSVRC-2012. Our code, narrower models and training logs are available at https://github.com/lmbxmu/DCFF.
translated by 谷歌翻译
Network pruning is widely used for reducing the heavy inference cost of deep models in low-resource settings. A typical pruning algorithm is a three-stage pipeline, i.e., training (a large model), pruning and fine-tuning. During pruning, according to a certain criterion, redundant weights are pruned and important weights are kept to best preserve the accuracy. In this work, we make several surprising observations which contradict common beliefs. For all state-of-the-art structured pruning algorithms we examined, fine-tuning a pruned model only gives comparable or worse performance than training that model with randomly initialized weights. For pruning algorithms which assume a predefined target network architecture, one can get rid of the full pipeline and directly train the target network from scratch. Our observations are consistent for multiple network architectures, datasets, and tasks, which imply that: 1) training a large, over-parameterized model is often not necessary to obtain an efficient final model, 2) learned "important" weights of the large model are typically not useful for the small pruned model, 3) the pruned architecture itself, rather than a set of inherited "important" weights, is more crucial to the efficiency in the final model, which suggests that in some cases pruning can be useful as an architecture search paradigm. Our results suggest the need for more careful baseline evaluations in future research on structured pruning methods. We also compare with the "Lottery Ticket Hypothesis" (Frankle & Carbin, 2019), and find that with optimal learning rate, the "winning ticket" initialization as used in Frankle & Carbin (2019) does not bring improvement over random initialization. * Equal contribution. † Work done while visiting UC Berkeley.
translated by 谷歌翻译
由于稀疏神经网络通常包含许多零权重,因此可以在不降低网络性能的情况下潜在地消除这些不必要的网络连接。因此,设计良好的稀疏神经网络具有显着降低拖鞋和计算资源的潜力。在这项工作中,我们提出了一种新的自动修剪方法 - 稀疏连接学习(SCL)。具体地,重量被重新参数化为可培训权重变量和二进制掩模的元素方向乘法。因此,由二进制掩模完全描述网络连接,其由单位步进函数调制。理论上,从理论上证明了使用直通估计器(STE)进行网络修剪的基本原理。这一原则是STE的代理梯度应该是积极的,确保掩模变量在其最小值处收敛。在找到泄漏的Relu后,SoftPlus和Identity Stes可以满足这个原理,我们建议采用SCL的身份STE以进行离散面膜松弛。我们发现不同特征的面具梯度非常不平衡,因此,我们建议将每个特征的掩模梯度标准化以优化掩码变量训练。为了自动训练稀疏掩码,我们将网络连接总数作为我们的客观函数中的正则化术语。由于SCL不需要由网络层设计人员定义的修剪标准或超级参数,因此在更大的假设空间中探讨了网络,以实现最佳性能的优化稀疏连接。 SCL克服了现有自动修剪方法的局限性。实验结果表明,SCL可以自动学习并选择各种基线网络结构的重要网络连接。 SCL培训的深度学习模型以稀疏性,精度和减少脚波特的SOTA人类设计和自动修剪方法训练。
translated by 谷歌翻译
网络压缩对于使深网的效率更高,更快且可推广到低端硬件至关重要。当前的网络压缩方法有两个开放问题:首先,缺乏理论框架来估计最大压缩率;其次,有些层可能会过多地进行,从而导致网络性能大幅下降。为了解决这两个问题,这项研究提出了一种基于梯度矩阵分析方法,以估计最大网络冗余。在最大速率的指导下,开发了一种新颖而有效的层次网络修剪算法,以最大程度地凝结神经元网络结构而无需牺牲网络性能。进行实质性实验以证明新方法修剪几个高级卷积神经网络(CNN)体系结构的功效。与现有的修剪方法相比,拟议的修剪算法实现了最先进的性能。与其他方法相比,在相同或相似的压缩比下,新方法提供了最高的网络预测准确性。
translated by 谷歌翻译
We propose an efficient and unified framework, namely ThiNet, to simultaneously accelerate and compress CNN models in both training and inference stages. We focus on the filter level pruning, i.e., the whole filter would be discarded if it is less important. Our method does not change the original network structure, thus it can be perfectly supported by any off-the-shelf deep learning libraries. We formally establish filter pruning as an optimization problem, and reveal that we need to prune filters based on statistics information computed from its next layer, not the current layer, which differentiates ThiNet from existing methods. Experimental results demonstrate the effectiveness of this strategy, which has advanced the state-of-the-art. We also show the performance of ThiNet on ILSVRC-12 benchmark. ThiNet achieves 3.31× FLOPs reduction and 16.63× compression on VGG-16, with only 0.52% top-5 accuracy drop. Similar experiments with ResNet-50 reveal that even for a compact network, ThiNet can also reduce more than half of the parameters and FLOPs, at the cost of roughly 1% top-5 accuracy drop. Moreover, the original VGG-16 model can be further pruned into a very small model with only 5.05MB model size, preserving AlexNet level accuracy but showing much stronger generalization ability.
translated by 谷歌翻译
由于存储器和计算资源有限,部署在移动设备上的卷积神经网络(CNNS)是困难的。我们的目标是通过利用特征图中的冗余来设计包括CPU和GPU的异构设备的高效神经网络,这很少在神经结构设计中进行了研究。对于类似CPU的设备,我们提出了一种新颖的CPU高效的Ghost(C-Ghost)模块,以生成从廉价操作的更多特征映射。基于一组内在的特征映射,我们使用廉价的成本应用一系列线性变换,以生成许多幽灵特征图,可以完全揭示内在特征的信息。所提出的C-Ghost模块可以作为即插即用组件,以升级现有的卷积神经网络。 C-Ghost瓶颈旨在堆叠C-Ghost模块,然后可以轻松建立轻量级的C-Ghostnet。我们进一步考虑GPU设备的有效网络。在建筑阶段的情况下,不涉及太多的GPU效率(例如,深度明智的卷积),我们建议利用阶段明智的特征冗余来制定GPU高效的幽灵(G-GHOST)阶段结构。舞台中的特征被分成两个部分,其中使用具有较少输出通道的原始块处理第一部分,用于生成内在特征,另一个通过利用阶段明智的冗余来生成廉价的操作。在基准测试上进行的实验证明了所提出的C-Ghost模块和G-Ghost阶段的有效性。 C-Ghostnet和G-Ghostnet分别可以分别实现CPU和GPU的准确性和延迟的最佳权衡。代码可在https://github.com/huawei-noah/cv-backbones获得。
translated by 谷歌翻译
通道修剪被广泛用于降低深网模型的复杂性。最近的修剪方法通常通过提出通道重要性标准来识别网络的哪些部分。但是,最近的研究表明,这些标准在所有情况下都不能很好地工作。在本文中,我们提出了一种新颖的功能最小化方法(FSM)方法来压缩CNN模型,该模型通过收敛功能和过滤器的信息来评估特征转移。具体而言,我们首先使用不同层深度的一些普遍方法研究压缩效率,然后提出特征转移概念。然后,我们引入了一种近似方法来估计特征移位的幅度,因为很难直接计算它。此外,我们提出了一种分布优化算法,以补偿准确性损失并提高网络压缩效率。该方法在各种基准网络和数据集上产生最先进的性能,并通过广泛的实验验证。这些代码可以在\ url {https://github.com/lscgx/fsm}上可用。
translated by 谷歌翻译
Structured channel pruning has been shown to significantly accelerate inference time for convolution neural networks (CNNs) on modern hardware, with a relatively minor loss of network accuracy. Recent works permanently zero these channels during training, which we observe to significantly hamper final accuracy, particularly as the fraction of the network being pruned increases. We propose Soft Masking for cost-constrained Channel Pruning (SMCP) to allow pruned channels to adaptively return to the network while simultaneously pruning towards a target cost constraint. By adding a soft mask re-parameterization of the weights and channel pruning from the perspective of removing input channels, we allow gradient updates to previously pruned channels and the opportunity for the channels to later return to the network. We then formulate input channel pruning as a global resource allocation problem. Our method outperforms prior works on both the ImageNet classification and PASCAL VOC detection datasets.
translated by 谷歌翻译
深度学习技术在各种任务中都表现出了出色的有效性,并且深度学习具有推进多种应用程序(包括在边缘计算中)的潜力,其中将深层模型部署在边缘设备上,以实现即时的数据处理和响应。一个关键的挑战是,虽然深层模型的应用通常会产生大量的内存和计算成本,但Edge设备通常只提供非常有限的存储和计算功能,这些功能可能会在各个设备之间差异很大。这些特征使得难以构建深度学习解决方案,以释放边缘设备的潜力,同时遵守其约束。应对这一挑战的一种有希望的方法是自动化有效的深度学习模型的设计,这些模型轻巧,仅需少量存储,并且仅产生低计算开销。该调查提供了针对边缘计算的深度学习模型设计自动化技术的全面覆盖。它提供了关键指标的概述和比较,这些指标通常用于量化模型在有效性,轻度和计算成本方面的水平。然后,该调查涵盖了深层设计自动化技术的三类最新技术:自动化神经体系结构搜索,自动化模型压缩以及联合自动化设计和压缩。最后,调查涵盖了未来研究的开放问题和方向。
translated by 谷歌翻译