卷积神经网络(CNNS)在许多计算机视觉任务中非常成功。然而,嵌入式和实时系统的推理成本很高,因此有很多关于压缩网络的研究。另一方面,自我关注模型的最新进步表明,卷积滤波器优选在较早层中的自我关注,这表明在较早的层中较强的电感偏差更好。如卷积滤波器所示,强大的偏置可以培训特定的滤波器并将不必要的过滤器构建为零。这类似于经典图像处理任务,其中选择合适的滤波器使得紧凑的字典表示特征。我们遵循这个想法,并将Gabor过滤器合并在较早的CNN层中进行压缩。通过BackProjagation学习Gabor滤波器的参数,因此该功能仅限于Gabor过滤器。我们表明,对于CIFAR-10的第一层VGG-16具有192个内核/功能,但学习Gabor过滤器需要平均29.4内核。此外,在改变的Reset-20上,使用Gabor滤波器,分别在第一和第二层中的平均83%和94%的内核,其中前五层与两层较大的核交换CiFar-10。
translated by 谷歌翻译
The success of CNNs in various applications is accompanied by a significant increase in the computation and parameter storage costs. Recent efforts toward reducing these overheads involve pruning and compressing the weights of various layers without hurting original accuracy. However, magnitude-based pruning of weights reduces a significant number of parameters from the fully connected layers and may not adequately reduce the computation costs in the convolutional layers due to irregular sparsity in the pruned networks. We present an acceleration method for CNNs, where we prune filters from CNNs that are identified as having a small effect on the output accuracy. By removing whole filters in the network together with their connecting feature maps, the computation costs are reduced significantly. In contrast to pruning weights, this approach does not result in sparse connectivity patterns. Hence, it does not need the support of sparse convolution libraries and can work with existing efficient BLAS libraries for dense matrix multiplications. We show that even simple filter pruning techniques can reduce inference costs for VGG-16 by up to 34% and ResNet-110 by up to 38% on CIFAR10 while regaining close to the original accuracy by retraining the networks.
translated by 谷歌翻译
Network pruning is widely used for reducing the heavy inference cost of deep models in low-resource settings. A typical pruning algorithm is a three-stage pipeline, i.e., training (a large model), pruning and fine-tuning. During pruning, according to a certain criterion, redundant weights are pruned and important weights are kept to best preserve the accuracy. In this work, we make several surprising observations which contradict common beliefs. For all state-of-the-art structured pruning algorithms we examined, fine-tuning a pruned model only gives comparable or worse performance than training that model with randomly initialized weights. For pruning algorithms which assume a predefined target network architecture, one can get rid of the full pipeline and directly train the target network from scratch. Our observations are consistent for multiple network architectures, datasets, and tasks, which imply that: 1) training a large, over-parameterized model is often not necessary to obtain an efficient final model, 2) learned "important" weights of the large model are typically not useful for the small pruned model, 3) the pruned architecture itself, rather than a set of inherited "important" weights, is more crucial to the efficiency in the final model, which suggests that in some cases pruning can be useful as an architecture search paradigm. Our results suggest the need for more careful baseline evaluations in future research on structured pruning methods. We also compare with the "Lottery Ticket Hypothesis" (Frankle & Carbin, 2019), and find that with optimal learning rate, the "winning ticket" initialization as used in Frankle & Carbin (2019) does not bring improvement over random initialization. * Equal contribution. † Work done while visiting UC Berkeley.
translated by 谷歌翻译
近年来,深度神经网络在各种应用领域中都有广泛的成功。但是,它们需要重要的计算和内存资源,严重阻碍其部署,特别是在移动设备上或实时应用程序。神经网络通常涉及大量参数,该参数对应于网络的权重。在培训过程中获得的这种参数是用于网络性能的决定因素。但是,它们也非常冗余。修剪方法尤其试图通过识别和移除不相关的重量来减小参数集的大小。在本文中,我们研究了培训策略对修剪效率的影响。考虑和比较了两种培训方式:(1)微调和(2)从头开始。在四个数据集(CIFAR10,CiFAR100,SVHN和CALTECH101)上获得的实验结果和两个不同的CNNS(VGG16和MOBILENET)证明已经在大语料库(例如想象成)上预先培训的网络,然后进行微调特定数据集可以更有效地修剪(高达80%的参数减少),而不是从头开始培训的相同网络。
translated by 谷歌翻译
过滤器修剪方法通过去除选定的过滤器来引入结构稀疏性,因此对于降低复杂性特别有效。先前的作品从验证较小规范的过滤器的角度从经验修剪网络中造成了较小的最终结果贡献。但是,此类标准已被证明对过滤器的分布敏感,并且由于修剪后的容量差距是固定的,因此准确性可能很难恢复。在本文中,我们提出了一种称为渐近软簇修剪(ASCP)的新型过滤器修剪方法,以根据过滤器的相似性来识别网络的冗余。首先通过聚类来区分来自参数过度的网络的每个过滤器,然后重建以手动将冗余引入其中。提出了一些聚类指南,以更好地保留特征提取能力。重建后,允许更新过滤器,以消除错误选择的效果。此外,还采用了各种修剪率的衰减策略来稳定修剪过程并改善最终性能。通过逐渐在每个群集中生成更相同的过滤器,ASCP可以通过通道添加操作将其删除,几乎没有准确性下降。 CIFAR-10和Imagenet数据集的广泛实验表明,与许多最新算法相比,我们的方法可以取得竞争性结果。
translated by 谷歌翻译
The deployment of deep convolutional neural networks (CNNs) in many real world applications is largely hindered by their high computational cost. In this paper, we propose a novel learning scheme for CNNs to simultaneously 1) reduce the model size; 2) decrease the run-time memory footprint; and 3) lower the number of computing operations, without compromising accuracy. This is achieved by enforcing channel-level sparsity in the network in a simple but effective way. Different from many existing approaches, the proposed method directly applies to modern CNN architectures, introduces minimum overhead to the training process, and requires no special software/hardware accelerators for the resulting models. We call our approach network slimming, which takes wide and large networks as input models, but during training insignificant channels are automatically identified and pruned afterwards, yielding thin and compact models with comparable accuracy. We empirically demonstrate the effectiveness of our approach with several state-of-the-art CNN models, including VGGNet, ResNet and DenseNet, on various image classification datasets. For VGGNet, a multi-pass version of network slimming gives a 20× reduction in model size and a 5× reduction in computing operations.
translated by 谷歌翻译
卷积神经网络(CNN)已在许多物联网(IoT)设备中应用于多种下游任务。但是,随着边缘设备上的数据量的增加,CNN几乎无法及时完成某些任务,而计算和存储资源有限。最近,过滤器修剪被认为是压缩和加速CNN的有效技术,但是从压缩高维张量的角度来看,现有的方法很少是修剪CNN。在本文中,我们提出了一种新颖的理论,可以在三维张量中找到冗余信息,即量化特征图(QSFM)之间的相似性,并利用该理论来指导滤波器修剪过程。我们在数据集(CIFAR-10,CIFAR-100和ILSVRC-12)上执行QSFM和Edge设备,证明所提出的方法可以在神经网络中找到冗余信息,具有可比的压缩和可耐受的准确性下降。没有任何微调操作,QSFM可以显着压缩CIFAR-56(48.7%的Flops和57.9%的参数),而TOP-1的准确性仅损失0.54%。对于边缘设备的实际应用,QSFM可以将Mobilenet-V2推理速度加速1.53倍,而ILSVRC-12 TOP-1的精度仅损失1.23%。
translated by 谷歌翻译
Deploying convolutional neural networks (CNNs) on embedded devices is difficult due to the limited memory and computation resources. The redundancy in feature maps is an important characteristic of those successful CNNs, but has rarely been investigated in neural architecture design. This paper proposes a novel Ghost module to generate more feature maps from cheap operations. Based on a set of intrinsic feature maps, we apply a series of linear transformations with cheap cost to generate many ghost feature maps that could fully reveal information underlying intrinsic features. The proposed Ghost module can be taken as a plug-and-play component to upgrade existing convolutional neural networks. Ghost bottlenecks are designed to stack Ghost modules, and then the lightweight Ghost-Net can be easily established. Experiments conducted on benchmarks demonstrate that the proposed Ghost module is an impressive alternative of convolution layers in baseline models, and our GhostNet can achieve higher recognition performance (e.g. 75.7% top-1 accuracy) than MobileNetV3 with similar computational cost on the ImageNet ILSVRC-2012 classification dataset. Code is available at https: //github.com/huawei-noah/ghostnet.
translated by 谷歌翻译
卷积神经网络(CNNS)在许多实际应用中成功了。但是,它们的高计算和存储要求通常使它们难以在资源受限的设备上部署。为了解决这个问题,已经提出了许多修剪算法用于CNN,但大多数人不能将CNNS提交给合理的水平。在本文中,我们提出了一种基于递归最小二乘(RLS)优化的训练和修剪CNN的新颖算法。在为某些时期培训CNN之后,我们的算法组合了逆输入自相关矩阵和权重矩阵,以按层评估和修剪不重要的输入通道或节点层。然后,我们的算法将继续培训修剪的网络,并且在修剪的网络恢复旧网络的完整性能之前,不会进行下一次修剪。此外,对于CNN,所提出的算法可用于前馈神经网络(FNN)。在MNIST,CIFAR-10和SVHN数据集上的三个实验表明,我们的算法可以实现更合理的修剪,并且具有比其他四个流行的修剪算法更高的学习效率。
translated by 谷歌翻译
神经网络修剪具有显着性能,可以降低深网络模型的复杂性。最近的网络修剪方法通常集中在网络中删除不重要或冗余过滤器。在本文中,通过探索特征图之间的相似性,我们提出了一种新颖的滤波器修剪方法,中央滤波器(CF),这表明在适当的调整之后滤波器大致等于一组其他滤波器。我们的方法基于发现特征贴图之间的平均相似性的发现,而不管输入图像的数量如何,都会很少变化。基于此发现,我们在特征映射上建立相似性图,并计算每个节点的近密中心以选择中央滤波器。此外,我们设计一种方法,可以在与中央滤波器对应的下一层中直接调整权重,有效地最小化由修剪引起的误差。通过对各种基准网络和数据集的实验,CF产生最先进的性能。例如,对于Reset-56,CF通过去除47.1%的参数来减少约39.7%的絮凝物,甚至在CiFar-10上的精度改善0.33%。通过Googlenet,CF通过去除55.6%的参数来减少大约63.2%的拖鞋,仅在CIFAR-10上的前1个精度下降0.35%的损失。通过resnet-50,CF通过去除36.9%的参数减少约47.9%的拖鞋,仅在Imagenet上的前1个精度下降1.07%。该代码可以在https://github.com/8ubpshlr23/centrter上获得。
translated by 谷歌翻译
This paper proposed a Soft Filter Pruning (SFP) method to accelerate the inference procedure of deep Convolutional Neural Networks (CNNs). Specifically, the proposed SFP enables the pruned filters to be updated when training the model after pruning. SFP has two advantages over previous works: (1) Larger model capacity. Updating previously pruned filters provides our approach with larger optimization space than fixing the filters to zero. Therefore, the network trained by our method has a larger model capacity to learn from the training data. (2) Less dependence on the pretrained model. Large capacity enables SFP to train from scratch and prune the model simultaneously. In contrast, previous filter pruning methods should be conducted on the basis of the pre-trained model to guarantee their performance. Empirically, SFP from scratch outperforms the previous filter pruning methods. Moreover, our approach has been demonstrated effective for many advanced CNN architectures. Notably, on ILSCRC-2012, SFP reduces more than 42% FLOPs on ResNet-101 with even 0.2% top-5 accuracy improvement, which has advanced the state-of-the-art. Code is publicly available on GitHub: https://github.com/he-y/softfilter-pruning
translated by 谷歌翻译
卷积神经网络(CNN)具有一定量的参数冗余,滤波器修剪旨在去除冗余滤波器,并提供在终端设备上应用CNN的可能性。但是,以前的作品更加注重设计了滤波器重要性的评估标准,然后缩短了具有固定修剪率的重要滤波器或固定数量,以减少卷积神经网络的冗余。它不考虑为每层预留有多少筛选器是最合理的选择。从这个角度来看,我们通过搜索适当的过滤器(SNF)来提出新的过滤器修剪方法。 SNF专用于搜索每层的最合理的保留过滤器,然后是具有特定标准的修剪过滤器。它可以根据不同的拖鞋定制最合适的网络结构。通过我们的方法进行过滤器修剪导致CIFAR-10的最先进(SOTA)精度,并在Imagenet ILSVRC-2012上实现了竞争性能。基于Reset-56网络,在Top-中增加了0.14%的增加0.14% 1对CIFAR-10拖出的52.94%的精度为52.94%。在减少68.68%拖鞋时,CiFar-10上的修剪Resnet-110还提高了0.03%的1 0.03%的精度。对于Imagenet,我们将修剪速率设置为52.10%的拖鞋,前1个精度只有0.74%。该代码可以在https://github.com/pk-l/snf上获得。
translated by 谷歌翻译
Neural network pruning offers a promising prospect to facilitate deploying deep neural networks on resourcelimited devices. However, existing methods are still challenged by the training inefficiency and labor cost in pruning designs, due to missing theoretical guidance of non-salient network components. In this paper, we propose a novel filter pruning method by exploring the High Rank of feature maps (HRank). Our HRank is inspired by the discovery that the average rank of multiple feature maps generated by a single filter is always the same, regardless of the number of image batches CNNs receive. Based on HRank, we develop a method that is mathematically formulated to prune filters with low-rank feature maps. The principle behind our pruning is that low-rank feature maps contain less information, and thus pruned results can be easily reproduced. Besides, we experimentally show that weights with high-rank feature maps contain more important information, such that even when a portion is not updated, very little damage would be done to the model performance. Without introducing any additional constraints, HRank leads to significant improvements over the state-of-the-arts in terms of FLOPs and parameters reduction, with similar accuracies. For example, with ResNet-110, we achieve a 58.2%-FLOPs reduction by removing 59.2% of the parameters, with only a small loss of 0.14% in top-1 accuracy on CIFAR-10. With Res-50, we achieve a 43.8%-FLOPs reduction by removing 36.7% of the parameters, with only a loss of 1.17% in the top-1 accuracy on ImageNet. The codes can be available at https://github.com/lmbxmu/HRank.
translated by 谷歌翻译
网络压缩对于使深网的效率更高,更快且可推广到低端硬件至关重要。当前的网络压缩方法有两个开放问题:首先,缺乏理论框架来估计最大压缩率;其次,有些层可能会过多地进行,从而导致网络性能大幅下降。为了解决这两个问题,这项研究提出了一种基于梯度矩阵分析方法,以估计最大网络冗余。在最大速率的指导下,开发了一种新颖而有效的层次网络修剪算法,以最大程度地凝结神经元网络结构而无需牺牲网络性能。进行实质性实验以证明新方法修剪几个高级卷积神经网络(CNN)体系结构的功效。与现有的修剪方法相比,拟议的修剪算法实现了最先进的性能。与其他方法相比,在相同或相似的压缩比下,新方法提供了最高的网络预测准确性。
translated by 谷歌翻译
We propose an efficient and unified framework, namely ThiNet, to simultaneously accelerate and compress CNN models in both training and inference stages. We focus on the filter level pruning, i.e., the whole filter would be discarded if it is less important. Our method does not change the original network structure, thus it can be perfectly supported by any off-the-shelf deep learning libraries. We formally establish filter pruning as an optimization problem, and reveal that we need to prune filters based on statistics information computed from its next layer, not the current layer, which differentiates ThiNet from existing methods. Experimental results demonstrate the effectiveness of this strategy, which has advanced the state-of-the-art. We also show the performance of ThiNet on ILSVRC-12 benchmark. ThiNet achieves 3.31× FLOPs reduction and 16.63× compression on VGG-16, with only 0.52% top-5 accuracy drop. Similar experiments with ResNet-50 reveal that even for a compact network, ThiNet can also reduce more than half of the parameters and FLOPs, at the cost of roughly 1% top-5 accuracy drop. Moreover, the original VGG-16 model can be further pruned into a very small model with only 5.05MB model size, preserving AlexNet level accuracy but showing much stronger generalization ability.
translated by 谷歌翻译
Low-rankness plays an important role in traditional machine learning, but is not so popular in deep learning. Most previous low-rank network compression methods compress the networks by approximating pre-trained models and re-training. However, the optimal solution in the Euclidean space may be quite different from the one in the low-rank manifold. A well-pre-trained model is not a good initialization for the model with low-rank constraints. Thus, the performance of a low-rank compressed network degrades significantly. Compared to other network compression methods such as pruning, low-rank methods attracts less attention in recent years. In this paper, we devise a new training method, low-rank projection with energy transfer (LRPET), that trains low-rank compressed networks from scratch and achieves competitive performance. First, we propose to alternately perform stochastic gradient descent training and projection onto the low-rank manifold. Compared to re-training on the compact model, this enables full utilization of model capacity since solution space is relaxed back to Euclidean space after projection. Second, the matrix energy (the sum of squares of singular values) reduction caused by projection is compensated by energy transfer. We uniformly transfer the energy of the pruned singular values to the remaining ones. We theoretically show that energy transfer eases the trend of gradient vanishing caused by projection. Third, we propose batch normalization (BN) rectification to cut off its effect on the optimal low-rank approximation of the weight matrix, which further improves the performance. Comprehensive experiments on CIFAR-10 and ImageNet have justified that our method is superior to other low-rank compression methods and also outperforms recent state-of-the-art pruning methods. Our code is available at https://github.com/BZQLin/LRPET.
translated by 谷歌翻译
重量修剪是一种有效的模型压缩技术,可以解决在移动设备上实现实时深神经网络(DNN)推断的挑战。然而,由于精度劣化,难以利用硬件加速度,以及某些类型的DNN层的限制,难以降低的应用方案具有有限的应用方案。在本文中,我们提出了一般的细粒度的结构化修剪方案和相应的编译器优化,适用于任何类型的DNN层,同时实现高精度和硬件推理性能。随着使用我们的编译器优化所支持的不同层的灵活性,我们进一步探讨了确定最佳修剪方案的新问题,了解各种修剪方案的不同加速度和精度性能。两个修剪方案映射方法,一个是基于搜索,另一个是基于规则的,建议自动推导出任何给定DNN的每层的最佳修剪规则和块大小。实验结果表明,我们的修剪方案映射方法,以及一般细粒化结构修剪方案,优于最先进的DNN优化框架,最高可达2.48 $ \ times $和1.73 $ \ times $ DNN推理加速在CiFar-10和Imagenet DataSet上没有准确性损失。
translated by 谷歌翻译
由于其实现的实际加速,过滤器修剪已广泛用于神经网络压缩。迄今为止,大多数现有滤波器修剪工作探索过滤器通过使用通道内信息的重要性。在本文中,从频道间透视开始,我们建议使用信道独立性进行有效的滤波器修剪,该指标测量不同特征映射之间的相关性。较少独立的特征映射被解释为包含较少有用的信息$ / $知识,因此可以修剪其相应的滤波器而不会影响模型容量。我们在过滤器修剪的背景下系统地调查了渠道独立性的量化度量,测量方案和敏感性$ / $可靠性。我们对各种数据集不同模型的评估结果显示了我们方法的卓越性能。值得注意的是,在CIFAR-10数据集上,我们的解决方案可以分别为基线Resnet-56和Resnet-110型号的0.75 \%$ 0.94 \%$ 0.94 \%。模型大小和拖鞋减少了42.8 \%$和$ 47.4 \%$(for Resnet-56)和48.3 \%$ 48.3 \%$ 52.1 \%$(for resnet-110)。在ImageNet DataSet上,我们的方法可以分别达到40.8 \%$ 44.8 \%$ 74.8 \%$ 0.15 \%$ 0.15 \%$ 0.15美元的准确性。该代码可在https://github.com/eclipsess/chip_neurivs2021上获得。
translated by 谷歌翻译
修剪技术可全面使用图像分类压缩卷积神经网络(CNN)。但是,大多数修剪方法需要一个经过良好训练的模型,以提供有用的支持参数,例如C1-核心,批处理值和梯度信息,如果预训练的模型的参数为,这可能会导致过滤器评估的不一致性不太优化。因此,我们提出了一种基于敏感性的方法,可以通过为原始模型增加额外的损害来评估每一层的重要性。由于准确性的性能取决于参数在所有层而不是单个参数中的分布,因此基于灵敏度的方法将对参数的更新具有鲁棒性。也就是说,我们可以获得对不完美训练和完全训练的模型之间每个卷积层的相似重要性评估。对于CIFAR-10上的VGG-16,即使原始模型仅接受50个时期训练,我们也可以对层的重要性进行相同的评估,并在对模型进行充分训练时的结果。然后,我们将通过量化的灵敏度从每一层中删除过滤器。我们基于敏感性的修剪框架在VGG-16,分别具有CIFAR-10,MNIST和CIFAR-100的VGG-16上有效验证。
translated by 谷歌翻译
In this paper, we introduce a new channel pruning method to accelerate very deep convolutional neural networks. Given a trained CNN model, we propose an iterative two-step algorithm to effectively prune each layer, by a LASSO regression based channel selection and least square reconstruction. We further generalize this algorithm to multi-layer and multi-branch cases. Our method reduces the accumulated error and enhance the compatibility with various architectures. Our pruned VGG-16 achieves the state-of-the-art results by 5× speed-up along with only 0.3% increase of error. More importantly, our method is able to accelerate modern networks like ResNet, Xception and suffers only 1.4%, 1.0% accuracy loss under 2× speedup respectively, which is significant. Code has been made publicly available 1 .
translated by 谷歌翻译