深度神经网络的过度参数性质导致在低端设备上的部署过程中有很大的障碍,并具有时间和空间限制。使用迭代修剪培训方案稀疏DNN的网络修剪策略通常在计算上很昂贵。结果,在训练之前,在初始化时修剪修剪的技术变得越来越流行。在这项工作中,我们提出了神经元到神经元的跳过连接,这些连接是稀疏的加权跳过连接,以增强修剪的DNN的整体连通性。遵循初步修剪步骤,在修剪网络的单个神经元/通道之间随机添加N2NSKIP连接,同时保持网络的整体稀疏性。我们证明,与没有N2NSKIP连接的修剪的网络相比,在修剪网络中引入N2NSKIP连接可以显着卓越的性能,尤其是在高稀疏度水平上。此外,我们提出了基于热扩散的连接分析,以定量确定修剪网络相对于参考网络的连通性。我们评估方法对两种不同初步修剪方法的疗效,这些方法在初始化时修剪,并通过利用N2NSKIP连接引起的增强连接性来始终获得卓越的性能。
translated by 谷歌翻译
深度神经网络已用于多种成功的应用中。但是,由于包含数百万个参数,它们的高度复杂性质导致在延迟需求低的管道中部署期间有问题。结果,更希望获得在推理期间具有相同性能的轻型神经网络。在这项工作中,我们提出了一种基于重量的修剪方法,其中权重根据以前的迭代势头逐渐修剪。神经网络的每个层都根据其相对稀疏性分配了一个重要性值,然后在先前迭代中的重量幅度分配。我们在Alexnet,VGG16和Resnet50等网络上评估了我们的方法,其中包括图像分类数据集,例如CIFAR-10和CIFAR-100。我们发现,在准确性和压缩比方面,结果优于先前的方法。我们的方法能够在两个数据集上获得同一降解的相同降解的15%压缩。
translated by 谷歌翻译
当前的深神经网络(DNN)被过度参数化,并在推断每个任务期间使用其大多数神经元连接。然而,人的大脑开发了针对不同任务的专门区域,并通过其神经元连接的一小部分进行推断。我们提出了一种迭代修剪策略,引入了一个简单的重要性评分度量度量,该指标可以停用不重要的连接,解决DNN中的过度参数化并调节射击模式。目的是找到仍然能够以可比精度解决给定任务的最小连接,即更简单的子网。我们在MNIST上实现了LENET体系结构的可比性能,并且与CIFAR-10/100和Tiny-ImageNet上的VGG和Resnet架构的最先进算法相比,参数压缩的性能明显更高。我们的方法对于考虑到ADAM和SGD的两个不同优化器也表现良好。该算法并非旨在在考虑当前的硬件和软件实现时最小化失败,尽管与最新技术相比,该算法的性能合理。
translated by 谷歌翻译
Pruning large neural networks while maintaining their performance is often desirable due to the reduced space and time complexity. In existing methods, pruning is done within an iterative optimization procedure with either heuristically designed pruning schedules or additional hyperparameters, undermining their utility. In this work, we present a new approach that prunes a given network once at initialization prior to training. To achieve this, we introduce a saliency criterion based on connection sensitivity that identifies structurally important connections in the network for the given task. This eliminates the need for both pretraining and the complex pruning schedule while making it robust to architecture variations. After pruning, the sparse network is trained in the standard way. Our method obtains extremely sparse networks with virtually the same accuracy as the reference network on the MNIST, CIFAR-10, and Tiny-ImageNet classification tasks and is broadly applicable to various architectures including convolutional, residual and recurrent networks. Unlike existing methods, our approach enables us to demonstrate that the retained connections are indeed relevant to the given task.
translated by 谷歌翻译
Pruning refers to the elimination of trivial weights from neural networks. The sub-networks within an overparameterized model produced after pruning are often called Lottery tickets. This research aims to generate winning lottery tickets from a set of lottery tickets that can achieve similar accuracy to the original unpruned network. We introduce a novel winning ticket called Cyclic Overlapping Lottery Ticket (COLT) by data splitting and cyclic retraining of the pruned network from scratch. We apply a cyclic pruning algorithm that keeps only the overlapping weights of different pruned models trained on different data segments. Our results demonstrate that COLT can achieve similar accuracies (obtained by the unpruned model) while maintaining high sparsities. We show that the accuracy of COLT is on par with the winning tickets of Lottery Ticket Hypothesis (LTH) and, at times, is better. Moreover, COLTs can be generated using fewer iterations than tickets generated by the popular Iterative Magnitude Pruning (IMP) method. In addition, we also notice COLTs generated on large datasets can be transferred to small ones without compromising performance, demonstrating its generalizing capability. We conduct all our experiments on Cifar-10, Cifar-100 & TinyImageNet datasets and report superior performance than the state-of-the-art methods.
translated by 谷歌翻译
Structural pruning of neural network parameters reduces computation, energy, and memory transfer costs during inference. We propose a novel method that estimates the contribution of a neuron (filter) to the final loss and iteratively removes those with smaller scores. We describe two variations of our method using the first and secondorder Taylor expansions to approximate a filter's contribution. Both methods scale consistently across any network layer without requiring per-layer sensitivity analysis and can be applied to any kind of layer, including skip connections. For modern networks trained on ImageNet, we measured experimentally a high (>93%) correlation between the contribution computed by our methods and a reliable estimate of the true importance. Pruning with the proposed methods leads to an improvement over state-ofthe-art in terms of accuracy, FLOPs, and parameter reduction. On ResNet-101, we achieve a 40% FLOPS reduction by removing 30% of the parameters, with a loss of 0.02% in the top-1 accuracy on ImageNet. Code is available at https://github.com/NVlabs/Taylor_pruning.
translated by 谷歌翻译
网络压缩对于使深网的效率更高,更快且可推广到低端硬件至关重要。当前的网络压缩方法有两个开放问题:首先,缺乏理论框架来估计最大压缩率;其次,有些层可能会过多地进行,从而导致网络性能大幅下降。为了解决这两个问题,这项研究提出了一种基于梯度矩阵分析方法,以估计最大网络冗余。在最大速率的指导下,开发了一种新颖而有效的层次网络修剪算法,以最大程度地凝结神经元网络结构而无需牺牲网络性能。进行实质性实验以证明新方法修剪几个高级卷积神经网络(CNN)体系结构的功效。与现有的修剪方法相比,拟议的修剪算法实现了最先进的性能。与其他方法相比,在相同或相似的压缩比下,新方法提供了最高的网络预测准确性。
translated by 谷歌翻译
修剪深度神经网络的现有方法专注于去除训练有素的网络的不必要参数,然后微调模型,找到恢复训练模型的初始性能的良好解决方案。与其他作品不同,我们的方法特别注意通过修剪神经元的压缩模型和推理计算时间的解决方案的质量。通过探索Hessian的光谱半径,所提出的算法通过探索Hessian的光谱半径来指示压缩模型的参数,这导致了更好地推广了未经看涨的数据。此外,该方法不适用于预先训练的网络,并同时执行训练和修剪。我们的结果表明,它改善了神经元压缩的最先进的结果。该方法能够在不同神经网络模型上实现具有小精度下降的非常小的网络。
translated by 谷歌翻译
由于稀疏神经网络通常包含许多零权重,因此可以在不降低网络性能的情况下潜在地消除这些不必要的网络连接。因此,设计良好的稀疏神经网络具有显着降低拖鞋和计算资源的潜力。在这项工作中,我们提出了一种新的自动修剪方法 - 稀疏连接学习(SCL)。具体地,重量被重新参数化为可培训权重变量和二进制掩模的元素方向乘法。因此,由二进制掩模完全描述网络连接,其由单位步进函数调制。理论上,从理论上证明了使用直通估计器(STE)进行网络修剪的基本原理。这一原则是STE的代理梯度应该是积极的,确保掩模变量在其最小值处收敛。在找到泄漏的Relu后,SoftPlus和Identity Stes可以满足这个原理,我们建议采用SCL的身份STE以进行离散面膜松弛。我们发现不同特征的面具梯度非常不平衡,因此,我们建议将每个特征的掩模梯度标准化以优化掩码变量训练。为了自动训练稀疏掩码,我们将网络连接总数作为我们的客观函数中的正则化术语。由于SCL不需要由网络层设计人员定义的修剪标准或超级参数,因此在更大的假设空间中探讨了网络,以实现最佳性能的优化稀疏连接。 SCL克服了现有自动修剪方法的局限性。实验结果表明,SCL可以自动学习并选择各种基线网络结构的重要网络连接。 SCL培训的深度学习模型以稀疏性,精度和减少脚波特的SOTA人类设计和自动修剪方法训练。
translated by 谷歌翻译
Network pruning is widely used for reducing the heavy inference cost of deep models in low-resource settings. A typical pruning algorithm is a three-stage pipeline, i.e., training (a large model), pruning and fine-tuning. During pruning, according to a certain criterion, redundant weights are pruned and important weights are kept to best preserve the accuracy. In this work, we make several surprising observations which contradict common beliefs. For all state-of-the-art structured pruning algorithms we examined, fine-tuning a pruned model only gives comparable or worse performance than training that model with randomly initialized weights. For pruning algorithms which assume a predefined target network architecture, one can get rid of the full pipeline and directly train the target network from scratch. Our observations are consistent for multiple network architectures, datasets, and tasks, which imply that: 1) training a large, over-parameterized model is often not necessary to obtain an efficient final model, 2) learned "important" weights of the large model are typically not useful for the small pruned model, 3) the pruned architecture itself, rather than a set of inherited "important" weights, is more crucial to the efficiency in the final model, which suggests that in some cases pruning can be useful as an architecture search paradigm. Our results suggest the need for more careful baseline evaluations in future research on structured pruning methods. We also compare with the "Lottery Ticket Hypothesis" (Frankle & Carbin, 2019), and find that with optimal learning rate, the "winning ticket" initialization as used in Frankle & Carbin (2019) does not bring improvement over random initialization. * Equal contribution. † Work done while visiting UC Berkeley.
translated by 谷歌翻译
轻量级模型设计已成为应用深度学习技术的重要方向,修剪是实现模型参数和拖鞋的大量减少的有效均值。现有的神经网络修剪方法主要从参数的重要性开始,以及设计参数评估度量来迭代地执行参数修剪。这些方法不是从模型拓扑的角度研究的,可能是有效但不高效的,并且需要完全不同的不同数据集修剪。在本文中,我们研究了神经网络的图形结构,并提出了常规的基于图的修剪(RGP)来执行单次神经网络修剪。我们生成常规图,将图的节点度值设置为满足修剪比率,并通过将边缘交换以获得最佳边缘分布来降低曲线图的平均最短路径长度。最后,将获得的图形映射到神经网络结构中以实现修剪。实验表明,曲线图的平均最短路径长度与相应神经网络的分类精度负相关,所提出的RGP显示出强的精度保持能力,具有极高的参数减少(超过90%)和拖鞋(更多超过90%)。
translated by 谷歌翻译
The success of CNNs in various applications is accompanied by a significant increase in the computation and parameter storage costs. Recent efforts toward reducing these overheads involve pruning and compressing the weights of various layers without hurting original accuracy. However, magnitude-based pruning of weights reduces a significant number of parameters from the fully connected layers and may not adequately reduce the computation costs in the convolutional layers due to irregular sparsity in the pruned networks. We present an acceleration method for CNNs, where we prune filters from CNNs that are identified as having a small effect on the output accuracy. By removing whole filters in the network together with their connecting feature maps, the computation costs are reduced significantly. In contrast to pruning weights, this approach does not result in sparse connectivity patterns. Hence, it does not need the support of sparse convolution libraries and can work with existing efficient BLAS libraries for dense matrix multiplications. We show that even simple filter pruning techniques can reduce inference costs for VGG-16 by up to 34% and ResNet-110 by up to 38% on CIFAR10 while regaining close to the original accuracy by retraining the networks.
translated by 谷歌翻译
The deployment of deep convolutional neural networks (CNNs) in many real world applications is largely hindered by their high computational cost. In this paper, we propose a novel learning scheme for CNNs to simultaneously 1) reduce the model size; 2) decrease the run-time memory footprint; and 3) lower the number of computing operations, without compromising accuracy. This is achieved by enforcing channel-level sparsity in the network in a simple but effective way. Different from many existing approaches, the proposed method directly applies to modern CNN architectures, introduces minimum overhead to the training process, and requires no special software/hardware accelerators for the resulting models. We call our approach network slimming, which takes wide and large networks as input models, but during training insignificant channels are automatically identified and pruned afterwards, yielding thin and compact models with comparable accuracy. We empirically demonstrate the effectiveness of our approach with several state-of-the-art CNN models, including VGGNet, ResNet and DenseNet, on various image classification datasets. For VGGNet, a multi-pass version of network slimming gives a 20× reduction in model size and a 5× reduction in computing operations.
translated by 谷歌翻译
Pruning large neural networks to create highquality, independently trainable sparse masks, which can maintain similar performance to their dense counterparts, is very desirable due to the reduced space and time complexity. As research effort is focused on increasingly sophisticated pruning methods that leads to sparse subnetworks trainable from the scratch, we argue for an orthogonal, under-explored theme: improving training techniques for pruned sub-networks, i.e. sparse training. Apart from the popular belief that only the quality of sparse masks matters for sparse training, in this paper we demonstrate an alternative opportunity: one can carefully customize the sparse training techniques to deviate from the default dense network training protocols, consisting of introducing "ghost" neurons and skip connections at the early stage of training, and strategically modifying the initialization as well as labels. Our new sparse training recipe is generally applicable to improving training from scratch with various sparse masks. By adopting our newly curated techniques, we demonstrate significant performance gains across various popular datasets (CIFAR-10, CIFAR-100, TinyIma-geNet), architectures (ResNet-18/32/104, Vgg16, MobileNet), and sparse mask options (lottery ticket, SNIP/GRASP, SynFlow, or even randomly pruning), compared to the default training protocols, especially at high sparsity levels. Code is at https://github.com/VITA-Group/ToST.
translated by 谷歌翻译
Neural networks are both computationally intensive and memory intensive, making them difficult to deploy on embedded systems. Also, conventional networks fix the architecture before training starts; as a result, training cannot improve the architecture. To address these limitations, we describe a method to reduce the storage and computation required by neural networks by an order of magnitude without affecting their accuracy by learning only the important connections. Our method prunes redundant connections using a three-step method. First, we train the network to learn which connections are important. Next, we prune the unimportant connections. Finally, we retrain the network to fine tune the weights of the remaining connections. On the ImageNet dataset, our method reduced the number of parameters of AlexNet by a factor of 9×, from 61 million to 6.7 million, without incurring accuracy loss. Similar experiments with VGG-16 found that the total number of parameters can be reduced by 13×, from 138 million to 10.3 million, again with no loss of accuracy.
translated by 谷歌翻译
Neural networks are both computationally intensive and memory intensive, making them difficult to deploy on embedded systems. Also, conventional networks fix the architecture before training starts; as a result, training cannot improve the architecture. To address these limitations, we describe a method to reduce the storage and computation required by neural networks by an order of magnitude without affecting their accuracy by learning only the important connections. Our method prunes redundant connections using a three-step method. First, we train the network to learn which connections are important. Next, we prune the unimportant connections. Finally, we retrain the network to fine tune the weights of the remaining connections. On the ImageNet dataset, our method reduced the number of parameters of AlexNet by a factor of 9×, from 61 million to 6.7 million, without incurring accuracy loss. Similar experiments with VGG-16 found that the total number of parameters can be reduced by 13×, from 138 million to 10.3 million, again with no loss of accuracy.
translated by 谷歌翻译
网络修剪是一种广泛使用的技术,用于有效地压缩深神经网络,几乎没有在推理期间在性能下降低。迭代幅度修剪(IMP)是由几种迭代训练和修剪步骤组成的网络修剪的最熟悉的方法之一,其中在修剪后丢失了大量网络的性能,然后在随后的再培训阶段中恢复。虽然常用为基准参考,但经常认为a)通过不将稀疏纳入训练阶段来达到次优状态,b)其全球选择标准未能正确地确定最佳层面修剪速率和c)其迭代性质使它变得缓慢和不竞争。根据最近提出的再培训技术,我们通过严格和一致的实验来调查这些索赔,我们将Impr到培训期间的训练算法进行比较,评估其选择标准的建议修改,并研究实际需要的迭代次数和总培训时间。我们发现IMP与SLR进行再培训,可以优于最先进的修剪期间,没有或仅具有很少的计算开销,即全局幅度选择标准在很大程度上具有更复杂的方法,并且只有几个刷新时期在实践中需要达到大部分稀疏性与IMP的诽谤 - 与性能权衡。我们的目标既可以证明基本的进攻已经可以提供最先进的修剪结果,甚至优于更加复杂或大量参数化方法,也可以为未来的研究建立更加现实但易于可实现的基线。
translated by 谷歌翻译
图形神经网络(GNNS)由于图形数据的规模和模型参数的数量呈指数增长,因此限制了它们在实际应用中的效用,因此往往会遭受高计算成本。为此,最近的一些作品着重于用彩票假设(LTH)稀疏GNN,以降低推理成本,同时保持绩效水平。但是,基于LTH的方法具有两个主要缺点:1)它们需要对密集模型进行详尽且迭代的训练,从而产生了极大的训练计算成本,2)它们仅修剪图形结构和模型参数,但忽略了节点功能维度,存在大量冗余。为了克服上述局限性,我们提出了一个综合的图形渐进修剪框架,称为CGP。这是通过在一个训练过程中设计在训练图周期修剪范式上进行动态修剪GNN来实现的。与基于LTH的方法不同,提出的CGP方法不需要重新训练,这大大降低了计算成本。此外,我们设计了一个共同策略,以全面地修剪GNN的所有三个核心元素:图形结构,节点特征和模型参数。同时,旨在完善修剪操作,我们将重生过程引入我们的CGP框架,以重新建立修剪但重要的连接。提出的CGP通过在6个GNN体系结构中使用节点分类任务进行评估,包括浅层模型(GCN和GAT),浅但深度散发模型(SGC和APPNP)以及Deep Models(GCNII和RESGCN),总共有14个真实图形数据集,包括来自挑战性开放图基准的大规模图数据集。实验表明,我们提出的策略在匹配时大大提高了训练和推理效率,甚至超过了现有方法的准确性。
translated by 谷歌翻译
在物联网(IoT)支持的网络边缘(IOT)上的人工智能(AI)的最新进展已通过启用低延期性和计算效率来实现多种应用程序(例如智能农业,智能医院和智能工厂)的优势情报。但是,部署最先进的卷积神经网络(CNN),例如VGG-16和在资源约束的边缘设备上的重新连接,由于其大量参数和浮点操作(Flops),因此实际上是不可行的。因此,将网络修剪作为一种模型压缩的概念正在引起注意在低功率设备上加速CNN。结构化或非结构化的最先进的修剪方法都不认为卷积层表现出的复杂性的不同基本性质,并遵循训练放回训练的管道,从而导致其他计算开销。在这项工作中,我们通过利用CNN的固有层层级复杂性来提出一种新颖和计算高效的修剪管道。与典型的方法不同,我们提出的复杂性驱动算法根据其对整体网络复杂性的贡献选择了特定层用于滤波器。我们遵循一个直接训练修剪模型并避免计算复杂排名和微调步骤的过程。此外,我们定义了修剪的三种模式,即参数感知(PA),拖网(FA)和内存感知(MA),以引入CNN的多功能压缩。我们的结果表明,我们的方法在准确性和加速方面的竞争性能。最后,我们提出了不同资源和准确性之间的权衡取舍,这对于开发人员在资源受限的物联网环境中做出正确的决策可能会有所帮助。
translated by 谷歌翻译
通道修剪用于减少卷积神经网络(CNN)中的权重次数。通道修剪去除重量张量的切片,从而使卷积层保持密集。从单层中去除这些重量切片会导致网络层之间的特征图数不匹配。一个简单的解决方案是迫使图层之间的特征映射数量通过从后续层中去除重量切片来匹配。在带有分支的DNN中,这种附加约束变得更加明显,其中需要将多个通道整理在一起以保持网络密度。流行的修剪显着性指标并不能考虑具有分支的DNN中产生的结构依赖性。我们建议多米诺骨牌指标(基于现有的渠道显着性指标)来反映这些结构性约束。我们测试了具有分支的多个网络上基线通道显着性指标的测试。 Domino显着性指标提高了大多数经过测试的网络的修剪率,在CIFAR-10上,Alexnet中最多可提高25%。
translated by 谷歌翻译