神经网络(NNS)对研究和行业进行了很大的影响。然而,随着NNS的准确性增加,它之后的大小是扩展,所需的计算操作数量和能量消耗。资源消费的增加导致NNS减少的采用率和现实世界部署不切实际。因此,需要压缩NNS以使它们可用于更广泛的受众,同时降低其运行时成本。在这项工作中,我们从因果推理的角度来处理这一挑战,我们提出了一个评分机制,以促进NNS的结构灌注。该方法基于在最大熵扰动下测量互信息,顺序地通过NN传播。我们展示了两种数据集和各种NNS尺寸的方法的表现,我们表明我们的方法在挑战条件下实现了竞争性能。
translated by 谷歌翻译
Structural pruning of neural network parameters reduces computation, energy, and memory transfer costs during inference. We propose a novel method that estimates the contribution of a neuron (filter) to the final loss and iteratively removes those with smaller scores. We describe two variations of our method using the first and secondorder Taylor expansions to approximate a filter's contribution. Both methods scale consistently across any network layer without requiring per-layer sensitivity analysis and can be applied to any kind of layer, including skip connections. For modern networks trained on ImageNet, we measured experimentally a high (>93%) correlation between the contribution computed by our methods and a reliable estimate of the true importance. Pruning with the proposed methods leads to an improvement over state-ofthe-art in terms of accuracy, FLOPs, and parameter reduction. On ResNet-101, we achieve a 40% FLOPS reduction by removing 30% of the parameters, with a loss of 0.02% in the top-1 accuracy on ImageNet. Code is available at https://github.com/NVlabs/Taylor_pruning.
translated by 谷歌翻译
We propose a new formulation for pruning convolutional kernels in neural networks to enable efficient inference. We interleave greedy criteria-based pruning with finetuning by backpropagation-a computationally efficient procedure that maintains good generalization in the pruned network. We propose a new criterion based on Taylor expansion that approximates the change in the cost function induced by pruning network parameters. We focus on transfer learning, where large pretrained networks are adapted to specialized tasks. The proposed criterion demonstrates superior performance compared to other criteria, e.g. the norm of kernel weights or feature map activation, for pruning large CNNs after adaptation to fine-grained classification tasks (Birds-200 and Flowers-102) relaying only on the first order gradient information. We also show that pruning can lead to more than 10× theoretical reduction in adapted 3D-convolutional filters with a small drop in accuracy in a recurrent gesture classifier. Finally, we show results for the largescale ImageNet dataset to emphasize the flexibility of our approach.
translated by 谷歌翻译
The importance of learning rate (LR) schedules on network pruning has been observed in a few recent works. As an example, Frankle and Carbin (2019) highlighted that winning tickets (i.e., accuracy preserving subnetworks) can not be found without applying a LR warmup schedule and Renda, Frankle and Carbin (2020) demonstrated that rewinding the LR to its initial state at the end of each pruning cycle improves performance. In this paper, we go one step further by first providing a theoretical justification for the surprising effect of LR schedules. Next, we propose a LR schedule for network pruning called SILO, which stands for S-shaped Improved Learning rate Optimization. The advantages of SILO over existing state-of-the-art (SOTA) LR schedules are two-fold: (i) SILO has a strong theoretical motivation and dynamically adjusts the LR during pruning to improve generalization. Specifically, SILO increases the LR upper bound (max_lr) in an S-shape. This leads to an improvement of 2% - 4% in extensive experiments with various types of networks (e.g., Vision Transformers, ResNet) on popular datasets such as ImageNet, CIFAR-10/100. (ii) In addition to the strong theoretical motivation, SILO is empirically optimal in the sense of matching an Oracle, which exhaustively searches for the optimal value of max_lr via grid search. We find that SILO is able to precisely adjust the value of max_lr to be within the Oracle optimized interval, resulting in performance competitive with the Oracle with significantly lower complexity.
translated by 谷歌翻译
The deployment of deep convolutional neural networks (CNNs) in many real world applications is largely hindered by their high computational cost. In this paper, we propose a novel learning scheme for CNNs to simultaneously 1) reduce the model size; 2) decrease the run-time memory footprint; and 3) lower the number of computing operations, without compromising accuracy. This is achieved by enforcing channel-level sparsity in the network in a simple but effective way. Different from many existing approaches, the proposed method directly applies to modern CNN architectures, introduces minimum overhead to the training process, and requires no special software/hardware accelerators for the resulting models. We call our approach network slimming, which takes wide and large networks as input models, but during training insignificant channels are automatically identified and pruned afterwards, yielding thin and compact models with comparable accuracy. We empirically demonstrate the effectiveness of our approach with several state-of-the-art CNN models, including VGGNet, ResNet and DenseNet, on various image classification datasets. For VGGNet, a multi-pass version of network slimming gives a 20× reduction in model size and a 5× reduction in computing operations.
translated by 谷歌翻译
To reduce the significant redundancy in deep Convolutional Neural Networks (CNNs), most existing methods prune neurons by only considering statistics of an individual layer or two consecutive layers (e.g., prune one layer to minimize the reconstruction error of the next layer), ignoring the effect of error propagation in deep networks. In contrast, we argue that it is essential to prune neurons in the entire neuron network jointly based on a unified goal: minimizing the reconstruction error of important responses in the "final response layer" (FRL), which is the secondto-last layer before classification, for a pruned network to retrain its predictive power. Specifically, we apply feature ranking techniques to measure the importance of each neuron in the FRL, and formulate network pruning as a binary integer optimization problem and derive a closed-form solution to it for pruning neurons in earlier layers. Based on our theoretical analysis, we propose the Neuron Importance Score Propagation (NISP) algorithm to propagate the importance scores of final responses to every neuron in the network. The CNN is pruned by removing neurons with least importance, and then fine-tuned to retain its predictive power. NISP is evaluated on several datasets with multiple CNN models and demonstrated to achieve significant acceleration and compression with negligible accuracy loss.
translated by 谷歌翻译
当前的深神经网络(DNN)被过度参数化,并在推断每个任务期间使用其大多数神经元连接。然而,人的大脑开发了针对不同任务的专门区域,并通过其神经元连接的一小部分进行推断。我们提出了一种迭代修剪策略,引入了一个简单的重要性评分度量度量,该指标可以停用不重要的连接,解决DNN中的过度参数化并调节射击模式。目的是找到仍然能够以可比精度解决给定任务的最小连接,即更简单的子网。我们在MNIST上实现了LENET体系结构的可比性能,并且与CIFAR-10/100和Tiny-ImageNet上的VGG和Resnet架构的最先进算法相比,参数压缩的性能明显更高。我们的方法对于考虑到ADAM和SGD的两个不同优化器也表现良好。该算法并非旨在在考虑当前的硬件和软件实现时最小化失败,尽管与最新技术相比,该算法的性能合理。
translated by 谷歌翻译
这项工作探讨了对神经网络的异质近似乘数配置的搜索,这些神经网络可产生高精度和低能消耗。我们讨论了添加到准确的神经网络计算中的加性高斯噪声的有效性,作为用于近似乘数行为模拟的替代模型。由加性高斯噪声模型跨越的解决方案空间的连续和微分特性被用作一种启发式,可生成有意义的层稳健性估计,而无需组合优化技术。取而代之的是,在网络训练期间,使用反向传播学习了注入精确计算的噪声量。提出了乘数误差的概率模型,以弥合域之间的间隙。该模型估计了近似乘数误差的标准偏差,将加性高斯噪声空间中的解决方案连接到实际硬件实例。我们的实验表明,对于CIFAR-10数据集上不同的重新网络变体,异质近似和神经网络再培训的组合将乘法的能量降低了70%至79%,而TOP-1精度损失却低于一个百分点。对于更复杂的小型成像网任务,我们的VGG16型号可降低能源消耗53%,前5个精度下降0.5个百分点。我们进一步证明,我们的误差模型可以在高精度的常用添加剂高斯噪声(AGN)模型的背景下预测近似乘数的参数。我们的软件实施可在https://github.com/etrommer/agn-approx下获得。
translated by 谷歌翻译
随着深度学习的快速发展,神经网络的大小变得越来越大,以使培训和推理经常压倒硬件资源。鉴于神经网络通常被过度参数化,减少这种计算开销的一种有效方法是神经网络修剪,通过删除训练有素的神经网络中的冗余参数。最近观察到,修剪不仅可以减少计算开销,而且可以改善深神经网络(NNS)的经验鲁棒性,这可能是由于消除了虚假相关性的同时保留了预测精度。本文首次证明,修剪通常可以在完整的验证设置下改善基于RELU的NN的认证鲁棒性。使用流行的分支机构(BAB)框架,我们发现修剪可以通过减轻线性放松和子域分裂问题来增强认证稳健性验证的估计结合紧密度。我们通过现成的修剪方法验证了我们的发现,并进一步提出了一种基于稳定性的新修剪方法,该方法量身定制了用于减少神经元不稳定性的新方法,该方法在增强认证的鲁棒性方面优于现有的修剪方法。我们的实验表明,通过适当修剪NN,在标准培训下,其认证准确性最多可提高8.2%,在CIFAR10数据集中的对抗培训下最多可提高24.5%。我们还观察到存在经过认证的彩票票,这些彩票可以符合不同数据集的原始密集型号的标准和认证的稳健精度。我们的发现提供了一个新的角度来研究稀疏性与鲁棒性之间的有趣相互作用,即通过神经元稳定性解释稀疏性和认证鲁棒性的相互作用。代码可在以下网址提供:https://github.com/vita-group/certifiedpruning。
translated by 谷歌翻译
由于稀疏神经网络通常包含许多零权重,因此可以在不降低网络性能的情况下潜在地消除这些不必要的网络连接。因此,设计良好的稀疏神经网络具有显着降低拖鞋和计算资源的潜力。在这项工作中,我们提出了一种新的自动修剪方法 - 稀疏连接学习(SCL)。具体地,重量被重新参数化为可培训权重变量和二进制掩模的元素方向乘法。因此,由二进制掩模完全描述网络连接,其由单位步进函数调制。理论上,从理论上证明了使用直通估计器(STE)进行网络修剪的基本原理。这一原则是STE的代理梯度应该是积极的,确保掩模变量在其最小值处收敛。在找到泄漏的Relu后,SoftPlus和Identity Stes可以满足这个原理,我们建议采用SCL的身份STE以进行离散面膜松弛。我们发现不同特征的面具梯度非常不平衡,因此,我们建议将每个特征的掩模梯度标准化以优化掩码变量训练。为了自动训练稀疏掩码,我们将网络连接总数作为我们的客观函数中的正则化术语。由于SCL不需要由网络层设计人员定义的修剪标准或超级参数,因此在更大的假设空间中探讨了网络,以实现最佳性能的优化稀疏连接。 SCL克服了现有自动修剪方法的局限性。实验结果表明,SCL可以自动学习并选择各种基线网络结构的重要网络连接。 SCL培训的深度学习模型以稀疏性,精度和减少脚波特的SOTA人类设计和自动修剪方法训练。
translated by 谷歌翻译
最先进的计算机视觉方法的性能飞跃归因于深度神经网络的发展。但是,它通常以计算价格可能会阻碍其部署。为了减轻这种限制,结构化修剪是一种众所周知的技术,它包括去除通道,神经元或过滤器,并且通常用于生产更紧凑的模型。在大多数情况下,根据相对重要性标准选择要删除的计算。同时,对可解释的预测模型的需求极大地增加了,并激发了强大归因方法的发展,该方法突出了输入图像或特征图的像素的相对重要性。在这项工作中,我们讨论了现有的修剪启发式方法的局限性,其中包括基于梯度和基于梯度的方法。我们从归因方法中汲取灵感来设计一种新型的集成梯度修剪标准,其中每个神经元的相关性被定义为梯度变化在通往这种神经元去除的路径上的积分。此外,我们提出了一个纠缠的DNN修剪和微调流程图,以更好地保留DNN准确性,同时删除参数。我们通过在几个数据集,架构以及修剪场景上进行广泛的验证,该方法称为Singe,大大优于现有的最新DNN修剪方法。
translated by 谷歌翻译
我们的神经科学和临床应用程序的激励,我们经验检查信息流的观察措施是否可以提出干预措施。我们通过在机器学习的公平性的背景下对人工神经网络进行实验来进行,目标是通过干预措施在系统中诱导公平。使用我们最近开发的$ M $-Information Flow框架,我们测量真实标签的信息流(负责精度,并且需要),并单独地,有关受保护属性的信息流(负责偏见,因此负责不希望的是培训的神经网络边缘。然后,我们通过修剪将流量幅度与干预这些边缘的效果进行比较。我们表明,携带较大信息流的修剪边缘有关受保护的属性在更大程度上会降低输出的偏差。这表明$ M $ -Information流程可以有意义地建议干预措施,以肯定的方式回答标题的问题。我们还评估了不同干预策略的偏见准确性权衡,分析了人们如何使用所需和不期望的信息的估计(这里,准确性和偏置流量)来告知保存前者的干预,同时减少后者。
translated by 谷歌翻译
The rectified linear unit (ReLU) is a highly successful activation function in neural networks as it allows networks to easily obtain sparse representations, which reduces overfitting in overparameterized networks. However, in network pruning, we find that the sparsity introduced by ReLU, which we quantify by a term called dynamic dead neuron rate (DNR), is not beneficial for the pruned network. Interestingly, the more the network is pruned, the smaller the dynamic DNR becomes during optimization. This motivates us to propose a method to explicitly reduce the dynamic DNR for the pruned network, i.e., de-sparsify the network. We refer to our method as Activating-while-Pruning (AP). We note that AP does not function as a stand-alone method, as it does not evaluate the importance of weights. Instead, it works in tandem with existing pruning methods and aims to improve their performance by selective activation of nodes to reduce the dynamic DNR. We conduct extensive experiments using popular networks (e.g., ResNet, VGG) via two classical and three state-of-the-art pruning methods. The experimental results on public datasets (e.g., CIFAR-10/100) suggest that AP works well with existing pruning methods and improves the performance by 3% - 4%. For larger scale datasets (e.g., ImageNet) and state-of-the-art networks (e.g., vision transformer), we observe an improvement of 2% - 3% with AP as opposed to without. Lastly, we conduct an ablation study to examine the effectiveness of the components comprising AP.
translated by 谷歌翻译
我们为神经网络提出了一种新颖,结构化修剪算法 - 迭代,稀疏结构修剪算法,称为I-Spasp。从稀疏信号恢复的思想启发,I-Spasp通过迭代地识别网络内的较大的重要参数组(例如,滤波器或神经元),这些参数组大多数对修剪和密集网络输出之间的残差贡献,然后基于这些组阈值以较小的预定定义修剪比率。对于具有Relu激活的双层和多层网络架构,我们展示了通过多项式修剪修剪诱导的错误,该衰减是基于密集网络隐藏表示的稀疏性任意大的。在我们的实验中,I-Spasp在各种数据集(即MNIST和ImageNet)和架构(即馈送前向网络,Resnet34和MobileNetv2)中进行评估,其中显示用于发现高性能的子网和改进经过几种数量级的可提供基线方法的修剪效率。简而言之,I-Spasp很容易通过自动分化实现,实现强大的经验结果,具有理论收敛保证,并且是高效的,因此将自己区分开作为少数几个计算有效,实用,实用,实用,实用,实用,实用,实用,实用和可提供的修剪算法之一。
translated by 谷歌翻译
Neural network pruning-the task of reducing the size of a network by removing parameters-has been the subject of a great deal of work in recent years. We provide a meta-analysis of the literature, including an overview of approaches to pruning and consistent findings in the literature. After aggregating results across 81 papers and pruning hundreds of models in controlled conditions, our clearest finding is that the community suffers from a lack of standardized benchmarks and metrics. This deficiency is substantial enough that it is hard to compare pruning techniques to one another or determine how much progress the field has made over the past three decades. To address this situation, we identify issues with current practices, suggest concrete remedies, and introduce ShrinkBench, an open-source framework to facilitate standardized evaluations of pruning methods. We use ShrinkBench to compare various pruning techniques and show that its comprehensive evaluation can prevent common pitfalls when comparing pruning methods.
translated by 谷歌翻译
修剪神经网络可降低推理时间和记忆成本。在标准硬件上,如果修剪诸如特征地图之类的粗粒结构(例如特征地图),这些好处将特别突出。我们为二阶结构修剪(SOSP)设计了两种新型的基于显着性的方法,其中包括所有结构和层之间的相关性。我们的主要方法SOSP-H采用了创新的二阶近似,可以通过快速的Hessian-vector产品进行显着评估。 SOSP-H因此,尽管考虑到了完整的Hessian,但仍像一阶方法一样缩放。我们通过将SOSP-H与使用公认的Hessian近似值以及许多最先进方法进行比较来验证SOSP-H。尽管SOSP-H在准确性方面的表现或更好,但在可伸缩性和效率方面具有明显的优势。这使我们能够将SOSP-H扩展到大规模视觉任务,即使它捕获了网络所有层的相关性。为了强调我们修剪方法的全球性质,我们不仅通过删除预验证网络的结构,而且还通过检测建筑瓶颈来评估它们的性能。我们表明,我们的算法允许系统地揭示建筑瓶颈,然后将其删除以进一步提高网络的准确性。
translated by 谷歌翻译
Quantifying which neurons are important with respect to the classification decision of a trained neural network is essential for understanding their inner workings. Previous work primarily attributed importance to individual neurons. In this work, we study which groups of neurons contain synergistic or redundant information using a multivariate mutual information method called the O-information. We observe the first layer is dominated by redundancy suggesting general shared features (i.e. detecting edges) while the last layer is dominated by synergy indicating local class-specific features (i.e. concepts). Finally, we show the O-information can be used for multi-neuron importance. This can be demonstrated by re-training a synergistic sub-network, which results in a minimal change in performance. These results suggest our method can be used for pruning and unsupervised representation learning.
translated by 谷歌翻译
现代深度神经网络往往太大而无法在许多实际情况下使用。神经网络修剪是降低这种模型的大小的重要技术和加速推断。Gibbs修剪是一种表达和设计神经网络修剪方法的新框架。结合统计物理和随机正则化方法的方法,它可以同时培训和修剪网络,使得学习的权重和修剪面膜彼此很好地适应。它可用于结构化或非结构化修剪,我们为每个提出了许多特定方法。我们将拟议的方法与许多当代神经网络修剪方法进行比较,发现Gibbs修剪优于它们。特别是,我们通过CIFAR-10数据集来实现修剪Reset-56的新型最先进的结果。
translated by 谷歌翻译
网络压缩对于使深网的效率更高,更快且可推广到低端硬件至关重要。当前的网络压缩方法有两个开放问题:首先,缺乏理论框架来估计最大压缩率;其次,有些层可能会过多地进行,从而导致网络性能大幅下降。为了解决这两个问题,这项研究提出了一种基于梯度矩阵分析方法,以估计最大网络冗余。在最大速率的指导下,开发了一种新颖而有效的层次网络修剪算法,以最大程度地凝结神经元网络结构而无需牺牲网络性能。进行实质性实验以证明新方法修剪几个高级卷积神经网络(CNN)体系结构的功效。与现有的修剪方法相比,拟议的修剪算法实现了最先进的性能。与其他方法相比,在相同或相似的压缩比下,新方法提供了最高的网络预测准确性。
translated by 谷歌翻译
Fairness of machine learning (ML) software has become a major concern in the recent past. Although recent research on testing and improving fairness have demonstrated impact on real-world software, providing fairness guarantee in practice is still lacking. Certification of ML models is challenging because of the complex decision-making process of the models. In this paper, we proposed Fairify, an SMT-based approach to verify individual fairness property in neural network (NN) models. Individual fairness ensures that any two similar individuals get similar treatment irrespective of their protected attributes e.g., race, sex, age. Verifying this fairness property is hard because of the global checking and non-linear computation nodes in NN. We proposed sound approach to make individual fairness verification tractable for the developers. The key idea is that many neurons in the NN always remain inactive when a smaller part of the input domain is considered. So, Fairify leverages whitebox access to the models in production and then apply formal analysis based pruning. Our approach adopts input partitioning and then prunes the NN for each partition to provide fairness certification or counterexample. We leveraged interval arithmetic and activation heuristic of the neurons to perform the pruning as necessary. We evaluated Fairify on 25 real-world neural networks collected from four different sources, and demonstrated the effectiveness, scalability and performance over baseline and closely related work. Fairify is also configurable based on the domain and size of the NN. Our novel formulation of the problem can answer targeted verification queries with relaxations and counterexamples, which have practical implications.
translated by 谷歌翻译