深度神经网络已用于多种成功的应用中。但是,由于包含数百万个参数,它们的高度复杂性质导致在延迟需求低的管道中部署期间有问题。结果,更希望获得在推理期间具有相同性能的轻型神经网络。在这项工作中,我们提出了一种基于重量的修剪方法,其中权重根据以前的迭代势头逐渐修剪。神经网络的每个层都根据其相对稀疏性分配了一个重要性值,然后在先前迭代中的重量幅度分配。我们在Alexnet,VGG16和Resnet50等网络上评估了我们的方法,其中包括图像分类数据集,例如CIFAR-10和CIFAR-100。我们发现,在准确性和压缩比方面,结果优于先前的方法。我们的方法能够在两个数据集上获得同一降解的相同降解的15%压缩。
translated by 谷歌翻译
深度神经网络的过度参数性质导致在低端设备上的部署过程中有很大的障碍,并具有时间和空间限制。使用迭代修剪培训方案稀疏DNN的网络修剪策略通常在计算上很昂贵。结果,在训练之前,在初始化时修剪修剪的技术变得越来越流行。在这项工作中,我们提出了神经元到神经元的跳过连接,这些连接是稀疏的加权跳过连接,以增强修剪的DNN的整体连通性。遵循初步修剪步骤,在修剪网络的单个神经元/通道之间随机添加N2NSKIP连接,同时保持网络的整体稀疏性。我们证明,与没有N2NSKIP连接的修剪的网络相比,在修剪网络中引入N2NSKIP连接可以显着卓越的性能,尤其是在高稀疏度水平上。此外,我们提出了基于热扩散的连接分析,以定量确定修剪网络相对于参考网络的连通性。我们评估方法对两种不同初步修剪方法的疗效,这些方法在初始化时修剪,并通过利用N2NSKIP连接引起的增强连接性来始终获得卓越的性能。
translated by 谷歌翻译
在物联网(IoT)支持的网络边缘(IOT)上的人工智能(AI)的最新进展已通过启用低延期性和计算效率来实现多种应用程序(例如智能农业,智能医院和智能工厂)的优势情报。但是,部署最先进的卷积神经网络(CNN),例如VGG-16和在资源约束的边缘设备上的重新连接,由于其大量参数和浮点操作(Flops),因此实际上是不可行的。因此,将网络修剪作为一种模型压缩的概念正在引起注意在低功率设备上加速CNN。结构化或非结构化的最先进的修剪方法都不认为卷积层表现出的复杂性的不同基本性质,并遵循训练放回训练的管道,从而导致其他计算开销。在这项工作中,我们通过利用CNN的固有层层级复杂性来提出一种新颖和计算高效的修剪管道。与典型的方法不同,我们提出的复杂性驱动算法根据其对整体网络复杂性的贡献选择了特定层用于滤波器。我们遵循一个直接训练修剪模型并避免计算复杂排名和微调步骤的过程。此外,我们定义了修剪的三种模式,即参数感知(PA),拖网(FA)和内存感知(MA),以引入CNN的多功能压缩。我们的结果表明,我们的方法在准确性和加速方面的竞争性能。最后,我们提出了不同资源和准确性之间的权衡取舍,这对于开发人员在资源受限的物联网环境中做出正确的决策可能会有所帮助。
translated by 谷歌翻译
Pruning refers to the elimination of trivial weights from neural networks. The sub-networks within an overparameterized model produced after pruning are often called Lottery tickets. This research aims to generate winning lottery tickets from a set of lottery tickets that can achieve similar accuracy to the original unpruned network. We introduce a novel winning ticket called Cyclic Overlapping Lottery Ticket (COLT) by data splitting and cyclic retraining of the pruned network from scratch. We apply a cyclic pruning algorithm that keeps only the overlapping weights of different pruned models trained on different data segments. Our results demonstrate that COLT can achieve similar accuracies (obtained by the unpruned model) while maintaining high sparsities. We show that the accuracy of COLT is on par with the winning tickets of Lottery Ticket Hypothesis (LTH) and, at times, is better. Moreover, COLTs can be generated using fewer iterations than tickets generated by the popular Iterative Magnitude Pruning (IMP) method. In addition, we also notice COLTs generated on large datasets can be transferred to small ones without compromising performance, demonstrating its generalizing capability. We conduct all our experiments on Cifar-10, Cifar-100 & TinyImageNet datasets and report superior performance than the state-of-the-art methods.
translated by 谷歌翻译
The deployment of deep convolutional neural networks (CNNs) in many real world applications is largely hindered by their high computational cost. In this paper, we propose a novel learning scheme for CNNs to simultaneously 1) reduce the model size; 2) decrease the run-time memory footprint; and 3) lower the number of computing operations, without compromising accuracy. This is achieved by enforcing channel-level sparsity in the network in a simple but effective way. Different from many existing approaches, the proposed method directly applies to modern CNN architectures, introduces minimum overhead to the training process, and requires no special software/hardware accelerators for the resulting models. We call our approach network slimming, which takes wide and large networks as input models, but during training insignificant channels are automatically identified and pruned afterwards, yielding thin and compact models with comparable accuracy. We empirically demonstrate the effectiveness of our approach with several state-of-the-art CNN models, including VGGNet, ResNet and DenseNet, on various image classification datasets. For VGGNet, a multi-pass version of network slimming gives a 20× reduction in model size and a 5× reduction in computing operations.
translated by 谷歌翻译
Network pruning is widely used for reducing the heavy inference cost of deep models in low-resource settings. A typical pruning algorithm is a three-stage pipeline, i.e., training (a large model), pruning and fine-tuning. During pruning, according to a certain criterion, redundant weights are pruned and important weights are kept to best preserve the accuracy. In this work, we make several surprising observations which contradict common beliefs. For all state-of-the-art structured pruning algorithms we examined, fine-tuning a pruned model only gives comparable or worse performance than training that model with randomly initialized weights. For pruning algorithms which assume a predefined target network architecture, one can get rid of the full pipeline and directly train the target network from scratch. Our observations are consistent for multiple network architectures, datasets, and tasks, which imply that: 1) training a large, over-parameterized model is often not necessary to obtain an efficient final model, 2) learned "important" weights of the large model are typically not useful for the small pruned model, 3) the pruned architecture itself, rather than a set of inherited "important" weights, is more crucial to the efficiency in the final model, which suggests that in some cases pruning can be useful as an architecture search paradigm. Our results suggest the need for more careful baseline evaluations in future research on structured pruning methods. We also compare with the "Lottery Ticket Hypothesis" (Frankle & Carbin, 2019), and find that with optimal learning rate, the "winning ticket" initialization as used in Frankle & Carbin (2019) does not bring improvement over random initialization. * Equal contribution. † Work done while visiting UC Berkeley.
translated by 谷歌翻译
Pruning large neural networks while maintaining their performance is often desirable due to the reduced space and time complexity. In existing methods, pruning is done within an iterative optimization procedure with either heuristically designed pruning schedules or additional hyperparameters, undermining their utility. In this work, we present a new approach that prunes a given network once at initialization prior to training. To achieve this, we introduce a saliency criterion based on connection sensitivity that identifies structurally important connections in the network for the given task. This eliminates the need for both pretraining and the complex pruning schedule while making it robust to architecture variations. After pruning, the sparse network is trained in the standard way. Our method obtains extremely sparse networks with virtually the same accuracy as the reference network on the MNIST, CIFAR-10, and Tiny-ImageNet classification tasks and is broadly applicable to various architectures including convolutional, residual and recurrent networks. Unlike existing methods, our approach enables us to demonstrate that the retained connections are indeed relevant to the given task.
translated by 谷歌翻译
修剪深度神经网络的现有方法专注于去除训练有素的网络的不必要参数,然后微调模型,找到恢复训练模型的初始性能的良好解决方案。与其他作品不同,我们的方法特别注意通过修剪神经元的压缩模型和推理计算时间的解决方案的质量。通过探索Hessian的光谱半径,所提出的算法通过探索Hessian的光谱半径来指示压缩模型的参数,这导致了更好地推广了未经看涨的数据。此外,该方法不适用于预先训练的网络,并同时执行训练和修剪。我们的结果表明,它改善了神经元压缩的最先进的结果。该方法能够在不同神经网络模型上实现具有小精度下降的非常小的网络。
translated by 谷歌翻译
在机器学习中,人工神经网络(ANN)是一种非常强大的工具,广泛用于许多应用程序。通常,所选的(深)架构包括许多层,因此包括大量参数,这使培训,存储和推理变得昂贵。这激发了有关将原始网络压缩为较小网络的一系列研究,而不会过分牺牲性能。在许多提出的压缩方法中,最受欢迎的方法之一是\ emph {Pruning},该方法的整个元素(链接,节点,通道,\ ldots)和相应的权重删除。由于该问题的性质本质上是组合的(要修剪的要素,什么不是),因此我们提出了一种基于操作研究工具的新修剪方法。我们从为该问题的天然混合组编程模型开始,然后使用透视化重新制作技术来增强其持续放松。从该重新制定中投射指标变量产生了一个新的正则化术语,我们称之为结构化的正则化,从而导致初始体系结构的结构化修剪。我们测试了应用于CIFAR-10,CIFAR-100和Imagenet数据集的一些重新NET架构,获得了竞争性能W.R.T.
translated by 谷歌翻译
Neural network pruning-the task of reducing the size of a network by removing parameters-has been the subject of a great deal of work in recent years. We provide a meta-analysis of the literature, including an overview of approaches to pruning and consistent findings in the literature. After aggregating results across 81 papers and pruning hundreds of models in controlled conditions, our clearest finding is that the community suffers from a lack of standardized benchmarks and metrics. This deficiency is substantial enough that it is hard to compare pruning techniques to one another or determine how much progress the field has made over the past three decades. To address this situation, we identify issues with current practices, suggest concrete remedies, and introduce ShrinkBench, an open-source framework to facilitate standardized evaluations of pruning methods. We use ShrinkBench to compare various pruning techniques and show that its comprehensive evaluation can prevent common pitfalls when comparing pruning methods.
translated by 谷歌翻译
深度学习的出现导致了许多应用程序,这些应用改变了已应用的研究领域的景观。但是,随着流行的增加,多年来,经典深层神经网络的复杂性有所增加。结果,这导致在具有空间和时间限制的设备上部署期间有很大的问题。在这项工作中,我们对非易失性记忆中目前的进步进行了综述,以及使用电阻RAM记忆,尤其是回忆录的使用如何帮助进步深度学习的研究状态。换句话说,我们希望提出一种意识形态,即记忆技术领域的进步可以极大地影响和影响对边缘设备的深度学习推断。
translated by 谷歌翻译
当前的深神经网络(DNN)被过度参数化,并在推断每个任务期间使用其大多数神经元连接。然而,人的大脑开发了针对不同任务的专门区域,并通过其神经元连接的一小部分进行推断。我们提出了一种迭代修剪策略,引入了一个简单的重要性评分度量度量,该指标可以停用不重要的连接,解决DNN中的过度参数化并调节射击模式。目的是找到仍然能够以可比精度解决给定任务的最小连接,即更简单的子网。我们在MNIST上实现了LENET体系结构的可比性能,并且与CIFAR-10/100和Tiny-ImageNet上的VGG和Resnet架构的最先进算法相比,参数压缩的性能明显更高。我们的方法对于考虑到ADAM和SGD的两个不同优化器也表现良好。该算法并非旨在在考虑当前的硬件和软件实现时最小化失败,尽管与最新技术相比,该算法的性能合理。
translated by 谷歌翻译
由于深度学习模型通常包含数百万可培训的权重,因此对更有效的网络结构具有越来越高的存储空间和提高的运行时效率。修剪是最受欢迎的网络压缩技术之一。在本文中,我们提出了一种新颖的非结构化修剪管线,基于关注的同时稀疏结构和体重学习(ASWL)。与传统的频道和体重注意机制不同,ASWL提出了一种有效的算法来计算每层的层次引起的修剪比率,并且跟踪密度网络和稀疏网络的两种权重,以便修剪结构是同时从随机初始化的权重学习。我们在Mnist,CiFar10和Imagenet上的实验表明,与最先进的网络修剪方法相比,ASWL在准确性,修剪比率和操作效率方面取得了卓越的修剪。
translated by 谷歌翻译
网络压缩对于使深网的效率更高,更快且可推广到低端硬件至关重要。当前的网络压缩方法有两个开放问题:首先,缺乏理论框架来估计最大压缩率;其次,有些层可能会过多地进行,从而导致网络性能大幅下降。为了解决这两个问题,这项研究提出了一种基于梯度矩阵分析方法,以估计最大网络冗余。在最大速率的指导下,开发了一种新颖而有效的层次网络修剪算法,以最大程度地凝结神经元网络结构而无需牺牲网络性能。进行实质性实验以证明新方法修剪几个高级卷积神经网络(CNN)体系结构的功效。与现有的修剪方法相比,拟议的修剪算法实现了最先进的性能。与其他方法相比,在相同或相似的压缩比下,新方法提供了最高的网络预测准确性。
translated by 谷歌翻译
The success of CNNs in various applications is accompanied by a significant increase in the computation and parameter storage costs. Recent efforts toward reducing these overheads involve pruning and compressing the weights of various layers without hurting original accuracy. However, magnitude-based pruning of weights reduces a significant number of parameters from the fully connected layers and may not adequately reduce the computation costs in the convolutional layers due to irregular sparsity in the pruned networks. We present an acceleration method for CNNs, where we prune filters from CNNs that are identified as having a small effect on the output accuracy. By removing whole filters in the network together with their connecting feature maps, the computation costs are reduced significantly. In contrast to pruning weights, this approach does not result in sparse connectivity patterns. Hence, it does not need the support of sparse convolution libraries and can work with existing efficient BLAS libraries for dense matrix multiplications. We show that even simple filter pruning techniques can reduce inference costs for VGG-16 by up to 34% and ResNet-110 by up to 38% on CIFAR10 while regaining close to the original accuracy by retraining the networks.
translated by 谷歌翻译
This paper presents a method for adding multiple tasks to a single deep neural network while avoiding catastrophic forgetting. Inspired by network pruning techniques, we exploit redundancies in large deep networks to free up parameters that can then be employed to learn new tasks. By performing iterative pruning and network re-training, we are able to sequentially "pack" multiple tasks into a single network while ensuring minimal drop in performance and minimal storage overhead. Unlike prior work that uses proxy losses to maintain accuracy on older tasks, we always optimize for the task at hand. We perform extensive experiments on a variety of network architectures and largescale datasets, and observe much better robustness against catastrophic forgetting than prior work. In particular, we are able to add three fine-grained classification tasks to a single ImageNet-trained VGG-16 network and achieve accuracies close to those of separately trained networks for each task. Code available at https://github.com/ arunmallya/packnet
translated by 谷歌翻译
现代深度神经网络往往太大而无法在许多实际情况下使用。神经网络修剪是降低这种模型的大小的重要技术和加速推断。Gibbs修剪是一种表达和设计神经网络修剪方法的新框架。结合统计物理和随机正则化方法的方法,它可以同时培训和修剪网络,使得学习的权重和修剪面膜彼此很好地适应。它可用于结构化或非结构化修剪,我们为每个提出了许多特定方法。我们将拟议的方法与许多当代神经网络修剪方法进行比较,发现Gibbs修剪优于它们。特别是,我们通过CIFAR-10数据集来实现修剪Reset-56的新型最先进的结果。
translated by 谷歌翻译
修剪技术可全面使用图像分类压缩卷积神经网络(CNN)。但是,大多数修剪方法需要一个经过良好训练的模型,以提供有用的支持参数,例如C1-核心,批处理值和梯度信息,如果预训练的模型的参数为,这可能会导致过滤器评估的不一致性不太优化。因此,我们提出了一种基于敏感性的方法,可以通过为原始模型增加额外的损害来评估每一层的重要性。由于准确性的性能取决于参数在所有层而不是单个参数中的分布,因此基于灵敏度的方法将对参数的更新具有鲁棒性。也就是说,我们可以获得对不完美训练和完全训练的模型之间每个卷积层的相似重要性评估。对于CIFAR-10上的VGG-16,即使原始模型仅接受50个时期训练,我们也可以对层的重要性进行相同的评估,并在对模型进行充分训练时的结果。然后,我们将通过量化的灵敏度从每一层中删除过滤器。我们基于敏感性的修剪框架在VGG-16,分别具有CIFAR-10,MNIST和CIFAR-100的VGG-16上有效验证。
translated by 谷歌翻译
由于稀疏神经网络通常包含许多零权重,因此可以在不降低网络性能的情况下潜在地消除这些不必要的网络连接。因此,设计良好的稀疏神经网络具有显着降低拖鞋和计算资源的潜力。在这项工作中,我们提出了一种新的自动修剪方法 - 稀疏连接学习(SCL)。具体地,重量被重新参数化为可培训权重变量和二进制掩模的元素方向乘法。因此,由二进制掩模完全描述网络连接,其由单位步进函数调制。理论上,从理论上证明了使用直通估计器(STE)进行网络修剪的基本原理。这一原则是STE的代理梯度应该是积极的,确保掩模变量在其最小值处收敛。在找到泄漏的Relu后,SoftPlus和Identity Stes可以满足这个原理,我们建议采用SCL的身份STE以进行离散面膜松弛。我们发现不同特征的面具梯度非常不平衡,因此,我们建议将每个特征的掩模梯度标准化以优化掩码变量训练。为了自动训练稀疏掩码,我们将网络连接总数作为我们的客观函数中的正则化术语。由于SCL不需要由网络层设计人员定义的修剪标准或超级参数,因此在更大的假设空间中探讨了网络,以实现最佳性能的优化稀疏连接。 SCL克服了现有自动修剪方法的局限性。实验结果表明,SCL可以自动学习并选择各种基线网络结构的重要网络连接。 SCL培训的深度学习模型以稀疏性,精度和减少脚波特的SOTA人类设计和自动修剪方法训练。
translated by 谷歌翻译
在过去几年中,神经网络的性能在越来越多的浮点操作(拖鞋)的成本上显着提高。但是,当计算资源有限时,更多的拖鞋可能是一个问题。作为解决这个问题的尝试,修剪过滤器是一种常见的解决方案,但大多数现有的修剪方法不有效地保持模型精度,因此需要大量的芬降时期。在本文中,我们提出了一种自动修剪方法,该方法学习保存的神经元以保持模型精度,同时将絮凝到预定目标。为了完成这项任务,我们介绍了一种可训练的瓶颈,只需要一个单一的单一时期,只需要一个数据集的25.6%(Cifar-10)或7.49%(ILSVRC2012)来了解哪些过滤器。在各种架构和数据集上的实验表明,该方法不仅可以在修剪后保持精度,而且在FineTuning之后也优越现有方法。我们在Reset-50上达到了52.00%的拖鞋,在ILSVRC2012上的灌溉后的前1个精度为47.51%,最先进的(SOTA)精度为76.63%。代码可用(链接匿名审核)。
translated by 谷歌翻译