初始化时(OPAI)的一次性网络修剪是降低网络修剪成本的有效方法。最近,人们越来越相信数据在OPAI中是不必要的。但是,我们通过两种代表性的OPAI方法,即剪切和掌握的消融实验获得了相反的结论。具体而言,我们发现信息数据对于增强修剪性能至关重要。在本文中,我们提出了两种新颖的方法,即判别性的单发网络修剪(DOP)和超级缝制,以通过高级视觉判别图像贴片来修剪网络。我们的贡献如下。(1)广泛的实验表明OPAI是数据依赖性的。(2)超级缝线的性能明显优于基准图像网上的原始OPAI方法,尤其是在高度压缩的模型中。
translated by 谷歌翻译
Pruning large neural networks while maintaining their performance is often desirable due to the reduced space and time complexity. In existing methods, pruning is done within an iterative optimization procedure with either heuristically designed pruning schedules or additional hyperparameters, undermining their utility. In this work, we present a new approach that prunes a given network once at initialization prior to training. To achieve this, we introduce a saliency criterion based on connection sensitivity that identifies structurally important connections in the network for the given task. This eliminates the need for both pretraining and the complex pruning schedule while making it robust to architecture variations. After pruning, the sparse network is trained in the standard way. Our method obtains extremely sparse networks with virtually the same accuracy as the reference network on the MNIST, CIFAR-10, and Tiny-ImageNet classification tasks and is broadly applicable to various architectures including convolutional, residual and recurrent networks. Unlike existing methods, our approach enables us to demonstrate that the retained connections are indeed relevant to the given task.
translated by 谷歌翻译
网络修剪是一种有效的方法,可以通过可接受的性能妥协降低网络复杂性。现有研究通过耗时的重量调谐或具有扩展宽度的网络的复杂搜索来实现神经网络的稀疏性,这极大地限制了网络修剪的应用。在本文中,我们表明,在没有权重调谐的情况下,高性能和稀疏的子网被称为“彩票奖线”,存在于具有膨胀宽度的预先训练的模型中。例如,我们获得了一个只有10%参数的彩票奖金,仍然达到了原始密度Vggnet-19的性能,而无需对CiFar-10的预先训练的重量进行任何修改。此外,我们观察到,来自许多现有修剪标准的稀疏面具与我们的彩票累积的搜索掩码具有高重叠,其中,基于幅度的修剪导致与我们的最相似的掩模。根据这种洞察力,我们使用基于幅度的修剪初始化我们的稀疏掩模,导致彩票累积搜索至少3倍降低,同时实现了可比或更好的性能。具体而言,我们的幅度基彩票奖学金在Reset-50中除去90%的重量,而在ImageNet上仅使用10个搜索时期可以轻松获得超过70%的前1个精度。我们的代码可在https://github.com/zyxxmu/lottery-jackpots获得。
translated by 谷歌翻译
Pruning refers to the elimination of trivial weights from neural networks. The sub-networks within an overparameterized model produced after pruning are often called Lottery tickets. This research aims to generate winning lottery tickets from a set of lottery tickets that can achieve similar accuracy to the original unpruned network. We introduce a novel winning ticket called Cyclic Overlapping Lottery Ticket (COLT) by data splitting and cyclic retraining of the pruned network from scratch. We apply a cyclic pruning algorithm that keeps only the overlapping weights of different pruned models trained on different data segments. Our results demonstrate that COLT can achieve similar accuracies (obtained by the unpruned model) while maintaining high sparsities. We show that the accuracy of COLT is on par with the winning tickets of Lottery Ticket Hypothesis (LTH) and, at times, is better. Moreover, COLTs can be generated using fewer iterations than tickets generated by the popular Iterative Magnitude Pruning (IMP) method. In addition, we also notice COLTs generated on large datasets can be transferred to small ones without compromising performance, demonstrating its generalizing capability. We conduct all our experiments on Cifar-10, Cifar-100 & TinyImageNet datasets and report superior performance than the state-of-the-art methods.
translated by 谷歌翻译
在训练之前对神经网络进行修剪不仅会压缩原始模型,还可以加速其具有实质性应用值的网络培训阶段。当前的工作着重于细粒修剪,该修剪使用指标来计算重量筛查的重量评分,并从初始的单阶修剪到迭代修剪。通过这些作品,我们认为可以将网络修剪总结为权重的表达力传递过程,其中保留权重将从被删除的力量中占据表达力,以维持原始网络的性能。为了实现最佳的表达力调度,我们在训练名为神经网络Pannaging之前提出了一种修剪计划,该方案通过多指数和多进程步骤指导表达力转移,并设计一种基于强化学习以自动化过程的平移代理。实验结果表明,平平在训练方法之前的性能优于各种可用的修剪。
translated by 谷歌翻译
Network pruning is widely used for reducing the heavy inference cost of deep models in low-resource settings. A typical pruning algorithm is a three-stage pipeline, i.e., training (a large model), pruning and fine-tuning. During pruning, according to a certain criterion, redundant weights are pruned and important weights are kept to best preserve the accuracy. In this work, we make several surprising observations which contradict common beliefs. For all state-of-the-art structured pruning algorithms we examined, fine-tuning a pruned model only gives comparable or worse performance than training that model with randomly initialized weights. For pruning algorithms which assume a predefined target network architecture, one can get rid of the full pipeline and directly train the target network from scratch. Our observations are consistent for multiple network architectures, datasets, and tasks, which imply that: 1) training a large, over-parameterized model is often not necessary to obtain an efficient final model, 2) learned "important" weights of the large model are typically not useful for the small pruned model, 3) the pruned architecture itself, rather than a set of inherited "important" weights, is more crucial to the efficiency in the final model, which suggests that in some cases pruning can be useful as an architecture search paradigm. Our results suggest the need for more careful baseline evaluations in future research on structured pruning methods. We also compare with the "Lottery Ticket Hypothesis" (Frankle & Carbin, 2019), and find that with optimal learning rate, the "winning ticket" initialization as used in Frankle & Carbin (2019) does not bring improvement over random initialization. * Equal contribution. † Work done while visiting UC Berkeley.
translated by 谷歌翻译
深度神经网络(DNN)在解决许多真实问题方面都有效。较大的DNN模型通常表现出更好的质量(例如,精度,精度),但它们的过度计算会导致长期推理时间。模型稀疏可以降低计算和内存成本,同时保持模型质量。大多数现有的稀疏算法是单向移除的重量,而其他人则随机或贪婪地探索每层进行修剪的小权重子集。这些算法的局限性降低了可实现的稀疏性水平。此外,许多算法仍然需要预先训练的密集模型,因此遭受大的内存占地面积。在本文中,我们提出了一种新颖的预定生长和修剪(间隙)方法,而无需预先培训密集模型。它通过反复生长一个层次的层来解决以前的作品的缺点,然后在一些训练后修剪回到稀疏。实验表明,使用所提出的方法修剪模型匹配或击败高度优化的密集模型的质量,在各种任务中以80%的稀疏度,例如图像分类,客观检测,3D对象分段和翻译。它们还优于模型稀疏的其他最先进的(SOTA)方法。作为一个例子,通过间隙获得的90%不均匀的稀疏resnet-50模型在想象中实现了77.9%的前1个精度,提高了先前的SOTA结果1.5%。所有代码将公开发布。
translated by 谷歌翻译
我们为神经网络提出了一种新颖,结构化修剪算法 - 迭代,稀疏结构修剪算法,称为I-Spasp。从稀疏信号恢复的思想启发,I-Spasp通过迭代地识别网络内的较大的重要参数组(例如,滤波器或神经元),这些参数组大多数对修剪和密集网络输出之间的残差贡献,然后基于这些组阈值以较小的预定定义修剪比率。对于具有Relu激活的双层和多层网络架构,我们展示了通过多项式修剪修剪诱导的错误,该衰减是基于密集网络隐藏表示的稀疏性任意大的。在我们的实验中,I-Spasp在各种数据集(即MNIST和ImageNet)和架构(即馈送前向网络,Resnet34和MobileNetv2)中进行评估,其中显示用于发现高性能的子网和改进经过几种数量级的可提供基线方法的修剪效率。简而言之,I-Spasp很容易通过自动分化实现,实现强大的经验结果,具有理论收敛保证,并且是高效的,因此将自己区分开作为少数几个计算有效,实用,实用,实用,实用,实用,实用,实用,实用和可提供的修剪算法之一。
translated by 谷歌翻译
现有的修剪技术保留了深层神经网络的整体能力,可以做出正确的预测,但在压缩过程中也可能会扩大隐藏的偏见。我们提出了一种新颖的修剪方法,即公平意识的梯度修剪法(Fairgrape),可最大程度地减少修剪对不同子组的不成比例的影响。我们的方法计算了每个模型权重的范围重要性,并选择了一部分权重,以维持相对组间的修剪中的总重要性。然后,提出的方法将具有较小重要性值的修剪网络边缘,并通过更新重要性值来重复该过程。我们在四个不同的数据集(Fairface,utkface,celeba和Imagenet)上演示了方法的有效性,用于面部属性分类的任务,其中我们的方法将性能降解的差异降低了90%,高达90% - 阿尔特修剪算法。我们的方法在较高的修剪率(99%)的环境中更有效。实验中使用的代码和数据集可在https://github.com/bernardo1998/fairgrape上获得
translated by 谷歌翻译
修剪是稀疏深神经网络的任务,最近受到了越来越多的关注。尽管最先进的修剪方法提取了高度稀疏的模型,但它们忽略了两个主要挑战:(1)寻找这些稀疏模型的过程通常非常昂贵; (2)非结构化的修剪在GPU记忆,训练时间或碳排放方面没有提供好处。我们提出了通过梯度流量保存(早期CROP)提出的早期压缩,该压缩在训练挑战(1)的培训(1)中有效提取最先进的稀疏模型,并且可以以结构化的方式应用来应对挑战(2)。这使我们能够在商品GPU上训练稀疏的网络,该商品GPU的密集版本太大,从而节省了成本并减少了硬件要求。我们从经验上表明,早期杂交的表现优于许多任务(包括分类,回归)和域(包括计算机视觉,自然语言处理和增强学习)的丰富基线。早期杂交导致准确性与密集训练相当,同时超过修剪基线。
translated by 谷歌翻译
修剪深度神经网络的现有方法专注于去除训练有素的网络的不必要参数,然后微调模型,找到恢复训练模型的初始性能的良好解决方案。与其他作品不同,我们的方法特别注意通过修剪神经元的压缩模型和推理计算时间的解决方案的质量。通过探索Hessian的光谱半径,所提出的算法通过探索Hessian的光谱半径来指示压缩模型的参数,这导致了更好地推广了未经看涨的数据。此外,该方法不适用于预先训练的网络,并同时执行训练和修剪。我们的结果表明,它改善了神经元压缩的最先进的结果。该方法能够在不同神经网络模型上实现具有小精度下降的非常小的网络。
translated by 谷歌翻译
深度加强学习(RL)是解决复杂的现实问题的强大框架。在框架中使用的大型神经网络传统上与更好的泛化能力相关,但它们的增加的大小需要广泛的培训持续时间,大量硬件资源和较长推理时间的缺点。解决这个问题的一种方法是修剪神经网络,只留下必要的参数。用于在固定数据分布的应用中施加稀疏性的最先进的并发修剪技术。但是,他们尚未在RL的背景下大大探索。我们缩小了RL和单次修剪技术之间的差距,并将一般修剪方法呈现给离线RL。在RL培训开始之前,我们利用固定数据集进行修剪神经网络。然后,我们运行不同网络稀疏度水平的实验,并评估连续控制任务中的初始化技术修剪的有效性。我们的结果表明,随着95%的网络权重修剪,离线-RL算法仍然可以在我们的大部分实验中保持性能。据我们所知,没有先前的工作,利用在这种高水平的稀疏性的RL保留的性能中进行修剪。此外,在未改变学习目标的情况下,可以在任何现有的离线-RL算法中容易地集成到任何现有的离线RL算法中。
translated by 谷歌翻译
网络修剪是一种广泛使用的技术,用于有效地压缩深神经网络,几乎没有在推理期间在性能下降低。迭代幅度修剪(IMP)是由几种迭代训练和修剪步骤组成的网络修剪的最熟悉的方法之一,其中在修剪后丢失了大量网络的性能,然后在随后的再培训阶段中恢复。虽然常用为基准参考,但经常认为a)通过不将稀疏纳入训练阶段来达到次优状态,b)其全球选择标准未能正确地确定最佳层面修剪速率和c)其迭代性质使它变得缓慢和不竞争。根据最近提出的再培训技术,我们通过严格和一致的实验来调查这些索赔,我们将Impr到培训期间的训练算法进行比较,评估其选择标准的建议修改,并研究实际需要的迭代次数和总培训时间。我们发现IMP与SLR进行再培训,可以优于最先进的修剪期间,没有或仅具有很少的计算开销,即全局幅度选择标准在很大程度上具有更复杂的方法,并且只有几个刷新时期在实践中需要达到大部分稀疏性与IMP的诽谤 - 与性能权衡。我们的目标既可以证明基本的进攻已经可以提供最先进的修剪结果,甚至优于更加复杂或大量参数化方法,也可以为未来的研究建立更加现实但易于可实现的基线。
translated by 谷歌翻译
过滤器修剪的目标是搜索不重要的过滤器以删除以便使卷积神经网络(CNNS)有效而不牺牲过程中的性能。挑战在于找到可以帮助确定每个过滤器关于神经网络的最终输出的重要或相关的信息的信息。在这项工作中,我们分享了我们的观察说,预先训练的CNN的批量标准化(BN)参数可用于估计激活输出的特征分布,而无需处理训练数据。在观察时,我们通过基于预先训练的CNN的BN参数评估每个滤波器的重要性来提出简单而有效的滤波修剪方法。 CiFar-10和Imagenet的实验结果表明,该方法可以在准确性下降和计算复杂性的计算复杂性和降低的折衷方面具有和不进行微调的卓越性能。
translated by 谷歌翻译
Structural pruning of neural network parameters reduces computation, energy, and memory transfer costs during inference. We propose a novel method that estimates the contribution of a neuron (filter) to the final loss and iteratively removes those with smaller scores. We describe two variations of our method using the first and secondorder Taylor expansions to approximate a filter's contribution. Both methods scale consistently across any network layer without requiring per-layer sensitivity analysis and can be applied to any kind of layer, including skip connections. For modern networks trained on ImageNet, we measured experimentally a high (>93%) correlation between the contribution computed by our methods and a reliable estimate of the true importance. Pruning with the proposed methods leads to an improvement over state-ofthe-art in terms of accuracy, FLOPs, and parameter reduction. On ResNet-101, we achieve a 40% FLOPS reduction by removing 30% of the parameters, with a loss of 0.02% in the top-1 accuracy on ImageNet. Code is available at https://github.com/NVlabs/Taylor_pruning.
translated by 谷歌翻译
Pruning large neural networks to create highquality, independently trainable sparse masks, which can maintain similar performance to their dense counterparts, is very desirable due to the reduced space and time complexity. As research effort is focused on increasingly sophisticated pruning methods that leads to sparse subnetworks trainable from the scratch, we argue for an orthogonal, under-explored theme: improving training techniques for pruned sub-networks, i.e. sparse training. Apart from the popular belief that only the quality of sparse masks matters for sparse training, in this paper we demonstrate an alternative opportunity: one can carefully customize the sparse training techniques to deviate from the default dense network training protocols, consisting of introducing "ghost" neurons and skip connections at the early stage of training, and strategically modifying the initialization as well as labels. Our new sparse training recipe is generally applicable to improving training from scratch with various sparse masks. By adopting our newly curated techniques, we demonstrate significant performance gains across various popular datasets (CIFAR-10, CIFAR-100, TinyIma-geNet), architectures (ResNet-18/32/104, Vgg16, MobileNet), and sparse mask options (lottery ticket, SNIP/GRASP, SynFlow, or even randomly pruning), compared to the default training protocols, especially at high sparsity levels. Code is at https://github.com/VITA-Group/ToST.
translated by 谷歌翻译
The rectified linear unit (ReLU) is a highly successful activation function in neural networks as it allows networks to easily obtain sparse representations, which reduces overfitting in overparameterized networks. However, in network pruning, we find that the sparsity introduced by ReLU, which we quantify by a term called dynamic dead neuron rate (DNR), is not beneficial for the pruned network. Interestingly, the more the network is pruned, the smaller the dynamic DNR becomes during optimization. This motivates us to propose a method to explicitly reduce the dynamic DNR for the pruned network, i.e., de-sparsify the network. We refer to our method as Activating-while-Pruning (AP). We note that AP does not function as a stand-alone method, as it does not evaluate the importance of weights. Instead, it works in tandem with existing pruning methods and aims to improve their performance by selective activation of nodes to reduce the dynamic DNR. We conduct extensive experiments using popular networks (e.g., ResNet, VGG) via two classical and three state-of-the-art pruning methods. The experimental results on public datasets (e.g., CIFAR-10/100) suggest that AP works well with existing pruning methods and improves the performance by 3% - 4%. For larger scale datasets (e.g., ImageNet) and state-of-the-art networks (e.g., vision transformer), we observe an improvement of 2% - 3% with AP as opposed to without. Lastly, we conduct an ablation study to examine the effectiveness of the components comprising AP.
translated by 谷歌翻译
Many applications require sparse neural networks due to space or inference time restrictions. There is a large body of work on training dense networks to yield sparse networks for inference, but this limits the size of the largest trainable sparse model to that of the largest trainable dense model. In this paper we introduce a method to train sparse neural networks with a fixed parameter count and a fixed computational cost throughout training, without sacrificing accuracy relative to existing dense-tosparse training methods. Our method updates the topology of the sparse network during training by using parameter magnitudes and infrequent gradient calculations. We show that this approach requires fewer floating-point operations (FLOPs) to achieve a given level of accuracy compared to prior techniques. We demonstrate state-of-the-art sparse training results on a variety of networks and datasets, including ResNet-50, MobileNets on Imagenet-2012, and RNNs on WikiText-103. Finally, we provide some insights into why allowing the topology to change during the optimization can overcome local minima encountered when the topology remains static * .
translated by 谷歌翻译
现有的可区分通道修剪方法通常将缩放因子或掩模在通道后面的掩盖范围内,以减少重要性的修剪过滤器,并假设输入样品统一贡献以过滤重要性。具体而言,实例复杂性对修剪性能的影响尚未得到充分研究。在本文中,我们提出了一个基于实例复杂性滤波器的重要性得分的简单而有效的可区分网络修剪方法上限。我们通过给硬样品给出更高的权重来定义每个样品的实例复杂性与重量相关的重量,并测量样品特异性软膜的加权总和,以模拟不同输入的非均匀贡献,这鼓励硬样品主导修剪过程和模型性能保存完好。此外,我们还引入了一个新的正规器,以鼓励面具两极分化,以便很容易找到甜蜜的位置以识别要修剪的过滤器。各种网络体系结构和数据集的性能评估表明,CAP在修剪大型网络方面具有优势。例如,CAP在删除65.64%的拖鞋后,CAP在CIFAR-10数据集上的RESNET56的准确性提高了0.33%,而Prunes在ImagEnet数据集上的RESNET50的PRUNES 87.75%,只有0.89%的TOP-1精度损失。
translated by 谷歌翻译
已知神经模型被过度参数化,最近的工作表明,稀疏的文本到语音(TTS)模型可以超过密集的模型。尽管已经为其他域提出了大量稀疏方法,但这种方法很少在TTS中应用。在这项工作中,我们试图回答以下问题:所选稀疏技术在性能和模型复杂性上的特征是什么?我们比较了Tacotron2基线和应用五种技术的结果。然后,我们通过自然性,清晰度和韵律来评估表现,同时报告模型规模和训练时间。与先前的研究相辅相成,我们发现在训练之前或期间进行修剪可以实现与训练后的修剪相似的性能,并且可以更快地进行培训,同时除去整个神经元降低了性能远不止于删除参数。据我们所知,这是比较文本到语音综合中稀疏范式的第一部作品。
translated by 谷歌翻译