识别各个网络单元的状态对于了解卷积神经网络(CNNS)的机制至关重要。但是,它仍然有挑战性,可以可靠地指示单位状态,特别是对于不同网络模型中的单位。为此,我们提出了一种使用代数拓扑工具定量阐明CNN中单位状态的新方法。单位状态通过计算定义的拓扑熵来指示称为特征熵,该特征熵,测量隐藏在单位的全局空间模式的混沌程度。通过这种方式,特征熵可以提供不同网络中单位的准确指示,具有不同的情况,如权重操作。此外,我们表明特征熵随着层次更深,并且在训练期间几乎同时同时趋于趋势而分享。我们表明,通过调查仅在培训数据上的单位的特征熵,它可以从特征表示的有效性看出具有不同泛化能力的网络之间的歧视。
translated by 谷歌翻译
Network pruning is widely used for reducing the heavy inference cost of deep models in low-resource settings. A typical pruning algorithm is a three-stage pipeline, i.e., training (a large model), pruning and fine-tuning. During pruning, according to a certain criterion, redundant weights are pruned and important weights are kept to best preserve the accuracy. In this work, we make several surprising observations which contradict common beliefs. For all state-of-the-art structured pruning algorithms we examined, fine-tuning a pruned model only gives comparable or worse performance than training that model with randomly initialized weights. For pruning algorithms which assume a predefined target network architecture, one can get rid of the full pipeline and directly train the target network from scratch. Our observations are consistent for multiple network architectures, datasets, and tasks, which imply that: 1) training a large, over-parameterized model is often not necessary to obtain an efficient final model, 2) learned "important" weights of the large model are typically not useful for the small pruned model, 3) the pruned architecture itself, rather than a set of inherited "important" weights, is more crucial to the efficiency in the final model, which suggests that in some cases pruning can be useful as an architecture search paradigm. Our results suggest the need for more careful baseline evaluations in future research on structured pruning methods. We also compare with the "Lottery Ticket Hypothesis" (Frankle & Carbin, 2019), and find that with optimal learning rate, the "winning ticket" initialization as used in Frankle & Carbin (2019) does not bring improvement over random initialization. * Equal contribution. † Work done while visiting UC Berkeley.
translated by 谷歌翻译
Neural network pruning techniques can reduce the parameter counts of trained networks by over 90%, decreasing storage requirements and improving computational performance of inference without compromising accuracy. However, contemporary experience is that the sparse architectures produced by pruning are difficult to train from the start, which would similarly improve training performance.We find that a standard pruning technique naturally uncovers subnetworks whose initializations made them capable of training effectively. Based on these results, we articulate the lottery ticket hypothesis: dense, randomly-initialized, feed-forward networks contain subnetworks (winning tickets) that-when trained in isolationreach test accuracy comparable to the original network in a similar number of iterations. The winning tickets we find have won the initialization lottery: their connections have initial weights that make training particularly effective.We present an algorithm to identify winning tickets and a series of experiments that support the lottery ticket hypothesis and the importance of these fortuitous initializations. We consistently find winning tickets that are less than 10-20% of the size of several fully-connected and convolutional feed-forward architectures for MNIST and CIFAR10. Above this size, the winning tickets that we find learn faster than the original network and reach higher test accuracy.
translated by 谷歌翻译
由于存储器和计算资源有限,部署在移动设备上的卷积神经网络(CNNS)是困难的。我们的目标是通过利用特征图中的冗余来设计包括CPU和GPU的异构设备的高效神经网络,这很少在神经结构设计中进行了研究。对于类似CPU的设备,我们提出了一种新颖的CPU高效的Ghost(C-Ghost)模块,以生成从廉价操作的更多特征映射。基于一组内在的特征映射,我们使用廉价的成本应用一系列线性变换,以生成许多幽灵特征图,可以完全揭示内在特征的信息。所提出的C-Ghost模块可以作为即插即用组件,以升级现有的卷积神经网络。 C-Ghost瓶颈旨在堆叠C-Ghost模块,然后可以轻松建立轻量级的C-Ghostnet。我们进一步考虑GPU设备的有效网络。在建筑阶段的情况下,不涉及太多的GPU效率(例如,深度明智的卷积),我们建议利用阶段明智的特征冗余来制定GPU高效的幽灵(G-GHOST)阶段结构。舞台中的特征被分成两个部分,其中使用具有较少输出通道的原始块处理第一部分,用于生成内在特征,另一个通过利用阶段明智的冗余来生成廉价的操作。在基准测试上进行的实验证明了所提出的C-Ghost模块和G-Ghost阶段的有效性。 C-Ghostnet和G-Ghostnet分别可以分别实现CPU和GPU的准确性和延迟的最佳权衡。代码可在https://github.com/huawei-noah/cv-backbones获得。
translated by 谷歌翻译
We propose an efficient and unified framework, namely ThiNet, to simultaneously accelerate and compress CNN models in both training and inference stages. We focus on the filter level pruning, i.e., the whole filter would be discarded if it is less important. Our method does not change the original network structure, thus it can be perfectly supported by any off-the-shelf deep learning libraries. We formally establish filter pruning as an optimization problem, and reveal that we need to prune filters based on statistics information computed from its next layer, not the current layer, which differentiates ThiNet from existing methods. Experimental results demonstrate the effectiveness of this strategy, which has advanced the state-of-the-art. We also show the performance of ThiNet on ILSVRC-12 benchmark. ThiNet achieves 3.31× FLOPs reduction and 16.63× compression on VGG-16, with only 0.52% top-5 accuracy drop. Similar experiments with ResNet-50 reveal that even for a compact network, ThiNet can also reduce more than half of the parameters and FLOPs, at the cost of roughly 1% top-5 accuracy drop. Moreover, the original VGG-16 model can be further pruned into a very small model with only 5.05MB model size, preserving AlexNet level accuracy but showing much stronger generalization ability.
translated by 谷歌翻译
We propose a new formulation for pruning convolutional kernels in neural networks to enable efficient inference. We interleave greedy criteria-based pruning with finetuning by backpropagation-a computationally efficient procedure that maintains good generalization in the pruned network. We propose a new criterion based on Taylor expansion that approximates the change in the cost function induced by pruning network parameters. We focus on transfer learning, where large pretrained networks are adapted to specialized tasks. The proposed criterion demonstrates superior performance compared to other criteria, e.g. the norm of kernel weights or feature map activation, for pruning large CNNs after adaptation to fine-grained classification tasks (Birds-200 and Flowers-102) relaying only on the first order gradient information. We also show that pruning can lead to more than 10× theoretical reduction in adapted 3D-convolutional filters with a small drop in accuracy in a recurrent gesture classifier. Finally, we show results for the largescale ImageNet dataset to emphasize the flexibility of our approach.
translated by 谷歌翻译
由于稀疏神经网络通常包含许多零权重,因此可以在不降低网络性能的情况下潜在地消除这些不必要的网络连接。因此,设计良好的稀疏神经网络具有显着降低拖鞋和计算资源的潜力。在这项工作中,我们提出了一种新的自动修剪方法 - 稀疏连接学习(SCL)。具体地,重量被重新参数化为可培训权重变量和二进制掩模的元素方向乘法。因此,由二进制掩模完全描述网络连接,其由单位步进函数调制。理论上,从理论上证明了使用直通估计器(STE)进行网络修剪的基本原理。这一原则是STE的代理梯度应该是积极的,确保掩模变量在其最小值处收敛。在找到泄漏的Relu后,SoftPlus和Identity Stes可以满足这个原理,我们建议采用SCL的身份STE以进行离散面膜松弛。我们发现不同特征的面具梯度非常不平衡,因此,我们建议将每个特征的掩模梯度标准化以优化掩码变量训练。为了自动训练稀疏掩码,我们将网络连接总数作为我们的客观函数中的正则化术语。由于SCL不需要由网络层设计人员定义的修剪标准或超级参数,因此在更大的假设空间中探讨了网络,以实现最佳性能的优化稀疏连接。 SCL克服了现有自动修剪方法的局限性。实验结果表明,SCL可以自动学习并选择各种基线网络结构的重要网络连接。 SCL培训的深度学习模型以稀疏性,精度和减少脚波特的SOTA人类设计和自动修剪方法训练。
translated by 谷歌翻译
有很好的参数来支持声明,特征表示最终从一般到深度神经网络(DNN)的特定转换,但这种转变仍然相对缺乏缺陷。在这项工作中,我们向理解特征表示的转换来移动一个微小的步骤。我们首先通过分析中间层中的类分离,然后将类别分离过程作为动态图中的社区演变进行了描述。然后,我们介绍模块化,是图形理论中的常见度量,量化社区的演变。我们发现,随着层更深,而是下降或达到特定层的高原,模块化趋于上升。通过渐近分析,我们表明模块化可以提供对特征表示转换的定量分析。通过了解特征表示,我们表明模块化也可用于识别和定位DNN中的冗余层,这为图层修剪提供了理论指导。基于这种鼓舞人心的发现,我们提出了一种基于模块化的层面修剪方法。进一步的实验表明,我们的方法可以修剪冗余层,对性能的影响最小。该代码可在https://github.com/yaolu-zjut/dynamic-graphs-construction中获得。
translated by 谷歌翻译
现代的神经网络是著名的,但也高度多余和可压缩。在深度学习文献中存在许多修剪策略,这些策略产生了超过90%的稀疏子网,这些子网已全面训练,密集的体系结构,同时仍保持其原始精度。不过,在这些方法中,由于其概念上的简单性,易于实施和功效 - 迭代幅度修剪(IMP)在实践中占主导地位,并且实际上是在修剪社区中击败的基线。但是,关于为什么像IMP这样的简单方法完全有限的理论解释是很少且有限的。在这项工作中,我们利用持续的同源性的概念来了解IMP的运作,并表明它本质地鼓励保留那些保留受过训练的网络中拓扑信息的权重。随后,我们还提供有关在完美保留其零订单拓扑特征的同时可以修剪多少不同网络的界限,并为IMP的修改版本提供了相同的操作。
translated by 谷歌翻译
Neural network pruning-the task of reducing the size of a network by removing parameters-has been the subject of a great deal of work in recent years. We provide a meta-analysis of the literature, including an overview of approaches to pruning and consistent findings in the literature. After aggregating results across 81 papers and pruning hundreds of models in controlled conditions, our clearest finding is that the community suffers from a lack of standardized benchmarks and metrics. This deficiency is substantial enough that it is hard to compare pruning techniques to one another or determine how much progress the field has made over the past three decades. To address this situation, we identify issues with current practices, suggest concrete remedies, and introduce ShrinkBench, an open-source framework to facilitate standardized evaluations of pruning methods. We use ShrinkBench to compare various pruning techniques and show that its comprehensive evaluation can prevent common pitfalls when comparing pruning methods.
translated by 谷歌翻译
过滤器修剪方法通过去除选定的过滤器来引入结构稀疏性,因此对于降低复杂性特别有效。先前的作品从验证较小规范的过滤器的角度从经验修剪网络中造成了较小的最终结果贡献。但是,此类标准已被证明对过滤器的分布敏感,并且由于修剪后的容量差距是固定的,因此准确性可能很难恢复。在本文中,我们提出了一种称为渐近软簇修剪(ASCP)的新型过滤器修剪方法,以根据过滤器的相似性来识别网络的冗余。首先通过聚类来区分来自参数过度的网络的每个过滤器,然后重建以手动将冗余引入其中。提出了一些聚类指南,以更好地保留特征提取能力。重建后,允许更新过滤器,以消除错误选择的效果。此外,还采用了各种修剪率的衰减策略来稳定修剪过程并改善最终性能。通过逐渐在每个群集中生成更相同的过滤器,ASCP可以通过通道添加操作将其删除,几乎没有准确性下降。 CIFAR-10和Imagenet数据集的广泛实验表明,与许多最新算法相比,我们的方法可以取得竞争性结果。
translated by 谷歌翻译
信道修剪中最有效的方法之一是根据每个神经元的重要性来修剪。然而,测量每个神经元的重要性是NP难题。以前的作品通过考虑单层或多个连续的神经元层的统计来修剪。这些作品无法消除不同数据对重建错误模型的影响,并且目前没有工作证明参数的绝对值可以直接用作判断权重的重要性的基础。一种更合理的方法是消除准确测量影响力的批量数据之间的差异。在本文中,我们建议使用集合学习来培训不同批量数据的模型,并使用影响功能(来自强大的统计数据的经典技术)来学习算法跟踪模型的预测并返回其训练参数梯度,使其返回其训练参数梯度,使其返回其培训参数梯度,使其返回其培训参数梯度,使其返回其培训参数梯度,使其返回其训练参数梯度我们可以在预测过程中确定我们称之为“影响”的每个参数的责任。此外,我们理论上证明了深度网络的后传播是权重的影响函数的一阶泰勒近似。我们执行广泛的实验,以证明使用集合学习的思想基于影响功能的修剪将比仅关注误差重建更有效。 CIFAR的实验表明,影响修剪达到最先进的结果。
translated by 谷歌翻译
The success of CNNs in various applications is accompanied by a significant increase in the computation and parameter storage costs. Recent efforts toward reducing these overheads involve pruning and compressing the weights of various layers without hurting original accuracy. However, magnitude-based pruning of weights reduces a significant number of parameters from the fully connected layers and may not adequately reduce the computation costs in the convolutional layers due to irregular sparsity in the pruned networks. We present an acceleration method for CNNs, where we prune filters from CNNs that are identified as having a small effect on the output accuracy. By removing whole filters in the network together with their connecting feature maps, the computation costs are reduced significantly. In contrast to pruning weights, this approach does not result in sparse connectivity patterns. Hence, it does not need the support of sparse convolution libraries and can work with existing efficient BLAS libraries for dense matrix multiplications. We show that even simple filter pruning techniques can reduce inference costs for VGG-16 by up to 34% and ResNet-110 by up to 38% on CIFAR10 while regaining close to the original accuracy by retraining the networks.
translated by 谷歌翻译
深度学习技术在各种任务中都表现出了出色的有效性,并且深度学习具有推进多种应用程序(包括在边缘计算中)的潜力,其中将深层模型部署在边缘设备上,以实现即时的数据处理和响应。一个关键的挑战是,虽然深层模型的应用通常会产生大量的内存和计算成本,但Edge设备通常只提供非常有限的存储和计算功能,这些功能可能会在各个设备之间差异很大。这些特征使得难以构建深度学习解决方案,以释放边缘设备的潜力,同时遵守其约束。应对这一挑战的一种有希望的方法是自动化有效的深度学习模型的设计,这些模型轻巧,仅需少量存储,并且仅产生低计算开销。该调查提供了针对边缘计算的深度学习模型设计自动化技术的全面覆盖。它提供了关键指标的概述和比较,这些指标通常用于量化模型在有效性,轻度和计算成本方面的水平。然后,该调查涵盖了深层设计自动化技术的三类最新技术:自动化神经体系结构搜索,自动化模型压缩以及联合自动化设计和压缩。最后,调查涵盖了未来研究的开放问题和方向。
translated by 谷歌翻译
网络压缩对于使深网的效率更高,更快且可推广到低端硬件至关重要。当前的网络压缩方法有两个开放问题:首先,缺乏理论框架来估计最大压缩率;其次,有些层可能会过多地进行,从而导致网络性能大幅下降。为了解决这两个问题,这项研究提出了一种基于梯度矩阵分析方法,以估计最大网络冗余。在最大速率的指导下,开发了一种新颖而有效的层次网络修剪算法,以最大程度地凝结神经元网络结构而无需牺牲网络性能。进行实质性实验以证明新方法修剪几个高级卷积神经网络(CNN)体系结构的功效。与现有的修剪方法相比,拟议的修剪算法实现了最先进的性能。与其他方法相比,在相同或相似的压缩比下,新方法提供了最高的网络预测准确性。
translated by 谷歌翻译
Image classification with small datasets has been an active research area in the recent past. However, as research in this scope is still in its infancy, two key ingredients are missing for ensuring reliable and truthful progress: a systematic and extensive overview of the state of the art, and a common benchmark to allow for objective comparisons between published methods. This article addresses both issues. First, we systematically organize and connect past studies to consolidate a community that is currently fragmented and scattered. Second, we propose a common benchmark that allows for an objective comparison of approaches. It consists of five datasets spanning various domains (e.g., natural images, medical imagery, satellite data) and data types (RGB, grayscale, multispectral). We use this benchmark to re-evaluate the standard cross-entropy baseline and ten existing methods published between 2017 and 2021 at renowned venues. Surprisingly, we find that thorough hyper-parameter tuning on held-out validation data results in a highly competitive baseline and highlights a stunted growth of performance over the years. Indeed, only a single specialized method dating back to 2019 clearly wins our benchmark and outperforms the baseline classifier.
translated by 谷歌翻译
轻量级模型设计已成为应用深度学习技术的重要方向,修剪是实现模型参数和拖鞋的大量减少的有效均值。现有的神经网络修剪方法主要从参数的重要性开始,以及设计参数评估度量来迭代地执行参数修剪。这些方法不是从模型拓扑的角度研究的,可能是有效但不高效的,并且需要完全不同的不同数据集修剪。在本文中,我们研究了神经网络的图形结构,并提出了常规的基于图的修剪(RGP)来执行单次神经网络修剪。我们生成常规图,将图的节点度值设置为满足修剪比率,并通过将边缘交换以获得最佳边缘分布来降低曲线图的平均最短路径长度。最后,将获得的图形映射到神经网络结构中以实现修剪。实验表明,曲线图的平均最短路径长度与相应神经网络的分类精度负相关,所提出的RGP显示出强的精度保持能力,具有极高的参数减少(超过90%)和拖鞋(更多超过90%)。
translated by 谷歌翻译
最近有一项激烈的活动在嵌入非常高维和非线性数据结构的嵌入中,其中大部分在数据科学和机器学习文献中。我们分四部分调查这项活动。在第一部分中,我们涵盖了非线性方法,例如主曲线,多维缩放,局部线性方法,ISOMAP,基于图形的方法和扩散映射,基于内核的方法和随机投影。第二部分与拓扑嵌入方法有关,特别是将拓扑特性映射到持久图和映射器算法中。具有巨大增长的另一种类型的数据集是非常高维网络数据。第三部分中考虑的任务是如何将此类数据嵌入中等维度的向量空间中,以使数据适合传统技术,例如群集和分类技术。可以说,这是算法机器学习方法与统计建模(所谓的随机块建模)之间的对比度。在论文中,我们讨论了两种方法的利弊。调查的最后一部分涉及嵌入$ \ mathbb {r}^ 2 $,即可视化中。提出了三种方法:基于第一部分,第二和第三部分中的方法,$ t $ -sne,UMAP和大节。在两个模拟数据集上进行了说明和比较。一个由嘈杂的ranunculoid曲线组成的三胞胎,另一个由随机块模型和两种类型的节点产生的复杂性的网络组成。
translated by 谷歌翻译
近年来,变压器模型的引入引发了自然语言处理(NLP)的革命。伯特(Bert)是仅使用注意机制的第一批文本编码者之一,没有任何复发部分来实现许多NLP任务的最新结果。本文使用拓扑数据分析介绍了文本分类器。我们将BERT的注意图转换为注意图作为该分类器的唯一输入。该模型可以解决诸如将垃圾邮件与HAM消息区分开的任务,认识到语法正确的句子,或将电影评论评估为负面还是正面。它与BERT基线相当表现,并在某些任务上表现优于它。此外,我们提出了一种新方法,以减少拓扑分类器考虑的BERT注意力头的数量,这使我们能够修剪从144个下降到只有10个,而不会降低性能。我们的工作还表明,拓扑模型比原始的BERT模型表现出对对抗性攻击的鲁棒性,该模型在修剪过程中维持。据我们所知,这项工作是第一个在NLP背景下以对抗性攻击的基于拓扑的模型。
translated by 谷歌翻译
近年来,深度神经网络在各种应用领域中都有广泛的成功。但是,它们需要重要的计算和内存资源,严重阻碍其部署,特别是在移动设备上或实时应用程序。神经网络通常涉及大量参数,该参数对应于网络的权重。在培训过程中获得的这种参数是用于网络性能的决定因素。但是,它们也非常冗余。修剪方法尤其试图通过识别和移除不相关的重量来减小参数集的大小。在本文中,我们研究了培训策略对修剪效率的影响。考虑和比较了两种培训方式:(1)微调和(2)从头开始。在四个数据集(CIFAR10,CiFAR100,SVHN和CALTECH101)上获得的实验结果和两个不同的CNNS(VGG16和MOBILENET)证明已经在大语料库(例如想象成)上预先培训的网络,然后进行微调特定数据集可以更有效地修剪(高达80%的参数减少),而不是从头开始培训的相同网络。
translated by 谷歌翻译