深度神经网络在计算机视野领域取得了重大进展。最近的研究表明,神经网络架构的深度,宽度和快捷方式连接在其性能中起着至关重要的作用。最先进的神经网络架构DenSenet之一,通过密集连接实现了优异的收敛速率。但是,它仍然具有明显的缺点在内存量的使用情况。在本文中,我们介绍了一种新型的修剪工具,阈值,这是指MOSFET中阈值电压的原理。这项工作采用此方法以不同的方式连接不同深度的块以减少内存的使用情况。它表示为阈值。我们在CiFar10的数据集上评估阈值和其他不同网络。实验表明,HardNet是DenSenet的两倍,在此基础上,阈值比HardNet更快10%,误差率降低10%。
translated by 谷歌翻译
随着计算机愿景任务中的神经网络的不断发展,越来越多的网络架构取得了突出的成功。作为最先进的神经网络架构之一,DenSenet捷径所有特征映射都可以解决模型深度的问题。虽然这种网络架构在低MAC(乘法和累积)上具有优异的准确性,但它需要过度推理时间。为了解决这个问题,HardNet减少了特征映射之间的连接,使得其余连接类似于谐波。然而,这种压缩方法可能导致模型精度和增加的MAC和模型大小降低。该网络架构仅减少了内存访问时间,需要改进其整体性能。因此,我们提出了一种新的网络架构,使用阈值机制来进一步优化连接方法。丢弃不同卷积层的不同数量的连接以压缩阈值中的特征映射。所提出的网络架构使用了三个数据集,CiFar-10,CiFar-100和SVHN,以评估图像分类的性能。实验结果表明,与DENSENET相比,阈值可降低推理时间高达60%,并且在这些数据集上的硬盘相比,训练速度快高达35%的训练速度和20%的误差率降低。
translated by 谷歌翻译
卷积神经网络(CNN)通过堆叠卷积层增加深度,而更深的网络模型在图像识别方面的表现更好。实证研究表明,简单地堆叠卷积层不会使网络训练更好,而跳过连接(残留学习)可以改善网络模型性能。对于图像分类任务,具有全球密集连接体系结构的模型在ImageNet等大型数据集中表现良好,但不适用于CIFAR-10和SVHN等小型数据集。与密集的连接不同,我们提出了两种连接层的新算法。基线是一个密集的连接网络,由两个新算法连接的网络分别命名为ShortNet1和ShortNet2。CIFAR-10和SVHN上图像分类的实验结果表明,ShortNet1的测试错误率低5%,推理时间比基线快25%。ShortNet2将推理时间加速40%,测试准确性损失较小。
translated by 谷歌翻译
通过在计算机视觉(CV)领域深度学习算法的良好性能,卷积神经网络(CNN)体系结构已成为计算机视觉任务的主要骨干。随着移动设备的广泛使用,基于计算能力低的平台的神经网络模型逐渐引起人们的注意。但是,由于计算能力的限制,移动设备上通常无法使用深度学习算法。本文提出了一个轻巧的卷积神经网络TripLenet,可以在Raspberry Pi上轻松运行。从阈值中的块连接概念中采用,新提出的网络模型会压缩并加速网络模型,减少网络的参数量,并在确保准确性的同时缩短每个图像的推理时间。我们提出的TripLenet和其他最先进的(SOTA)神经网络在Raspberry Pi上使用CIFAR-10和SVHN数据集进行了图像分类实验。实验结果表明,与GhostNet,Mobilenet,Theashnet,EdefityNet和HardNet相比,每图像的推理时间分别缩短了15%,16%,17%,24%和30%。
translated by 谷歌翻译
Recent work has shown that convolutional networks can be substantially deeper, more accurate, and efficient to train if they contain shorter connections between layers close to the input and those close to the output. In this paper, we embrace this observation and introduce the Dense Convolutional Network (DenseNet), which connects each layer to every other layer in a feed-forward fashion. Whereas traditional convolutional networks with L layers have L connections-one between each layer and its subsequent layer-our network has L(L+1) 2 direct connections. For each layer, the feature-maps of all preceding layers are used as inputs, and its own feature-maps are used as inputs into all subsequent layers. DenseNets have several compelling advantages: they alleviate the vanishing-gradient problem, strengthen feature propagation, encourage feature reuse, and substantially reduce the number of parameters. We evaluate our proposed architecture on four highly competitive object recognition benchmark tasks SVHN, and ImageNet). DenseNets obtain significant improvements over the state-of-the-art on most of them, whilst requiring less computation to achieve high performance. Code and pre-trained models are available at https://github.com/liuzhuang13/DenseNet.
translated by 谷歌翻译
虽然残留连接使训练非常深的神经网络,但由于其多分支拓扑而​​导致在线推断不友好。这鼓励许多研究人员在推动时没有残留连接的情况下设计DNN。例如,repvgg在部署时将多分支拓扑重新参数化为vgg型(单分支)模型,当网络相对较浅时显示出具有很大的性能。但是,RepVGG不能等效地将Reset转换为VGG,因为重新参数化方法只能应用于线性块,并且必须将非线性层(Relu)放在残余连接之外,这导致了有限的表示能力,特别是更深入网络。在本文中,我们的目标是通过在Resblock上的保留和合并(RM)操作等效地纠正此问题,并提出删除Vanilla Reset中的残留连接。具体地,RM操作允许输入特征映射通过块,同时保留其信息,并在每个块的末尾合并所有信息,这可以去除残差而不改变原始输出。作为一个插件方法,RM操作基本上有三个优点:1)其实现使其实现高比率网络修剪。 2)它有助于打破RepVGG的深度限制。 3)与Reset和RepVGG相比,它导致更好的精度速度折衷网络(RMNet)。我们相信RM操作的意识形态可以激发对未来社区的模型设计的许多见解。代码可用:https://github.com/fxmeng/rmnet。
translated by 谷歌翻译
Capturing feature information effectively is of great importance in vision tasks. With the development of convolutional neural networks (CNNs), concepts like residual connection and multiple scales promote continual performance gains on diverse deep learning vision tasks. However, the existing methods do not organically combined advantages of these valid ideas. In this paper, we propose a novel CNN architecture called GoogLe2Net, it consists of residual feature-reutilization inceptions (ResFRI) or split residual feature-reutilization inceptions (Split-ResFRI) which create transverse passages between adjacent groups of convolutional layers to enable features flow to latter processing branches and possess residual connections to better process information. Our GoogLe2Net is able to reutilize information captured by foregoing groups of convolutional layers and express multi-scale features at a fine-grained level, which improves performances in image classification. And the inception we proposed could be embedded into inception-like networks directly without any migration costs. Moreover, in experiments based on popular vision datasets, such as CIFAR10 (97.94%), CIFAR100 (85.91%) and Tiny Imagenet (70.54%), we obtain better results on image classification task compared with other modern models.
translated by 谷歌翻译
由于存储器和计算资源有限,部署在移动设备上的卷积神经网络(CNNS)是困难的。我们的目标是通过利用特征图中的冗余来设计包括CPU和GPU的异构设备的高效神经网络,这很少在神经结构设计中进行了研究。对于类似CPU的设备,我们提出了一种新颖的CPU高效的Ghost(C-Ghost)模块,以生成从廉价操作的更多特征映射。基于一组内在的特征映射,我们使用廉价的成本应用一系列线性变换,以生成许多幽灵特征图,可以完全揭示内在特征的信息。所提出的C-Ghost模块可以作为即插即用组件,以升级现有的卷积神经网络。 C-Ghost瓶颈旨在堆叠C-Ghost模块,然后可以轻松建立轻量级的C-Ghostnet。我们进一步考虑GPU设备的有效网络。在建筑阶段的情况下,不涉及太多的GPU效率(例如,深度明智的卷积),我们建议利用阶段明智的特征冗余来制定GPU高效的幽灵(G-GHOST)阶段结构。舞台中的特征被分成两个部分,其中使用具有较少输出通道的原始块处理第一部分,用于生成内在特征,另一个通过利用阶段明智的冗余来生成廉价的操作。在基准测试上进行的实验证明了所提出的C-Ghost模块和G-Ghost阶段的有效性。 C-Ghostnet和G-Ghostnet分别可以分别实现CPU和GPU的准确性和延迟的最佳权衡。代码可在https://github.com/huawei-noah/cv-backbones获得。
translated by 谷歌翻译
由于稀疏神经网络通常包含许多零权重,因此可以在不降低网络性能的情况下潜在地消除这些不必要的网络连接。因此,设计良好的稀疏神经网络具有显着降低拖鞋和计算资源的潜力。在这项工作中,我们提出了一种新的自动修剪方法 - 稀疏连接学习(SCL)。具体地,重量被重新参数化为可培训权重变量和二进制掩模的元素方向乘法。因此,由二进制掩模完全描述网络连接,其由单位步进函数调制。理论上,从理论上证明了使用直通估计器(STE)进行网络修剪的基本原理。这一原则是STE的代理梯度应该是积极的,确保掩模变量在其最小值处收敛。在找到泄漏的Relu后,SoftPlus和Identity Stes可以满足这个原理,我们建议采用SCL的身份STE以进行离散面膜松弛。我们发现不同特征的面具梯度非常不平衡,因此,我们建议将每个特征的掩模梯度标准化以优化掩码变量训练。为了自动训练稀疏掩码,我们将网络连接总数作为我们的客观函数中的正则化术语。由于SCL不需要由网络层设计人员定义的修剪标准或超级参数,因此在更大的假设空间中探讨了网络,以实现最佳性能的优化稀疏连接。 SCL克服了现有自动修剪方法的局限性。实验结果表明,SCL可以自动学习并选择各种基线网络结构的重要网络连接。 SCL培训的深度学习模型以稀疏性,精度和减少脚波特的SOTA人类设计和自动修剪方法训练。
translated by 谷歌翻译
在物联网(IoT)支持的网络边缘(IOT)上的人工智能(AI)的最新进展已通过启用低延期性和计算效率来实现多种应用程序(例如智能农业,智能医院和智能工厂)的优势情报。但是,部署最先进的卷积神经网络(CNN),例如VGG-16和在资源约束的边缘设备上的重新连接,由于其大量参数和浮点操作(Flops),因此实际上是不可行的。因此,将网络修剪作为一种模型压缩的概念正在引起注意在低功率设备上加速CNN。结构化或非结构化的最先进的修剪方法都不认为卷积层表现出的复杂性的不同基本性质,并遵循训练放回训练的管道,从而导致其他计算开销。在这项工作中,我们通过利用CNN的固有层层级复杂性来提出一种新颖和计算高效的修剪管道。与典型的方法不同,我们提出的复杂性驱动算法根据其对整体网络复杂性的贡献选择了特定层用于滤波器。我们遵循一个直接训练修剪模型并避免计算复杂排名和微调步骤的过程。此外,我们定义了修剪的三种模式,即参数感知(PA),拖网(FA)和内存感知(MA),以引入CNN的多功能压缩。我们的结果表明,我们的方法在准确性和加速方面的竞争性能。最后,我们提出了不同资源和准确性之间的权衡取舍,这对于开发人员在资源受限的物联网环境中做出正确的决策可能会有所帮助。
translated by 谷歌翻译
The success of CNNs in various applications is accompanied by a significant increase in the computation and parameter storage costs. Recent efforts toward reducing these overheads involve pruning and compressing the weights of various layers without hurting original accuracy. However, magnitude-based pruning of weights reduces a significant number of parameters from the fully connected layers and may not adequately reduce the computation costs in the convolutional layers due to irregular sparsity in the pruned networks. We present an acceleration method for CNNs, where we prune filters from CNNs that are identified as having a small effect on the output accuracy. By removing whole filters in the network together with their connecting feature maps, the computation costs are reduced significantly. In contrast to pruning weights, this approach does not result in sparse connectivity patterns. Hence, it does not need the support of sparse convolution libraries and can work with existing efficient BLAS libraries for dense matrix multiplications. We show that even simple filter pruning techniques can reduce inference costs for VGG-16 by up to 34% and ResNet-110 by up to 38% on CIFAR10 while regaining close to the original accuracy by retraining the networks.
translated by 谷歌翻译
修剪技术可全面使用图像分类压缩卷积神经网络(CNN)。但是,大多数修剪方法需要一个经过良好训练的模型,以提供有用的支持参数,例如C1-核心,批处理值和梯度信息,如果预训练的模型的参数为,这可能会导致过滤器评估的不一致性不太优化。因此,我们提出了一种基于敏感性的方法,可以通过为原始模型增加额外的损害来评估每一层的重要性。由于准确性的性能取决于参数在所有层而不是单个参数中的分布,因此基于灵敏度的方法将对参数的更新具有鲁棒性。也就是说,我们可以获得对不完美训练和完全训练的模型之间每个卷积层的相似重要性评估。对于CIFAR-10上的VGG-16,即使原始模型仅接受50个时期训练,我们也可以对层的重要性进行相同的评估,并在对模型进行充分训练时的结果。然后,我们将通过量化的灵敏度从每一层中删除过滤器。我们基于敏感性的修剪框架在VGG-16,分别具有CIFAR-10,MNIST和CIFAR-100的VGG-16上有效验证。
translated by 谷歌翻译
二进制神经网络(BNN)是卷积神经网络(CNN)的极端量化版本,其所有功能和权重映射到仅1位。尽管BNN节省了大量的内存和计算需求以使CNN适用于边缘或移动设备,但由于二进制后的表示能力降低,BNN遭受了网络性能的下降。在本文中,我们提出了一个新的可更换且易于使用的卷积模块reponv,该模块reponv通过复制输入或沿通道维度的输出来增强特征地图,而不是$ \ beta $ times,而没有额外的参数和卷积计算费用。我们还定义了一组Reptran规则,可以在整个BNN模块中使用Repconv,例如二进制卷积,完全连接的层和批处理归一化。实验表明,在Reptran转换之后,一组高度引用的BNN与原始BNN版本相比,实现了普遍的性能。例如,Rep-Recu-Resnet-20的前1位准确性,即REPBCONV增强的RECU-RESNET-20,在CIFAR-10上达到了88.97%,比原始网络高1.47%。 Rep-Adambnn-Reactnet-A在Imagenet上获得了71.342%的TOP-1精度,这是BNN的最新结果。代码和型号可在以下网址提供:https://github.com/imfinethanks/rep_adambnn。
translated by 谷歌翻译
Deploying convolutional neural networks (CNNs) on embedded devices is difficult due to the limited memory and computation resources. The redundancy in feature maps is an important characteristic of those successful CNNs, but has rarely been investigated in neural architecture design. This paper proposes a novel Ghost module to generate more feature maps from cheap operations. Based on a set of intrinsic feature maps, we apply a series of linear transformations with cheap cost to generate many ghost feature maps that could fully reveal information underlying intrinsic features. The proposed Ghost module can be taken as a plug-and-play component to upgrade existing convolutional neural networks. Ghost bottlenecks are designed to stack Ghost modules, and then the lightweight Ghost-Net can be easily established. Experiments conducted on benchmarks demonstrate that the proposed Ghost module is an impressive alternative of convolution layers in baseline models, and our GhostNet can achieve higher recognition performance (e.g. 75.7% top-1 accuracy) than MobileNetV3 with similar computational cost on the ImageNet ILSVRC-2012 classification dataset. Code is available at https: //github.com/huawei-noah/ghostnet.
translated by 谷歌翻译
Neural networks are both computationally intensive and memory intensive, making them difficult to deploy on embedded systems. Also, conventional networks fix the architecture before training starts; as a result, training cannot improve the architecture. To address these limitations, we describe a method to reduce the storage and computation required by neural networks by an order of magnitude without affecting their accuracy by learning only the important connections. Our method prunes redundant connections using a three-step method. First, we train the network to learn which connections are important. Next, we prune the unimportant connections. Finally, we retrain the network to fine tune the weights of the remaining connections. On the ImageNet dataset, our method reduced the number of parameters of AlexNet by a factor of 9×, from 61 million to 6.7 million, without incurring accuracy loss. Similar experiments with VGG-16 found that the total number of parameters can be reduced by 13×, from 138 million to 10.3 million, again with no loss of accuracy.
translated by 谷歌翻译
Neural networks are both computationally intensive and memory intensive, making them difficult to deploy on embedded systems. Also, conventional networks fix the architecture before training starts; as a result, training cannot improve the architecture. To address these limitations, we describe a method to reduce the storage and computation required by neural networks by an order of magnitude without affecting their accuracy by learning only the important connections. Our method prunes redundant connections using a three-step method. First, we train the network to learn which connections are important. Next, we prune the unimportant connections. Finally, we retrain the network to fine tune the weights of the remaining connections. On the ImageNet dataset, our method reduced the number of parameters of AlexNet by a factor of 9×, from 61 million to 6.7 million, without incurring accuracy loss. Similar experiments with VGG-16 found that the total number of parameters can be reduced by 13×, from 138 million to 10.3 million, again with no loss of accuracy.
translated by 谷歌翻译
The deployment of deep convolutional neural networks (CNNs) in many real world applications is largely hindered by their high computational cost. In this paper, we propose a novel learning scheme for CNNs to simultaneously 1) reduce the model size; 2) decrease the run-time memory footprint; and 3) lower the number of computing operations, without compromising accuracy. This is achieved by enforcing channel-level sparsity in the network in a simple but effective way. Different from many existing approaches, the proposed method directly applies to modern CNN architectures, introduces minimum overhead to the training process, and requires no special software/hardware accelerators for the resulting models. We call our approach network slimming, which takes wide and large networks as input models, but during training insignificant channels are automatically identified and pruned afterwards, yielding thin and compact models with comparable accuracy. We empirically demonstrate the effectiveness of our approach with several state-of-the-art CNN models, including VGGNet, ResNet and DenseNet, on various image classification datasets. For VGGNet, a multi-pass version of network slimming gives a 20× reduction in model size and a 5× reduction in computing operations.
translated by 谷歌翻译
网络压缩对于使深网的效率更高,更快且可推广到低端硬件至关重要。当前的网络压缩方法有两个开放问题:首先,缺乏理论框架来估计最大压缩率;其次,有些层可能会过多地进行,从而导致网络性能大幅下降。为了解决这两个问题,这项研究提出了一种基于梯度矩阵分析方法,以估计最大网络冗余。在最大速率的指导下,开发了一种新颖而有效的层次网络修剪算法,以最大程度地凝结神经元网络结构而无需牺牲网络性能。进行实质性实验以证明新方法修剪几个高级卷积神经网络(CNN)体系结构的功效。与现有的修剪方法相比,拟议的修剪算法实现了最先进的性能。与其他方法相比,在相同或相似的压缩比下,新方法提供了最高的网络预测准确性。
translated by 谷歌翻译
Deep residual networks were shown to be able to scale up to thousands of layers and still have improving performance. However, each fraction of a percent of improved accuracy costs nearly doubling the number of layers, and so training very deep residual networks has a problem of diminishing feature reuse, which makes these networks very slow to train. To tackle these problems, in this paper we conduct a detailed experimental study on the architecture of ResNet blocks, based on which we propose a novel architecture where we decrease depth and increase width of residual networks. We call the resulting network structures wide residual networks (WRNs) and show that these are far superior over their commonly used thin and very deep counterparts. For example, we demonstrate that even a simple 16-layer-deep wide residual network outperforms in accuracy and efficiency all previous deep residual networks, including thousand-layerdeep networks, achieving new state-of-the-art results on CIFAR, SVHN, COCO, and significant improvements on ImageNet. Our code and models are available at https: //github.com/szagoruyko/wide-residual-networks.
translated by 谷歌翻译
Deep neural networks (DNNs) are currently widely used for many artificial intelligence (AI) applications including computer vision, speech recognition, and robotics. While DNNs deliver state-of-the-art accuracy on many AI tasks, it comes at the cost of high computational complexity. Accordingly, techniques that enable efficient processing of DNNs to improve energy efficiency and throughput without sacrificing application accuracy or increasing hardware cost are critical to the wide deployment of DNNs in AI systems.This article aims to provide a comprehensive tutorial and survey about the recent advances towards the goal of enabling efficient processing of DNNs. Specifically, it will provide an overview of DNNs, discuss various hardware platforms and architectures that support DNNs, and highlight key trends in reducing the computation cost of DNNs either solely via hardware design changes or via joint hardware design and DNN algorithm changes. It will also summarize various development resources that enable researchers and practitioners to quickly get started in this field, and highlight important benchmarking metrics and design considerations that should be used for evaluating the rapidly growing number of DNN hardware designs, optionally including algorithmic co-designs, being proposed in academia and industry.The reader will take away the following concepts from this article: understand the key design considerations for DNNs; be able to evaluate different DNN hardware implementations with benchmarks and comparison metrics; understand the trade-offs between various hardware architectures and platforms; be able to evaluate the utility of various DNN design techniques for efficient processing; and understand recent implementation trends and opportunities.
translated by 谷歌翻译