基于训练后辍学的方法实现了高稀疏性,并且是解释与计算成本和神经网络架构中过度拟合的问题的良好方法。相反,初始化修剪仍然远远落后。当涉及到网络的计算成本时,初始化修剪更有效。此外,它可以处理过度拟合以及培训后辍学。在对上述原因的认可中,本文提出了两种初始化时修剪的方法。目标是在保持性能的同时获得更高的稀疏性。 1)K-starts,从初始化时k随机p-sparse矩阵开始。在前几个时期,网络然后确定了这些P-Sparse矩阵的“优胜者”,以尝试找到“彩票” P-SPARSE网络。进化算法如何找到最好的个体来采用这种方法。根据神经网络体系结构,健身标准可以基于网络权重的大小,梯度积累的幅度或两者的组合。 2)耗散梯度方法,目的是消除在前几个时期内保持其初始值的一部分的权重。尽管它们的幅度最佳地保留了网络的性能,但以这种方式去除权重。相反,该方法还需要最多的时期才能达到更高的稀疏性。 3)耗散梯度和KSTART的组合始终优于方法和随机辍学。使用提供的相关方法的好处是:1)他们不需要对分类任务的特定知识,固定辍学阈值或正则化参数2)模型的重新训练既不是必要的,也不影响P-SPARSE网络的性能。
translated by 谷歌翻译
Neural network pruning techniques can reduce the parameter counts of trained networks by over 90%, decreasing storage requirements and improving computational performance of inference without compromising accuracy. However, contemporary experience is that the sparse architectures produced by pruning are difficult to train from the start, which would similarly improve training performance.We find that a standard pruning technique naturally uncovers subnetworks whose initializations made them capable of training effectively. Based on these results, we articulate the lottery ticket hypothesis: dense, randomly-initialized, feed-forward networks contain subnetworks (winning tickets) that-when trained in isolationreach test accuracy comparable to the original network in a similar number of iterations. The winning tickets we find have won the initialization lottery: their connections have initial weights that make training particularly effective.We present an algorithm to identify winning tickets and a series of experiments that support the lottery ticket hypothesis and the importance of these fortuitous initializations. We consistently find winning tickets that are less than 10-20% of the size of several fully-connected and convolutional feed-forward architectures for MNIST and CIFAR10. Above this size, the winning tickets that we find learn faster than the original network and reach higher test accuracy.
translated by 谷歌翻译
Many applications require sparse neural networks due to space or inference time restrictions. There is a large body of work on training dense networks to yield sparse networks for inference, but this limits the size of the largest trainable sparse model to that of the largest trainable dense model. In this paper we introduce a method to train sparse neural networks with a fixed parameter count and a fixed computational cost throughout training, without sacrificing accuracy relative to existing dense-tosparse training methods. Our method updates the topology of the sparse network during training by using parameter magnitudes and infrequent gradient calculations. We show that this approach requires fewer floating-point operations (FLOPs) to achieve a given level of accuracy compared to prior techniques. We demonstrate state-of-the-art sparse training results on a variety of networks and datasets, including ResNet-50, MobileNets on Imagenet-2012, and RNNs on WikiText-103. Finally, we provide some insights into why allowing the topology to change during the optimization can overcome local minima encountered when the topology remains static * .
translated by 谷歌翻译
Sparse neural networks attract increasing interest as they exhibit comparable performance to their dense counterparts while being computationally efficient. Pruning the dense neural networks is among the most widely used methods to obtain a sparse neural network. Driven by the high training cost of such methods that can be unaffordable for a low-resource device, training sparse neural networks sparsely from scratch has recently gained attention. However, existing sparse training algorithms suffer from various issues, including poor performance in high sparsity scenarios, computing dense gradient information during training, or pure random topology search. In this paper, inspired by the evolution of the biological brain and the Hebbian learning theory, we present a new sparse training approach that evolves sparse neural networks according to the behavior of neurons in the network. Concretely, by exploiting the cosine similarity metric to measure the importance of the connections, our proposed method, Cosine similarity-based and Random Topology Exploration (CTRE), evolves the topology of sparse neural networks by adding the most important connections to the network without calculating dense gradient in the backward. We carried out different experiments on eight datasets, including tabular, image, and text datasets, and demonstrate that our proposed method outperforms several state-of-the-art sparse training algorithms in extremely sparse neural networks by a large gap. The implementation code is available on https://github.com/zahraatashgahi/CTRE
translated by 谷歌翻译
激活功能(AFS)在神经网络的性能中起关键作用。整流线性单元(RELU)当前是最常用的AF。已经提出了几个替代者,但事实证明,改进措施不一致。一些AFS在特定任务中表现出更好的性能,但是很难先验如何选择合适的任务。研究标准完全连接的神经网络(FCN)和卷积神经网络(CNN),我们提出了一种新颖的,三个人群,共同进化算法来进化AFS,并将其与其他四种方法进行比较,即进化和非进化。在四个数据集(MNIST,FashionMnist,KMNIST和USPS)上进行了测试,共同进化被证明是找到良好的AFS和AF体系结构的性能算法。
translated by 谷歌翻译
Pruning large neural networks while maintaining their performance is often desirable due to the reduced space and time complexity. In existing methods, pruning is done within an iterative optimization procedure with either heuristically designed pruning schedules or additional hyperparameters, undermining their utility. In this work, we present a new approach that prunes a given network once at initialization prior to training. To achieve this, we introduce a saliency criterion based on connection sensitivity that identifies structurally important connections in the network for the given task. This eliminates the need for both pretraining and the complex pruning schedule while making it robust to architecture variations. After pruning, the sparse network is trained in the standard way. Our method obtains extremely sparse networks with virtually the same accuracy as the reference network on the MNIST, CIFAR-10, and Tiny-ImageNet classification tasks and is broadly applicable to various architectures including convolutional, residual and recurrent networks. Unlike existing methods, our approach enables us to demonstrate that the retained connections are indeed relevant to the given task.
translated by 谷歌翻译
由于稀疏神经网络通常包含许多零权重,因此可以在不降低网络性能的情况下潜在地消除这些不必要的网络连接。因此,设计良好的稀疏神经网络具有显着降低拖鞋和计算资源的潜力。在这项工作中,我们提出了一种新的自动修剪方法 - 稀疏连接学习(SCL)。具体地,重量被重新参数化为可培训权重变量和二进制掩模的元素方向乘法。因此,由二进制掩模完全描述网络连接,其由单位步进函数调制。理论上,从理论上证明了使用直通估计器(STE)进行网络修剪的基本原理。这一原则是STE的代理梯度应该是积极的,确保掩模变量在其最小值处收敛。在找到泄漏的Relu后,SoftPlus和Identity Stes可以满足这个原理,我们建议采用SCL的身份STE以进行离散面膜松弛。我们发现不同特征的面具梯度非常不平衡,因此,我们建议将每个特征的掩模梯度标准化以优化掩码变量训练。为了自动训练稀疏掩码,我们将网络连接总数作为我们的客观函数中的正则化术语。由于SCL不需要由网络层设计人员定义的修剪标准或超级参数,因此在更大的假设空间中探讨了网络,以实现最佳性能的优化稀疏连接。 SCL克服了现有自动修剪方法的局限性。实验结果表明,SCL可以自动学习并选择各种基线网络结构的重要网络连接。 SCL培训的深度学习模型以稀疏性,精度和减少脚波特的SOTA人类设计和自动修剪方法训练。
translated by 谷歌翻译
Neural networks are both computationally intensive and memory intensive, making them difficult to deploy on embedded systems. Also, conventional networks fix the architecture before training starts; as a result, training cannot improve the architecture. To address these limitations, we describe a method to reduce the storage and computation required by neural networks by an order of magnitude without affecting their accuracy by learning only the important connections. Our method prunes redundant connections using a three-step method. First, we train the network to learn which connections are important. Next, we prune the unimportant connections. Finally, we retrain the network to fine tune the weights of the remaining connections. On the ImageNet dataset, our method reduced the number of parameters of AlexNet by a factor of 9×, from 61 million to 6.7 million, without incurring accuracy loss. Similar experiments with VGG-16 found that the total number of parameters can be reduced by 13×, from 138 million to 10.3 million, again with no loss of accuracy.
translated by 谷歌翻译
Deep neural nets with a large number of parameters are very powerful machine learning systems. However, overfitting is a serious problem in such networks. Large networks are also slow to use, making it difficult to deal with overfitting by combining the predictions of many different large neural nets at test time. Dropout is a technique for addressing this problem. The key idea is to randomly drop units (along with their connections) from the neural network during training. This prevents units from co-adapting too much. During training, dropout samples from an exponential number of different "thinned" networks. At test time, it is easy to approximate the effect of averaging the predictions of all these thinned networks by simply using a single unthinned network that has smaller weights. This significantly reduces overfitting and gives major improvements over other regularization methods. We show that dropout improves the performance of neural networks on supervised learning tasks in vision, speech recognition, document classification and computational biology, obtaining state-of-the-art results on many benchmark data sets.
translated by 谷歌翻译
Neural networks are both computationally intensive and memory intensive, making them difficult to deploy on embedded systems. Also, conventional networks fix the architecture before training starts; as a result, training cannot improve the architecture. To address these limitations, we describe a method to reduce the storage and computation required by neural networks by an order of magnitude without affecting their accuracy by learning only the important connections. Our method prunes redundant connections using a three-step method. First, we train the network to learn which connections are important. Next, we prune the unimportant connections. Finally, we retrain the network to fine tune the weights of the remaining connections. On the ImageNet dataset, our method reduced the number of parameters of AlexNet by a factor of 9×, from 61 million to 6.7 million, without incurring accuracy loss. Similar experiments with VGG-16 found that the total number of parameters can be reduced by 13×, from 138 million to 10.3 million, again with no loss of accuracy.
translated by 谷歌翻译
分类是数据挖掘和机器学习领域中研究最多的任务之一,并且已经提出了文献中的许多作品来解决分类问题,以解决多个知识领域,例如医学,生物学,安全性和遥感。由于没有单个分类器可以为各种应用程序取得最佳结果,因此,一个很好的选择是采用分类器融合策略。分类器融合方法成功的关键点是属于合奏的分类器之间多样性和准确性的结合。借助文献中可用的大量分类模型,一个挑战是选择最终分类系统的最合适的分类器,从而产生了分类器选择策略的需求。我们通过基于一个称为CIF-E(分类器,初始化,健身函数和进化算法)的四步协议的分类器选择和融合的框架来解决这一点。我们按照提出的CIF-E协议实施和评估24种各种集合方法,并能够找到最准确的方法。在文献中最佳方法和许多其他基线中,还进行了比较分析。该实验表明,基于单变量分布算法(UMDA)的拟议进化方法可以超越许多著名的UCI数据集中最新的文献方法。
translated by 谷歌翻译
网络修剪是一种广泛使用的技术,用于有效地压缩深神经网络,几乎没有在推理期间在性能下降低。迭代幅度修剪(IMP)是由几种迭代训练和修剪步骤组成的网络修剪的最熟悉的方法之一,其中在修剪后丢失了大量网络的性能,然后在随后的再培训阶段中恢复。虽然常用为基准参考,但经常认为a)通过不将稀疏纳入训练阶段来达到次优状态,b)其全球选择标准未能正确地确定最佳层面修剪速率和c)其迭代性质使它变得缓慢和不竞争。根据最近提出的再培训技术,我们通过严格和一致的实验来调查这些索赔,我们将Impr到培训期间的训练算法进行比较,评估其选择标准的建议修改,并研究实际需要的迭代次数和总培训时间。我们发现IMP与SLR进行再培训,可以优于最先进的修剪期间,没有或仅具有很少的计算开销,即全局幅度选择标准在很大程度上具有更复杂的方法,并且只有几个刷新时期在实践中需要达到大部分稀疏性与IMP的诽谤 - 与性能权衡。我们的目标既可以证明基本的进攻已经可以提供最先进的修剪结果,甚至优于更加复杂或大量参数化方法,也可以为未来的研究建立更加现实但易于可实现的基线。
translated by 谷歌翻译
当前的深神经网络(DNN)被过度参数化,并在推断每个任务期间使用其大多数神经元连接。然而,人的大脑开发了针对不同任务的专门区域,并通过其神经元连接的一小部分进行推断。我们提出了一种迭代修剪策略,引入了一个简单的重要性评分度量度量,该指标可以停用不重要的连接,解决DNN中的过度参数化并调节射击模式。目的是找到仍然能够以可比精度解决给定任务的最小连接,即更简单的子网。我们在MNIST上实现了LENET体系结构的可比性能,并且与CIFAR-10/100和Tiny-ImageNet上的VGG和Resnet架构的最先进算法相比,参数压缩的性能明显更高。我们的方法对于考虑到ADAM和SGD的两个不同优化器也表现良好。该算法并非旨在在考虑当前的硬件和软件实现时最小化失败,尽管与最新技术相比,该算法的性能合理。
translated by 谷歌翻译
最近证明利用稀疏网络连接深神经网络中的连续层,可为大型最新模型提供好处。但是,网络连接性在浅网络的学习曲线中也起着重要作用,例如经典限制的玻尔兹曼机器(RBM)。一个基本问题是有效地找到了改善学习曲线的连接模式。最近的原则方法明确将网络连接作为参数,这些参数必须在模型中进行优化,但通常依靠连续功能来表示连接和明确的惩罚。这项工作提出了一种基于网络梯度的想法来找到RBM的最佳连接模式的方法:计算每个可能连接的梯度,给定特定的连接模式,并使用梯度驱动连续连接强度参数又使用确定连接模式。因此,学习RBM参数和学习网络连接是真正共同执行的,尽管学习率不同,并且没有改变目标函数。该方法应用于MNIST数据集,以显示针对样本生成和输入分类的基准任务找到更好的RBM模型。
translated by 谷歌翻译
将差异化随机梯度下降(DPSGD)应用于培训现代大规模神经网络(例如基于变压器的模型)是一项艰巨的任务,因为在每个迭代尺度上添加了噪声的幅度,都具有模型维度,从而阻碍了学习能力显著地。我们提出了一个统一的框架,即$ \ textsf {lsg} $,该框架充分利用了神经网络的低级别和稀疏结构,以减少梯度更新的维度,从而减轻DPSGD的负面影响。首先使用一对低级矩阵近似梯度更新。然后,一种新颖的策略用于稀疏梯度,从而导致低维,较少的嘈杂更新,这些更新尚未保留神经网络的性能。关于自然语言处理和计算机视觉任务的经验评估表明,我们的方法的表现优于其他最先进的基线。
translated by 谷歌翻译
自从各种任务的自动化开始以来,自动驾驶车辆一直引起人们的兴趣。人类容易疲惫,在道路上的响应时间缓慢,最重要的是,每年约有135万道路交通事故死亡,这已经是一项危险的任务。预计自动驾驶可以减少世界上驾驶事故的数量,这就是为什么这个问题对研究人员感兴趣的原因。目前,自动驾驶汽车在使车辆自动驾驶时使用不同的算法来实现各种子问题。我们将重点关注增强学习算法,更具体地说是Q学习算法和增强拓扑的神经进化(NEAT),即进化算法和人工神经网络的组合,以训练模型代理,以学习如何在给定路径上驱动。本文将重点介绍上述两种算法之间的比较。
translated by 谷歌翻译
We propose a simultaneous learning and pruning algorithm capable of identifying and eliminating irrelevant structures in a neural network during the early stages of training. Thus, the computational cost of subsequent training iterations, besides that of inference, is considerably reduced. Our method, based on variational inference principles using Gaussian scale mixture priors on neural network weights, learns the variational posterior distribution of Bernoulli random variables multiplying the units/filters similarly to adaptive dropout. Our algorithm, ensures that the Bernoulli parameters practically converge to either 0 or 1, establishing a deterministic final network. We analytically derive a novel hyper-prior distribution over the prior parameters that is crucial for their optimal selection and leads to consistent pruning levels and prediction accuracy regardless of weight initialization or the size of the starting network. We prove the convergence properties of our algorithm establishing theoretical and practical pruning conditions. We evaluate the proposed algorithm on the MNIST and CIFAR-10 data sets and the commonly used fully connected and convolutional LeNet and VGG16 architectures. The simulations show that our method achieves pruning levels on par with state-of the-art methods for structured pruning, while maintaining better test-accuracy and more importantly in a manner robust with respect to network initialization and initial size.
translated by 谷歌翻译
最近,稀疏的培训方法已开始作为事实上的人工神经网络的培训和推理效率的方法。然而,这种效率只是理论上。在实践中,每个人都使用二进制掩码来模拟稀疏性,因为典型的深度学习软件和硬件已针对密集的矩阵操作进行了优化。在本文中,我们采用正交方法,我们表明我们可以训练真正稀疏的神经网络以收获其全部潜力。为了实现这一目标,我们介绍了三个新颖的贡献,这些贡献是专门为稀疏神经网络设计的:(1)平行训练算法及其相应的稀疏实现,(2)具有不可训练的参数的激活功能,以支持梯度流动,以支持梯度流量, (3)隐藏的神经元对消除冗余的重要性指标。总而言之,我们能够打破记录并训练有史以来最大的神经网络在代表力方面训练 - 达到蝙蝠大脑的大小。结果表明,我们的方法具有最先进的表现,同时为环保人工智能时代开辟了道路。
translated by 谷歌翻译
我们为神经网络提出了一种新颖,结构化修剪算法 - 迭代,稀疏结构修剪算法,称为I-Spasp。从稀疏信号恢复的思想启发,I-Spasp通过迭代地识别网络内的较大的重要参数组(例如,滤波器或神经元),这些参数组大多数对修剪和密集网络输出之间的残差贡献,然后基于这些组阈值以较小的预定定义修剪比率。对于具有Relu激活的双层和多层网络架构,我们展示了通过多项式修剪修剪诱导的错误,该衰减是基于密集网络隐藏表示的稀疏性任意大的。在我们的实验中,I-Spasp在各种数据集(即MNIST和ImageNet)和架构(即馈送前向网络,Resnet34和MobileNetv2)中进行评估,其中显示用于发现高性能的子网和改进经过几种数量级的可提供基线方法的修剪效率。简而言之,I-Spasp很容易通过自动分化实现,实现强大的经验结果,具有理论收敛保证,并且是高效的,因此将自己区分开作为少数几个计算有效,实用,实用,实用,实用,实用,实用,实用,实用和可提供的修剪算法之一。
translated by 谷歌翻译
彩票假设引发了通过识别大型随机初始化神经网络的稀疏子网来实现结构学习的修剪算法的快速发展。这些“胜利门票”的存在理论上已被证明,但在次优稀疏水平。当代修剪算法还在努力确定复杂的学习任务的稀疏彩票票。这个次优稀疏仅仅是存在证明和算法的文物还是修剪方法的一般限制?并且,如果存在非常稀疏的罚单,则当前算法是能够找到它们的当前算法,或者是实现有效网络压缩所需的进一步改进吗?为了系统地回答这些问题,我们推导了一个框架来植物并隐藏大型随机初始化的神经网络中的目标架构。对于机器学习中的三个共同挑战,我们手工制作极其稀疏的网络拓扑,将它们植入大型神经网络,并评估最先进的彩票修剪方法。我们发现,修剪算法的当前局限性识别极其稀疏的票证是算法的,而不是基本的性质,并且预期我们的种植框架将促进有效修剪算法的未来发展,因为我们已经解决了所提出的领域缺失基线的问题Frankle等人。
translated by 谷歌翻译