The compute-intensive nature of neural networks (NNs) limits their deployment in resource-constrained environments such as cell phones, drones, autonomous robots, etc. Hence, developing robust sparse models fit for safety-critical applications has been an issue of longstanding interest. Though adversarial training with model sparsification has been combined to attain the goal, conventional adversarial training approaches provide no formal guarantee that the models would be robust against any rogue samples in a restricted space around a benign sample. Recently proposed verified local robustness techniques provide such a guarantee. This is the first paper that combines the ideas from verified local robustness and dynamic sparse training to develop `SparseVLR'-- a novel framework to search verified locally robust sparse networks. Obtained sparse models exhibit accuracy and robustness comparable to their dense counterparts at sparsity as high as 99%. Furthermore, unlike most conventional sparsification techniques, SparseVLR does not require a pre-trained dense model, reducing the training time by 50%. We exhaustively investigated SparseVLR's efficacy and generalizability by evaluating various benchmark and application-specific datasets across several models.
translated by 谷歌翻译
深度神经网络近似高度复杂功能的能力是其成功的关键。但是,这种好处是以巨大的模型大小为代价的,这挑战了其在资源受限环境中的部署。修剪是一种用于限制此问题的有效技术,但通常以降低准确性和对抗性鲁棒性为代价。本文解决了这些缺点,并引入了Deadwooding,这是一种新型的全球修剪技术,它利用了Lagrangian双重方法来鼓励模型稀疏性,同时保持准确性并确保鲁棒性。所得模型显示出在鲁棒性和准确性度量方面的最先进研究大大优于最先进的模型。
translated by 谷歌翻译
随着深度学习的快速发展,神经网络的大小变得越来越大,以使培训和推理经常压倒硬件资源。鉴于神经网络通常被过度参数化,减少这种计算开销的一种有效方法是神经网络修剪,通过删除训练有素的神经网络中的冗余参数。最近观察到,修剪不仅可以减少计算开销,而且可以改善深神经网络(NNS)的经验鲁棒性,这可能是由于消除了虚假相关性的同时保留了预测精度。本文首次证明,修剪通常可以在完整的验证设置下改善基于RELU的NN的认证鲁棒性。使用流行的分支机构(BAB)框架,我们发现修剪可以通过减轻线性放松和子域分裂问题来增强认证稳健性验证的估计结合紧密度。我们通过现成的修剪方法验证了我们的发现,并进一步提出了一种基于稳定性的新修剪方法,该方法量身定制了用于减少神经元不稳定性的新方法,该方法在增强认证的鲁棒性方面优于现有的修剪方法。我们的实验表明,通过适当修剪NN,在标准培训下,其认证准确性最多可提高8.2%,在CIFAR10数据集中的对抗培训下最多可提高24.5%。我们还观察到存在经过认证的彩票票,这些彩票可以符合不同数据集的原始密集型号的标准和认证的稳健精度。我们的发现提供了一个新的角度来研究稀疏性与鲁棒性之间的有趣相互作用,即通过神经元稳定性解释稀疏性和认证鲁棒性的相互作用。代码可在以下网址提供:https://github.com/vita-group/certifiedpruning。
translated by 谷歌翻译
可认证的鲁棒性是在安全至关重要的情况下采用深层神经网络(DNN)的高度理想的属性,但通常需要建立乏味的计算。主要障碍在于大型DNN中的大量非线性。为了权衡DNN表现力(要求更多的非线性)和鲁棒性认证可伸缩性(更喜欢线性性),我们提出了一种新颖的解决方案来通过“授予”适当的线性水平来策略性地操纵神经元。我们建议的核心是首先将无关紧要的依赖神经元线性化,以消除既有用于DNN性能的多余的非线性组件,又对其认证有害。然后,我们优化替换线性激活的相关斜率和截距,以恢复模型性能,同时保持认证性。因此,典型的神经元修剪可以被视为一种特殊情况,即授予固定零斜率和截距的线性功能,这可能过于限制网络灵活性并牺牲其性能。在多个数据集和网络骨架上进行的广泛实验表明,我们的线性嫁接可以有效地收紧认证界限; (2)在没有认证的鲁棒培训的情况下实现竞争性认证的鲁棒性(即CIFAR-10型号的30%改进); (3)将完整的验证扩展到具有17m参数的大型对抗训练的模型。代码可在https://github.com/vita-group/linearity-grafting上找到。
translated by 谷歌翻译
Many applications require sparse neural networks due to space or inference time restrictions. There is a large body of work on training dense networks to yield sparse networks for inference, but this limits the size of the largest trainable sparse model to that of the largest trainable dense model. In this paper we introduce a method to train sparse neural networks with a fixed parameter count and a fixed computational cost throughout training, without sacrificing accuracy relative to existing dense-tosparse training methods. Our method updates the topology of the sparse network during training by using parameter magnitudes and infrequent gradient calculations. We show that this approach requires fewer floating-point operations (FLOPs) to achieve a given level of accuracy compared to prior techniques. We demonstrate state-of-the-art sparse training results on a variety of networks and datasets, including ResNet-50, MobileNets on Imagenet-2012, and RNNs on WikiText-103. Finally, we provide some insights into why allowing the topology to change during the optimization can overcome local minima encountered when the topology remains static * .
translated by 谷歌翻译
Pruning refers to the elimination of trivial weights from neural networks. The sub-networks within an overparameterized model produced after pruning are often called Lottery tickets. This research aims to generate winning lottery tickets from a set of lottery tickets that can achieve similar accuracy to the original unpruned network. We introduce a novel winning ticket called Cyclic Overlapping Lottery Ticket (COLT) by data splitting and cyclic retraining of the pruned network from scratch. We apply a cyclic pruning algorithm that keeps only the overlapping weights of different pruned models trained on different data segments. Our results demonstrate that COLT can achieve similar accuracies (obtained by the unpruned model) while maintaining high sparsities. We show that the accuracy of COLT is on par with the winning tickets of Lottery Ticket Hypothesis (LTH) and, at times, is better. Moreover, COLTs can be generated using fewer iterations than tickets generated by the popular Iterative Magnitude Pruning (IMP) method. In addition, we also notice COLTs generated on large datasets can be transferred to small ones without compromising performance, demonstrating its generalizing capability. We conduct all our experiments on Cifar-10, Cifar-100 & TinyImageNet datasets and report superior performance than the state-of-the-art methods.
translated by 谷歌翻译
Sparse neural networks attract increasing interest as they exhibit comparable performance to their dense counterparts while being computationally efficient. Pruning the dense neural networks is among the most widely used methods to obtain a sparse neural network. Driven by the high training cost of such methods that can be unaffordable for a low-resource device, training sparse neural networks sparsely from scratch has recently gained attention. However, existing sparse training algorithms suffer from various issues, including poor performance in high sparsity scenarios, computing dense gradient information during training, or pure random topology search. In this paper, inspired by the evolution of the biological brain and the Hebbian learning theory, we present a new sparse training approach that evolves sparse neural networks according to the behavior of neurons in the network. Concretely, by exploiting the cosine similarity metric to measure the importance of the connections, our proposed method, Cosine similarity-based and Random Topology Exploration (CTRE), evolves the topology of sparse neural networks by adding the most important connections to the network without calculating dense gradient in the backward. We carried out different experiments on eight datasets, including tabular, image, and text datasets, and demonstrate that our proposed method outperforms several state-of-the-art sparse training algorithms in extremely sparse neural networks by a large gap. The implementation code is available on https://github.com/zahraatashgahi/CTRE
translated by 谷歌翻译
野外的深度学习(DL)的成功采用需要模型:(1)紧凑,(2)准确,(3)强大的分布换档。不幸的是,同时满足这些要求的努力主要是不成功的。这提出了一个重要问题:无法创建紧凑,准确,强大的深神经网络(卡)基础?为了回答这个问题,我们对流行的模型压缩技术进行了大规模分析,该技术揭示了几种有趣模式。值得注意的是,与传统的修剪方法相比(例如,微调和逐渐修剪),我们发现“彩票式风格”方法令人惊讶地用于生产卡,包括二进制牌。具体而言,我们能够创建极其紧凑的卡,与其较大的对应物相比,具有类似的测试精度和匹配(或更好)的稳健性 - 仅通过修剪和(可选)量化。利用卡的紧凑性,我们开发了一种简单的域 - 自适应测试时间合并方法(卡片 - 甲板),它使用门控模块根据与测试样本的光谱相似性动态地选择相应的卡片。该拟议的方法建立了一个“赢得胜利”的卡片,即在CiFar-10-C精度(即96.8%标准和92.75%的鲁棒)和CiFar-100- C精度(80.6%标准和71.3%的稳健性),内存使用率比非压缩基线(Https://github.com/robustbench/robustbench提供的预制卡和卡片 - 甲板)。最后,我们为我们的理论支持提供了理论支持经验研究结果。
translated by 谷歌翻译
深度神经网络(DNN)的计算要求增加导致获得稀疏,且准确的DNN模型的兴趣。最近的工作已经调查了稀疏训练的更加困难的情况,其中DNN重量尽可能稀少,以减少训练期间的计算成本。现有的稀疏训练方法通常是经验的,并且可以具有相对于致密基线的准确性较低。在本文中,我们介绍了一种称为交替压缩/解压缩(AC / DC)训练DNN的一般方法,证明了算法变体的收敛,并表明AC / DC在类似的计算预算中准确地表现出现有的稀疏训练方法;在高稀疏水平下,AC / DC甚至优于现有的现有方法,依赖于准确的预训练密集模型。 AC / DC的一个重要属性是它允许联合培训密集和稀疏的型号,在训练过程结束时产生精确的稀疏密集模型对。这在实践中是有用的,其中压缩变体可能是为了在资源受限的设置中进行部署而不重新执行整个训练流,并且还为我们提供了深入和压缩模型之间的精度差距的见解。代码可在:https://github.com/ist-daslab/acdc。
translated by 谷歌翻译
近年来,深神经网络(DNN)应用的流行和成功促使对DNN压缩的研究,例如修剪和量化。这些技术加速了模型推断,减少功耗,并降低运行DNN所需的硬件的大小和复杂性,而准确性几乎没有损失。但是,由于DNN容易受到对抗输入的影响,因此重要的是要考虑压缩和对抗性鲁棒性之间的关系。在这项工作中,我们研究了几种不规则修剪方案和8位量化产生的模型的对抗性鲁棒性。此外,尽管常规修剪消除了DNN中最不重要的参数,但我们研究了一种非常规修剪方法的效果:根据对抗输入的梯度去除最重要的模型参数。我们称这种方法称贪婪的对抗修剪(GAP),我们发现这种修剪方法会导致模型可抵抗从其未压缩的对应物转移攻击的模型。
translated by 谷歌翻译
现代深度神经网络往往太大而无法在许多实际情况下使用。神经网络修剪是降低这种模型的大小的重要技术和加速推断。Gibbs修剪是一种表达和设计神经网络修剪方法的新框架。结合统计物理和随机正则化方法的方法,它可以同时培训和修剪网络,使得学习的权重和修剪面膜彼此很好地适应。它可用于结构化或非结构化修剪,我们为每个提出了许多特定方法。我们将拟议的方法与许多当代神经网络修剪方法进行比较,发现Gibbs修剪优于它们。特别是,我们通过CIFAR-10数据集来实现修剪Reset-56的新型最先进的结果。
translated by 谷歌翻译
最近对稀疏神经网络的作品已经证明了独立从头开始训练稀疏子网,以匹配其相应密集网络的性能。然而,识别这种稀疏的子网(获奖票)涉及昂贵的迭代火车 - 培训 - 培训过程(例如,彩票票证假设)或过度扩展的训练时间(例如,动态稀疏训练)。在这项工作中,我们在稀疏神经网络训练和深度合并技术之间汲取了独特的联系,产生了一个名为FreeTickets的新型集合学习框架。 FreeTickets而不是从密集的网络开始,随机初始化稀疏的子网,然后在动态调整其稀疏掩码的同时列举子网,从而在整个训练过程中产生许多不同的稀疏子网。 FreeTickets被定义为这些稀疏子网的集合,在这种单次通过,稀疏稀疏训练中自由获得,其仅使用Vanilla密集培训所需的计算资源的一小部分。此外,尽管是模型的集合,但与单一密集模型相比,FreeTickets的参数和训练拖鞋更少:这种看似反向直观的结果是由于每个子网的高稀疏性。与标准致密基线相比,观察到惯性基因术,以预测准确性,不确定度估计,鲁棒性和效率相比表现出显着的全面改进。 FreeTickets在ImageNet上只使用后者所需的四分之一的培训拖鞋,可以轻松地表达Naive Deep EndleBe。我们的结果提供了对稀疏神经网络的强度的见解,并表明稀疏性的好处超出了通常预期的推理效率。
translated by 谷歌翻译
Network pruning is widely used for reducing the heavy inference cost of deep models in low-resource settings. A typical pruning algorithm is a three-stage pipeline, i.e., training (a large model), pruning and fine-tuning. During pruning, according to a certain criterion, redundant weights are pruned and important weights are kept to best preserve the accuracy. In this work, we make several surprising observations which contradict common beliefs. For all state-of-the-art structured pruning algorithms we examined, fine-tuning a pruned model only gives comparable or worse performance than training that model with randomly initialized weights. For pruning algorithms which assume a predefined target network architecture, one can get rid of the full pipeline and directly train the target network from scratch. Our observations are consistent for multiple network architectures, datasets, and tasks, which imply that: 1) training a large, over-parameterized model is often not necessary to obtain an efficient final model, 2) learned "important" weights of the large model are typically not useful for the small pruned model, 3) the pruned architecture itself, rather than a set of inherited "important" weights, is more crucial to the efficiency in the final model, which suggests that in some cases pruning can be useful as an architecture search paradigm. Our results suggest the need for more careful baseline evaluations in future research on structured pruning methods. We also compare with the "Lottery Ticket Hypothesis" (Frankle & Carbin, 2019), and find that with optimal learning rate, the "winning ticket" initialization as used in Frankle & Carbin (2019) does not bring improvement over random initialization. * Equal contribution. † Work done while visiting UC Berkeley.
translated by 谷歌翻译
网络修剪是一种有效的方法,可以通过可接受的性能妥协降低网络复杂性。现有研究通过耗时的重量调谐或具有扩展宽度的网络的复杂搜索来实现神经网络的稀疏性,这极大地限制了网络修剪的应用。在本文中,我们表明,在没有权重调谐的情况下,高性能和稀疏的子网被称为“彩票奖线”,存在于具有膨胀宽度的预先训练的模型中。例如,我们获得了一个只有10%参数的彩票奖金,仍然达到了原始密度Vggnet-19的性能,而无需对CiFar-10的预先训练的重量进行任何修改。此外,我们观察到,来自许多现有修剪标准的稀疏面具与我们的彩票累积的搜索掩码具有高重叠,其中,基于幅度的修剪导致与我们的最相似的掩模。根据这种洞察力,我们使用基于幅度的修剪初始化我们的稀疏掩模,导致彩票累积搜索至少3倍降低,同时实现了可比或更好的性能。具体而言,我们的幅度基彩票奖学金在Reset-50中除去90%的重量,而在ImageNet上仅使用10个搜索时期可以轻松获得超过70%的前1个精度。我们的代码可在https://github.com/zyxxmu/lottery-jackpots获得。
translated by 谷歌翻译
Recent works on Lottery Ticket Hypothesis have shown that pre-trained language models (PLMs) contain smaller matching subnetworks(winning tickets) which are capable of reaching accuracy comparable to the original models. However, these tickets are proved to be notrobust to adversarial examples, and even worse than their PLM counterparts. To address this problem, we propose a novel method based on learning binary weight masks to identify robust tickets hidden in the original PLMs. Since the loss is not differentiable for the binary mask, we assign the hard concrete distribution to the masks and encourage their sparsity using a smoothing approximation of L0 regularization.Furthermore, we design an adversarial loss objective to guide the search for robust tickets and ensure that the tickets perform well bothin accuracy and robustness. Experimental results show the significant improvement of the proposed method over previous work on adversarial robustness evaluation.
translated by 谷歌翻译
背景信息:在过去几年中,机器学习(ML)一直是许多创新的核心。然而,包括在所谓的“安全关键”系统中,例如汽车或航空的系统已经被证明是非常具有挑战性的,因为ML的范式转变为ML带来完全改变传统认证方法。目的:本文旨在阐明与ML为基础的安全关键系统认证有关的挑战,以及文献中提出的解决方案,以解决它们,回答问题的问题如何证明基于机器学习的安全关键系统?'方法:我们开展2015年至2020年至2020年之间发布的研究论文的系统文献综述(SLR),涵盖了与ML系统认证有关的主题。总共确定了217篇论文涵盖了主题,被认为是ML认证的主要支柱:鲁棒性,不确定性,解释性,验证,安全强化学习和直接认证。我们分析了每个子场的主要趋势和问题,并提取了提取的论文的总结。结果:单反结果突出了社区对该主题的热情,以及在数据集和模型类型方面缺乏多样性。它还强调需要进一步发展学术界和行业之间的联系,以加深域名研究。最后,它还说明了必须在上面提到的主要支柱之间建立连接的必要性,这些主要柱主要主要研究。结论:我们强调了目前部署的努力,以实现ML基于ML的软件系统,并讨论了一些未来的研究方向。
translated by 谷歌翻译
While machine learning is traditionally a resource intensive task, embedded systems, autonomous navigation, and the vision of the Internet of Things fuel the interest in resource-efficient approaches. These approaches aim for a carefully chosen trade-off between performance and resource consumption in terms of computation and energy. The development of such approaches is among the major challenges in current machine learning research and key to ensure a smooth transition of machine learning technology from a scientific environment with virtually unlimited computing resources into everyday's applications. In this article, we provide an overview of the current state of the art of machine learning techniques facilitating these real-world requirements. In particular, we focus on deep neural networks (DNNs), the predominant machine learning models of the past decade. We give a comprehensive overview of the vast literature that can be mainly split into three non-mutually exclusive categories: (i) quantized neural networks, (ii) network pruning, and (iii) structural efficiency. These techniques can be applied during training or as post-processing, and they are widely used to reduce the computational demands in terms of memory footprint, inference speed, and energy efficiency. We also briefly discuss different concepts of embedded hardware for DNNs and their compatibility with machine learning techniques as well as potential for energy and latency reduction. We substantiate our discussion with experiments on well-known benchmark datasets using compression techniques (quantization, pruning) for a set of resource-constrained embedded systems, such as CPUs, GPUs and FPGAs. The obtained results highlight the difficulty of finding good trade-offs between resource efficiency and predictive performance.
translated by 谷歌翻译
The deployment of deep convolutional neural networks (CNNs) in many real world applications is largely hindered by their high computational cost. In this paper, we propose a novel learning scheme for CNNs to simultaneously 1) reduce the model size; 2) decrease the run-time memory footprint; and 3) lower the number of computing operations, without compromising accuracy. This is achieved by enforcing channel-level sparsity in the network in a simple but effective way. Different from many existing approaches, the proposed method directly applies to modern CNN architectures, introduces minimum overhead to the training process, and requires no special software/hardware accelerators for the resulting models. We call our approach network slimming, which takes wide and large networks as input models, but during training insignificant channels are automatically identified and pruned afterwards, yielding thin and compact models with comparable accuracy. We empirically demonstrate the effectiveness of our approach with several state-of-the-art CNN models, including VGGNet, ResNet and DenseNet, on various image classification datasets. For VGGNet, a multi-pass version of network slimming gives a 20× reduction in model size and a 5× reduction in computing operations.
translated by 谷歌翻译
人们通常认为,修剪网络不仅会降低深网的计算成本,而且还可以通过降低模型容量来防止过度拟合。但是,我们的工作令人惊讶地发现,网络修剪有时甚至会加剧过度拟合。我们报告了出乎意料的稀疏双后裔现象,随着我们通过网络修剪增加模型稀疏性,首先测试性能变得更糟(由于过度拟合),然后变得更好(由于过度舒适),并且终于变得更糟(由于忘记了有用的有用信息)。尽管最近的研究集中在模型过度参数化方面,但他们未能意识到稀疏性也可能导致双重下降。在本文中,我们有三个主要贡献。首先,我们通过广泛的实验报告了新型的稀疏双重下降现象。其次,对于这种现象,我们提出了一种新颖的学习距离解释,即$ \ ell_ {2} $稀疏模型的学习距离(从初始化参数到最终参数)可能与稀疏的双重下降曲线良好相关,并更好地反映概括比最小平坦。第三,在稀疏的双重下降的背景下,彩票票假设中的获胜票令人惊讶地并不总是赢。
translated by 谷歌翻译
Pruning large neural networks while maintaining their performance is often desirable due to the reduced space and time complexity. In existing methods, pruning is done within an iterative optimization procedure with either heuristically designed pruning schedules or additional hyperparameters, undermining their utility. In this work, we present a new approach that prunes a given network once at initialization prior to training. To achieve this, we introduce a saliency criterion based on connection sensitivity that identifies structurally important connections in the network for the given task. This eliminates the need for both pretraining and the complex pruning schedule while making it robust to architecture variations. After pruning, the sparse network is trained in the standard way. Our method obtains extremely sparse networks with virtually the same accuracy as the reference network on the MNIST, CIFAR-10, and Tiny-ImageNet classification tasks and is broadly applicable to various architectures including convolutional, residual and recurrent networks. Unlike existing methods, our approach enables us to demonstrate that the retained connections are indeed relevant to the given task.
translated by 谷歌翻译