本文研究了静态稀疏对训练有素网络对扰动,数据腐败和对抗性示例的鲁棒性的影响。我们表明,通过增加网络宽度和深度,同时保持网络容量固定,稀疏网络始终匹配,并且通常优于其最初密集的版本,从而达到了一定的稀疏性。由于网络层之间的连通性松动而导致非常高的稀疏性同时下降。我们的发现表明,文献中观察到的网络压缩引起的快速鲁棒性下降是由于网络容量降低而不是稀疏性。
translated by 谷歌翻译
在本文中,我们推测,如果考虑到神经网络的置换不变性,SGD解决方案可能不会在它们之间的线性插值中没有障碍。尽管这是一个大胆的猜想,但我们展示了广泛的经验尝试却没有反驳。我们进一步提供了初步的理论结果来支持我们的猜想。我们的猜想对彩票票证假设,分布式培训和合奏方法有影响。
translated by 谷歌翻译
野外的深度学习(DL)的成功采用需要模型:(1)紧凑,(2)准确,(3)强大的分布换档。不幸的是,同时满足这些要求的努力主要是不成功的。这提出了一个重要问题:无法创建紧凑,准确,强大的深神经网络(卡)基础?为了回答这个问题,我们对流行的模型压缩技术进行了大规模分析,该技术揭示了几种有趣模式。值得注意的是,与传统的修剪方法相比(例如,微调和逐渐修剪),我们发现“彩票式风格”方法令人惊讶地用于生产卡,包括二进制牌。具体而言,我们能够创建极其紧凑的卡,与其较大的对应物相比,具有类似的测试精度和匹配(或更好)的稳健性 - 仅通过修剪和(可选)量化。利用卡的紧凑性,我们开发了一种简单的域 - 自适应测试时间合并方法(卡片 - 甲板),它使用门控模块根据与测试样本的光谱相似性动态地选择相应的卡片。该拟议的方法建立了一个“赢得胜利”的卡片,即在CiFar-10-C精度(即96.8%标准和92.75%的鲁棒)和CiFar-100- C精度(80.6%标准和71.3%的稳健性),内存使用率比非压缩基线(Https://github.com/robustbench/robustbench提供的预制卡和卡片 - 甲板)。最后,我们为我们的理论支持提供了理论支持经验研究结果。
translated by 谷歌翻译
深度神经网络近似高度复杂功能的能力是其成功的关键。但是,这种好处是以巨大的模型大小为代价的,这挑战了其在资源受限环境中的部署。修剪是一种用于限制此问题的有效技术,但通常以降低准确性和对抗性鲁棒性为代价。本文解决了这些缺点,并引入了Deadwooding,这是一种新型的全球修剪技术,它利用了Lagrangian双重方法来鼓励模型稀疏性,同时保持准确性并确保鲁棒性。所得模型显示出在鲁棒性和准确性度量方面的最先进研究大大优于最先进的模型。
translated by 谷歌翻译
Neural network pruning techniques can reduce the parameter counts of trained networks by over 90%, decreasing storage requirements and improving computational performance of inference without compromising accuracy. However, contemporary experience is that the sparse architectures produced by pruning are difficult to train from the start, which would similarly improve training performance.We find that a standard pruning technique naturally uncovers subnetworks whose initializations made them capable of training effectively. Based on these results, we articulate the lottery ticket hypothesis: dense, randomly-initialized, feed-forward networks contain subnetworks (winning tickets) that-when trained in isolationreach test accuracy comparable to the original network in a similar number of iterations. The winning tickets we find have won the initialization lottery: their connections have initial weights that make training particularly effective.We present an algorithm to identify winning tickets and a series of experiments that support the lottery ticket hypothesis and the importance of these fortuitous initializations. We consistently find winning tickets that are less than 10-20% of the size of several fully-connected and convolutional feed-forward architectures for MNIST and CIFAR10. Above this size, the winning tickets that we find learn faster than the original network and reach higher test accuracy.
translated by 谷歌翻译
Despite the success of convolutional neural networks (CNNs) in many academic benchmarks for computer vision tasks, their application in the real-world is still facing fundamental challenges. One of these open problems is the inherent lack of robustness, unveiled by the striking effectiveness of adversarial attacks. Current attack methods are able to manipulate the network's prediction by adding specific but small amounts of noise to the input. In turn, adversarial training (AT) aims to achieve robustness against such attacks and ideally a better model generalization ability by including adversarial samples in the trainingset. However, an in-depth analysis of the resulting robust models beyond adversarial robustness is still pending. In this paper, we empirically analyze a variety of adversarially trained models that achieve high robust accuracies when facing state-of-the-art attacks and we show that AT has an interesting side-effect: it leads to models that are significantly less overconfident with their decisions, even on clean data than non-robust models. Further, our analysis of robust models shows that not only AT but also the model's building blocks (like activation functions and pooling) have a strong influence on the models' prediction confidences. Data & Project website: https://github.com/GeJulia/robustness_confidences_evaluation
translated by 谷歌翻译
现代神经网络Excel在图像分类中,但它们仍然容易受到常见图像损坏,如模糊,斑点噪音或雾。最近的方法关注这个问题,例如Augmix和Deepaulment,引入了在预期运行的防御,以期望图像损坏分布。相比之下,$ \ ell_p $ -norm界限扰动的文献侧重于针对最坏情况损坏的防御。在这项工作中,我们通过提出防范内人来调和两种方法,这是一种优化图像到图像模型的参数来产生对外损坏的增强图像的技术。我们理论上激发了我们的方法,并为其理想化版本的一致性以及大纲领提供了足够的条件。我们的分类机器在预期对CiFar-10-C进行的常见图像腐败基准上提高了最先进的,并改善了CIFAR-10和ImageNet上的$ \ ell_p $ -norm有界扰动的最坏情况性能。
translated by 谷歌翻译
对抗性训练(AT)及其变体在过去几年来改善对对抗性扰动和常见腐败的神经网络的鲁棒性方面取得了长足的进步。 AT及其变体的算法设计集中在指定的扰动强度$ \ epsilon $上,并且仅利用该$ \ epsilon $ -Robust模型的性能的反馈来改善算法。在这项工作中,我们专注于在$ \ epsilon $值的频谱上训练的模型。我们分析了三个观点:模型性能,中间特征精度和卷积滤波器灵敏度。在每种情况下,我们都会确定AT的替代改进,否则在单个$ \ epsilon $中并不明显。具体来说,我们发现,对于以某种强度$ \ delta $的pgd攻击,有一个型号以某种稍大的强度$ \ epsilon $,但没有更大的范围,可以概括它。因此,我们建议过度设计鲁棒性,我们建议以$ \ epsilon $略高于$ \ delta $的培训模型。其次,我们观察到(在各种$ \ epsilon $值中),鲁棒性对中间特征的精度,尤其是在第一层和第二层之后的精度高度敏感。因此,我们建议在防御措施中添加简单的量化,以提高可见和看不见的适应性攻击的准确性。第三,我们分析了增加$ \ epsilon $的每一层模型的卷积过滤器,并注意到第一和第二层的卷积过滤器可能完全负责放大输入扰动。我们通过在CIFAR-10和CIFAR-10-C数据集上使用Resnet和WideSnet模型进行实验,介绍我们的发现并证明我们的技术。
translated by 谷歌翻译
Many applications require sparse neural networks due to space or inference time restrictions. There is a large body of work on training dense networks to yield sparse networks for inference, but this limits the size of the largest trainable sparse model to that of the largest trainable dense model. In this paper we introduce a method to train sparse neural networks with a fixed parameter count and a fixed computational cost throughout training, without sacrificing accuracy relative to existing dense-tosparse training methods. Our method updates the topology of the sparse network during training by using parameter magnitudes and infrequent gradient calculations. We show that this approach requires fewer floating-point operations (FLOPs) to achieve a given level of accuracy compared to prior techniques. We demonstrate state-of-the-art sparse training results on a variety of networks and datasets, including ResNet-50, MobileNets on Imagenet-2012, and RNNs on WikiText-103. Finally, we provide some insights into why allowing the topology to change during the optimization can overcome local minima encountered when the topology remains static * .
translated by 谷歌翻译
最近对稀疏神经网络的作品已经证明了独立从头开始训练稀疏子网,以匹配其相应密集网络的性能。然而,识别这种稀疏的子网(获奖票)涉及昂贵的迭代火车 - 培训 - 培训过程(例如,彩票票证假设)或过度扩展的训练时间(例如,动态稀疏训练)。在这项工作中,我们在稀疏神经网络训练和深度合并技术之间汲取了独特的联系,产生了一个名为FreeTickets的新型集合学习框架。 FreeTickets而不是从密集的网络开始,随机初始化稀疏的子网,然后在动态调整其稀疏掩码的同时列举子网,从而在整个训练过程中产生许多不同的稀疏子网。 FreeTickets被定义为这些稀疏子网的集合,在这种单次通过,稀疏稀疏训练中自由获得,其仅使用Vanilla密集培训所需的计算资源的一小部分。此外,尽管是模型的集合,但与单一密集模型相比,FreeTickets的参数和训练拖鞋更少:这种看似反向直观的结果是由于每个子网的高稀疏性。与标准致密基线相比,观察到惯性基因术,以预测准确性,不确定度估计,鲁棒性和效率相比表现出显着的全面改进。 FreeTickets在ImageNet上只使用后者所需的四分之一的培训拖鞋,可以轻松地表达Naive Deep EndleBe。我们的结果提供了对稀疏神经网络的强度的见解,并表明稀疏性的好处超出了通常预期的推理效率。
translated by 谷歌翻译
人们通常认为,修剪网络不仅会降低深网的计算成本,而且还可以通过降低模型容量来防止过度拟合。但是,我们的工作令人惊讶地发现,网络修剪有时甚至会加剧过度拟合。我们报告了出乎意料的稀疏双后裔现象,随着我们通过网络修剪增加模型稀疏性,首先测试性能变得更糟(由于过度拟合),然后变得更好(由于过度舒适),并且终于变得更糟(由于忘记了有用的有用信息)。尽管最近的研究集中在模型过度参数化方面,但他们未能意识到稀疏性也可能导致双重下降。在本文中,我们有三个主要贡献。首先,我们通过广泛的实验报告了新型的稀疏双重下降现象。其次,对于这种现象,我们提出了一种新颖的学习距离解释,即$ \ ell_ {2} $稀疏模型的学习距离(从初始化参数到最终参数)可能与稀疏的双重下降曲线良好相关,并更好地反映概括比最小平坦。第三,在稀疏的双重下降的背景下,彩票票假设中的获胜票令人惊讶地并不总是赢。
translated by 谷歌翻译
对共同腐败的稳健性的文献表明对逆势培训是否可以提高这种环境的性能,没有达成共识。 First, we show that, when used with an appropriately selected perturbation radius, $\ell_p$ adversarial training can serve as a strong baseline against common corruptions improving both accuracy and calibration.然后,我们解释了为什么对抗性训练比具有简单高斯噪声的数据增强更好地表现,这被观察到是对共同腐败的有意义的基线。与此相关,我们确定了高斯增强过度适用于用于培训的特定标准偏差的$ \ sigma $ -oviting现象,这对培训具有显着不利影响的普通腐败精度。我们讨论如何缓解这一问题,然后如何通过学习的感知图像贴片相似度引入对抗性训练的有效放松来进一步增强$ \ ell_p $普发的培训。通过对CiFar-10和Imagenet-100的实验,我们表明我们的方法不仅改善了$ \ ell_p $普发的培训基线,而且还有累积的收益与Augmix,Deepaulment,Ant和Sin等数据增强方法,导致普通腐败的最先进的表现。我们的实验代码在HTTPS://github.com/tml-epfl/adv-training - 窗子上公开使用。
translated by 谷歌翻译
积极的数据增强是视觉变压器(VIT)的强大泛化能力的关键组成部分。一种这样的数据增强技术是对抗性培训;然而,许多先前的作品表明,这通常会导致清洁的准确性差。在这项工作中,我们展示了金字塔对抗训练,这是一种简单有效的技术来提高韦维尔的整体性能。我们将其与“匹配”辍学和随机深度正则化配对,这采用了干净和对抗样品的相同辍学和随机深度配置。类似于Advprop的CNNS的改进(不直接适用于VIT),我们的金字塔对抗性训练会破坏分销准确性和vit和相关架构的分配鲁棒性之间的权衡。当Imagenet-1K数据训练时,它导致ImageNet清洁准确性的182美元的vit-B模型的精确度,同时由7美元的稳健性指标同时提高性能,从$ 1.76 \%$至11.45 \%$。我们为Imagenet-C(41.4 MCE),Imagenet-R($ 53.92 \%$),以及Imagenet-Sketch(41.04美元\%$)的新的最先进,只使用vit-b / 16骨干和我们的金字塔对抗训练。我们的代码将在接受时公开提供。
translated by 谷歌翻译
已知深神经网络(DNN)容易受到对抗性攻击的影响。已经提出了一系列防御方法来培训普遍稳健的DNN,其中对抗性培训已经证明了有希望的结果。然而,尽管对对抗性培训开发的初步理解,但从架构角度来看,它仍然不明确,从架构角度来看,什么配置可以导致更强大的DNN。在本文中,我们通过全面调查网络宽度和深度对前对方培训的DNN的鲁棒性的全面调查来解决这一差距。具体地,我们进行以下关键观察:1)更多参数(更高的模型容量)不一定有助于对抗冒险; 2)网络的最后阶段(最后一组块)降低能力实际上可以改善对抗性的鲁棒性; 3)在相同的参数预算下,存在对抗性鲁棒性的最佳架构配置。我们还提供了一个理论分析,解释了为什么这种网络配置可以帮助鲁棒性。这些架构见解可以帮助设计对抗的强制性DNN。代码可用于\ url {https://github.com/hanxunh/robustwrn}。
translated by 谷歌翻译
已知深神经网络(DNN)容易受到对抗性攻击的影响,即对输入的不可察觉的扰动可以误导DNN在清洁图像上培训,以制造错误的预测。为了解决这一目标,对抗性训练是目前最有效的防御方法,通过增强速度设定的训练,在飞行中产生的对抗样本。有趣的是,我们首次发现,在随机初始化的网络中,在没有任何模型训练的随机初始化网络中,第一次发现具有天生稳健性,匹配或超越对抗训练网络的强大准确性的鲁棒准确性,表明对模型权重的对抗训练不是对抗性鲁棒性不可或缺。我们命名为强大的临时票故障票(RST),也是自然效率的那种。不同于流行的彩票假设,既不需要培训原始密集的网络也不需要训练。为了验证和理解这种迷人的发现,我们进一步开展了广泛的实验,以研究不同模型,数据集,稀疏模式和攻击下RST的存在性和性质,绘制关于DNNS鲁棒性与其初始化/过度分辨率之间的关系的洞察。此外,我们确定从同一随机初始化的密集网络绘制的不同稀疏比率的RST之间的差的对抗性转移性,并提出了一种随机切换不同RST之间的随机切换的随机性,作为基于顶部的新型防御方法第一次。我们相信我们对RST的调查结果已经开辟了一个新的视角,以研究模型稳健性并扩大彩票假设。
translated by 谷歌翻译
Adversarial examples are commonly viewed as a threat to ConvNets. Here we present an opposite perspective: adversarial examples can be used to improve image recognition models if harnessed in the right manner. We propose AdvProp, an enhanced adversarial training scheme which treats adversarial examples as additional examples, to prevent overfitting. Key to our method is the usage of a separate auxiliary batch norm for adversarial examples, as they have different underlying distributions to normal examples.We show that AdvProp improves a wide range of models on various image recognition tasks and performs better when the models are bigger. For instance, by applying AdvProp to the latest EfficientNet-B7 [41] on ImageNet, we achieve significant improvements on ImageNet (+0.7%), ImageNet-C (+6.5%), ImageNet-A (+7.0%) and Stylized-ImageNet (+4.8%). With an enhanced EfficientNet-B8, our method achieves the state-of-the-art 85.5% ImageNet top-1 accuracy without extra data. This result even surpasses the best model in [24] which is trained with 3.5B Instagram images (∼3000× more than ImageNet) and ∼9.4× more parameters. Models are available at https://github.com/tensorflow/tpu/tree/ master/models/official/efficientnet.
translated by 谷歌翻译
Pruning large neural networks to create highquality, independently trainable sparse masks, which can maintain similar performance to their dense counterparts, is very desirable due to the reduced space and time complexity. As research effort is focused on increasingly sophisticated pruning methods that leads to sparse subnetworks trainable from the scratch, we argue for an orthogonal, under-explored theme: improving training techniques for pruned sub-networks, i.e. sparse training. Apart from the popular belief that only the quality of sparse masks matters for sparse training, in this paper we demonstrate an alternative opportunity: one can carefully customize the sparse training techniques to deviate from the default dense network training protocols, consisting of introducing "ghost" neurons and skip connections at the early stage of training, and strategically modifying the initialization as well as labels. Our new sparse training recipe is generally applicable to improving training from scratch with various sparse masks. By adopting our newly curated techniques, we demonstrate significant performance gains across various popular datasets (CIFAR-10, CIFAR-100, TinyIma-geNet), architectures (ResNet-18/32/104, Vgg16, MobileNet), and sparse mask options (lottery ticket, SNIP/GRASP, SynFlow, or even randomly pruning), compared to the default training protocols, especially at high sparsity levels. Code is at https://github.com/VITA-Group/ToST.
translated by 谷歌翻译
最近的自我监督方法在学习特征表示中取得了成功,这些特征表示可以与完全监督竞争,并且已被证明以几种方式有利于模型:例如改善模型的鲁棒性和分布外检测。在我们的论文中,我们进行了一个实证研究,以更准确地了解自我监督的学习 - 作为训练技术或反对派训练的一部分 - 影响模型鲁棒性至$ l_2 $和$ l _ {\ infty} $对抗扰动和自然形象腐败。自我监督确实可以改善模型稳健性,但事实证明魔鬼是细节。如果只有对逆势训练的串联增加自我监督损失,那么当用更小或与$ \ epsilon_ {rest} $的价值进行对抗的对手扰动评估时,可以看到模型的准确性提高。但是,如果一个人观察到$ \ epsilon_ {test} \ ge \ epsilon_ {train} $的准确性,则模型精度下降。事实上,监督损失的重量越大,性能下降越大,即损害模型的鲁棒性。我们确定自我监督可以添加到对抗的主要方式,并观察使用自我监督损失来优化网络参数,发现对抗性示例导致模型稳健性最强的改善,因为这可以被视为合奏对抗培训的形式。尽管与随机重量初始化相比,自我监督的预训练产生益处改善对抗性培训,但如果在对抗培训中,我们将在模型鲁棒性或准确性中观察到模型鲁棒性或准确性。
translated by 谷歌翻译
We study whether a neural network optimizes to the same, linearly connected minimum under different samples of SGD noise (e.g., random data order and augmentation). We find that standard vision models become stable to SGD noise in this way early in training. From then on, the outcome of optimization is determined to a linearly connected region. We use this technique to study iterative magnitude pruning (IMP), the procedure used by work on the lottery ticket hypothesis to identify subnetworks that could have trained in isolation to full accuracy. We find that these subnetworks only reach full accuracy when they are stable to SGD noise, which either occurs at initialization for small-scale settings (MNIST) or early in training for large-scale settings (ResNet-50 and Inception-v3 on ImageNet).
translated by 谷歌翻译
It is common practice in deep learning to use overparameterized networks and train for as long as possible; there are numerous studies that show, both theoretically and empirically, that such practices surprisingly do not unduly harm the generalization performance of the classifier. In this paper, we empirically study this phenomenon in the setting of adversarially trained deep networks, which are trained to minimize the loss under worst-case adversarial perturbations. We find that overfitting to the training set does in fact harm robust performance to a very large degree in adversarially robust training across multiple datasets (SVHN, CIFAR-10, CIFAR-100, and ImageNet) and perturbation models ( ∞ and 2 ). Based upon this observed effect, we show that the performance gains of virtually all recent algorithmic improvements upon adversarial training can be matched by simply using early stopping. We also show that effects such as the double descent curve do still occur in adversarially trained models, yet fail to explain the observed overfitting. Finally, we study several classical and modern deep learning remedies for overfitting, including regularization and data augmentation, and find that no approach in isolation improves significantly upon the gains achieved by early stopping. All code for reproducing the experiments as well as pretrained model weights and training logs can be found at https://github.com/ locuslab/robust_overfitting.
translated by 谷歌翻译