发言人识别系统(SRSS)最近被证明容易受到对抗攻击的影响,从而引发了重大的安全问题。在这项工作中,我们系统地研究了基于确保SRSS的基于对抗性训练的防御。根据SRSS的特征,我们提出了22种不同的转换,并使用扬声器识别的7种最新有前途的对抗攻击(4个白盒和3个Black-Box)对其进行了彻底评估。仔细考虑了国防评估中的最佳实践,我们分析了转换的强度以承受适应性攻击。我们还评估并理解它们与对抗训练相结合的自适应攻击的有效性。我们的研究提供了许多有用的见解和发现,其中许多与图像和语音识别域中的结论是新的或不一致的,例如,可变和恒定的比特率语音压缩具有不同的性能,并且某些不可差的转换仍然有效地抗衡。当前有希望的逃避技术通常在图像域中很好地工作。我们证明,与完整的白色盒子设置中的唯一对抗性训练相比,提出的新型功能级转换与对抗训练相比是相当有效的,例如,将准确性提高了13.62%,而攻击成本则达到了两个数量级,而其他攻击成本则增加了。转型不一定会提高整体防御能力。这项工作进一步阐明了该领域的研究方向。我们还发布了我们的评估平台SpeakerGuard,以促进进一步的研究。
translated by 谷歌翻译
最近的工作阐明了说话者识别系统(SRSS)针对对抗性攻击的脆弱性,从而在部署SRSS时引起了严重的安全问题。但是,他们仅考虑了一些设置(例如,来源和目标扬声器的某些组合),仅在现实世界攻击方案中留下了许多有趣而重要的环境。在这项工作中,我们介绍了AS2T,这是该域中的第一次攻击,该域涵盖了所有设置,因此,对手可以使用任意源和目标扬声器来制作对抗性声音,并执行三个主要识别任务中的任何一种。由于现有的损失功能都不能应用于所有设置,因此我们探索了每种设置的许多候选损失功能,包括现有和新设计的损失功能。我们彻底评估了它们的功效,并发现某些现有的损失功能是次优的。然后,为了提高AS2T对实用的无线攻击的鲁棒性,我们研究了可能发生的扭曲发生在空中传输中,利用具有不同参数的不同转换功能来对这些扭曲进行建模,并将其整合到生成中对手的声音。我们的模拟无线评估验证了解决方案在产生强大的对抗声音方面的有效性,这些声音在各种硬件设备和各种声音环境下保持有效,具有不同的混响,环境噪声和噪声水平。最后,我们利用AS2T来执行迄今为止最大的评估,以了解14个不同SRSS之间的可转移性。可传递性分析提供了许多有趣且有用的见解,这些见解挑战了图像域中先前作品中得出的几个发现和结论。我们的研究还阐明了说话者识别域中对抗攻击的未来方向。
translated by 谷歌翻译
Adaptive attacks have (rightfully) become the de facto standard for evaluating defenses to adversarial examples. We find, however, that typical adaptive evaluations are incomplete. We demonstrate that thirteen defenses recently published at ICLR, ICML and NeurIPS-and which illustrate a diverse set of defense strategies-can be circumvented despite attempting to perform evaluations using adaptive attacks. While prior evaluation papers focused mainly on the end result-showing that a defense was ineffective-this paper focuses on laying out the methodology and the approach necessary to perform an adaptive attack. Some of our attack strategies are generalizable, but no single strategy would have been sufficient for all defenses. This underlines our key message that adaptive attacks cannot be automated and always require careful and appropriate tuning to a given defense. We hope that these analyses will serve as guidance on how to properly perform adaptive attacks against defenses to adversarial examples, and thus will allow the community to make further progress in building more robust models.
translated by 谷歌翻译
对抗商业黑匣子语音平台的对抗攻击,包括云语音API和语音控制设备,直到近年来接受了很少的关注。目前的“黑匣子”攻击所有严重依赖于预测/置信度评分的知识,以加工有效的对抗示例,这可以通过服务提供商直观地捍卫,而不返回这些消息。在本文中,我们提出了在更实用和严格的情况下提出了两种新的对抗攻击。对于商业云演讲API,我们提出了一个决定的黑匣子逆势攻击,这些攻击是唯一的最终决定。在偶变中,我们将决策的AE发电作为一个不连续的大规模全局优化问题,并通过自适应地将该复杂问题自适应地分解成一组子问题并协同优化每个问题来解决它。我们的春天是一种齐全的所有方法,它在一个广泛的流行语音和扬声器识别API,包括谷歌,阿里巴巴,微软,腾讯,达到100%的攻击攻击速度100%的攻击率。 iflytek,和景东,表现出最先进的黑箱攻击。对于商业语音控制设备,我们提出了Ni-Occam,第一个非交互式物理对手攻击,而对手不需要查询Oracle并且无法访问其内部信息和培训数据。我们将对抗性攻击与模型反演攻击相结合,从而产生具有高可转换性的物理有效的音频AE,而无需与目标设备的任何交互。我们的实验结果表明,NI-Occam可以成功欺骗苹果Siri,Microsoft Cortana,Google Assistant,Iflytek和Amazon Echo,平均SRO为52%和SNR为9.65dB,对抗语音控制设备的非交互式物理攻击。
translated by 谷歌翻译
Although deep neural networks (DNNs) have achieved great success in many tasks, they can often be fooled by adversarial examples that are generated by adding small but purposeful distortions to natural examples. Previous studies to defend against adversarial examples mostly focused on refining the DNN models, but have either shown limited success or required expensive computation. We propose a new strategy, feature squeezing, that can be used to harden DNN models by detecting adversarial examples. Feature squeezing reduces the search space available to an adversary by coalescing samples that correspond to many different feature vectors in the original space into a single sample. By comparing a DNN model's prediction on the original input with that on squeezed inputs, feature squeezing detects adversarial examples with high accuracy and few false positives.This paper explores two feature squeezing methods: reducing the color bit depth of each pixel and spatial smoothing. These simple strategies are inexpensive and complementary to other defenses, and can be combined in a joint detection framework to achieve high detection rates against state-of-the-art attacks.
translated by 谷歌翻译
深度学习技术的发展极大地促进了自动语音识别(ASR)技术的性能提高,该技术证明了在许多任务中与人类听力相当的能力。语音接口正变得越来越广泛地用作许多应用程序和智能设备的输入。但是,现有的研究表明,DNN很容易受到轻微干扰的干扰,并且会出现错误的识别,这对于由声音控制的智能语音应用非常危险。
translated by 谷歌翻译
We identify obfuscated gradients, a kind of gradient masking, as a phenomenon that leads to a false sense of security in defenses against adversarial examples. While defenses that cause obfuscated gradients appear to defeat iterative optimizationbased attacks, we find defenses relying on this effect can be circumvented. We describe characteristic behaviors of defenses exhibiting the effect, and for each of the three types of obfuscated gradients we discover, we develop attack techniques to overcome it. In a case study, examining noncertified white-box-secure defenses at ICLR 2018, we find obfuscated gradients are a common occurrence, with 7 of 9 defenses relying on obfuscated gradients. Our new attacks successfully circumvent 6 completely, and 1 partially, in the original threat model each paper considers.
translated by 谷歌翻译
随着硬件和算法的开发,ASR(自动语音识别)系统发展了很多。随着模型变得越来越简单,开发和部署的困难变得更加容易,ASR系统正越来越接近我们的生活。一方面,我们经常使用ASR的应用程序或API来生成字幕和记录会议。另一方面,智能扬声器和自动驾驶汽车依靠ASR系统来控制Aiot设备。在过去的几年中,对ASR系统的攻击攻击有很多作品。通过在波形中添加小的扰动,识别结果有很大的不同。在本文中,我们描述了ASR系统的发展,攻击的不同假设以及如何评估这些攻击。接下来,我们在两个攻击假设中介绍了有关对抗性示例攻击的当前作品:白框攻击和黑框攻击。与其他调查不同,我们更多地关注它们在ASR系统中扰动波形,这些攻击之间的关系及其实现方法之间的层。我们专注于他们作品的效果。
translated by 谷歌翻译
The authors thank Nicholas Carlini (UC Berkeley) and Dimitris Tsipras (MIT) for feedback to improve the survey quality. We also acknowledge X. Huang (Uni. Liverpool), K. R. Reddy (IISC), E. Valle (UNICAMP), Y. Yoo (CLAIR) and others for providing pointers to make the survey more comprehensive.
translated by 谷歌翻译
This paper investigates recently proposed approaches for defending against adversarial examples and evaluating adversarial robustness. We motivate adversarial risk as an objective for achieving models robust to worst-case inputs. We then frame commonly used attacks and evaluation metrics as defining a tractable surrogate objective to the true adversarial risk. This suggests that models may optimize this surrogate rather than the true adversarial risk. We formalize this notion as obscurity to an adversary, and develop tools and heuristics for identifying obscured models and designing transparent models. We demonstrate that this is a significant problem in practice by repurposing gradient-free optimization techniques into adversarial attacks, which we use to decrease the accuracy of several recently proposed defenses to near zero. Our hope is that our formulations and results will help researchers to develop more powerful defenses.
translated by 谷歌翻译
神经网络对攻击的缺乏鲁棒性引起了对安全敏感环境(例如自动驾驶汽车)的担忧。虽然许多对策看起来可能很有希望,但只有少数能够承受严格的评估。使用随机变换(RT)的防御能力显示出令人印象深刻的结果,尤其是Imagenet上的Bart(Raff等,2019)。但是,这种防御尚未经过严格评估,使其稳健性的理解不足。它们的随机特性使评估更具挑战性,并使对确定性模型的许多拟议攻击不可应用。首先,我们表明BART评估中使用的BPDA攻击(Athalye等,2018a)无效,可能高估了其稳健性。然后,我们尝试通过明智的转换和贝叶斯优化来调整其参数来构建最强的RT防御。此外,我们创造了最强烈的攻击来评估我们的RT防御。我们的新攻击极大地胜过基线,与常用的EOT攻击减少19%相比,将准确性降低了83%($ 4.3 \ times $改善)。我们的结果表明,在Imagenette数据集上的RT防御(ImageNet的十级子集)在对抗性示例上并不强大。进一步扩展研究,我们使用新的攻击来对抗RT防御(称为Advrt),从而获得了巨大的稳健性增长。代码可从https://github.com/wagner-group/demystify-random-transform获得。
translated by 谷歌翻译
Recent work has demonstrated that deep neural networks are vulnerable to adversarial examples-inputs that are almost indistinguishable from natural data and yet classified incorrectly by the network. In fact, some of the latest findings suggest that the existence of adversarial attacks may be an inherent weakness of deep learning models. To address this problem, we study the adversarial robustness of neural networks through the lens of robust optimization. This approach provides us with a broad and unifying view on much of the prior work on this topic. Its principled nature also enables us to identify methods for both training and attacking neural networks that are reliable and, in a certain sense, universal. In particular, they specify a concrete security guarantee that would protect against any adversary. These methods let us train networks with significantly improved resistance to a wide range of adversarial attacks. They also suggest the notion of security against a first-order adversary as a natural and broad security guarantee. We believe that robustness against such well-defined classes of adversaries is an important stepping stone towards fully resistant deep learning models. 1
translated by 谷歌翻译
虽然已显示自动语音识别易受对抗性攻击的影响,但对这些攻击的防御仍然滞后。现有的,天真的防御可以用自适应攻击部分地破坏。在分类任务中,随机平滑范式已被证明是有效的卫生模型。然而,由于它们的复杂性和其输出的顺序性,难以将此范例应用于ASR任务。我们的论文通过利用更具增强和流动站投票的语音专用工具来设计一个对扰动强大的ASR模型来克服了一些这些挑战。我们应用了最先进的攻击的自适应版本,例如难以察觉的ASR攻击,我们的模型,并表明我们最强大的防守对所有使用听不清噪声的攻击是强大的,并且只能以非常高的扭曲破碎。
translated by 谷歌翻译
对机器学习模型的逃避攻击通常通过迭代探测固定目标模型成功,从而曾经成功的攻击将反复成功。应对这种威胁的一种有希望的方法是使模型成为对抗输入的行动目标。为此,我们介绍了Morphence-2.0,这是一个由分布外(OOD)检测提供动力的可扩展移动目标防御(MTD),以防止对抗性例子。通过定期移动模型的决策功能,Morphence-2.0使重复或相关攻击成功的挑战变得极大。 Morphence-2.0以基本模型生成的模型池以引入足够随机性的方式对预测查询进行响应。通过OOD检测,Morphence-2.0配备了调度方法,该方法将对抗性示例分配给了强大的决策功能,并将良性样本分配给了未防御的准确模型。为了确保重复或相关的攻击失败,已部署的模型池在达到查询预算后​​自动到期,并且模型池被提前生成的新模型池无缝替换。我们在两个基准图像分类数据集(MNIST和CIFAR10)上评估Morphence-2.0,以4个参考攻击(3个白框和1个黑色框)。 Morphence-2.0始终优于先前的防御能力,同时保留清洁数据的准确性和降低攻击转移性。我们还表明,当由OOD检测提供动力时,Morphence-2.0能够精确地对模型的决策功能进行基于输入的运动,从而导致对对抗和良性查询的预测准确性更高。
translated by 谷歌翻译
随着现实世界图像的大小不同,机器学习模型是包括上游图像缩放算法的较大系统的一部分。在本文中,我们研究了基于决策的黑框设置中图像缩放过程的漏洞与机器学习模型之间的相互作用。我们提出了一种新颖的采样策略,以端到端的方式使黑框攻击利用漏洞在缩放算法,缩放防御和最终的机器学习模型中。基于这种缩放感知的攻击,我们揭示了大多数现有的缩放防御能力在下游模型的威胁下无效。此外,我们从经验上观察到,标准的黑盒攻击可以通过利用脆弱的缩放程序来显着提高其性能。我们进一步在具有基于决策的黑盒攻击的商业图像分析API上证明了这个问题。
translated by 谷歌翻译
评估防御模型的稳健性是对抗对抗鲁棒性研究的具有挑战性的任务。僵化的渐变,先前已经发现了一种梯度掩蔽,以许多防御方法存在并导致鲁棒性的错误信号。在本文中,我们确定了一种更细微的情况,称为不平衡梯度,也可能导致过高的对抗性鲁棒性。当边缘损耗的一个术语的梯度主导并将攻击朝向次优化方向推动时,发生不平衡梯度的现象。为了利用不平衡的梯度,我们制定了分解利润率损失的边缘分解(MD)攻击,并通过两阶段过程分别探讨了这些术语的攻击性。我们还提出了一个Multared和Ensemble版本的MD攻击。通过调查自2018年以来提出的17个防御模型,我们发现6种型号易受不平衡梯度的影响,我们的MD攻击可以减少由最佳基线独立攻击评估的鲁棒性另外2%。我们还提供了对不平衡梯度的可能原因和有效对策的深入分析。
translated by 谷歌翻译
Adversarial examples are perturbed inputs designed to fool machine learning models. Adversarial training injects such examples into training data to increase robustness. To scale this technique to large datasets, perturbations are crafted using fast single-step methods that maximize a linear approximation of the model's loss. We show that this form of adversarial training converges to a degenerate global minimum, wherein small curvature artifacts near the data points obfuscate a linear approximation of the loss. The model thus learns to generate weak perturbations, rather than defend against strong ones. As a result, we find that adversarial training remains vulnerable to black-box attacks, where we transfer perturbations computed on undefended models, as well as to a powerful novel single-step attack that escapes the non-smooth vicinity of the input data via a small random step. We further introduce Ensemble Adversarial Training, a technique that augments training data with perturbations transferred from other models. On ImageNet, Ensemble Adversarial Training yields models with stronger robustness to blackbox attacks. In particular, our most robust model won the first round of the NIPS 2017 competition on Defenses against Adversarial Attacks (Kurakin et al., 2017c). However, subsequent work found that more elaborate black-box attacks could significantly enhance transferability and reduce the accuracy of our models.
translated by 谷歌翻译
深度卷积神经网络(CNN)很容易被输入图像的细微,不可察觉的变化所欺骗。为了解决此漏洞,对抗训练会创建扰动模式,并将其包括在培训设置中以鲁棒性化模型。与仅使用阶级有限信息的现有对抗训练方法(例如,使用交叉渗透损失)相反,我们建议利用功能空间中的其他信息来促进更强的对手,这些信息又用于学习强大的模型。具体来说,我们将使用另一类的目标样本的样式和内容信息以及其班级边界信息来创建对抗性扰动。我们以深入监督的方式应用了我们提出的多任务目标,从而提取了多尺度特征知识,以创建最大程度地分开对手。随后,我们提出了一种最大边缘对抗训练方法,该方法可最大程度地减少源图像与其对手之间的距离,并最大程度地提高对手和目标图像之间的距离。与最先进的防御能力相比,我们的对抗训练方法表明了强大的鲁棒性,可以很好地推广到自然发生的损坏和数据分配变化,并保留了清洁示例的模型准确性。
translated by 谷歌翻译
尽管机器学习系统的效率和可扩展性,但最近的研究表明,许多分类方法,尤其是深神经网络(DNN),易受对抗的例子;即,仔细制作欺骗训练有素的分类模型的例子,同时无法区分从自然数据到人类。这使得在安全关键区域中应用DNN或相关方法可能不安全。由于这个问题是由Biggio等人确定的。 (2013)和Szegedy等人。(2014年),在这一领域已经完成了很多工作,包括开发攻击方法,以产生对抗的例子和防御技术的构建防范这些例子。本文旨在向统计界介绍这一主题及其最新发展,主要关注对抗性示例的产生和保护。在数值实验中使用的计算代码(在Python和R)公开可用于读者探讨调查的方法。本文希望提交人们将鼓励更多统计学人员在这种重要的令人兴奋的领域的产生和捍卫对抗的例子。
translated by 谷歌翻译
This paper investigates strategies that defend against adversarial-example attacks on image-classification systems by transforming the inputs before feeding them to the system. Specifically, we study applying image transformations such as bit-depth reduction, JPEG compression, total variance minimization, and image quilting before feeding the image to a convolutional network classifier. Our experiments on ImageNet show that total variance minimization and image quilting are very effective defenses in practice, in particular, when the network is trained on transformed images. The strength of those defenses lies in their non-differentiable nature and their inherent randomness, which makes it difficult for an adversary to circumvent the defenses. Our best defense eliminates 60% of strong gray-box and 90% of strong black-box attacks by a variety of major attack methods.
translated by 谷歌翻译