Our work targets at searching feasible adversarial perturbation to attack a classifier with high-dimensional categorical inputs in a domain-agnostic setting. This is intrinsically an NP-hard knapsack problem where the exploration space becomes explosively larger as the feature dimension increases. Without the help of domain knowledge, solving this problem via heuristic method, such as Branch-and-Bound, suffers from exponential complexity, yet can bring arbitrarily bad attack results. We address the challenge via the lens of multi-armed bandit based combinatorial search. Our proposed method, namely FEAT, treats modifying each categorical feature as pulling an arm in multi-armed bandit programming. Our objective is to achieve highly efficient and effective attack using an Orthogonal Matching Pursuit (OMP)-enhanced Upper Confidence Bound (UCB) exploration strategy. Our theoretical analysis bounding the regret gap of FEAT guarantees its practical attack performance. In empirical analysis, we compare FEAT with other state-of-the-art domain-agnostic attack methods over various real-world categorical data sets of different applications. Substantial experimental observations confirm the expected efficiency and attack effectiveness of FEAT applied in different application scenarios. Our work further hints the applicability of FEAT for assessing the adversarial vulnerability of classification systems with high-dimensional categorical inputs.
translated by 谷歌翻译
许多机器学习问题在表格域中使用数据。对抗性示例可能对这些应用尤其有害。然而,现有关于对抗鲁棒性的作品主要集中在图像和文本域中的机器学习模型。我们认为,由于表格数据和图像或文本之间的差异,现有的威胁模型不适合表格域。这些模型没有捕获该成本比不可识别更重要,也不能使对手可以将不同的价值归因于通过部署不同的对手示例获得的效用。我们表明,由于这些差异,用于图像的攻击和防御方法和文本无法直接应用于表格设置。我们通过提出新的成本和公用事业感知的威胁模型来解决这些问题,该模型量身定制了针对表格域的攻击者的攻击者的约束。我们介绍了一个框架,使我们能够设计攻击和防御机制,从而导致模型免受成本或公用事业意识的对手的影响,例如,受到一定美元预算约束的对手。我们表明,我们的方法在与对应于对抗性示例具有经济和社会影响的应用相对应的三个表格数据集中有效。
translated by 谷歌翻译
在过去几年中,已经提出了各种文字攻击方法来揭示自然语言处理中使用的深度神经网络的脆弱性。通常,这些方法涉及一个重要的优化步骤,以确定原始输入中的每个单词使用的替代。然而,从对问题理解和解决问题的角度来看,对这一步骤的目前的研究仍然是有限的。在本文中,我们通过揭示问题的理论属性并提出有效的本地搜索算法(LS)来解决这些问题来解决这些问题。我们建立了一个关于解决问题的第一个可提供的近似保证。涉及5个NLP任务,8个数据集和26个NLP模型的扩展实验表明,LS可能大大降低了Qualies数量,以实现高攻击成功率。进一步的实验表明,LS制造的对抗例通常具有更高的质量,表现出更好的可转移性,并且可以通过对抗培训为受害者模型带来更高的鲁棒性改善。
translated by 谷歌翻译
过去几年的对抗性文本攻击领域已经大大增长,其中常见的目标是加工可以成功欺骗目标模型的对抗性示例。然而,攻击的难以察觉,也是基本目标,通常被以前的研究遗漏。在这项工作中,我们倡导同时考虑两个目标,并提出一种新的多优化方法(被称为水合物转速),具有可提供的绩效保证,以实现高稳定性的成功攻击。我们通过基于分数和决策的设置,展示了HydroText通过广泛实验的效果,涉及五个基于基准数据集的现代NLP模型。与现有的最先进的攻击相比,Hydratext同时实现了更高的成功率,更低的修改率和与原始文本更高的语义相似性。人类评估研究表明,由水分精制成的对抗例保持良好的有效性和自然。最后,这些例子也表现出良好的可转移性,并且可以通过对抗性培训为目标模型带来显着的稳健性。
translated by 谷歌翻译
Gradient-based explanation is the cornerstone of explainable deep networks, but it has been shown to be vulnerable to adversarial attacks. However, existing works measure the explanation robustness based on $\ell_p$-norm, which can be counter-intuitive to humans, who only pay attention to the top few salient features. We propose explanation ranking thickness as a more suitable explanation robustness metric. We then present a new practical adversarial attacking goal for manipulating explanation rankings. To mitigate the ranking-based attacks while maintaining computational feasibility, we derive surrogate bounds of the thickness that involve expensive sampling and integration. We use a multi-objective approach to analyze the convergence of a gradient-based attack to confirm that the explanation robustness can be measured by the thickness metric. We conduct experiments on various network architectures and diverse datasets to prove the superiority of the proposed methods, while the widely accepted Hessian-based curvature smoothing approaches are not as robust as our method.
translated by 谷歌翻译
我们研究对线性随机匪徒的对抗攻击:通过操纵奖励,对手旨在控制匪徒的行为。也许令人惊讶的是,我们首先表明某些攻击目标永远无法实现。这与无上下文的随机匪徒形成了鲜明的对比,并且本质上是由于线性随机陆上的臂之间的相关性。在这一发现的激励下,本文研究了$ k $武装的线性匪徒环境的攻击性。我们首先根据武器上下文向量的几何形状提供了攻击性的完全必要性和充分性表征。然后,我们提出了针对Linucb和鲁棒相消除的两阶段攻击方法。该方法首先断言给定环境是否可攻击;而且,如果是的话,它会付出巨大的奖励,以强迫算法仅使用sublinear成本来拉动目标臂线性时间。数值实验进一步验证了拟议攻击方法的有效性和成本效益。
translated by 谷歌翻译
随着深度神经网络(DNNS)的进步在许多关键应用中表现出前所未有的性能水平,它们的攻击脆弱性仍然是一个悬而未决的问题。我们考虑在测试时间进行逃避攻击,以防止在受约束的环境中进行深入学习,其中需要满足特征之间的依赖性。这些情况可能自然出现在表格数据中,也可能是特定应用程序域中功能工程的结果,例如网络安全中的威胁检测。我们提出了一个普通的基于迭代梯度的框架,称为围栏,用于制定逃避攻击,考虑到约束域和应用要求的细节。我们将其应用于针对两个网络安全应用培训的前馈神经网络:网络流量僵尸网络分类和恶意域分类,以生成可行的对抗性示例。我们广泛评估了攻击的成功率和绩效,比较它们对几个基线的改进,并分析影响攻击成功率的因素,包括优化目标和数据失衡。我们表明,通过最少的努力(例如,生成12个其他网络连接),攻击者可以将模型的预测从恶意类更改为良性并逃避分类器。我们表明,在具有更高失衡的数据集上训练的模型更容易受到我们的围栏攻击。最后,我们证明了在受限领域进行对抗训练的潜力,以提高针对这些逃避攻击的模型弹性。
translated by 谷歌翻译
Neural networks provide state-of-the-art results for most machine learning tasks. Unfortunately, neural networks are vulnerable to adversarial examples: given an input x and any target classification t, it is possible to find a new input x that is similar to x but classified as t. This makes it difficult to apply neural networks in security-critical areas. Defensive distillation is a recently proposed approach that can take an arbitrary neural network, and increase its robustness, reducing the success rate of current attacks' ability to find adversarial examples from 95% to 0.5%.In this paper, we demonstrate that defensive distillation does not significantly increase the robustness of neural networks by introducing three new attack algorithms that are successful on both distilled and undistilled neural networks with 100% probability. Our attacks are tailored to three distance metrics used previously in the literature, and when compared to previous adversarial example generation algorithms, our attacks are often much more effective (and never worse). Furthermore, we propose using high-confidence adversarial examples in a simple transferability test we show can also be used to break defensive distillation. We hope our attacks will be used as a benchmark in future defense attempts to create neural networks that resist adversarial examples.
translated by 谷歌翻译
Adversarial robustness assessment for video recognition models has raised concerns owing to their wide applications on safety-critical tasks. Compared with images, videos have much high dimension, which brings huge computational costs when generating adversarial videos. This is especially serious for the query-based black-box attacks where gradient estimation for the threat models is usually utilized, and high dimensions will lead to a large number of queries. To mitigate this issue, we propose to simultaneously eliminate the temporal and spatial redundancy within the video to achieve an effective and efficient gradient estimation on the reduced searching space, and thus query number could decrease. To implement this idea, we design the novel Adversarial spatial-temporal Focus (AstFocus) attack on videos, which performs attacks on the simultaneously focused key frames and key regions from the inter-frames and intra-frames in the video. AstFocus attack is based on the cooperative Multi-Agent Reinforcement Learning (MARL) framework. One agent is responsible for selecting key frames, and another agent is responsible for selecting key regions. These two agents are jointly trained by the common rewards received from the black-box threat models to perform a cooperative prediction. By continuously querying, the reduced searching space composed of key frames and key regions is becoming precise, and the whole query number becomes less than that on the original video. Extensive experiments on four mainstream video recognition models and three widely used action recognition datasets demonstrate that the proposed AstFocus attack outperforms the SOTA methods, which is prevenient in fooling rate, query number, time, and perturbation magnitude at the same.
translated by 谷歌翻译
我们专注于在黑框设置中对模型的对抗性攻击的问题,攻击者旨在制作对受害者模型的查询访问有限的对抗性示例。现有的黑框攻击主要基于贪婪的算法,使用预先计算的关键位置来扰动,从而严重限制了搜索空间,并可能导致次优的解决方案。为此,我们提出了使用贝叶斯优化的查询有效的黑盒攻击,该贝叶斯优化使用自动相关性确定(ARD)分类内核动态计算重要位置。我们引入了块分解和历史次采样技术,以提高输入序列长时间时贝叶斯优化的可伸缩性。此外,我们开发了一种优化后算法,该算法找到了具有较小扰动大小的对抗示例。关于自然语言和蛋白质分类任务的实验表明,与先前的最新方法相比,我们的方法始终达到更高的攻击成功率,查询计数和修改率的显着降低。
translated by 谷歌翻译
机器学习模型严重易于来自对抗性示例的逃避攻击。通常,对逆势示例的修改输入类似于原始输入的修改输入,在WhiteBox设置下由对手的WhiteBox设置构成,完全访问模型。然而,最近的攻击已经显示出使用BlackBox攻击的对逆势示例的查询号显着减少。特别是,警报是从越来越多的机器学习提供的经过培训的模型的访问界面中利用分类决定作为包括Google,Microsoft,IBM的服务提供商,并由包含这些模型的多种应用程序使用的服务提供商来利用培训的模型。对手仅利用来自模型的预测标签的能力被区别为基于决策的攻击。在我们的研究中,我们首先深入潜入最近的ICLR和SP的最先进的决策攻击,以突出发现低失真对抗采用梯度估计方法的昂贵性质。我们开发了一种强大的查询高效攻击,能够避免在梯度估计方法中看到的嘈杂渐变中的局部最小和误导中的截留。我们提出的攻击方法,ramboattack利用随机块坐标下降的概念来探索隐藏的分类器歧管,针对扰动来操纵局部输入功能以解决梯度估计方法的问题。重要的是,ramboattack对对对手和目标类别可用的不同样本输入更加强大。总的来说,对于给定的目标类,ramboattack被证明在实现给定查询预算的较低失真时更加强大。我们使用大规模的高分辨率ImageNet数据集来策划我们的广泛结果,并在GitHub上开源我们的攻击,测试样本和伪影。
translated by 谷歌翻译
Explainability has been widely stated as a cornerstone of the responsible and trustworthy use of machine learning models. With the ubiquitous use of Deep Neural Network (DNN) models expanding to risk-sensitive and safety-critical domains, many methods have been proposed to explain the decisions of these models. Recent years have also seen concerted efforts that have shown how such explanations can be distorted (attacked) by minor input perturbations. While there have been many surveys that review explainability methods themselves, there has been no effort hitherto to assimilate the different methods and metrics proposed to study the robustness of explanations of DNN models. In this work, we present a comprehensive survey of methods that study, understand, attack, and defend explanations of DNN models. We also present a detailed review of different metrics used to evaluate explanation methods, as well as describe attributional attack and defense methods. We conclude with lessons and take-aways for the community towards ensuring robust explanations of DNN model predictions.
translated by 谷歌翻译
时间序列数据在许多现实世界中(例如,移动健康)和深神经网络(DNNS)中产生,在解决它们方面已取得了巨大的成功。尽管他们成功了,但对他们对对抗性攻击的稳健性知之甚少。在本文中,我们提出了一个通过统计特征(TSA-STAT)}称为时间序列攻击的新型对抗框架}。为了解决时间序列域的独特挑战,TSA-STAT对时间序列数据的统计特征采取限制来构建对抗性示例。优化的多项式转换用于创建比基于加性扰动的攻击(就成功欺骗DNN而言)更有效的攻击。我们还提供有关构建对抗性示例的统计功能规范的认证界限。我们对各种现实世界基准数据集的实验表明,TSA-STAT在欺骗DNN的时间序列域和改善其稳健性方面的有效性。 TSA-STAT算法的源代码可在https://github.com/tahabelkhouja/time-series-series-attacks-via-statity-features上获得
translated by 谷歌翻译
恶意软件是跨越多个操作系统和各种文件格式的计算机的最损害威胁之一。为了防止不断增长的恶意软件的威胁,已经提出了巨大的努力来提出各种恶意软件检测方法,试图有效和有效地检测恶意软件。最近的研究表明,一方面,现有的ML和DL能够卓越地检测新出现和以前看不见的恶意软件。然而,另一方面,ML和DL模型本质上易于侵犯对抗性示例形式的对抗性攻击,这通过略微仔细地扰乱了合法输入来混淆目标模型来恶意地产生。基本上,在计算机视觉领域最初广泛地研究了对抗性攻击,并且一些快速扩展到其他域,包括NLP,语音识别甚至恶意软件检测。在本文中,我们专注于Windows操作系统系列中的便携式可执行文件(PE)文件格式的恶意软件,即Windows PE恶意软件,作为在这种对抗设置中研究对抗性攻击方法的代表性案例。具体而言,我们首先首先概述基于ML / DL的Windows PE恶意软件检测的一般学习框架,随后突出了在PE恶意软件的上下文中执行对抗性攻击的三个独特挑战。然后,我们进行全面和系统的审查,以对PE恶意软件检测以及增加PE恶意软件检测的稳健性的相应防御,对近最新的对手攻击进行分类。我们首先向Windows PE恶意软件检测的其他相关攻击结束除了对抗对抗攻击之外,然后对未来的研究方向和机遇脱落。
translated by 谷歌翻译
Robustness evaluation against adversarial examples has become increasingly important to unveil the trustworthiness of the prevailing deep models in natural language processing (NLP). However, in contrast to the computer vision domain where the first-order projected gradient descent (PGD) is used as the benchmark approach to generate adversarial examples for robustness evaluation, there lacks a principled first-order gradient-based robustness evaluation framework in NLP. The emerging optimization challenges lie in 1) the discrete nature of textual inputs together with the strong coupling between the perturbation location and the actual content, and 2) the additional constraint that the perturbed text should be fluent and achieve a low perplexity under a language model. These challenges make the development of PGD-like NLP attacks difficult. To bridge the gap, we propose TextGrad, a new attack generator using gradient-driven optimization, supporting high-accuracy and high-quality assessment of adversarial robustness in NLP. Specifically, we address the aforementioned challenges in a unified optimization framework. And we develop an effective convex relaxation method to co-optimize the continuously-relaxed site selection and perturbation variables and leverage an effective sampling method to establish an accurate mapping from the continuous optimization variables to the discrete textual perturbations. Moreover, as a first-order attack generation method, TextGrad can be baked into adversarial training to further improve the robustness of NLP models. Extensive experiments are provided to demonstrate the effectiveness of TextGrad not only in attack generation for robustness evaluation but also in adversarial defense.
translated by 谷歌翻译
We propose the Square Attack, a score-based black-box l2and l∞-adversarial attack that does not rely on local gradient information and thus is not affected by gradient masking. Square Attack is based on a randomized search scheme which selects localized squareshaped updates at random positions so that at each iteration the perturbation is situated approximately at the boundary of the feasible set. Our method is significantly more query efficient and achieves a higher success rate compared to the state-of-the-art methods, especially in the untargeted setting. In particular, on ImageNet we improve the average query efficiency in the untargeted setting for various deep networks by a factor of at least 1.8 and up to 3 compared to the recent state-ofthe-art l∞-attack of Al-Dujaili & OReilly (2020). Moreover, although our attack is black-box, it can also outperform gradient-based white-box attacks on the standard benchmarks achieving a new state-of-the-art in terms of the success rate. The code of our attack is available at https://github.com/max-andr/square-attack.
translated by 谷歌翻译
在随着时间变化的组合环境中的在线决策激励,我们研究了将离线算法转换为其在线对应物的问题。我们专注于使用贪婪算法对局部错误的贪婪算法进行恒定因子近似的离线组合问题。对于此类问题,我们提供了一个通用框架,该框架可有效地将稳健的贪婪算法转换为使用Blackwell的易近算法。我们证明,在完整信息设置下,由此产生的在线算法具有$ O(\ sqrt {t})$(近似)遗憾。我们进一步介绍了Blackwell易接近性的强盗扩展,我们称之为Bandit Blackwell的可接近性。我们利用这一概念将贪婪的稳健离线算法转变为匪(t^{2/3})$(近似)$(近似)的遗憾。展示了我们框架的灵活性,我们将脱机之间的转换应用于收入管理,市场设计和在线优化的几个问题,包括在线平台中的产品排名优化,拍卖中的储备价格优化以及supperular tossodular最大化。 。我们还将还原扩展到连续优化的类似贪婪的一阶方法,例如用于最大化连续强的DR单调下调功能,这些功能受到凸约束的约束。我们表明,当应用于这些应用程序时,我们的转型会导致新的后悔界限或改善当前已知界限。我们通过为我们的两个应用进行数值模拟来补充我们的理论研究,在这两种应用中,我们都观察到,转换的数值性能在实际情况下优于理论保证。
translated by 谷歌翻译
已知深度学习系统容易受到对抗例子的影响。特别是,基于查询的黑框攻击不需要深入学习模型的知识,而可以通过提交查询和检查收益来计算网络上的对抗示例。最近的工作在很大程度上提高了这些攻击的效率,证明了它们在当今的ML-AS-A-Service平台上的实用性。我们提出了Blacklight,这是针对基于查询的黑盒对抗攻击的新防御。推动我们设计的基本见解是,为了计算对抗性示例,这些攻击在网络上进行了迭代优化,从而在输入空间中产生了非常相似的图像查询。 Blacklight使用在概率内容指纹上运行的有效相似性引擎来检测高度相似的查询来检测基于查询的黑盒攻击。我们根据各种模型和图像分类任务对八次最先进的攻击进行评估。 Blacklight通常只有几次查询后,都可以识别所有这些。通过拒绝所有检测到的查询,即使攻击者在帐户禁令或查询拒绝之后持续提交查询,Blacklight也可以防止任何攻击完成。 Blacklight在几个强大的对策中也很强大,包括最佳的黑盒攻击,该攻击近似于效率的白色框攻击。最后,我们说明了黑光如何推广到其他域,例如文本分类。
translated by 谷歌翻译
Performance of machine learning algorithms depends critically on identifying a good set of hyperparameters. While recent approaches use Bayesian optimization to adaptively select configurations, we focus on speeding up random search through adaptive resource allocation and early-stopping. We formulate hyperparameter optimization as a pure-exploration nonstochastic infinite-armed bandit problem where a predefined resource like iterations, data samples, or features is allocated to randomly sampled configurations. We introduce a novel algorithm, Hyperband, for this framework and analyze its theoretical properties, providing several desirable guarantees. Furthermore, we compare Hyperband with popular Bayesian optimization methods on a suite of hyperparameter optimization problems. We observe that Hyperband can provide over an order-of-magnitude speedup over our competitor set on a variety of deep-learning and kernel-based learning problems.
translated by 谷歌翻译
图表神经网络,一种流行的模型,在各种基于图形的学习任务中有效,已被证明易受对抗攻击的影响。虽然大多数文献侧重于节点级分类任务中的这种脆弱性,但很少努力致力于分析对图形级分类的对抗攻击,这是生物化学和社会网络分析等众多现实生活应用的重要问题。少数现有方法通常需要不切实际的设置,例如访问受害者模型的内部信息,或者是一个不切实际的查询。我们提出了一种新型贝叶斯优化的攻击方法,用于图形分类模型。我们的方法是黑匣子,查询效率和涉及扰动的效率和解析。我们经验验证了所提出的方法对涉及不同图形属性,约束和攻击方式的图形分类任务的效果和灵活性。最后,我们分析了产生的对手样本后面的常见可解释模式,这可能会在图形分类模型的对抗鲁棒性上流出进一步的光。
translated by 谷歌翻译