如今,基于机器学习(ML)的系统被广泛用于不同域。鉴于它们的受欢迎程度,ML模型已成为各种攻击的目标。结果,在安全与隐私的交汇处以及ML的研究蓬勃发展。研究界一直在分别探索攻击媒介和潜在的缓解。但是,从业人员可能需要同时部署防御措施,以防止几种威胁。针对特定问题的最佳解决方案可能会与旨在解决其他问题的解决方案进行负相互作用。在这项工作中,我们探讨了不同解决方案之间相互作用的潜力,从而增强了ML基本系统的安全/隐私。我们专注于模型和数据所有权;探索所有权验证技术如何与其他ML安全/隐私技术(如差异化私有培训)以及反对逃避模型的鲁棒性相互作用。我们提供一个框架,并对成对相互作用进行系统分析。我们表明许多对不兼容。在可能的情况下,我们为超参数或允许同时部署的技术提供放松。最后,我们讨论含义并为将来的工作提供指南。
translated by 谷歌翻译
用于训练机器学习(ML)模型的数据可能是敏感的。成员推理攻击(MIS),试图确定特定数据记录是否用于培训ML模型,违反会员隐私。 ML模型建设者需要一个原则的定义,使他们能够有效地定量(a)单独培训数据记录,(b)的隐私风险,有效地。未在会员资格危险风险指标上均未达到所有这些标准。我们提出了这种公制,SHAPR,它通过抑制其对模型的实用程序的影响来量化朔芙值以量化模型的记忆。这个记忆是衡量成功MIA的可能性的衡量标准。使用十个基准数据集,我们显示ShapR是有效的(精确度:0.94 $ \ PM 0.06 $,回忆:0.88 $ \ PM 0.06 $)在估算MIAS的培训数据记录的易感性时,高效(可在几分钟内计算,较小数据集和最大数据集的约〜90分钟)。 ShapR也是多功能的,因为它可以用于评估数据集的子集的公平或分配估值的其他目的。例如,我们显示Shapr正确地捕获不同子组的不成比例漏洞到MIS。使用SHAPR,我们表明,通过去除高风险训练数据记录,不一定改善数据集的成员隐私风险,从而确认在显着扩展的设置中从事工作(在十个数据集中,最多可删除50%的数据)的观察。
translated by 谷歌翻译
对机器学习模型的逃避攻击通常通过迭代探测固定目标模型成功,从而曾经成功的攻击将反复成功。应对这种威胁的一种有希望的方法是使模型成为对抗输入的行动目标。为此,我们介绍了Morphence-2.0,这是一个由分布外(OOD)检测提供动力的可扩展移动目标防御(MTD),以防止对抗性例子。通过定期移动模型的决策功能,Morphence-2.0使重复或相关攻击成功的挑战变得极大。 Morphence-2.0以基本模型生成的模型池以引入足够随机性的方式对预测查询进行响应。通过OOD检测,Morphence-2.0配备了调度方法,该方法将对抗性示例分配给了强大的决策功能,并将良性样本分配给了未防御的准确模型。为了确保重复或相关的攻击失败,已部署的模型池在达到查询预算后​​自动到期,并且模型池被提前生成的新模型池无缝替换。我们在两个基准图像分类数据集(MNIST和CIFAR10)上评估Morphence-2.0,以4个参考攻击(3个白框和1个黑色框)。 Morphence-2.0始终优于先前的防御能力,同时保留清洁数据的准确性和降低攻击转移性。我们还表明,当由OOD检测提供动力时,Morphence-2.0能够精确地对模型的决策功能进行基于输入的运动,从而导致对对抗和良性查询的预测准确性更高。
translated by 谷歌翻译
会员推理攻击是机器学习模型中最简单的隐私泄漏形式之一:给定数据点和模型,确定该点是否用于培训模型。当查询其培训数据时,现有会员推理攻击利用模型的异常置信度。如果对手访问模型的预测标签,则不会申请这些攻击,而不会置信度。在本文中,我们介绍了仅限标签的会员资格推理攻击。我们的攻击而不是依赖置信分数,而是评估模型预测标签在扰动下的稳健性,以获得细粒度的隶属信号。这些扰动包括常见的数据增强或对抗例。我们经验表明,我们的标签占会员推理攻击与先前攻击相符,以便需要访问模型信心。我们进一步证明,仅限标签攻击违反了(隐含或明确)依赖于我们呼叫信心屏蔽的现象的员工推论攻击的多种防御。这些防御修改了模型的置信度分数以挫败攻击,但留下模型的预测标签不变。我们的标签攻击展示了置信性掩蔽不是抵御会员推理的可行的防御策略。最后,我们调查唯一的案例标签攻击,该攻击推断为少量异常值数据点。我们显示仅标签攻击也匹配此设置中基于置信的攻击。我们发现具有差异隐私和(强)L2正则化的培训模型是唯一已知的防御策略,成功地防止所有攻击。即使差异隐私预算太高而无法提供有意义的可证明担保,这仍然存在。
translated by 谷歌翻译
A distribution inference attack aims to infer statistical properties of data used to train machine learning models. These attacks are sometimes surprisingly potent, but the factors that impact distribution inference risk are not well understood and demonstrated attacks often rely on strong and unrealistic assumptions such as full knowledge of training environments even in supposedly black-box threat scenarios. To improve understanding of distribution inference risks, we develop a new black-box attack that even outperforms the best known white-box attack in most settings. Using this new attack, we evaluate distribution inference risk while relaxing a variety of assumptions about the adversary's knowledge under black-box access, like known model architectures and label-only access. Finally, we evaluate the effectiveness of previously proposed defenses and introduce new defenses. We find that although noise-based defenses appear to be ineffective, a simple re-sampling defense can be highly effective. Code is available at https://github.com/iamgroot42/dissecting_distribution_inference
translated by 谷歌翻译
从公共机器学习(ML)模型中泄漏数据是一个越来越重要的领域,因为ML的商业和政府应用可以利用多个数据源,可能包括用户和客户的敏感数据。我们对几个方面的当代进步进行了全面的调查,涵盖了非自愿数据泄漏,这对ML模型很自然,潜在的恶毒泄漏是由隐私攻击引起的,以及目前可用的防御机制。我们专注于推理时间泄漏,这是公开可用模型的最可能场景。我们首先在不同的数据,任务和模型体系结构的背景下讨论什么是泄漏。然后,我们提出了跨非自愿和恶意泄漏的分类法,可用的防御措施,然后进行当前可用的评估指标和应用。我们以杰出的挑战和开放性的问题结束,概述了一些有希望的未来研究方向。
translated by 谷歌翻译
最近对机器学习(ML)模型的攻击,例如逃避攻击,具有对抗性示例,并通过提取攻击窃取了一些模型,构成了几种安全性和隐私威胁。先前的工作建议使用对抗性训练从对抗性示例中保护模型,以逃避模型的分类并恶化其性能。但是,这种保护技术会影响模型的决策边界及其预测概率,因此可能会增加模型隐私风险。实际上,仅使用对模型预测输出的查询访问的恶意用户可以提取它并获得高智能和高保真替代模型。为了更大的提取,这些攻击利用了受害者模型的预测概率。实际上,所有先前关于提取攻击的工作都没有考虑到出于安全目的的培训过程中的变化。在本文中,我们提出了一个框架,以评估具有视觉数据集对对抗训练的模型的提取攻击。据我们所知,我们的工作是第一个进行此类评估的工作。通过一项广泛的实证研究,我们证明了受对抗训练的模型比在自然训练情况下获得的模型更容易受到提取攻击的影响。他们可以达到高达$ \ times1.2 $更高的准确性和同意,而疑问低于$ \ times0.75 $。我们还发现,与从自然训练的(即标准)模型中提取的DNN相比,从鲁棒模型中提取的对抗性鲁棒性能力可通过提取攻击(即从鲁棒模型提取的深神经网络(DNN)提取的深神网络(DNN))传递。
translated by 谷歌翻译
Differential privacy is a strong notion for privacy that can be used to prove formal guarantees, in terms of a privacy budget, , about how much information is leaked by a mechanism. However, implementations of privacy-preserving machine learning often select large values of in order to get acceptable utility of the model, with little understanding of the impact of such choices on meaningful privacy. Moreover, in scenarios where iterative learning procedures are used, differential privacy variants that offer tighter analyses are used which appear to reduce the needed privacy budget but present poorly understood trade-offs between privacy and utility. In this paper, we quantify the impact of these choices on privacy in experiments with logistic regression and neural network models. Our main finding is that there is a huge gap between the upper bounds on privacy loss that can be guaranteed, even with advanced mechanisms, and the effective privacy loss that can be measured using current inference attacks. Current mechanisms for differentially private machine learning rarely offer acceptable utility-privacy trade-offs with guarantees for complex learning tasks: settings that provide limited accuracy loss provide meaningless privacy guarantees, and settings that provide strong privacy guarantees result in useless models.
translated by 谷歌翻译
有针对性的训练集攻击将恶意实例注入训练集中,以导致训练有素的模型错误地标记一个或多个特定的测试实例。这项工作提出了目标识别的任务,该任务决定了特定的测试实例是否是训练集攻击的目标。目标识别可以与对抗性识别相结合,以查找(并删除)攻击实例,从而减轻对其他预测的影响,从而减轻攻击。我们没有专注于单个攻击方法或数据模式,而是基于影响力估计,这量化了每个培训实例对模型预测的贡献。我们表明,现有的影响估计量的不良实际表现通常来自于他们对训练实例和迭代次数的过度依赖。我们重新归一化的影响估计器解决了这一弱点。他们的表现远远超过了原始估计量,可以在对抗和非对抗环境中识别有影响力的训练示例群体,甚至发现多达100%的对抗训练实例,没有清洁数据误报。然后,目标识别简化以检测具有异常影响值的测试实例。我们证明了我们的方法对各种数据域的后门和中毒攻击的有效性,包括文本,视觉和语音,以及针对灰色盒子的自适应攻击者,该攻击者专门优化了逃避我们方法的对抗性实例。我们的源代码可在https://github.com/zaydh/target_indistification中找到。
translated by 谷歌翻译
Adversarial examples are perturbed inputs designed to fool machine learning models. Adversarial training injects such examples into training data to increase robustness. To scale this technique to large datasets, perturbations are crafted using fast single-step methods that maximize a linear approximation of the model's loss. We show that this form of adversarial training converges to a degenerate global minimum, wherein small curvature artifacts near the data points obfuscate a linear approximation of the loss. The model thus learns to generate weak perturbations, rather than defend against strong ones. As a result, we find that adversarial training remains vulnerable to black-box attacks, where we transfer perturbations computed on undefended models, as well as to a powerful novel single-step attack that escapes the non-smooth vicinity of the input data via a small random step. We further introduce Ensemble Adversarial Training, a technique that augments training data with perturbations transferred from other models. On ImageNet, Ensemble Adversarial Training yields models with stronger robustness to blackbox attacks. In particular, our most robust model won the first round of the NIPS 2017 competition on Defenses against Adversarial Attacks (Kurakin et al., 2017c). However, subsequent work found that more elaborate black-box attacks could significantly enhance transferability and reduce the accuracy of our models.
translated by 谷歌翻译
员额推理攻击允许对训练的机器学习模型进行对手以预测模型的训练数据集中包含特定示例。目前使用平均案例的“精度”度量来评估这些攻击,该攻击未能表征攻击是否可以自信地识别培训集的任何成员。我们认为,应该通过计算其低(例如<0.1%)假阳性率来计算攻击来评估攻击,并在以这种方式评估时发现大多数事先攻击差。为了解决这一问题,我们开发了一个仔细结合文献中多种想法的似然比攻击(Lira)。我们的攻击是低于虚假阳性率的10倍,并且在攻击现有度量的情况下也严格占主导地位。
translated by 谷歌翻译
Speech-centric machine learning systems have revolutionized many leading domains ranging from transportation and healthcare to education and defense, profoundly changing how people live, work, and interact with each other. However, recent studies have demonstrated that many speech-centric ML systems may need to be considered more trustworthy for broader deployment. Specifically, concerns over privacy breaches, discriminating performance, and vulnerability to adversarial attacks have all been discovered in ML research fields. In order to address the above challenges and risks, a significant number of efforts have been made to ensure these ML systems are trustworthy, especially private, safe, and fair. In this paper, we conduct the first comprehensive survey on speech-centric trustworthy ML topics related to privacy, safety, and fairness. In addition to serving as a summary report for the research community, we point out several promising future research directions to inspire the researchers who wish to explore further in this area.
translated by 谷歌翻译
在模型提取攻击中,对手可以通过反复查询并根据获得的预测来窃取通过公共API暴露的机器学习模型。为了防止模型窃取,现有的防御措施专注于检测恶意查询,截断或扭曲输出,因此必然会为合法用户引入鲁棒性和模型实用程序之间的权衡。取而代之的是,我们建议通过要求用户在阅读模型的预测之前完成工作证明来阻碍模型提取。这可以通过大大增加(甚至高达100倍)来阻止攻击者,以利用查询访问模型提取所需的计算工作。由于我们校准完成每个查询的工作证明所需的努力,因此这仅为常规用户(最多2倍)引入一个轻微的开销。为了实现这一目标,我们的校准应用了来自差异隐私的工具来衡量查询揭示的信息。我们的方法不需要对受害者模型进行任何修改,可以通过机器学习从业人员来应用其公开暴露的模型免于轻易被盗。
translated by 谷歌翻译
Vertical federated learning (VFL) is an emerging paradigm that enables collaborators to build machine learning models together in a distributed fashion. In general, these parties have a group of users in common but own different features. Existing VFL frameworks use cryptographic techniques to provide data privacy and security guarantees, leading to a line of works studying computing efficiency and fast implementation. However, the security of VFL's model remains underexplored.
translated by 谷歌翻译
机器学习(ML)模型已广泛应用于各种应用,包括图像分类,文本生成,音频识别和图形数据分析。然而,最近的研究表明,ML模型容易受到隶属推导攻击(MIS),其目的是推断数据记录是否用于训练目标模型。 ML模型上的MIA可以直接导致隐私违规行为。例如,通过确定已经用于训练与某种疾病相关的模型的临床记录,攻击者可以推断临床记录的所有者具有很大的机会。近年来,MIS已被证明对各种ML模型有效,例如,分类模型和生成模型。同时,已经提出了许多防御方法来减轻米西亚。虽然ML模型上的MIAS形成了一个新的新兴和快速增长的研究区,但还没有对这一主题进行系统的调查。在本文中,我们对会员推论和防御进行了第一个全面调查。我们根据其特征提供攻击和防御的分类管理,并讨论其优点和缺点。根据本次调查中确定的限制和差距,我们指出了几个未来的未来研究方向,以激发希望遵循该地区的研究人员。这项调查不仅是研究社区的参考,而且还为该研究领域之外的研究人员带来了清晰的照片。为了进一步促进研究人员,我们创建了一个在线资源存储库,并与未来的相关作品继续更新。感兴趣的读者可以在https://github.com/hongshenghu/membership-inference-machine-learning-literature找到存储库。
translated by 谷歌翻译
由明确的反对派制作的对抗例子在机器学习中引起了重要的关注。然而,潜在虚假朋友带来的安全风险基本上被忽视了。在本文中,我们揭示了虚伪的例子的威胁 - 最初被错误分类但是虚假朋友扰乱的投入,以强迫正确的预测。虽然这种扰动的例子似乎是无害的,但我们首次指出,它们可能是恶意地用来隐瞒评估期间不合格(即,不如所需)模型的错误。一旦部署者信任虚伪的性能并在真实应用程序中应用“良好的”模型,即使在良性环境中也可能发生意外的失败。更严重的是,这种安全风险似乎是普遍存在的:我们发现许多类型的不合标准模型易受多个数据集的虚伪示例。此外,我们提供了第一次尝试,以称为虚伪风险的公制表征威胁,并试图通过一些对策来规避它。结果表明对策的有效性,即使在自适应稳健的培训之后,风险仍然是不可忽视的。
translated by 谷歌翻译
我们审查在机器学习(ML)中使用差异隐私(DP)对隐私保护的使用。我们表明,在维护学习模型的准确性的驱动下,基于DP的ML实现非常宽松,以至于它们不提供DP的事前隐私保证。取而代之的是,他们提供的基本上是与传统(经常受到批评的)统计披露控制方法相似的噪声。由于缺乏正式的隐私保证,因此所提供的实际隐私水平必须经过实验评估,这很少进行。在这方面,我们提出的经验结果表明,ML中的标准反拟合技术可以比DP实现更好的实用性/隐私/效率权衡。
translated by 谷歌翻译
Adaptive attacks have (rightfully) become the de facto standard for evaluating defenses to adversarial examples. We find, however, that typical adaptive evaluations are incomplete. We demonstrate that thirteen defenses recently published at ICLR, ICML and NeurIPS-and which illustrate a diverse set of defense strategies-can be circumvented despite attempting to perform evaluations using adaptive attacks. While prior evaluation papers focused mainly on the end result-showing that a defense was ineffective-this paper focuses on laying out the methodology and the approach necessary to perform an adaptive attack. Some of our attack strategies are generalizable, but no single strategy would have been sufficient for all defenses. This underlines our key message that adaptive attacks cannot be automated and always require careful and appropriate tuning to a given defense. We hope that these analyses will serve as guidance on how to properly perform adaptive attacks against defenses to adversarial examples, and thus will allow the community to make further progress in building more robust models.
translated by 谷歌翻译
与令人印象深刻的进步触动了我们社会的各个方面,基于深度神经网络(DNN)的AI技术正在带来越来越多的安全问题。虽然在考试时间运行的攻击垄断了研究人员的初始关注,但是通过干扰培训过程来利用破坏DNN模型的可能性,代表了破坏训练过程的可能性,这是破坏AI技术的可靠性的进一步严重威胁。在后门攻击中,攻击者损坏了培训数据,以便在测试时间诱导错误的行为。然而,测试时间误差仅在存在与正确制作的输入样本对应的触发事件的情况下被激活。通过这种方式,损坏的网络继续正常输入的预期工作,并且只有当攻击者决定激活网络内隐藏的后门时,才会发生恶意行为。在过去几年中,后门攻击一直是强烈的研究活动的主题,重点是新的攻击阶段的发展,以及可能对策的提议。此概述文件的目标是审查发表的作品,直到现在,分类到目前为止提出的不同类型的攻击和防御。指导分析的分类基于攻击者对培训过程的控制量,以及防御者验证用于培训的数据的完整性,并监控DNN在培训和测试中的操作时间。因此,拟议的分析特别适合于参考他们在运营的应用方案的攻击和防御的强度和弱点。
translated by 谷歌翻译
Recent work has demonstrated that deep neural networks are vulnerable to adversarial examples-inputs that are almost indistinguishable from natural data and yet classified incorrectly by the network. In fact, some of the latest findings suggest that the existence of adversarial attacks may be an inherent weakness of deep learning models. To address this problem, we study the adversarial robustness of neural networks through the lens of robust optimization. This approach provides us with a broad and unifying view on much of the prior work on this topic. Its principled nature also enables us to identify methods for both training and attacking neural networks that are reliable and, in a certain sense, universal. In particular, they specify a concrete security guarantee that would protect against any adversary. These methods let us train networks with significantly improved resistance to a wide range of adversarial attacks. They also suggest the notion of security against a first-order adversary as a natural and broad security guarantee. We believe that robustness against such well-defined classes of adversaries is an important stepping stone towards fully resistant deep learning models. 1
translated by 谷歌翻译