We quantitatively investigate how machine learning models leak information about the individual data records on which they were trained. We focus on the basic membership inference attack: given a data record and black-box access to a model, determine if the record was in the model's training dataset. To perform membership inference against a target model, we make adversarial use of machine learning and train our own inference model to recognize differences in the target model's predictions on the inputs that it trained on versus the inputs that it did not train on.We empirically evaluate our inference techniques on classification models trained by commercial "machine learning as a service" providers such as Google and Amazon. Using realistic datasets and classification tasks, including a hospital discharge dataset whose membership is sensitive from the privacy perspective, we show that these models can be vulnerable to membership inference attacks. We then investigate the factors that influence this leakage and evaluate mitigation strategies.
translated by 谷歌翻译
Machine learning (ML) has become a core component of many real-world applications and training data is a key factor that drives current progress. This huge success has led Internet companies to deploy machine learning as a service (MLaaS). Recently, the first membership inference attack has shown that extraction of information on the training set is possible in such MLaaS settings, which has severe security and privacy implications.However, the early demonstrations of the feasibility of such attacks have many assumptions on the adversary, such as using multiple so-called shadow models, knowledge of the target model structure, and having a dataset from the same distribution as the target model's training data. We relax all these key assumptions, thereby showing that such attacks are very broadly applicable at low cost and thereby pose a more severe risk than previously thought. We present the most comprehensive study so far on this emerging and developing threat using eight diverse datasets which show the viability of the proposed attacks across domains.In addition, we propose the first effective defense mechanisms against such broader class of membership inference attacks that maintain a high level of utility of the ML model.
translated by 谷歌翻译
Deep neural networks are susceptible to various inference attacks as they remember information about their training data. We design white-box inference attacks to perform a comprehensive privacy analysis of deep learning models. We measure the privacy leakage through parameters of fully trained models as well as the parameter updates of models during training. We design inference algorithms for both centralized and federated learning, with respect to passive and active inference attackers, and assuming different adversary prior knowledge.We evaluate our novel white-box membership inference attacks against deep learning algorithms to trace their training data records. We show that a straightforward extension of the known black-box attacks to the white-box setting (through analyzing the outputs of activation functions) is ineffective. We therefore design new algorithms tailored to the white-box setting by exploiting the privacy vulnerabilities of the stochastic gradient descent algorithm, which is the algorithm used to train deep neural networks. We investigate the reasons why deep learning models may leak information about their training data. We then show that even well-generalized models are significantly susceptible to white-box membership inference attacks, by analyzing stateof-the-art pre-trained and publicly available models for the CIFAR dataset. We also show how adversarial participants, in the federated learning setting, can successfully run active membership inference attacks against other participants, even when the global model achieves high prediction accuracies.
translated by 谷歌翻译
Machine learning models leak information about the datasets on which they are trained. An adversary can build an algorithm to trace the individual members of a model's training dataset. As a fundamental inference attack, he aims to distinguish between data points that were part of the model's training set and any other data points from the same distribution. This is known as the tracing (and also membership inference) attack. In this paper, we focus on such attacks against black-box models, where the adversary can only observe the output of the model, but not its parameters. This is the current setting of machine learning as a service in the Internet.We introduce a privacy mechanism to train machine learning models that provably achieve membership privacy: the model's predictions on its training data are indistinguishable from its predictions on other data points from the same distribution. We design a strategic mechanism where the privacy mechanism anticipates the membership inference attacks. The objective is to train a model such that not only does it have the minimum prediction error (high utility), but also it is the most robust model against its corresponding strongest inference attack (high privacy). We formalize this as a min-max game optimization problem, and design an adversarial training algorithm that minimizes the classification loss of the model as well as the maximum gain of the membership inference attack against it. This strategy, which guarantees membership privacy (as prediction indistinguishability), acts also as a strong regularizer and significantly generalizes the model.We evaluate our privacy mechanism on deep neural networks using different benchmark datasets. We show that our min-max strategy can mitigate the risk of membership inference attacks (close to the random guess) with a negligible cost in terms of the classification error.
translated by 谷歌翻译
机器学习(ML)模型已广泛应用于各种应用,包括图像分类,文本生成,音频识别和图形数据分析。然而,最近的研究表明,ML模型容易受到隶属推导攻击(MIS),其目的是推断数据记录是否用于训练目标模型。 ML模型上的MIA可以直接导致隐私违规行为。例如,通过确定已经用于训练与某种疾病相关的模型的临床记录,攻击者可以推断临床记录的所有者具有很大的机会。近年来,MIS已被证明对各种ML模型有效,例如,分类模型和生成模型。同时,已经提出了许多防御方法来减轻米西亚。虽然ML模型上的MIAS形成了一个新的新兴和快速增长的研究区,但还没有对这一主题进行系统的调查。在本文中,我们对会员推论和防御进行了第一个全面调查。我们根据其特征提供攻击和防御的分类管理,并讨论其优点和缺点。根据本次调查中确定的限制和差距,我们指出了几个未来的未来研究方向,以激发希望遵循该地区的研究人员。这项调查不仅是研究社区的参考,而且还为该研究领域之外的研究人员带来了清晰的照片。为了进一步促进研究人员,我们创建了一个在线资源存储库,并与未来的相关作品继续更新。感兴趣的读者可以在https://github.com/hongshenghu/membership-inference-machine-learning-literature找到存储库。
translated by 谷歌翻译
Differential privacy is a strong notion for privacy that can be used to prove formal guarantees, in terms of a privacy budget, , about how much information is leaked by a mechanism. However, implementations of privacy-preserving machine learning often select large values of in order to get acceptable utility of the model, with little understanding of the impact of such choices on meaningful privacy. Moreover, in scenarios where iterative learning procedures are used, differential privacy variants that offer tighter analyses are used which appear to reduce the needed privacy budget but present poorly understood trade-offs between privacy and utility. In this paper, we quantify the impact of these choices on privacy in experiments with logistic regression and neural network models. Our main finding is that there is a huge gap between the upper bounds on privacy loss that can be guaranteed, even with advanced mechanisms, and the effective privacy loss that can be measured using current inference attacks. Current mechanisms for differentially private machine learning rarely offer acceptable utility-privacy trade-offs with guarantees for complex learning tasks: settings that provide limited accuracy loss provide meaningless privacy guarantees, and settings that provide strong privacy guarantees result in useless models.
translated by 谷歌翻译
窃取对受控信息的攻击,以及越来越多的信息泄漏事件,已成为近年来新兴网络安全威胁。由于蓬勃发展和部署先进的分析解决方案,新颖的窃取攻击利用机器学习(ML)算法来实现高成功率并导致大量损坏。检测和捍卫这种攻击是挑战性和紧迫的,因此政府,组织和个人应该非常重视基于ML的窃取攻击。本调查显示了这种新型攻击和相应对策的最新进展。以三类目标受控信息的视角审查了基于ML的窃取攻击,包括受控用户活动,受控ML模型相关信息和受控认证信息。最近的出版物总结了概括了总体攻击方法,并导出了基于ML的窃取攻击的限制和未来方向。此外,提出了从三个方面制定有效保护的对策 - 检测,破坏和隔离。
translated by 谷歌翻译
会员推理攻击是机器学习模型中最简单的隐私泄漏形式之一:给定数据点和模型,确定该点是否用于培训模型。当查询其培训数据时,现有会员推理攻击利用模型的异常置信度。如果对手访问模型的预测标签,则不会申请这些攻击,而不会置信度。在本文中,我们介绍了仅限标签的会员资格推理攻击。我们的攻击而不是依赖置信分数,而是评估模型预测标签在扰动下的稳健性,以获得细粒度的隶属信号。这些扰动包括常见的数据增强或对抗例。我们经验表明,我们的标签占会员推理攻击与先前攻击相符,以便需要访问模型信心。我们进一步证明,仅限标签攻击违反了(隐含或明确)依赖于我们呼叫信心屏蔽的现象的员工推论攻击的多种防御。这些防御修改了模型的置信度分数以挫败攻击,但留下模型的预测标签不变。我们的标签攻击展示了置信性掩蔽不是抵御会员推理的可行的防御策略。最后,我们调查唯一的案例标签攻击,该攻击推断为少量异常值数据点。我们显示仅标签攻击也匹配此设置中基于置信的攻击。我们发现具有差异隐私和(强)L2正则化的培训模型是唯一已知的防御策略,成功地防止所有攻击。即使差异隐私预算太高而无法提供有意义的可证明担保,这仍然存在。
translated by 谷歌翻译
Collaborative machine learning and related techniques such as federated learning allow multiple participants, each with his own training dataset, to build a joint model by training locally and periodically exchanging model updates. We demonstrate that these updates leak unintended information about participants' training data and develop passive and active inference attacks to exploit this leakage. First, we show that an adversarial participant can infer the presence of exact data points-for example, specific locations-in others' training data (i.e., membership inference). Then, we show how this adversary can infer properties that hold only for a subset of the training data and are independent of the properties that the joint model aims to capture. For example, he can infer when a specific person first appears in the photos used to train a binary gender classifier. We evaluate our attacks on a variety of tasks, datasets, and learning configurations, analyze their limitations, and discuss possible defenses.
translated by 谷歌翻译
员额推理攻击允许对训练的机器学习模型进行对手以预测模型的训练数据集中包含特定示例。目前使用平均案例的“精度”度量来评估这些攻击,该攻击未能表征攻击是否可以自信地识别培训集的任何成员。我们认为,应该通过计算其低(例如<0.1%)假阳性率来计算攻击来评估攻击,并在以这种方式评估时发现大多数事先攻击差。为了解决这一问题,我们开发了一个仔细结合文献中多种想法的似然比攻击(Lira)。我们的攻击是低于虚假阳性率的10倍,并且在攻击现有度量的情况下也严格占主导地位。
translated by 谷歌翻译
在培训机器学习模型期间,它们可能会存储或“了解”有关培训数据的更多信息,而不是预测或分类任务所需的信息。属性推理攻击旨在从给定模型的培训数据中提取统计属性,而无需访问培训数据本身,从而利用了这一点。这些属性可能包括图片的质量,以识别相机模型,以揭示产品的目标受众的年龄分布或在计算机网络中使用恶意软件攻击的随附的主机类型。当攻击者可以访问所有模型参数时,即在白色盒子方案中,此攻击尤其准确。通过捍卫此类攻击,模型所有者可以确保其培训数据,相关的属性以及其知识产权保持私密,即使他们故意共享自己的模型,例如协作培训或模型泄漏。在本文中,我们介绍了属性,这是针对白盒属性推理攻击的有效防御机制,独立于培训数据类型,模型任务或属性数量。属性通过系统地更改目标模型的训练的权重和偏见来减轻属性推理攻击,从而使对手无法提取所选属性。我们在三个不同的数据集(包括表格数据和图像数据)以及两种类型的人工神经网络(包括人造神经网络)上进行了经验评估属性。我们的研究结果表明,以良好的隐私性权衡取舍,可以保护机器学习模型免受财产推理攻击的侵害,既有效又可靠。此外,我们的方法表明该机制也有效地取消了多个特性。
translated by 谷歌翻译
从公共机器学习(ML)模型中泄漏数据是一个越来越重要的领域,因为ML的商业和政府应用可以利用多个数据源,可能包括用户和客户的敏感数据。我们对几个方面的当代进步进行了全面的调查,涵盖了非自愿数据泄漏,这对ML模型很自然,潜在的恶毒泄漏是由隐私攻击引起的,以及目前可用的防御机制。我们专注于推理时间泄漏,这是公开可用模型的最可能场景。我们首先在不同的数据,任务和模型体系结构的背景下讨论什么是泄漏。然后,我们提出了跨非自愿和恶意泄漏的分类法,可用的防御措施,然后进行当前可用的评估指标和应用。我们以杰出的挑战和开放性的问题结束,概述了一些有希望的未来研究方向。
translated by 谷歌翻译
Machine learning (ML) models may be deemed confidential due to their sensitive training data, commercial value, or use in security applications. Increasingly often, confidential ML models are being deployed with publicly accessible query interfaces. ML-as-a-service ("predictive analytics") systems are an example: Some allow users to train models on potentially sensitive data and charge others for access on a pay-per-query basis.The tension between model confidentiality and public access motivates our investigation of model extraction attacks. In such attacks, an adversary with black-box access, but no prior knowledge of an ML model's parameters or training data, aims to duplicate the functionality of (i.e., "steal") the model. Unlike in classical learning theory settings, ML-as-a-service offerings may accept partial feature vectors as inputs and include confidence values with predictions. Given these practices, we show simple, efficient attacks that extract target ML models with near-perfect fidelity for popular model classes including logistic regression, neural networks, and decision trees. We demonstrate these attacks against the online services of BigML and Amazon Machine Learning. We further show that the natural countermeasure of omitting confidence values from model outputs still admits potentially harmful model extraction attacks. Our results highlight the need for careful ML model deployment and new model extraction countermeasures.
translated by 谷歌翻译
Machine learning algorithms, when applied to sensitive data, pose a distinct threat to privacy. A growing body of prior work demonstrates that models produced by these algorithms may leak specific private information in the training data to an attacker, either through the models' structure or their observable behavior. However, the underlying cause of this privacy risk is not well understood beyond a handful of anecdotal accounts that suggest overfitting and influence might play a role.This paper examines the effect that overfitting and influence have on the ability of an attacker to learn information about the training data from machine learning models, either through training set membership inference or attribute inference attacks. Using both formal and empirical analyses, we illustrate a clear relationship between these factors and the privacy risk that arises in several popular machine learning algorithms. We find that overfitting is sufficient to allow an attacker to perform membership inference and, when the target attribute meets certain conditions about its influence, attribute inference attacks. Interestingly, our formal analysis also shows that overfitting is not necessary for these attacks and begins to shed light on what other factors may be in play. Finally, we explore the connection between membership inference and attribute inference, showing that there are deep connections between the two that lead to effective new attacks.
translated by 谷歌翻译
属性推理攻击使对手可以从机器学习模型中提取培训数据集的全局属性。此类攻击对共享数据集来培训机器学习模型的数据所有者具有隐私影响。已经提出了几种针对深神经网络的财产推理攻击的现有方法,但它们都依靠攻击者训练大量的影子模型,这会导致大型计算开销。在本文中,我们考虑了攻击者可以毒化训练数据集的子集并查询训练有素的目标模型的属性推理攻击的设置。通过我们对中毒下模型信心的理论分析的激励,我们设计了有效的财产推理攻击,SNAP,该攻击获得了更高的攻击成功,并且需要比Mahloujifar Et的基于最先进的中毒的财产推理攻击更高的中毒量。 al。例如,在人口普查数据集上,SNAP的成功率比Mahloujifar等人高34%。同时更快56.5倍。我们还扩展了攻击,以确定在培训中是否根本存在某个财产,并有效地估算了利息财产的确切比例。我们评估了对四个数据集各种比例的多种属性的攻击,并证明了Snap的一般性和有效性。
translated by 谷歌翻译
依赖于并非所有输入都需要相同数量的计算来产生自信的预测的事实,多EXIT网络正在引起人们的注意,这是推动有效部署限制的重要方法。多EXIT网络赋予了具有早期退出的骨干模型,从而可以在模型的中间层获得预测,从而节省计算时间和/或能量。但是,当前的多种exit网络的各种设计仅被认为是为了实现资源使用效率和预测准确性之间的最佳权衡,从未探索过来自它们的隐私风险。这促使需要全面调查多EXIT网络中的隐私风险。在本文中,我们通过会员泄漏的镜头对多EXIT网络进行了首次隐私分析。特别是,我们首先利用现有的攻击方法来量化多exit网络对成员泄漏的脆弱性。我们的实验结果表明,多EXIT网络不太容易受到会员泄漏的影响,而在骨干模型上附加的退出(数字和深度)与攻击性能高度相关。此外,我们提出了一种混合攻击,该攻击利用退出信息以提高现有攻击的性能。我们评估了由三种不同的对手设置下的混合攻击造成的成员泄漏威胁,最终到达了无模型和无数据的对手。这些结果清楚地表明,我们的混合攻击非常广泛地适用,因此,相应的风险比现有的会员推理攻击所显示的要严重得多。我们进一步提出了一种专门针对多EXIT网络的TimeGuard的防御机制,并表明TimeGuard完美地减轻了新提出的攻击。
translated by 谷歌翻译
用于训练机器学习(ML)模型的数据可能是敏感的。成员推理攻击(MIS),试图确定特定数据记录是否用于培训ML模型,违反会员隐私。 ML模型建设者需要一个原则的定义,使他们能够有效地定量(a)单独培训数据记录,(b)的隐私风险,有效地。未在会员资格危险风险指标上均未达到所有这些标准。我们提出了这种公制,SHAPR,它通过抑制其对模型的实用程序的影响来量化朔芙值以量化模型的记忆。这个记忆是衡量成功MIA的可能性的衡量标准。使用十个基准数据集,我们显示ShapR是有效的(精确度:0.94 $ \ PM 0.06 $,回忆:0.88 $ \ PM 0.06 $)在估算MIAS的培训数据记录的易感性时,高效(可在几分钟内计算,较小数据集和最大数据集的约〜90分钟)。 ShapR也是多功能的,因为它可以用于评估数据集的子集的公平或分配估值的其他目的。例如,我们显示Shapr正确地捕获不同子组的不成比例漏洞到MIS。使用SHAPR,我们表明,通过去除高风险训练数据记录,不一定改善数据集的成员隐私风险,从而确认在显着扩展的设置中从事工作(在十个数据集中,最多可删除50%的数据)的观察。
translated by 谷歌翻译
A distribution inference attack aims to infer statistical properties of data used to train machine learning models. These attacks are sometimes surprisingly potent, but the factors that impact distribution inference risk are not well understood and demonstrated attacks often rely on strong and unrealistic assumptions such as full knowledge of training environments even in supposedly black-box threat scenarios. To improve understanding of distribution inference risks, we develop a new black-box attack that even outperforms the best known white-box attack in most settings. Using this new attack, we evaluate distribution inference risk while relaxing a variety of assumptions about the adversary's knowledge under black-box access, like known model architectures and label-only access. Finally, we evaluate the effectiveness of previously proposed defenses and introduce new defenses. We find that although noise-based defenses appear to be ineffective, a simple re-sampling defense can be highly effective. Code is available at https://github.com/iamgroot42/dissecting_distribution_inference
translated by 谷歌翻译
鉴于对机器学习模型的访问,可以进行对手重建模型的培训数据?这项工作从一个强大的知情对手的镜头研究了这个问题,他们知道除了一个之外的所有培训数据点。通过实例化混凝土攻击,我们表明重建此严格威胁模型中的剩余数据点是可行的。对于凸模型(例如Logistic回归),重建攻击很简单,可以以封闭形式导出。对于更常规的模型(例如神经网络),我们提出了一种基于训练的攻击策略,该攻击策略接收作为输入攻击的模型的权重,并产生目标数据点。我们展示了我们对MNIST和CIFAR-10训练的图像分类器的攻击的有效性,并系统地研究了标准机器学习管道的哪些因素影响重建成功。最后,我们从理论上调查了有多差异的隐私足以通过知情对手减轻重建攻击。我们的工作提供了有效的重建攻击,模型开发人员可以用于评估超出以前作品中考虑的一般设置中的个别点的记忆(例如,生成语言模型或访问培训梯度);它表明,标准模型具有存储足够信息的能力,以实现培训数据点的高保真重建;它表明,差异隐私可以成功减轻该参数制度中的攻击,其中公用事业劣化最小。
translated by 谷歌翻译
机器学习模型越来越多地被世界各地的企业和组织用于自动化任务和决策。在潜在敏感的数据集上培训,已经显示了机器学习模型来泄露数据集中的个人信息以及全局数据集信息。我们在这里通过提出针对ML模型的新攻击来进一步研究数据集财产推理攻击一步:数据集相关推理攻击,其中攻击者的目标是推断模型的输入变量之间的相关性。我们首先表明攻击者可以利用相关矩阵的球面参数化,以进行明智的猜测。这意味着仅使用输入变量与目标变量之间的相关性,攻击者可以推断出两个输入变量之间的相关性,而不是随机猜测基线。我们提出了第二次攻击,利用暗影建模来利用机器学习模型来改进猜测。我们的攻击采用基于高斯opula的生成建模来生成具有各种相关性的合成数据集,以便为相关推断任务培训Meta-Model。我们评估我们对逻辑回归和多层Perceptron模型的攻击,并显示出胜过模型的攻击。我们的研究结果表明,基于机器的准确性,基于机器学习的攻击随着变量的数量而减少,并达到模型攻击的准确性。然而,无论变量的数量如何,与目标变量高度相关的输入变量之间的相关性更易受攻击。我们的工作桥梁可以考虑训练数据集和个人级泄漏的全球泄漏之间的差距。当加上边缘泄漏攻击时,它也可能构成数据集重建的第一步。
translated by 谷歌翻译