神经网络修剪一直是减少对资源受限设备的深度神经网络的计算和记忆要求的重要技术。大多数现有的研究主要侧重于平衡修剪神经网络的稀疏性和准确性,通过策略性地删除无关紧要的参数并重新修剪修剪模型。由于记忆的增加而造成了严重的隐私风险,因此尚未调查这种训练样品的这种努力。在本文中,我们对神经网络修剪中的隐私风险进行了首次分析。具体而言,我们研究了神经网络修剪对培训数据隐私的影响,即成员推理攻击。我们首先探讨了神经网络修剪对预测差异的影响,在该预测差异中,修剪过程不成比例地影响了修剪的模型对成员和非会员的行为。同时,差异的影响甚至以细粒度的方式在不同类别之间有所不同。通过这种分歧,我们提出了对修剪的神经网络的自我发起会员推断攻击。进行了广泛的实验,以严格评估不同修剪方法,稀疏水平和对手知识的隐私影响。拟议的攻击表明,与现有的八次成员推理攻击相比,对修剪模型的攻击性能更高。此外,我们提出了一种新的防御机制,通过基于KL-Divergence距离来缓解预测差异,以保护修剪过程,该距离的预测差异已通过实验证明,可以有效地降低隐私风险,同时维持较修剪模型的稀疏性和准确性。
translated by 谷歌翻译
机器学习模型容易记住敏感数据,使它们容易受到会员推理攻击的攻击,其中对手的目的是推断是否使用输入样本来训练模型。在过去的几年中,研究人员产生了许多会员推理攻击和防御。但是,这些攻击和防御采用各种策略,并在不同的模型和数据集中进行。但是,缺乏全面的基准意味着我们不了解现有攻击和防御的优势和劣势。我们通过对不同的会员推理攻击和防御措施进行大规模测量来填补这一空白。我们通过研究九项攻击和六项防御措施来系统化成员的推断,并在整体评估中衡量不同攻击和防御的性能。然后,我们量化威胁模型对这些攻击结果的影响。我们发现,威胁模型的某些假设,例如相同架构和阴影和目标模型之间的相同分布是不必要的。我们也是第一个对从Internet收集的现实世界数据而不是实验室数据集进行攻击的人。我们进一步研究是什么决定了会员推理攻击的表现,并揭示了通常认为过度拟合水平不足以成功攻击。取而代之的是,成员和非成员样本之间的熵/横向熵的詹森 - 香农距离与攻击性能的相关性更好。这为我们提供了一种新的方法,可以在不进行攻击的情况下准确预测会员推理风险。最后,我们发现数据增强在更大程度上降低了现有攻击的性能,我们提出了使用增强作用的自适应攻击来训练阴影和攻击模型,以改善攻击性能。
translated by 谷歌翻译
对机器学习模型的会员推理攻击(MIA)可能会导致模型培训中使用的培训数据集的严重隐私风险。在本文中,我们提出了一种针对成员推理攻击(MIAS)的新颖有效的神经元引导的防御方法。我们确定了针对MIA的现有防御机制的关键弱点,在该机制中,他们不能同时防御两个常用的基于神经网络的MIA,表明应分别评估这两次攻击以确保防御效果。我们提出了Neuguard,这是一种新的防御方法,可以通过对象共同控制输出和内部神经元的激活,以指导训练集的模型输出和测试集的模型输出以具有近距离分布。 Neuguard由类别的差异最小化靶向限制最终输出神经元和层平衡输出控制的目标,旨在限制每一层中的内部神经元。我们评估Neuguard,并将其与最新的防御能力与两个基于神经网络的MIA,五个最强的基于度量的MIA,包括三个基准数据集中的新提出的仅标签MIA。结果表明,Neuguard通过提供大大改善的公用事业权衡权衡,一般性和间接费用来优于最先进的防御能力。
translated by 谷歌翻译
半监督学习(SSL)利用标记和未标记的数据来训练机器学习(ML)模型。最先进的SSL方法可以通过利用更少的标记数据来实现与监督学习相当的性能。但是,大多数现有作品都集中在提高SSL的性能。在这项工作中,我们通过研究SSL的培训数据隐私来采取不同的角度。具体而言,我们建议针对由SSL训练的ML模型进行的第一个基于数据增强的成员推理攻击。给定数据样本和黑框访问模型,成员推理攻击的目标是确定数据样本是否属于模型的训练数据集。我们的评估表明,拟议的攻击可以始终超过现有的成员推理攻击,并针对由SSL训练的模型实现最佳性能。此外,我们发现,SSL中会员泄漏的原因与受到监督学习中普遍认为的原因不同,即过度拟合(培训和测试准确性之间的差距)。我们观察到,SSL模型已被概括为测试数据(几乎为0个过度拟合),但“记住”训练数据通过提供更自信的预测,无论其正确性如何。我们还探索了早期停止,作为防止成员推理攻击SSL的对策。结果表明,早期停止可以减轻会员推理攻击,但由于模型的实用性降解成本。
translated by 谷歌翻译
人工智能(AI)软件中深度学习模型的规模正在迅速增加,这阻碍了对资源限制设备(例如智能手机)的大规模部署。为了减轻此问题,AI软件压缩起着至关重要的作用,旨在压缩模型大小的同时保持高性能。但是,大型模型中的固有缺陷可以由压缩后遗传。攻击者很容易利用此类缺陷,因为压缩模型通常部署在大量设备中而没有充分保护的设备中。在本文中,我们试图从安全性的合作观点来解决安全模型压缩问题。具体而言,受到软件工程中测试驱动的开发(TDD)范式的启发,我们提出了一个称为SafeCompress的测试驱动的稀疏训练框架。通过模拟攻击机制作为安全测试,SafeCompress可以在动态稀疏训练范式之后自动将大型模型压缩到一个小模型中。此外,考虑到代表性攻击,即成员推理攻击(MIA),我们开发了一种混凝土安全模型压缩机制,称为MIA-SAFECSPRASS。进行了广泛的实验,以评估用于计算机视觉和自然语言处理任务的五个数据集上的MIA量压缩。结果验证了我们方法的有效性和概括。我们还讨论了如何将SafeCompress适应除MIA以外的其他攻击,并证明了SafCompress的灵活性。
translated by 谷歌翻译
作为对培训数据隐私的长期威胁,会员推理攻击(MIA)在机器学习模型中无处不在。现有作品证明了培训的区分性与测试损失分布与模型对MIA的脆弱性之间的密切联系。在现有结果的激励下,我们提出了一个基于轻松损失的新型培训框架,并具有更可实现的学习目标,从而导致概括差距狭窄和隐私泄漏减少。 RelaseLoss适用于任何分类模型,具有易于实施和可忽略不计的开销的额外好处。通过对具有不同方式(图像,医疗数据,交易记录)的五个数据集进行广泛的评估,我们的方法始终优于针对MIA和模型效用的韧性,以最先进的防御机制优于最先进的防御机制。我们的防御是第一个可以承受广泛攻击的同时,同时保存(甚至改善)目标模型的效用。源代码可从https://github.com/dingfanchen/relaxloss获得
translated by 谷歌翻译
机器学习(ML)模型已广泛应用于各种应用,包括图像分类,文本生成,音频识别和图形数据分析。然而,最近的研究表明,ML模型容易受到隶属推导攻击(MIS),其目的是推断数据记录是否用于训练目标模型。 ML模型上的MIA可以直接导致隐私违规行为。例如,通过确定已经用于训练与某种疾病相关的模型的临床记录,攻击者可以推断临床记录的所有者具有很大的机会。近年来,MIS已被证明对各种ML模型有效,例如,分类模型和生成模型。同时,已经提出了许多防御方法来减轻米西亚。虽然ML模型上的MIAS形成了一个新的新兴和快速增长的研究区,但还没有对这一主题进行系统的调查。在本文中,我们对会员推论和防御进行了第一个全面调查。我们根据其特征提供攻击和防御的分类管理,并讨论其优点和缺点。根据本次调查中确定的限制和差距,我们指出了几个未来的未来研究方向,以激发希望遵循该地区的研究人员。这项调查不仅是研究社区的参考,而且还为该研究领域之外的研究人员带来了清晰的照片。为了进一步促进研究人员,我们创建了一个在线资源存储库,并与未来的相关作品继续更新。感兴趣的读者可以在https://github.com/hongshenghu/membership-inference-machine-learning-literature找到存储库。
translated by 谷歌翻译
Neural networks are susceptible to data inference attacks such as the membership inference attack, the adversarial model inversion attack and the attribute inference attack, where the attacker could infer useful information such as the membership, the reconstruction or the sensitive attributes of a data sample from the confidence scores predicted by the target classifier. In this paper, we propose a method, namely PURIFIER, to defend against membership inference attacks. It transforms the confidence score vectors predicted by the target classifier and makes purified confidence scores indistinguishable in individual shape, statistical distribution and prediction label between members and non-members. The experimental results show that PURIFIER helps defend membership inference attacks with high effectiveness and efficiency, outperforming previous defense methods, and also incurs negligible utility loss. Besides, our further experiments show that PURIFIER is also effective in defending adversarial model inversion attacks and attribute inference attacks. For example, the inversion error is raised about 4+ times on the Facescrub530 classifier, and the attribute inference accuracy drops significantly when PURIFIER is deployed in our experiment.
translated by 谷歌翻译
会员推理攻击是机器学习模型中最简单的隐私泄漏形式之一:给定数据点和模型,确定该点是否用于培训模型。当查询其培训数据时,现有会员推理攻击利用模型的异常置信度。如果对手访问模型的预测标签,则不会申请这些攻击,而不会置信度。在本文中,我们介绍了仅限标签的会员资格推理攻击。我们的攻击而不是依赖置信分数,而是评估模型预测标签在扰动下的稳健性,以获得细粒度的隶属信号。这些扰动包括常见的数据增强或对抗例。我们经验表明,我们的标签占会员推理攻击与先前攻击相符,以便需要访问模型信心。我们进一步证明,仅限标签攻击违反了(隐含或明确)依赖于我们呼叫信心屏蔽的现象的员工推论攻击的多种防御。这些防御修改了模型的置信度分数以挫败攻击,但留下模型的预测标签不变。我们的标签攻击展示了置信性掩蔽不是抵御会员推理的可行的防御策略。最后,我们调查唯一的案例标签攻击,该攻击推断为少量异常值数据点。我们显示仅标签攻击也匹配此设置中基于置信的攻击。我们发现具有差异隐私和(强)L2正则化的培训模型是唯一已知的防御策略,成功地防止所有攻击。即使差异隐私预算太高而无法提供有意义的可证明担保,这仍然存在。
translated by 谷歌翻译
Deep ensemble learning has been shown to improve accuracy by training multiple neural networks and averaging their outputs. Ensemble learning has also been suggested to defend against membership inference attacks that undermine privacy. In this paper, we empirically demonstrate a trade-off between these two goals, namely accuracy and privacy (in terms of membership inference attacks), in deep ensembles. Using a wide range of datasets and model architectures, we show that the effectiveness of membership inference attacks increases when ensembling improves accuracy. We analyze the impact of various factors in deep ensembles and demonstrate the root cause of the trade-off. Then, we evaluate common defenses against membership inference attacks based on regularization and differential privacy. We show that while these defenses can mitigate the effectiveness of membership inference attacks, they simultaneously degrade ensemble accuracy. We illustrate similar trade-off in more advanced and state-of-the-art ensembling techniques, such as snapshot ensembles and diversified ensemble networks. Finally, we propose a simple yet effective defense for deep ensembles to break the trade-off and, consequently, improve the accuracy and privacy, simultaneously.
translated by 谷歌翻译
依赖于并非所有输入都需要相同数量的计算来产生自信的预测的事实,多EXIT网络正在引起人们的注意,这是推动有效部署限制的重要方法。多EXIT网络赋予了具有早期退出的骨干模型,从而可以在模型的中间层获得预测,从而节省计算时间和/或能量。但是,当前的多种exit网络的各种设计仅被认为是为了实现资源使用效率和预测准确性之间的最佳权衡,从未探索过来自它们的隐私风险。这促使需要全面调查多EXIT网络中的隐私风险。在本文中,我们通过会员泄漏的镜头对多EXIT网络进行了首次隐私分析。特别是,我们首先利用现有的攻击方法来量化多exit网络对成员泄漏的脆弱性。我们的实验结果表明,多EXIT网络不太容易受到会员泄漏的影响,而在骨干模型上附加的退出(数字和深度)与攻击性能高度相关。此外,我们提出了一种混合攻击,该攻击利用退出信息以提高现有攻击的性能。我们评估了由三种不同的对手设置下的混合攻击造成的成员泄漏威胁,最终到达了无模型和无数据的对手。这些结果清楚地表明,我们的混合攻击非常广泛地适用,因此,相应的风险比现有的会员推理攻击所显示的要严重得多。我们进一步提出了一种专门针对多EXIT网络的TimeGuard的防御机制,并表明TimeGuard完美地减轻了新提出的攻击。
translated by 谷歌翻译
A distribution inference attack aims to infer statistical properties of data used to train machine learning models. These attacks are sometimes surprisingly potent, but the factors that impact distribution inference risk are not well understood and demonstrated attacks often rely on strong and unrealistic assumptions such as full knowledge of training environments even in supposedly black-box threat scenarios. To improve understanding of distribution inference risks, we develop a new black-box attack that even outperforms the best known white-box attack in most settings. Using this new attack, we evaluate distribution inference risk while relaxing a variety of assumptions about the adversary's knowledge under black-box access, like known model architectures and label-only access. Finally, we evaluate the effectiveness of previously proposed defenses and introduce new defenses. We find that although noise-based defenses appear to be ineffective, a simple re-sampling defense can be highly effective. Code is available at https://github.com/iamgroot42/dissecting_distribution_inference
translated by 谷歌翻译
Backdoor attacks represent one of the major threats to machine learning models. Various efforts have been made to mitigate backdoors. However, existing defenses have become increasingly complex and often require high computational resources or may also jeopardize models' utility. In this work, we show that fine-tuning, one of the most common and easy-to-adopt machine learning training operations, can effectively remove backdoors from machine learning models while maintaining high model utility. Extensive experiments over three machine learning paradigms show that fine-tuning and our newly proposed super-fine-tuning achieve strong defense performance. Furthermore, we coin a new term, namely backdoor sequela, to measure the changes in model vulnerabilities to other attacks before and after the backdoor has been removed. Empirical evaluation shows that, compared to other defense methods, super-fine-tuning leaves limited backdoor sequela. We hope our results can help machine learning model owners better protect their models from backdoor threats. Also, it calls for the design of more advanced attacks in order to comprehensively assess machine learning models' backdoor vulnerabilities.
translated by 谷歌翻译
机器学习模型容易受到会员推理攻击的影响,在这种攻击中,对手的目的是预测目标模型培训数据集中是否包含特定样本。现有的攻击方法通常仅从给定的目标模型中利用输出信息(主要是损失)。结果,在成员和非成员样本都产生类似小损失的实际情况下,这些方法自然无法区分它们。为了解决这一限制,在本文中,我们提出了一种称为\系统的新攻击方法,该方法可以利用目标模型的整个培训过程中的成员资格信息来改善攻击性能。要将攻击安装在共同的黑盒环境中,我们利用知识蒸馏,并通过在不同蒸馏时期的中间模型中评估的损失表示成员资格信息,即\ emph {蒸馏损失轨迹},以及损失来自给定的目标模型。对不同数据集和模型体系结构的实验结果证明了我们在不同指标方面的攻击优势。例如,在Cinic-10上,我们的攻击至少达到6 $ \ times $ $阳性的速率,低阳性率为0.1 \%的速率比现有方法高。进一步的分析表明,在更严格的情况下,我们攻击的总体有效性。
translated by 谷歌翻译
关于神经体系结构搜索(NAS)的现有研究主要集中于有效地搜索具有更好性能的网络体系结构。几乎没有取得进展,以系统地了解NAS搜索的架构是否对隐私攻击是强大的,而丰富的工作已经表明,人类设计的架构容易受到隐私攻击。在本文中,我们填补了这一空白,并系统地衡量了NAS体系结构的隐私风险。利用我们的测量研究中的见解,我们进一步探索了基于细胞的NAS架构的细胞模式,并评估细胞模式如何影响NAS搜索架构的隐私风险。通过广泛的实验,我们阐明了如何针对隐私攻击设计强大的NAS体系结构,还提供了一种通用方法,以了解NAS搜索的体系结构与其他隐私风险之间的隐藏相关性。
translated by 谷歌翻译
Machine learning models leak information about the datasets on which they are trained. An adversary can build an algorithm to trace the individual members of a model's training dataset. As a fundamental inference attack, he aims to distinguish between data points that were part of the model's training set and any other data points from the same distribution. This is known as the tracing (and also membership inference) attack. In this paper, we focus on such attacks against black-box models, where the adversary can only observe the output of the model, but not its parameters. This is the current setting of machine learning as a service in the Internet.We introduce a privacy mechanism to train machine learning models that provably achieve membership privacy: the model's predictions on its training data are indistinguishable from its predictions on other data points from the same distribution. We design a strategic mechanism where the privacy mechanism anticipates the membership inference attacks. The objective is to train a model such that not only does it have the minimum prediction error (high utility), but also it is the most robust model against its corresponding strongest inference attack (high privacy). We formalize this as a min-max game optimization problem, and design an adversarial training algorithm that minimizes the classification loss of the model as well as the maximum gain of the membership inference attack against it. This strategy, which guarantees membership privacy (as prediction indistinguishability), acts also as a strong regularizer and significantly generalizes the model.We evaluate our privacy mechanism on deep neural networks using different benchmark datasets. We show that our min-max strategy can mitigate the risk of membership inference attacks (close to the random guess) with a negligible cost in terms of the classification error.
translated by 谷歌翻译
会员推理攻击(MIA)在机器学习模型的培训数据上提出隐私风险。使用MIA,如果目标数据是训练数据集的成员,则攻击者猜测。对MIAS的最先进的防御,蒸馏为会员隐私(DMP),不仅需要私人数据来保护但是大量未标记的公共数据。但是,在某些隐私敏感域名,如医疗和财务,公共数据的可用性并不明显。此外,通过使用生成的对策网络生成公共数据的琐碎方法显着降低了DMP的作者报道的模型精度。为了克服这个问题,我们在不需要公共数据的情况下,使用知识蒸馏提出对米西亚的小说防御。我们的实验表明,我们防御的隐私保护和准确性与MIA研究中使用的基准表格数据集的DMP相媲美,我们的国防有更好的隐私式权限远非现有防御不使用图像数据集CIFAR10的公共数据。
translated by 谷歌翻译
In a membership inference attack, an attacker aims to infer whether a data sample is in a target classifier's training dataset or not. Specifically, given a black-box access to the target classifier, the attacker trains a binary classifier, which takes a data sample's confidence score vector predicted by the target classifier as an input and predicts the data sample to be a member or non-member of the target classifier's training dataset. Membership inference attacks pose severe privacy and security threats to the training dataset. Most existing defenses leverage differential privacy when training the target classifier or regularize the training process of the target classifier. These defenses suffer from two key limitations: 1) they do not have formal utility-loss guarantees of the confidence score vectors, and 2) they achieve suboptimal privacy-utility tradeoffs.In this work, we propose MemGuard, the first defense with formal utility-loss guarantees against black-box membership inference attacks. Instead of tampering the training process of the target classifier, MemGuard adds noise to each confidence score vector predicted by the target classifier. Our key observation is that attacker uses a classifier to predict member or non-member and classifier is vulnerable to adversarial examples. Based on the observation, we propose to add a carefully crafted noise vector to a confidence score vector to turn it into an adversarial example that misleads the attacker's classifier. Specifically, MemGuard works in two phases. In Phase I, MemGuard finds a carefully crafted noise vector that can turn a confidence score vector into an adversarial example, which is likely to mislead the attacker's classifier to make a random guessing at member or non-member. We find such carefully crafted noise vector via a new method that we design to incorporate the unique utility-loss constraints on the noise vector. In Phase II, Mem-Guard adds the noise vector to the confidence score vector with a certain probability, which is selected to satisfy a given utility-loss budget on the confidence score vector. Our experimental results on
translated by 谷歌翻译
嵌入式系统使用神经网络(NNS)的设备对数据处理数据(NNS)的处理,同时符合存储器,功率和计算约束,导致效率和准确性折衷。要将NNS带到边缘设备,通过修剪,量化和现成的架构进行了多种优化,例如具有高效设计的模型压缩,已被广泛采用。这些算法部署到现实世界敏感应用程序时,需要抵制推理攻击以保护用户培训数据的隐私。然而,对推理攻击的阻力不算用于为IOT设计NN模型。在这项工作中,我们分析了IOT设备NNS中的三维隐私 - 准确效率折衷,并提出了壁虎培训方法,在那里我们明确地将抵抗私人推广作为设计目标。我们优化嵌入式设备的推理时间内存,计算和功率约束作为设计NN体系结构的标准,同时还保留隐私。我们选择量化为高效和私人模型的设计选择。这种选择是由观察到的观察,压缩模型与基线模型相比泄漏更多信息,而现成的高效架构表明效率和隐私权衡差。我们展示使用壁虎方法训练的模型与在提供效率的准确性和隐私方面的对黑匣子成员攻击的事先防御。
translated by 谷歌翻译
Deep neural networks are susceptible to various inference attacks as they remember information about their training data. We design white-box inference attacks to perform a comprehensive privacy analysis of deep learning models. We measure the privacy leakage through parameters of fully trained models as well as the parameter updates of models during training. We design inference algorithms for both centralized and federated learning, with respect to passive and active inference attackers, and assuming different adversary prior knowledge.We evaluate our novel white-box membership inference attacks against deep learning algorithms to trace their training data records. We show that a straightforward extension of the known black-box attacks to the white-box setting (through analyzing the outputs of activation functions) is ineffective. We therefore design new algorithms tailored to the white-box setting by exploiting the privacy vulnerabilities of the stochastic gradient descent algorithm, which is the algorithm used to train deep neural networks. We investigate the reasons why deep learning models may leak information about their training data. We then show that even well-generalized models are significantly susceptible to white-box membership inference attacks, by analyzing stateof-the-art pre-trained and publicly available models for the CIFAR dataset. We also show how adversarial participants, in the federated learning setting, can successfully run active membership inference attacks against other participants, even when the global model achieves high prediction accuracies.
translated by 谷歌翻译