Vertical federated learning is a trending solution for multi-party collaboration in training machine learning models. Industrial frameworks adopt secure multi-party computation methods such as homomorphic encryption to guarantee data security and privacy. However, a line of work has revealed that there are still leakage risks in VFL. The leakage is caused by the correlation between the intermediate representations and the raw data. Due to the powerful approximation ability of deep neural networks, an adversary can capture the correlation precisely and reconstruct the data. To deal with the threat of the data reconstruction attack, we propose a hashing-based VFL framework, called \textit{HashVFL}, to cut off the reversibility directly. The one-way nature of hashing allows our framework to block all attempts to recover data from hash codes. However, integrating hashing also brings some challenges, e.g., the loss of information. This paper proposes and addresses three challenges to integrating hashing: learnability, bit balance, and consistency. Experimental results demonstrate \textit{HashVFL}'s efficiency in keeping the main task's performance and defending against data reconstruction attacks. Furthermore, we also analyze its potential value in detecting abnormal inputs. In addition, we conduct extensive experiments to prove \textit{HashVFL}'s generalization in various settings. In summary, \textit{HashVFL} provides a new perspective on protecting multi-party's data security and privacy in VFL. We hope our study can attract more researchers to expand the application domains of \textit{HashVFL}.
translated by 谷歌翻译
Vertical federated learning (VFL) is an emerging paradigm that enables collaborators to build machine learning models together in a distributed fashion. In general, these parties have a group of users in common but own different features. Existing VFL frameworks use cryptographic techniques to provide data privacy and security guarantees, leading to a line of works studying computing efficiency and fast implementation. However, the security of VFL's model remains underexplored.
translated by 谷歌翻译
在联合学习等协作学习环境中,好奇的疗程可能是诚实的,但正在通过推理攻击试图通过推断攻击推断其他方的私人数据,而恶意缔约方可能会通过后门攻击操纵学习过程。但是,大多数现有的作品只考虑通过样本(HFL)划分数据的联合学习场景。特征分区联合学习(VFL)可以是许多真实应用程序中的另一个重要方案。当攻击者和防守者无法访问其他参与者的功能或模型参数时,这种情况下的攻击和防御尤其挑战。以前的作品仅显示了可以从每个样本渐变重建私有标签。在本文中,我们首先表明,只有批量平均梯度被揭示时,可以重建私人标签,这是针对常见的推定。此外,我们表明VFL中的被动派对甚至可以通过梯度替换攻击将其相应的标签用目标标签替换为目标标签。为了防御第一次攻击,我们介绍了一种基于AutoEncoder和熵正则化的混乱自动化器(CoAE)的新技术。我们证明,与现有方法相比,这种技术可以成功阻止标签推理攻击,同时损害较少的主要任务准确性。我们的COAE技术在捍卫梯度替代后门攻击方面也有效,使其成为一个普遍和实用的防御策略,没有改变原来的VFL协议。我们展示了我们双方和多方VFL设置下的方法的有效性。据我们所知,这是第一次处理特征分区联合学习框架中的标签推理和后门攻击的第一个系统研究。
translated by 谷歌翻译
拜占庭式联合学习(FL)旨在对抗恶意客户并培训准确的全球模型,同时保持极低的攻击成功率。然而,大多数现有系统仅在诚实/半hon最达克的多数设置中都具有强大的功能。 FLTRUST(NDSS '21)将上下文扩展到对客户的恶意多数,但在训练之前,应在训练之前为服务器提供辅助数据集,以便过滤恶意输入。私人火焰/flguard(Usenix '22)提供了一种解决方案,以确保在半多数上下文中既有稳健性和更新机密性。到目前为止,不可能平衡恶意背景,鲁棒性和更新机密性之间的权衡。为了解决这个问题,我们提出了一种新颖的拜占庭式bybust和隐私的FL系统,称为简介,以捕获恶意的少数群体和多数服务器和客户端。具体而言,基于DBSCAN算法,我们设计了一种通过成对调整的余弦相似性聚类的新方法,以提高聚类结果的准确性。为了阻止多数攻击恶意的攻击,我们开发了一种称为模型分割的算法,在该算法中,同一集群中的本地更新聚集在一起,并且将聚合正确地发送回相应的客户端。我们还利用多种密码工具来执行聚类任务,而无需牺牲培训正确性并更新机密性。我们介绍了详细的安全证明和经验评估以及简要的收敛分析。实验结果表明,简介的测试精度实际上接近FL基线(平均为0.8%的差距)。同时,攻击成功率约为0%-5%。我们进一步优化了设计,以便可以分别降低{67%-89.17%和66.05%-68.75%}的通信开销和运行时。
translated by 谷歌翻译
对协作学习的实证攻击表明,深度神经网络的梯度不仅可以披露训练数据的私有潜在属性,还可以用于重建原始数据。虽然先前的作品试图量化了梯度的隐私风险,但这些措施没有建立理论上对梯度泄漏的理解了解,而不是跨越攻击者的概括,并且不能完全解释通过实际攻击在实践中通过实证攻击观察到的内容。在本文中,我们介绍了理论上激励的措施,以量化攻击依赖和攻击无关方式的信息泄漏。具体而言,我们展示了$ \ mathcal {v} $ - 信息的适应,它概括了经验攻击成功率,并允许量化可以从任何所选择的攻击模型系列泄漏的信息量。然后,我们提出了独立的措施,只需要共享梯度,用于量化原始和潜在信息泄漏。我们的经验结果,六个数据集和四种流行型号,揭示了第一层的梯度包含最高量的原始信息,而(卷积)特征提取器层之后的(完全连接的)分类层包含最高的潜在信息。此外,我们展示了如何在训练期间诸如梯度聚集的技术如何减轻信息泄漏。我们的工作为更好的防御方式铺平了道路,例如基于层的保护或强聚合。
translated by 谷歌翻译
安全的基于多方计算的机器学习(称为MPL)已成为利用来自具有隐私保护的多个政党的数据的重要技术。尽管MPL为计算过程提供了严格的安全保证,但MPL训练的模型仍然容易受到仅依赖于访问模型的攻击。差异隐私可以帮助防御此类攻击。但是,差异隐私和安全多方计算协议的巨大沟通开销带来的准确性损失使得平衡隐私,效率和准确性之间的三通权衡是高度挑战的。在本文中,我们有动力通过提出一种解决方案(称为PEA(私有,高效,准确))来解决上述问题,该解决方案由安全的DPSGD协议和两种优化方法组成。首先,我们提出了一个安全的DPSGD协议,以在基于秘密共享的MPL框架中强制执行DPSGD。其次,为了减少因差异隐私噪声和MPL的巨大通信开销而导致的准确性损失,我们提出了MPL训练过程的两种优化方法:(1)与数据无关的功能提取方法,旨在简化受过训练的模型结构体; (2)基于本地数据的全局模型初始化方法,旨在加快模型训练的收敛性。我们在两个开源MPL框架中实施PEA:TF-Conteded和Queqiao。各种数据集的实验结果证明了PEA的效率和有效性。例如。当$ {\ epsilon} $ = 2时,我们可以在LAN设置下的7分钟内训练CIFAR-10的差异私有分类模型,其精度为88%。这一结果大大优于来自CryptGPU的一个SOTA MPL框架:在CIFAR-10上训练非私有性深神经网络模型的成本超过16小时,其精度相同。
translated by 谷歌翻译
在联合学习(FL)中,数据不会在联合培训机器学习模型时留下个人设备。相反,这些设备与中央党(例如,公司)共享梯度。因为数据永远不会“离开”个人设备,因此FL作为隐私保留呈现。然而,最近显示这种保护是一个薄的外观,甚至是一种被动攻击者观察梯度可以重建各个用户的数据。在本文中,我们争辩说,事先工作仍然很大程度上低估了FL的脆弱性。这是因为事先努力专门考虑被动攻击者,这些攻击者是诚实但好奇的。相反,我们介绍了一个活跃和不诚实的攻击者,作为中央会,他们能够在用户计算模型渐变之前修改共享模型的权重。我们称之为修改的重量“陷阱重量”。我们的活跃攻击者能够完全恢复用户数据,并在接近零成本时:攻击不需要复杂的优化目标。相反,它利用了模型梯度的固有数据泄漏,并通过恶意改变共享模型的权重来放大这种效果。这些特异性使我们的攻击能够扩展到具有大型迷你批次数据的模型。如果来自现有工作的攻击者需要小时才能恢复单个数据点,我们的方法需要毫秒来捕获完全连接和卷积的深度神经网络的完整百分之批次数据。最后,我们考虑缓解。我们观察到,FL中的差异隐私(DP)的当前实现是有缺陷的,因为它们明确地信任中央会,并在增加DP噪音的关键任务,因此不提供对恶意中央党的保护。我们还考虑其他防御,并解释为什么它们类似地不足。它需要重新设计FL,为用户提供任何有意义的数据隐私。
translated by 谷歌翻译
联合学习使多个用户能够通过共享其模型更新(渐变)来构建联合模型,而其原始数据在其设备上保持本地。与常见的信念相比,这提供了隐私福利,我们在共享渐变时,我们在这里增加了隐私风险的最新结果。具体而言,我们调查梯度(LLG)的标签泄漏,这是一种新建攻击,从他们的共享梯度提取用户培训数据的标签。该攻击利用梯度的方向和幅度来确定任何标签的存在或不存在。 LLG简单且有效,能够泄漏由标签表示的电位敏感信息,并缩放到任意批量尺寸和多个类别。在数学上以及经验上证明了不同设置下攻击的有效性。此外,经验结果表明,LLG在模型训练的早期阶段以高精度成功提取标签。我们还讨论了针对这种泄漏的不同防御机制。我们的研究结果表明,梯度压缩是减轻攻击的实用技术。
translated by 谷歌翻译
联合学习(FL)已成为解决数据筒仓问题的实用解决方案,而不会损害用户隐私。它的一种变体垂直联合学习(VFL)最近引起了人们的关注,因为VFL与企业对利用更有价值的功能的需求相匹配,以构建更好的机器学习模型,同时保留用户隐私。当前在VFL中的工作集中于为特定VFL算法开发特定的保护或攻击机制。在这项工作中,我们提出了一个评估框架,该框架提出了隐私 - 私人评估问题。然后,我们将此框架作为指南,以全面评估针对三种广泛依据的VFL算法的大多数最先进的隐私攻击的广泛保护机制。这些评估可以帮助FL从业人员在特定要求下选择适当的保护机制。我们的评估结果表明:模型反转和大多数标签推理攻击可能会因现有保护机制而挫败;很难防止模型完成(MC)攻击,这需要更高级的MC靶向保护机制。根据我们的评估结果,我们为提高VFL系统的隐私保护能力提供具体建议。
translated by 谷歌翻译
Deep learning (DL) methods have been widely applied to anomaly-based network intrusion detection system (NIDS) to detect malicious traffic. To expand the usage scenarios of DL-based methods, the federated learning (FL) framework allows multiple users to train a global model on the basis of respecting individual data privacy. However, it has not yet been systematically evaluated how robust FL-based NIDSs are against existing privacy attacks under existing defenses. To address this issue, we propose two privacy evaluation metrics designed for FL-based NIDSs, including (1) privacy score that evaluates the similarity between the original and recovered traffic features using reconstruction attacks, and (2) evasion rate against NIDSs using Generative Adversarial Network-based adversarial attack with the reconstructed benign traffic. We conduct experiments to show that existing defenses provide little protection that the corresponding adversarial traffic can even evade the SOTA NIDS Kitsune. To defend against such attacks and build a more robust FL-based NIDS, we further propose FedDef, a novel optimization-based input perturbation defense strategy with theoretical guarantee. It achieves both high utility by minimizing the gradient distance and strong privacy protection by maximizing the input distance. We experimentally evaluate four existing defenses on four datasets and show that our defense outperforms all the baselines in terms of privacy protection with up to 7 times higher privacy score, while maintaining model accuracy loss within 3% under optimal parameter combination.
translated by 谷歌翻译
联合学习允许一组用户在私人训练数据集中培训深度神经网络。在协议期间,数据集永远不会留下各个用户的设备。这是通过要求每个用户向中央服务器发送“仅”模型更新来实现,从而汇总它们以更新深神经网络的参数。然而,已经表明,每个模型更新都具有关于用户数据集的敏感信息(例如,梯度反转攻击)。联合学习的最先进的实现通过利用安全聚合来保护这些模型更新:安全监控协议,用于安全地计算用户的模型更新的聚合。安全聚合是关键,以保护用户的隐私,因为它会阻碍服务器学习用户提供的个人模型更新的源,防止推断和数据归因攻击。在这项工作中,我们表明恶意服务器可以轻松地阐明安全聚合,就像后者未到位一样。我们设计了两种不同的攻击,能够在参与安全聚合的用户数量上,独立于参与安全聚合的用户数。这使得它们在大规模现实世界联邦学习应用中的具体威胁。攻击是通用的,不瞄准任何特定的安全聚合协议。即使安全聚合协议被其理想功能替换为提供完美的安全性的理想功能,它们也同样有效。我们的工作表明,安全聚合与联合学习相结合,当前实施只提供了“虚假的安全感”。
translated by 谷歌翻译
Differentially private federated learning (DP-FL) has received increasing attention to mitigate the privacy risk in federated learning. Although different schemes for DP-FL have been proposed, there is still a utility gap. Employing central Differential Privacy in FL (CDP-FL) can provide a good balance between the privacy and model utility, but requires a trusted server. Using Local Differential Privacy for FL (LDP-FL) does not require a trusted server, but suffers from lousy privacy-utility trade-off. Recently proposed shuffle DP based FL has the potential to bridge the gap between CDP-FL and LDP-FL without a trusted server; however, there is still a utility gap when the number of model parameters is large. In this work, we propose OLIVE, a system that combines the merits from CDP-FL and LDP-FL by leveraging Trusted Execution Environment (TEE). Our main technical contributions are the analysis and countermeasures against the vulnerability of TEE in OLIVE. Firstly, we theoretically analyze the memory access pattern leakage of OLIVE and find that there is a risk for sparsified gradients, which is common in FL. Secondly, we design an inference attack to understand how the memory access pattern could be linked to the training data. Thirdly, we propose oblivious yet efficient algorithms to prevent the memory access pattern leakage in OLIVE. Our experiments on real-world data demonstrate that OLIVE is efficient even when training a model with hundreds of thousands of parameters and effective against side-channel attacks on TEE.
translated by 谷歌翻译
Split学习(SL)通过允许客户在不共享原始数据的情况下协作培训深度学习模型来实现数据隐私保护。但是,SL仍然有限制,例如潜在的数据隐私泄漏和客户端的高计算。在这项研究中,我们建议将SL局部层进行二线以进行更快的计算(在移动设备上的培训和推理阶段的前进时间少17.5倍)和减少内存使用情况(最多减少32倍的内存和带宽要求) 。更重要的是,二进制的SL(B-SL)模型可以减少SL污染数据中的隐私泄漏,而模型精度的降解仅小。为了进一步增强隐私保护,我们还提出了两种新颖的方法:1)培训额外的局部泄漏损失,2)应用差异隐私,可以单独或同时集成到B-SL模型中。与多种基准模型相比,使用不同数据集的实验结果肯定了B-SL模型的优势。还说明了B-SL模型针对功能空间劫持攻击(FSHA)的有效性。我们的结果表明,B-SL模型对于具有高隐私保护要求(例如移动医疗保健应用程序)的轻巧的物联网/移动应用程序很有希望。
translated by 谷歌翻译
在模型提取攻击中,对手可以通过反复查询并根据获得的预测来窃取通过公共API暴露的机器学习模型。为了防止模型窃取,现有的防御措施专注于检测恶意查询,截断或扭曲输出,因此必然会为合法用户引入鲁棒性和模型实用程序之间的权衡。取而代之的是,我们建议通过要求用户在阅读模型的预测之前完成工作证明来阻碍模型提取。这可以通过大大增加(甚至高达100倍)来阻止攻击者,以利用查询访问模型提取所需的计算工作。由于我们校准完成每个查询的工作证明所需的努力,因此这仅为常规用户(最多2倍)引入一个轻微的开销。为了实现这一目标,我们的校准应用了来自差异隐私的工具来衡量查询揭示的信息。我们的方法不需要对受害者模型进行任何修改,可以通过机器学习从业人员来应用其公开暴露的模型免于轻易被盗。
translated by 谷歌翻译
在联合学习(FL)中,一组参与者共享与将更新结合到全局模型中的聚合服务器在本地数据上计算的更新。但是,将准确性与隐私和安全性进行调和是FL的挑战。一方面,诚实参与者发送的良好更新可能会揭示其私人本地信息,而恶意参与者发送的中毒更新可能会损害模型的可用性和/或完整性。另一方面,通过更新失真赔偿准确性增强隐私,而通过更新聚合损坏安全性,因为它不允许服务器过滤掉单个中毒更新。为了解决准确性私人关系冲突,我们提出{\ em碎片的联合学习}(FFL),其中参与者在将其发送到服务器之前,随机交换并混合其更新的片段。为了获得隐私,我们设计了一个轻巧的协议,该协议允许参与者私下交换和混合其更新的加密片段,以便服务器既不能获得单个更新,也不能将其链接到其发起人。为了实现安全性,我们设计了针对FFL量身定制的基于声誉的防御,该防御根据他们交换的片段质量以及他们发送的混合更新来建立对参与者及其混合更新的信任。由于交换的片段的参数可以保持其原始坐标和攻击者可以中和,因此服务器可以从接收到的混合更新中正确重建全局模型而不会准确损失。四个真实数据集的实验表明,FFL可以防止半冬季服务器安装隐私攻击,可以有效地抵抗中毒攻击,并可以保持全局模型的准确性。
translated by 谷歌翻译
In terms of artificial intelligence, there are several security and privacy deficiencies in the traditional centralized training methods of machine learning models by a server. To address this limitation, federated learning (FL) has been proposed and is known for breaking down ``data silos" and protecting the privacy of users. However, FL has not yet gained popularity in the industry, mainly due to its security, privacy, and high cost of communication. For the purpose of advancing the research in this field, building a robust FL system, and realizing the wide application of FL, this paper sorts out the possible attacks and corresponding defenses of the current FL system systematically. Firstly, this paper briefly introduces the basic workflow of FL and related knowledge of attacks and defenses. It reviews a great deal of research about privacy theft and malicious attacks that have been studied in recent years. Most importantly, in view of the current three classification criteria, namely the three stages of machine learning, the three different roles in federated learning, and the CIA (Confidentiality, Integrity, and Availability) guidelines on privacy protection, we divide attack approaches into two categories according to the training stage and the prediction stage in machine learning. Furthermore, we also identify the CIA property violated for each attack method and potential attack role. Various defense mechanisms are then analyzed separately from the level of privacy and security. Finally, we summarize the possible challenges in the application of FL from the aspect of attacks and defenses and discuss the future development direction of FL systems. In this way, the designed FL system has the ability to resist different attacks and is more secure and stable.
translated by 谷歌翻译
联合学习(FL)和分裂学习(SL)是两种新兴的协作学习方法,可能会极大地促进物联网(IoT)中无处不在的智能。联合学习使机器学习(ML)模型在本地培训的模型使用私人数据汇总为全球模型。分裂学习使ML模型的不同部分可以在学习框架中对不同工人进行协作培训。联合学习和分裂学习,每个学习都有独特的优势和各自的局限性,可能会相互补充,在物联网中无处不在的智能。因此,联合学习和分裂学习的结合最近成为一个活跃的研究领域,引起了广泛的兴趣。在本文中,我们回顾了联合学习和拆分学习方面的最新发展,并介绍了有关最先进技术的调查,该技术用于将这两种学习方法组合在基于边缘计算的物联网环境中。我们还确定了一些开放问题,并讨论了该领域未来研究的可能方向,希望进一步引起研究界对这个新兴领域的兴趣。
translated by 谷歌翻译
Privacy-preserving inference via edge or encrypted computing paradigms encourages users of machine learning services to confidentially run a model on their personal data for a target task and only share the model's outputs with the service provider; e.g., to activate further services. Nevertheless, despite all confidentiality efforts, we show that a ''vicious'' service provider can approximately reconstruct its users' personal data by observing only the model's outputs, while keeping the target utility of the model very close to that of a ''honest'' service provider. We show the possibility of jointly training a target model (to be run at users' side) and an attack model for data reconstruction (to be secretly used at server's side). We introduce the ''reconstruction risk'': a new measure for assessing the quality of reconstructed data that better captures the privacy risk of such attacks. Experimental results on 6 benchmark datasets show that for low-complexity data types, or for tasks with larger number of classes, a user's personal data can be approximately reconstructed from the outputs of a single target inference task. We propose a potential defense mechanism that helps to distinguish vicious vs. honest classifiers at inference time. We conclude this paper by discussing current challenges and open directions for future studies. We open-source our code and results, as a benchmark for future work.
translated by 谷歌翻译
In recent years, mobile devices are equipped with increasingly advanced sensing and computing capabilities. Coupled with advancements in Deep Learning (DL), this opens up countless possibilities for meaningful applications, e.g., for medical purposes and in vehicular networks. Traditional cloudbased Machine Learning (ML) approaches require the data to be centralized in a cloud server or data center. However, this results in critical issues related to unacceptable latency and communication inefficiency. To this end, Mobile Edge Computing (MEC) has been proposed to bring intelligence closer to the edge, where data is produced. However, conventional enabling technologies for ML at mobile edge networks still require personal data to be shared with external parties, e.g., edge servers. Recently, in light of increasingly stringent data privacy legislations and growing privacy concerns, the concept of Federated Learning (FL) has been introduced. In FL, end devices use their local data to train an ML model required by the server. The end devices then send the model updates rather than raw data to the server for aggregation. FL can serve as an enabling technology in mobile edge networks since it enables the collaborative training of an ML model and also enables DL for mobile edge network optimization. However, in a large-scale and complex mobile edge network, heterogeneous devices with varying constraints are involved. This raises challenges of communication costs, resource allocation, and privacy and security in the implementation of FL at scale. In this survey, we begin with an introduction to the background and fundamentals of FL. Then, we highlight the aforementioned challenges of FL implementation and review existing solutions. Furthermore, we present the applications of FL for mobile edge network optimization. Finally, we discuss the important challenges and future research directions in FL.
translated by 谷歌翻译
我们考虑垂直逻辑回归(VLR)接受了迷你批次梯度下降训练,这种环境吸引了行业日益增长的兴趣,并被证明在包括金融和医学研究在内的广泛应用中很有用。我们在一系列开源联合学习框架中提供了对VLR的全面和严格的隐私分析,其中协议之间可能会有所不同,但是获得了获得本地梯度的过程。我们首先考虑了诚实而有趣的威胁模型,其中忽略了协议的详细实施,并且仅假定共享过程,我们将其作为甲骨文提取。我们发现,即使在这种一般环境下,在适当的批处理大小约束下,仍然可以从另一方恢复单维功能和标签,从而证明了遵循相同理念的所有框架的潜在脆弱性。然后,我们研究基于同态加密(HE)的协议的流行实例。我们提出了一种主动攻击,该攻击通过生成和压缩辅助密文来显着削弱对先前分析中批处理大小的约束。为了解决基于HE的协议中的隐私泄漏,我们基于差异隐私(DP)开发了一种简单的对策,并为更新的算法提供实用程序和隐私保证。最后,我们从经验上验证了我们对基准数据集的攻击和防御的有效性。总之,我们的发现表明,仅依靠他的所有垂直联合学习框架可能包含严重的隐私风险,而DP已经证明了其在水平联合学习中的力量,也可以在垂直环境中起着至关重要的作用,尤其是当耦合时使用HE或安全的多方计算(MPC)技术。
translated by 谷歌翻译