We propose a framework in which multiple entities collaborate to build a machine learning model while preserving privacy of their data. The approach utilizes feature embeddings from shared/per-entity feature extractors transforming data into a feature space for cooperation between entities. We propose two specific methods and compare them with a baseline method. In Shared Feature Extractor (SFE) Learning, the entities use a shared feature extractor to compute feature embeddings of samples. In Locally Trained Feature Extractor (LTFE) Learning, each entity uses a separate feature extractor and models are trained using concatenated features from all entities. As a baseline, in Cooperatively Trained Feature Extractor (CTFE) Learning, the entities train models by sharing raw data. Secure multi-party algorithms are utilized to train models without revealing data or features in plain text. We investigate the trade-offs among SFE, LTFE, and CTFE in regard to performance, privacy leakage (using an off-the-shelf membership inference attack), and computational cost. LTFE provides the most privacy, followed by SFE, and then CTFE. Computational cost is lowest for SFE and the relative speed of CTFE and LTFE depends on network architecture. CTFE and LTFE provide the best accuracy. We use MNIST, a synthetic dataset, and a credit card fraud detection dataset for evaluations.
translated by 谷歌翻译
Differentially private federated learning (DP-FL) has received increasing attention to mitigate the privacy risk in federated learning. Although different schemes for DP-FL have been proposed, there is still a utility gap. Employing central Differential Privacy in FL (CDP-FL) can provide a good balance between the privacy and model utility, but requires a trusted server. Using Local Differential Privacy for FL (LDP-FL) does not require a trusted server, but suffers from lousy privacy-utility trade-off. Recently proposed shuffle DP based FL has the potential to bridge the gap between CDP-FL and LDP-FL without a trusted server; however, there is still a utility gap when the number of model parameters is large. In this work, we propose OLIVE, a system that combines the merits from CDP-FL and LDP-FL by leveraging Trusted Execution Environment (TEE). Our main technical contributions are the analysis and countermeasures against the vulnerability of TEE in OLIVE. Firstly, we theoretically analyze the memory access pattern leakage of OLIVE and find that there is a risk for sparsified gradients, which is common in FL. Secondly, we design an inference attack to understand how the memory access pattern could be linked to the training data. Thirdly, we propose oblivious yet efficient algorithms to prevent the memory access pattern leakage in OLIVE. Our experiments on real-world data demonstrate that OLIVE is efficient even when training a model with hundreds of thousands of parameters and effective against side-channel attacks on TEE.
translated by 谷歌翻译
联合学习(FL)和分裂学习(SL)是两种新兴的协作学习方法,可能会极大地促进物联网(IoT)中无处不在的智能。联合学习使机器学习(ML)模型在本地培训的模型使用私人数据汇总为全球模型。分裂学习使ML模型的不同部分可以在学习框架中对不同工人进行协作培训。联合学习和分裂学习,每个学习都有独特的优势和各自的局限性,可能会相互补充,在物联网中无处不在的智能。因此,联合学习和分裂学习的结合最近成为一个活跃的研究领域,引起了广泛的兴趣。在本文中,我们回顾了联合学习和拆分学习方面的最新发展,并介绍了有关最先进技术的调查,该技术用于将这两种学习方法组合在基于边缘计算的物联网环境中。我们还确定了一些开放问题,并讨论了该领域未来研究的可能方向,希望进一步引起研究界对这个新兴领域的兴趣。
translated by 谷歌翻译
Deep neural networks are susceptible to various inference attacks as they remember information about their training data. We design white-box inference attacks to perform a comprehensive privacy analysis of deep learning models. We measure the privacy leakage through parameters of fully trained models as well as the parameter updates of models during training. We design inference algorithms for both centralized and federated learning, with respect to passive and active inference attackers, and assuming different adversary prior knowledge.We evaluate our novel white-box membership inference attacks against deep learning algorithms to trace their training data records. We show that a straightforward extension of the known black-box attacks to the white-box setting (through analyzing the outputs of activation functions) is ineffective. We therefore design new algorithms tailored to the white-box setting by exploiting the privacy vulnerabilities of the stochastic gradient descent algorithm, which is the algorithm used to train deep neural networks. We investigate the reasons why deep learning models may leak information about their training data. We then show that even well-generalized models are significantly susceptible to white-box membership inference attacks, by analyzing stateof-the-art pre-trained and publicly available models for the CIFAR dataset. We also show how adversarial participants, in the federated learning setting, can successfully run active membership inference attacks against other participants, even when the global model achieves high prediction accuracies.
translated by 谷歌翻译
在联合学习(FL)中,数据不会在联合培训机器学习模型时留下个人设备。相反,这些设备与中央党(例如,公司)共享梯度。因为数据永远不会“离开”个人设备,因此FL作为隐私保留呈现。然而,最近显示这种保护是一个薄的外观,甚至是一种被动攻击者观察梯度可以重建各个用户的数据。在本文中,我们争辩说,事先工作仍然很大程度上低估了FL的脆弱性。这是因为事先努力专门考虑被动攻击者,这些攻击者是诚实但好奇的。相反,我们介绍了一个活跃和不诚实的攻击者,作为中央会,他们能够在用户计算模型渐变之前修改共享模型的权重。我们称之为修改的重量“陷阱重量”。我们的活跃攻击者能够完全恢复用户数据,并在接近零成本时:攻击不需要复杂的优化目标。相反,它利用了模型梯度的固有数据泄漏,并通过恶意改变共享模型的权重来放大这种效果。这些特异性使我们的攻击能够扩展到具有大型迷你批次数据的模型。如果来自现有工作的攻击者需要小时才能恢复单个数据点,我们的方法需要毫秒来捕获完全连接和卷积的深度神经网络的完整百分之批次数据。最后,我们考虑缓解。我们观察到,FL中的差异隐私(DP)的当前实现是有缺陷的,因为它们明确地信任中央会,并在增加DP噪音的关键任务,因此不提供对恶意中央党的保护。我们还考虑其他防御,并解释为什么它们类似地不足。它需要重新设计FL,为用户提供任何有意义的数据隐私。
translated by 谷歌翻译
Federated learning is a collaborative method that aims to preserve data privacy while creating AI models. Current approaches to federated learning tend to rely heavily on secure aggregation protocols to preserve data privacy. However, to some degree, such protocols assume that the entity orchestrating the federated learning process (i.e., the server) is not fully malicious or dishonest. We investigate vulnerabilities to secure aggregation that could arise if the server is fully malicious and attempts to obtain access to private, potentially sensitive data. Furthermore, we provide a method to further defend against such a malicious server, and demonstrate effectiveness against known attacks that reconstruct data in a federated learning setting.
translated by 谷歌翻译
已经提出了安全的多方计算(MPC),以允许多个相互不信任的数据所有者在其合并数据上共同训练机器学习(ML)模型。但是,通过设计,MPC协议忠实地计算了训练功能,对抗性ML社区已证明该功能泄漏了私人信息,并且可以在中毒攻击中篡改。在这项工作中,我们认为在我们的框架中实现的模型合奏是一种称为Safenet的框架,是MPC的高度无限方法,可以避免许多对抗性ML攻击。 MPC培训中所有者之间数据的自然分区允许这种方法在训练时间高度可扩展,可证明可保护免受中毒攻击的保护,并证明可以防御许多隐私攻击。我们展示了Safenet对在端到端和转移学习方案训练的几个机器学习数据集和模型上中毒的效率,准确性和韧性。例如,Safenet可显着降低后门攻击的成功,同时获得$ 39 \ times $ $的培训,$ 36 \ times $ $ $少于达尔斯科夫(Dalskov)等人的四方MPC框架。我们的实验表明,即使在许多非IID设置中,结合也能保留这些好处。结合的简单性,廉价的设置和鲁棒性属性使其成为MPC私下培训ML模型的强大首选。
translated by 谷歌翻译
联邦机器学习利用边缘计算来开发网络用户数据的模型,但联合学习的隐私仍然是一个重大挑战。已经提出了使用差异隐私的技术来解决这一点,但是带来了自己的挑战 - 许多人需要一个值得信赖的第三方,或者增加了太多的噪音来生产有用的模型。使用多方计算的\ EMPH {SERVE聚合}的最新进步消除了对第三方的需求,但是在计算上尤其在规模上昂贵。我们提出了一种新的联合学习协议,利用了一种基于与错误学习的技术的新颖差异私有的恶意安全聚合协议。我们的协议优于当前最先进的技术,并且经验结果表明它缩放到大量方面,具有任何差别私有联合学习方案的最佳精度。
translated by 谷歌翻译
隐私法规法(例如GDPR)将透明度和安全性作为数据处理算法的设计支柱。在这种情况下,联邦学习是保护隐私的分布式机器学习的最具影响力的框架之一,从而实现了许多自然语言处理和计算机视觉任务的惊人结果。一些联合学习框架采用差异隐私,以防止私人数据泄漏到未经授权的政党和恶意攻击者。但是,许多研究突出了标准联邦学习对中毒和推理的脆弱性,因此引起了人们对敏感数据潜在风险的担忧。为了解决此问题,我们提出了SGDE,这是一种生成数据交换协议,可改善跨索洛联合会中的用户安全性和机器学习性能。 SGDE的核心是共享具有强大差异隐私的数据生成器,保证了对私人数据培训的培训,而不是通信显式梯度信息。这些发电机合成了任意大量数据,这些数据保留了私人样品的独特特征,但有很大差异。我们展示了将SGDE纳入跨核心联合网络如何提高对联邦学习最有影响力的攻击的弹性。我们在图像和表格数据集上测试我们的方法,利用β变量自动编码器作为数据生成器,并突出了对非生成数据的本地和联合学习的公平性和绩效改进。
translated by 谷歌翻译
由于对隐私保护的关注不断增加,因此如何在不同数据源上建立机器学习(ML)模型具有安全保证,这越来越受欢迎。垂直联合学习(VFL)描述了这种情况,其中ML模型建立在不同参与方的私人数据上,该数据与同一集合相同的实例中拥有不相交的功能,这适合许多现实世界中的协作任务。但是,我们发现VFL现有的解决方案要么支持有限的输入功能,要么在联合执行过程中遭受潜在数据泄漏的损失。为此,本文旨在研究VFL方案中ML模式的功能和安全性。具体来说,我们介绍了BlindFL,这是VFL训练和推理的新型框架。首先,为了解决VFL模型的功能,我们建议联合源层团结不同各方的数据。联合源层可以有效地支持各种特征,包括密集,稀疏,数值和分类特征。其次,我们在联合执行期间仔细分析了安全性,并正式化了隐私要求。基于分析,我们设计了安全,准确的算法协议,并进一步证明了在理想真实的仿真范式下的安全保证。广泛的实验表明,BlindFL支持各种数据集和模型,同时获得强大的隐私保证。
translated by 谷歌翻译
联合学习允许一组用户在私人训练数据集中培训深度神经网络。在协议期间,数据集永远不会留下各个用户的设备。这是通过要求每个用户向中央服务器发送“仅”模型更新来实现,从而汇总它们以更新深神经网络的参数。然而,已经表明,每个模型更新都具有关于用户数据集的敏感信息(例如,梯度反转攻击)。联合学习的最先进的实现通过利用安全聚合来保护这些模型更新:安全监控协议,用于安全地计算用户的模型更新的聚合。安全聚合是关键,以保护用户的隐私,因为它会阻碍服务器学习用户提供的个人模型更新的源,防止推断和数据归因攻击。在这项工作中,我们表明恶意服务器可以轻松地阐明安全聚合,就像后者未到位一样。我们设计了两种不同的攻击,能够在参与安全聚合的用户数量上,独立于参与安全聚合的用户数。这使得它们在大规模现实世界联邦学习应用中的具体威胁。攻击是通用的,不瞄准任何特定的安全聚合协议。即使安全聚合协议被其理想功能替换为提供完美的安全性的理想功能,它们也同样有效。我们的工作表明,安全聚合与联合学习相结合,当前实施只提供了“虚假的安全感”。
translated by 谷歌翻译
Federated learning facilitates the collaborative training of models without the sharing of raw data. However, recent attacks demonstrate that simply maintaining data locality during training processes does not provide sufficient privacy guarantees. Rather, we need a federated learning system capable of preventing inference over both the messages exchanged during training and the final trained model while ensuring the resulting model also has acceptable predictive accuracy. Existing federated learning approaches either use secure multiparty computation (SMC) which is vulnerable to inference or differential privacy which can lead to low accuracy given a large number of parties with relatively small amounts of data each. In this paper, we present an alternative approach that utilizes both differential privacy and SMC to balance these trade-offs. Combining differential privacy with secure multiparty computation enables us to reduce the growth of noise injection as the number of parties increases without sacrificing privacy while maintaining a pre-defined rate of trust. Our system is therefore a scalable approach that protects against inference threats and produces models with high accuracy. Additionally, our system can be used to train a variety of machine learning models, which we validate with experimental results on 3 different machine learning algorithms. Our experiments demonstrate that our approach out-performs state of the art solutions. CCS CONCEPTS• Security and privacy → Privacy-preserving protocols; Trust frameworks; • Computing methodologies → Learning settings.
translated by 谷歌翻译
Collaborative machine learning and related techniques such as federated learning allow multiple participants, each with his own training dataset, to build a joint model by training locally and periodically exchanging model updates. We demonstrate that these updates leak unintended information about participants' training data and develop passive and active inference attacks to exploit this leakage. First, we show that an adversarial participant can infer the presence of exact data points-for example, specific locations-in others' training data (i.e., membership inference). Then, we show how this adversary can infer properties that hold only for a subset of the training data and are independent of the properties that the joint model aims to capture. For example, he can infer when a specific person first appears in the photos used to train a binary gender classifier. We evaluate our attacks on a variety of tasks, datasets, and learning configurations, analyze their limitations, and discuss possible defenses.
translated by 谷歌翻译
制药行业可以更好地利用其数据资产来通过协作机器学习平台虚拟化药物发现。另一方面,由于参与者的培训数据的意外泄漏,存在不可忽略的风险,因此,对于这样的平台,必须安全和隐私权。本文介绍了在药物发现的临床前阶段进行协作建模的隐私风险评估,以加快有前途的候选药物的选择。在最新推理攻击的简短分类法之后,我们采用并定制了几种基础情况。最后,我们用一些相关的隐私保护技术来描述和实验,以减轻此类攻击。
translated by 谷歌翻译
联邦学习的出现在维持隐私的同时,促进了机器学习模型之间的大规模数据交换。尽管历史悠久,但联邦学习正在迅速发展,以使更广泛的使用更加实用。该领域中最重要的进步之一是将转移学习纳入联邦学习,这克服了主要联合学习的基本限制,尤其是在安全方面。本章从安全的角度进行了有关联合和转移学习的交集的全面调查。这项研究的主要目标是发现可能损害使用联合和转移学习的系统的隐私和性能的潜在脆弱性和防御机制。
translated by 谷歌翻译
我们调查分裂学习的安全 - 一种新颖的协作机器学习框架,通过需要最小的资源消耗来实现峰值性能。在本文中,我们通过介绍客户私人培训集重建的一般攻击策略来揭示议定书的脆弱性并展示其固有的不安全。更突出地,我们表明恶意服务器可以积极地劫持分布式模型的学习过程,并将其纳入不安全状态,从而为客户端提供推动攻击。我们实施不同的攻击调整,并在各种数据集中测试它们以及现实的威胁方案。我们证明我们的攻击能够克服最近提出的防御技术,旨在提高分裂学习议定书的安全性。最后,我们还通过扩展以前设计的联合学习的攻击来说明协议对恶意客户的不安全性。要使我们的结果可重复,我们会在https://github.com/pasquini-dario/splitn_fsha提供的代码。
translated by 谷歌翻译
联合学习(FL)为培训机器学习模型打开了新的观点,同时将个人数据保存在用户场所上。具体而言,在FL中,在用户设备上训练了模型,并且仅将模型更新(即梯度)发送到中央服务器以进行聚合目的。但是,近年来发表的一系列推理攻击泄漏了私人数据,这强调了需要设计有效的保护机制来激励FL的大规模采用。尽管存在缓解服务器端的这些攻击的解决方案,但几乎没有采取任何措施来保护用户免受客户端执行的攻击。在这种情况下,在客户端使用受信任的执行环境(TEE)是最建议的解决方案之一。但是,现有的框架(例如,Darknetz)需要静态地将机器学习模型的很大一部分放入T恤中,以有效防止复杂的攻击或攻击组合。我们提出了GradSec,该解决方案允许在静态或动态上仅在机器学习模型的TEE上进行保护,因此将TCB的大小和整体训练时间降低了30%和56%,相比之下 - 艺术竞争者。
translated by 谷歌翻译
网络威胁情报(CTI)共享是减少攻击者和捍卫者之间信息不对称的重要活动。但是,由于数据共享和机密性之间的紧张关系,这项活动带来了挑战,这导致信息保留通常会导致自由骑士问题。因此,共享的信息仅代表冰山一角。当前的文献假设访问包含所有信息的集中数据库,但是由于上述张力,这并不总是可行的。这会导致不平衡或不完整的数据集,需要使用技术扩展它们。我们展示了这些技术如何导致结果和误导性能期望。我们提出了一个新颖的框架,用于从分布式数据中提取有关事件,漏洞和妥协指标的分布式数据,并与恶意软件信息共享平台(MISP)一起证明其在几种实际情况下的使用。提出和讨论了CTI共享的政策影响。拟议的系统依赖于隐私增强技术和联合处理的有效组合。这使组织能够控制其CTI,并最大程度地减少暴露或泄漏的风险,同时为共享的好处,更准确和代表性的结果以及更有效的预测性和预防性防御能力。
translated by 谷歌翻译
Deep learning based on artificial neural networks is a very popular approach to modeling, classifying, and recognizing complex data such as images, speech, and text. The unprecedented accuracy of deep learning methods has turned them into the foundation of new AI-based services on the Internet. Commercial companies that collect user data on a large scale have been the main beneficiaries of this trend since the success of deep learning techniques is directly proportional to the amount of data available for training.Massive data collection required for deep learning presents obvious privacy issues. Users' personal, highly sensitive data such as photos and voice recordings is kept indefinitely by the companies that collect it. Users can neither delete it, nor restrict the purposes for which it is used. Furthermore, centrally kept data is subject to legal subpoenas and extra-judicial surveillance. Many data owners-for example, medical institutions that may want to apply deep learning methods to clinical records-are prevented by privacy and confidentiality concerns from sharing the data and thus benefitting from large-scale deep learning.In this paper, we design, implement, and evaluate a practical system that enables multiple parties to jointly learn an accurate neuralnetwork model for a given objective without sharing their input datasets. We exploit the fact that the optimization algorithms used in modern deep learning, namely, those based on stochastic gradient descent, can be parallelized and executed asynchronously. Our system lets participants train independently on their own datasets and selectively share small subsets of their models' key parameters during training. This offers an attractive point in the utility/privacy tradeoff space: participants preserve the privacy of their respective data while still benefitting from other participants' models and thus boosting their learning accuracy beyond what is achievable solely on their own inputs. We demonstrate the accuracy of our privacypreserving deep learning on benchmark datasets.
translated by 谷歌翻译
联合学习(FL)已成为协作分布式学习的隐私解决方案,客户直接在其设备上训练AI模型,而不是与集中式(潜在的对手)服务器共享数据。尽管FL在某种程度上保留了本地数据隐私,但已显示有关客户数据的信息仍然可以从模型更新中推断出来。近年来,已经制定了各种隐私计划来解决这种隐私泄漏。但是,它们通常以牺牲模型性能或系统效率为代价提供隐私,而在实施FL计划时,平衡这些权衡是一个至关重要的挑战。在本手稿中,我们提出了一个保护隐私的联合学习(PPFL)框架,该框架建立在控制理论中的矩阵加密和系统沉浸工具的协同作用上。这个想法是将学习算法(随机梯度体面(SGD))浸入更高维度的系统(所谓的目标系统)中,并设计目标系统的动力学,以便:浸入原始SGD的轨迹: /嵌入其轨迹中,并在加密数据上学习(在这里我们使用随机矩阵加密)。矩阵加密是在服务器上重新重新格式化的,作为将原始参数映射到更高维的参数空间的坐标的随机更改,并强制执行目标SGD收敛到原始SGD Optiral解决方案的加密版本。服务器使用浸入式地图的左侧逆汇总模型解密。我们表明,我们的算法提供与标准FL相同的准确性和收敛速度,而计算成本可忽略不计,同时却没有透露有关客户数据的信息。
translated by 谷歌翻译