Conventional matrix factorization relies on centralized collection of users' data for recommendation, which might introduce an increased risk of privacy leakage especially when the recommender is untrusted. Existing differentially private matrix factorization methods either assume the recommender is trusted, or can only provide a uniform level of privacy protection for all users and items with untrusted recommender. In this paper, we propose a novel Heterogeneous Differentially Private Matrix Factorization algorithm (denoted as HDPMF) for untrusted recommender. To the best of our knowledge, we are the first to achieve heterogeneous differential privacy for decentralized matrix factorization in untrusted recommender scenario. Specifically, our framework uses modified stretching mechanism with an innovative rescaling scheme to achieve better trade off between privacy and accuracy. Meanwhile, by allocating privacy budget properly, we can capture homogeneous privacy preference within a user/item but heterogeneous privacy preference across different users/items. Theoretical analysis confirms that HDPMF renders rigorous privacy guarantee, and exhaustive experiments demonstrate its superiority especially in strong privacy guarantee, high dimension model and sparse dataset scenario.
现在,推荐系统已经变得繁荣,旨在通过学习嵌入来预测用户对项目的潜在兴趣。图形神经网络的最新进展〜(GNNS)还提供带有强大备份的推荐系统,从用户项图中学习嵌入。但是,由于数据收集困难,仅利用用户项交互遭受冷启动问题。因此,目前的努力建议将社交信息与用户项目相互作用融合以缓解它,这是社会推荐问题。现有工作使用GNNS同时聚合两个社交链接和用户项交互。但是,它们都需要集中存储的社交链接和用户的互动,从而导致隐私问题。此外,根据严格的隐私保护,在一般数据保护规则下,将来可能不可行的数据存储可能是不可行的,敦促分散的社会建议框架。为此,我们设计了一个小说框架\ textbf {fe} delated \ textbf {so} cial推荐与\ textbf {g} raph神经网络(fesog)。首先,FeSog采用关系的关注和聚集来处理异质性。其次,Fesog Infers使用本地数据来保留个性化的用户嵌入。最后但并非最不重要的是,所提出的模型采用伪标签技术,其中包含项目采样,以保护隐私和增强培训。三个现实世界数据集的广泛实验可以证明FeSog在完成社会建议和隐私保护方面的有效性。我们是为我们所知,为社会建议提供联邦学习框架的第一项工作。
Federated learning has recently been applied to recommendation systems to protect user privacy. In federated learning settings, recommendation systems can train recommendation models only collecting the intermediate parameters instead of the real user data, which greatly enhances the user privacy. Beside, federated recommendation systems enable to collaborate with other data platforms to improve recommended model performance while meeting the regulation and privacy constraints. However, federated recommendation systems faces many new challenges such as privacy, security, heterogeneity and communication costs. While significant research has been conducted in these areas, gaps in the surveying literature still exist. In this survey, we-(1) summarize some common privacy mechanisms used in federated recommendation systems and discuss the advantages and limitations of each mechanism; (2) review some robust aggregation strategies and several novel attacks against security; (3) summarize some approaches to address heterogeneity and communication costs problems; (4)introduce some open source platforms that can be used to build federated recommendation systems; (5) present some prospective research directions in the future. This survey can guide researchers and practitioners understand the research progress in these areas.
下一个利益点(POI)的建议已成为基于位置的社交网络(LBSN)中必不可少的功能,因为它在帮助人们决定下一个POI访问方面有效。但是,准确的建议需要大量的历史检查数据,因此威胁用户隐私,因为云服务器需要处理位置敏感的数据。尽管有几个用于保护隐私的POI建议的设备框架,但在存储和计算方面,它们仍然是资源密集的,并且对用户POI交互的高稀疏性表现出有限的鲁棒性。在此基础上,我们为POI推荐(DCLR)提出了一个新颖的分散协作学习框架,该框架允许用户以协作方式在本地培训其个性化模型。 DCLR大大降低了本地模型对云的依赖性训练,并可用于扩展任意的集中建议模型。为了抵消在学习每个本地模型时在设备用户数据的稀疏性,我们设计了两个自学信号,以通过POI的地理和分类相关性在服务器上预处理POI表示。为了促进协作学习,我们创新建议将来自地理或语义上类似用户的知识纳入每个本地模型,并以细心的聚合和相互信息最大化。协作学习过程可利用设备之间的通信,同时仅需要中央服务器的少量参与来识别用户组,并且与诸如差异隐私之类的常见隐私保护机制兼容。我们使用两个现实世界数据集评估了DCLR,结果表明,与集中式同行相比,DCLR的表现优于最先进的设备框架,并产生竞争结果。
Rankings are widely collected in various real-life scenarios, leading to the leakage of personal information such as users' preferences on videos or news. To protect rankings, existing works mainly develop privacy protection on a single ranking within a set of ranking or pairwise comparisons of a ranking under the $\epsilon$-differential privacy. This paper proposes a novel notion called $\epsilon$-ranking differential privacy for protecting ranks. We establish the connection between the Mallows model (Mallows, 1957) and the proposed $\epsilon$-ranking differential privacy. This allows us to develop a multistage ranking algorithm to generate synthetic rankings while satisfying the developed $\epsilon$-ranking differential privacy. Theoretical results regarding the utility of synthetic rankings in the downstream tasks, including the inference attack and the personalized ranking tasks, are established. For the inference attack, we quantify how $\epsilon$ affects the estimation of the true ranking based on synthetic rankings. For the personalized ranking task, we consider varying privacy preferences among users and quantify how their privacy preferences affect the consistency in estimating the optimal ranking function. Extensive numerical experiments are carried out to verify the theoretical results and demonstrate the effectiveness of the proposed synthetic ranking algorithm.
Non-negative matrix factorization is a popular unsupervised machine learning algorithm for extracting meaningful features from data which are inherently non-negative. However, such data sets may often contain privacy-sensitive user data, and therefore, we may need to take necessary steps to ensure the privacy of the users while analyzing the data. In this work, we focus on developing a Non-negative matrix factorization algorithm in the privacy-preserving framework. More specifically, we propose a novel privacy-preserving algorithm for non-negative matrix factorisation capable of operating on private data, while achieving results comparable to those of the non-private algorithm. We design the framework such that one has the control to select the degree of privacy grantee based on the utility gap. We show our proposed framework's performance in six real data sets. The experimental results show that our proposed method can achieve very close performance with the non-private algorithm under some parameter regime, while ensuring strict privacy.
目前,联邦图神经网络(GNN)由于其在现实中的广泛应用而没有违反隐私法规而引起了很多关注。在所有隐私保护技术中,差异隐私(DP)是最有希望的,因为它的有效性和轻度计算开销。但是,基于DP的联合GNN尚未得到很好的研究,尤其是在子图级环境中,例如推荐系统的情况。最大的挑战是如何保证隐私并在联邦GNN中解决非独立和相同分布的(非IID)数据。在本文中,我们提出了基于DP的联合GNN DP-FEDREC来填补空白。利用私有集合交叉点(PSI)来扩展每个客户端的本地图,从而解决了非IID问题。最重要的是,DP不仅应用于权重,而且应用于PSI相交图的边缘,以完全保护客户的隐私。该评估表明,DP-FEDREC通过图形扩展实现了更好的性能,而DP仅引入了很少的计算开销。
推荐系统被证明是提取与用户相关的内容帮助用户进行日常活动的宝贵工具(例如,找到相关的访问地点,要消费的内容,要购买的商品)。但是,为了有效,这些系统需要收集和分析大量个人数据(例如,位置检查,电影评分,点击率等),这使用户面临许多隐私威胁。在这种情况下,基于联合学习(FL)的推荐系统似乎是一个有前途的解决方案,可以在计算准确的建议的同时将个人数据保存在用户设备上时,是一个有前途的解决方案。但是,FL,因此基于FL的推荐系统,依靠中央服务器,除了容易受到攻击外,还可以遇到可伸缩性问题。为了解决这个问题,我们提出了基于八卦学习原理的分散推荐系统Pepper。在胡椒中,用户八卦模型更新并不同步。 Pepper的核心位于两个关键组成部分:一个个性化的同行采样协议,该协议保存在每个节点附近,这是与前者具有相似兴趣的节点的一部分,以及一个简单而有效的模型汇总功能,该功能构建了一个模型更适合每个用户。通过在三个实施两个用例的实验实验中进行实验:位置入住建议和电影推荐,我们证明我们的解决方案比其他分散的解决方案快42%收敛于42%与分散的竞争对手相比,长时间性能的命中率和高达21%的速度提高了21%。
使用敏感用户数据调用隐私保护方法,执行低排名矩阵完成。在这项工作中,我们提出了一种新型的噪声添加机制,用于保留差异隐私,其中噪声分布受Huber损失的启发,Huber损失是众所周知的稳定统计数据中众所周知的损失功能。在使用交替的最小二乘方法来解决矩阵完成问题的同时,对现有的差异隐私机制进行了评估。我们还建议使用迭代重新加权的最小二乘算法来完成低级矩阵,并研究合成和真实数据集中不同噪声机制的性能。我们证明所提出的机制实现了{\ epsilon} - 差异性隐私,类似于拉普拉斯机制。此外,经验结果表明,在某些情况下,Huber机制优于Laplacian和Gaussian,否则是可比的。
深度神经网络(DNNS)铰接对大型数据集的可用性的最新成功;但是,对此类数据集的培训经常为敏感培训信息构成隐私风险。在本文中,我们的目标是探讨生成模型和梯度稀疏性的力量,并提出了一种可扩展的隐私保留生成模型数据标准。与标准展示隐私保留框架相比,允许教师对一维预测进行投票,在高维梯度向量上投票在隐私保存方面具有挑战性。随着需要尺寸减少技术,我们需要在(1)之间的改进之间导航精致的权衡空间,并进行SGD收敛的放缓。为了解决这一点,我们利用通信高效学习,并通过将顶-K压缩与相应的噪声注入机构相结合,提出一种新的噪声压缩和聚集方法TopAGG。理论上,我们证明了DataLens框架保证了其生成数据的差异隐私,并提供了其收敛性的分析。为了展示DataLens的实际使用情况,我们对不同数据集进行广泛的实验,包括Mnist,Fashion-Mnist和高维Celeba,并且我们表明,DataLens显着优于其他基线DP生成模型。此外,我们改进了所提出的Topagg方法,该方法是DP SGD培训的主要构建块之一,并表明它能够在大多数情况下实现比最先进的DP SGD方法更高的效用案件。我们的代码在HTTPS://公开提供。
Distributing machine learning predictors enables the collection of large-scale datasets while leaving sensitive raw data at trustworthy sites. We show that locally training support vector machines (SVMs) and computing their averages leads to a learning technique that is scalable to a large number of users, satisfies differential privacy, and is applicable to non-trivial tasks, such as CIFAR-10. For a large number of participants, communication cost is one of the main challenges. We achieve a low communication cost by requiring only a single invocation of an efficient secure multiparty summation protocol. By relying on state-of-the-art feature extractors (SimCLR), we are able to utilize differentially private convex learners for non-trivial tasks such as CIFAR-10. Our experimental results illustrate that for $1{,}000$ users with $50$ data points each, our scheme outperforms state-of-the-art scalable distributed learning methods (differentially private federated learning, short DP-FL) while requiring around $500$ times fewer communication costs: For CIFAR-10, we achieve a classification accuracy of $79.7\,\%$ for an $\varepsilon = 0.59$ while DP-FL achieves $57.6\,\%$. More generally, we prove learnability properties for the average of such locally trained models: convergence and uniform stability. By only requiring strongly convex, smooth, and Lipschitz-continuous objective functions, locally trained via stochastic gradient descent (SGD), we achieve a strong utility-privacy tradeoff.
差异隐私正在成为保护公共共享数据隐私的金标准。它已广泛应用于社会科学,数据科学,公共卫生,信息技术和美国二年人口普查。然而,为了保证差异隐私,现有方法可能不可避免地改变原始数据分析的结论,因为私有化通常会改变样品分布。这种现象被称为隐私保护和统计准确性之间的权衡。在这项工作中,我们通过开发分发 - 不变的私有化(DIP)方法来打破这个权衡,以协调高统计准确性和严格的差异隐私。因此,任何下游统计或机器学习任务都具有基本相同的结论,好像使用原始数据一样。在数字上,在相同的隐私保护的严格性下,DIP在两次模拟和三个真实基准中实现了卓越的统计准确性。
Deep neural networks have strong capabilities of memorizing the underlying training data, which can be a serious privacy concern. An effective solution to this problem is to train models with differential privacy, which provides rigorous privacy guarantees by injecting random noise to the gradients. This paper focuses on the scenario where sensitive data are distributed among multiple participants, who jointly train a model through federated learning (FL), using both secure multiparty computation (MPC) to ensure the confidentiality of each gradient update, and differential privacy to avoid data leakage in the resulting model. A major challenge in this setting is that common mechanisms for enforcing DP in deep learning, which inject real-valued noise, are fundamentally incompatible with MPC, which exchanges finite-field integers among the participants. Consequently, most existing DP mechanisms require rather high noise levels, leading to poor model utility. Motivated by this, we propose Skellam mixture mechanism (SMM), an approach to enforce DP on models built via FL. Compared to existing methods, SMM eliminates the assumption that the input gradients must be integer-valued, and, thus, reduces the amount of noise injected to preserve DP. Further, SMM allows tight privacy accounting due to the nice composition and sub-sampling properties of the Skellam distribution, which are key to accurate deep learning with DP. The theoretical analysis of SMM is highly non-trivial, especially considering (i) the complicated math of differentially private deep learning in general and (ii) the fact that the mixture of two Skellam distributions is rather complex, and to our knowledge, has not been studied in the DP literature. Extensive experiments on various practical settings demonstrate that SMM consistently and significantly outperforms existing solutions in terms of the utility of the resulting model.
联合学习是一种协作机器学习,参与客户在本地处理他们的数据,仅与协作模型共享更新。这使得能够建立隐私意识的分布式机器学习模型等。目的是通过最大程度地减少一组客户本地存储的数据集的成本函数来优化统计模型的参数。这个过程使客户遇到了两个问题:私人信息的泄漏和模型的个性化缺乏。另一方面,随着分析数据的最新进步,人们对侵犯参与客户的隐私行为的关注激增。为了减轻这种情况,差异隐私及其变体是提供正式隐私保证的标准。客户通常代表非常异构的社区,并拥有非常多样化的数据。因此,与FL社区的最新重点保持一致,以为代表其多样性的用户建立个性化模型框架,这对于防止潜在威胁免受客户的敏感和个人信息而言也是至关重要的。 $ d $ - 私人是对地理位置可区分性的概括,即最近普及的位置隐私范式,它使用了一种基于公制的混淆技术,可保留原始数据的空间分布。为了解决保护客户隐私并允许个性化模型培训以增强系统的公平性和实用性的问题,我们提出了一种提供团体隐私性的方法在FL的框架下。我们为对现实世界数据集的适用性和实验验证提供了理论上的理由,以说明该方法的工作。
在推荐系统中,一个普遍的挑战是冷门问题,在系统中,相互作用非常有限。为了应对这一挑战,最近,许多作品将元优化的想法介绍到建议方案中,即学习仅通过过去的几个交互项目来学习用户偏好。核心想法是为所有用户学习全局共享的元启动参数,并分别为每个用户迅速调整其本地参数。他们的目的是在各种用户的偏好学习中得出一般知识,以便通过博学的先验和少量培训数据迅速适应未来的新用户。但是,以前的作品表明,推荐系统通常容易受到偏见和不公平的影响。尽管元学习成功地通过冷启动提高了推荐性能,但公平性问题在很大程度上被忽略了。在本文中,我们提出了一个名为Clover的全面的公平元学习框架,以确保元学习的推荐模型的公平性。我们系统地研究了三种公平性 - 个人公平,反事实公平和推荐系统中的群体公平,并建议通过多任务对抗学习方案满足所有三种类型。我们的框架提供了一种通用的培训范式,适用于不同的元学习推荐系统。我们证明了三叶草对三个现实世界数据集的代表性元学习用户偏好估计器的有效性。经验结果表明,三叶草可以实现全面的公平性,而不会恶化整体的冷淡建议性能。
Federated learning facilitates the collaborative training of models without the sharing of raw data. However, recent attacks demonstrate that simply maintaining data locality during training processes does not provide sufficient privacy guarantees. Rather, we need a federated learning system capable of preventing inference over both the messages exchanged during training and the final trained model while ensuring the resulting model also has acceptable predictive accuracy. Existing federated learning approaches either use secure multiparty computation (SMC) which is vulnerable to inference or differential privacy which can lead to low accuracy given a large number of parties with relatively small amounts of data each. In this paper, we present an alternative approach that utilizes both differential privacy and SMC to balance these trade-offs. Combining differential privacy with secure multiparty computation enables us to reduce the growth of noise injection as the number of parties increases without sacrificing privacy while maintaining a pre-defined rate of trust. Our system is therefore a scalable approach that protects against inference threats and produces models with high accuracy. Additionally, our system can be used to train a variety of machine learning models, which we validate with experimental results on 3 different machine learning algorithms. Our experiments demonstrate that our approach out-performs state of the art solutions. CCS CONCEPTS• Security and privacy → Privacy-preserving protocols; Trust frameworks; • Computing methodologies → Learning settings.
