Privacy noise may negate the benefits of using adaptive optimizers in differentially private model training. Prior works typically address this issue by using auxiliary information (e.g., public data) to boost the effectiveness of adaptive optimization. In this work, we explore techniques to estimate and efficiently adapt to gradient geometry in private adaptive optimization without auxiliary data. Motivated by the observation that adaptive methods can tolerate stale preconditioners, we propose differentially private adaptive training with delayed preconditioners (DP^2), a simple method that constructs delayed but less noisy preconditioners to better realize the benefits of adaptivity. Theoretically, we provide convergence guarantees for our method for both convex and non-convex problems, and analyze trade-offs between delay and privacy noise reduction. Empirically, we explore DP^2 across several real-world datasets, demonstrating that it can improve convergence speed by as much as 4x relative to non-adaptive baselines and match the performance of state-of-the-art optimization methods that require auxiliary data.
translated by 谷歌翻译
联合学习(FL)是以保护隐私方式在异质客户设备上进行机器学习的框架。迄今为止,大多数FL算法都在多个回合中学习一个“全局”服务器模型。在每回合中,相同的服务器模型都向所有参与的客户端广播,在本地更新,然后跨客户端进行汇总。在这项工作中,我们提出了一个更一般的过程,客户“选择”了发送给他们的值的程序。值得注意的是,这使客户可以在较小的数据依赖性切片上操作。为了使这种实用性,我们概述了原始的联合选择,该选择可以在现实的FL系统中进行特定于客户的选择。我们讨论了如何使用联合选择进行模型培训,并表明它可以导致通信和客户记忆使用情况的急剧减少,从而有可能使模型的训练太大而无法适合处个设备。我们还讨论了联邦选择对隐私和信任的含义,这反过来影响了可能的系统约束和设计。最后,我们讨论有关模型体系结构,隐私保护技术和实用FL系统的开放问题。
translated by 谷歌翻译
在最新的应用中,我们需要在自适应流中进行差异隐私,我们研究了在这种情况下矩阵机制的最佳实例化问题。我们证明了矩阵因素化对自适应流的适用性的基本理论结果,并提供了用于计算最佳因素化的无参数固定点算法。我们就机器学习中自然出现的混凝土矩阵实例化了该框架,并通过用户级别的差异私密性来培训用户级别的差异私有模型,从而在联邦学习中产生了显着的问题。
translated by 谷歌翻译
我们考虑使用迷你批量梯度进行差异隐私(DP)的培训模型。现有的最先进的差异私有随机梯度下降(DP-SGD)需要通过采样或洗机来获得最佳隐私/准确性/计算权衡的隐私放大。不幸的是,在重要的实际情况下,精确采样和洗牌的精确要求可能很难获得,特别是联邦学习(FL)。我们设计和分析跟随 - 正规的领导者(DP-FTRL)的DP变体,其比较(理论上和经验地)与放大的DP-SGD相比,同时允许更灵活的数据访问模式。DP-FTRL不使用任何形式的隐私放大。该代码可在https://github.com/google-Research/federated/tree/master/dp_ftrl和https://github.com/google-reesearch/dp-ftrl处获得。
translated by 谷歌翻译
Federated learning is a distributed machine learning paradigm in which a large number of clients coordinate with a central server to learn a model without sharing their own training data. Standard federated optimization methods such as Federated Averaging (FEDAVG) are often difficult to tune and exhibit unfavorable convergence behavior. In non-federated settings, adaptive optimization methods have had notable success in combating such issues. In this work, we propose federated versions of adaptive optimizers, including ADAGRAD, ADAM, and YOGI, and analyze their convergence in the presence of heterogeneous data for general nonconvex settings. Our results highlight the interplay between client heterogeneity and communication efficiency. We also perform extensive experiments on these methods and show that the use of adaptive optimizers can significantly improve the performance of federated learning.
translated by 谷歌翻译
Federated learning (FL) is a machine learning setting where many clients (e.g. mobile devices or whole organizations) collaboratively train a model under the orchestration of a central server (e.g. service provider), while keeping the training data decentralized. FL embodies the principles of focused data collection and minimization, and can mitigate many of the systemic privacy risks and costs resulting from traditional, centralized machine learning and data science approaches. Motivated by the explosive growth in FL research, this paper discusses recent advances and presents an extensive collection of open problems and challenges.
translated by 谷歌翻译
在联合学习(FL)设置中具有用户级差异隐私(例如,DP联合平均)培训神经网络的现有方法涉及通过*将其绘制到某些常量值的贡献限制每个用户的模型更新的贡献。但是,没有好处*先验*跨任务和学习设置的剪切规范设置:更新规范分布取决于模型架构和丢失,每个设备上的数据量,客户端学习率以及可能各种其他参数。我们提出了一种方法,其中代替固定剪切范围,一个剪辑到更新规范分布的指定定量位的值,其中定量位的值本身估计在线,具有差异隐私。该方法紧密地追踪量级,使用可忽略的隐私预算,与其他联合学习技术相容,例如压缩和安全聚合,并具有DP-Fedivg的直接联合DP分析。实验表明,适应性剪辑到中位更新规范的适应性剪辑跨越一系列现实的联合学习任务,有时甚至优于在后敏感中选择的最佳固定剪辑,而无需调整任何剪切的超参数。
translated by 谷歌翻译
Federated Learning is a distributed machine learning approach which enables model training on a large corpus of decentralized data. We have built a scalable production system for Federated Learning in the domain of mobile devices, based on TensorFlow. In this paper, we describe the resulting high-level design, sketch some of the challenges and their solutions, and touch upon the open problems and future directions.
translated by 谷歌翻译
We demonstrate that it is possible to train large recurrent language models with user-level differential privacy guarantees with only a negligible cost in predictive accuracy. Our work builds on recent advances in the training of deep networks on user-partitioned data and privacy accounting for stochastic gradient descent. In particular, we add user-level privacy protection to the federated averaging algorithm, which makes "large step" updates from user-level data. Our work demonstrates that given a dataset with a sufficiently large number of users (a requirement easily met by even small internet-scale datasets), achieving differential privacy comes at the cost of increased computation, rather than in decreased utility as in most prior work. We find that our private LSTM language models are quantitatively and qualitatively similar to un-noised models when trained on a large dataset.
translated by 谷歌翻译
Federated Learning is a machine learning setting where the goal is to train a highquality centralized model while training data remains distributed over a large number of clients each with unreliable and relatively slow network connections. We consider learning algorithms for this setting where on each round, each client independently computes an update to the current model based on its local data, and communicates this update to a central server, where the client-side updates are aggregated to compute a new global model. The typical clients in this setting are mobile phones, and communication efficiency is of the utmost importance. In this paper, we propose two ways to reduce the uplink communication costs: structured updates, where we directly learn an update from a restricted space parametrized using a smaller number of variables, e.g. either low-rank or a random mask; and sketched updates, where we learn a full model update and then compress it using a combination of quantization, random rotations, and subsampling before sending it to the server. Experiments on both convolutional and recurrent networks show that the proposed methods can reduce the communication cost by two orders of magnitude. * Work performed while also affiliated with University of Edinburgh.
translated by 谷歌翻译