联邦学习最近在机器学习中迅速发展,引起了各种研究主题。流行的优化算法基于(随机)梯度下降方法的框架或乘数的交替方向方法。在本文中,我们部署了一种确切的惩罚方法来处理联合学习,并提出了一种算法Fedepm,该算法能够解决联合学习中的四个关键问题:沟通效率,计算复杂性,Stragglers的效果和数据隐私。此外,事实证明,它具有收敛性和作证为具有高数值性能。
translated by 谷歌翻译
One of the crucial issues in federated learning is how to develop efficient optimization algorithms. Most of the current ones require full device participation and/or impose strong assumptions for convergence. Different from the widely-used gradient descent-based algorithms, in this paper, we develop an inexact alternating direction method of multipliers (ADMM), which is both computation- and communication-efficient, capable of combating the stragglers' effect, and convergent under mild conditions. Furthermore, it has a high numerical performance compared with several state-of-the-art algorithms for federated learning.
translated by 谷歌翻译
Federated learning has shown its advances recently but is still facing many challenges, such as how algorithms save communication resources and reduce computational costs, and whether they converge. To address these critical issues, we propose a hybrid federated learning algorithm (FedGiA) that combines the gradient descent and the inexact alternating direction method of multipliers. The proposed algorithm is more communication- and computation-efficient than several state-of-the-art algorithms theoretically and numerically. Moreover, it also converges globally under mild conditions.
translated by 谷歌翻译
联邦学习(FL)已成为一个热门研究领域,以在拥有敏感本地数据的多个客户中对机器学习模型进行协作培训。然而,主要使用随机梯度下降(SGD)研究了不受约束的联邦优化,该梯度下降可能会缓慢收敛,并且限制了联邦优化的优化,这更具挑战性,迄今尚未研究。本文分别研究了基于样本和基于特征的联合优化,并考虑了每个人的无限制和约束非凸问题。首先,我们建议使用随机连续的凸近似(SSCA)和迷你批次技术提出FL算法。这些算法可以充分利用目标和约束函数的结构,并逐步利用样品。我们表明,所提出的FL算法分别收敛到固定点和相应不受约束和约束的非凸问题的固定点和Karush-Kuhn-Tucker(KKT)点。接下来,我们提供算法示例,每回合具有吸引人的计算复杂性和通信负载。我们表明,未约束的联邦优化算法示例与动量SGD相同,与FL算法相同,并在SSCA和动量SGD之间提供分析连接。最后,数值实验证明了在收敛速度,通信和计算成本以及模型规范中提出算法的固有优势。
translated by 谷歌翻译
我们展示了一个联合学习框架,旨在强大地提供具有异构数据的各个客户端的良好预测性能。所提出的方法对基于SuperQualile的学习目标铰接,捕获异构客户端的误差分布的尾统计。我们提出了一种随机训练算法,其与联合平均步骤交织差异私人客户重新重量步骤。该提出的算法支持有限时间收敛保证,保证覆盖凸和非凸面设置。关于联邦学习的基准数据集的实验结果表明,我们的方法在平均误差方面与古典误差竞争,并且在误差的尾统计方面优于它们。
translated by 谷歌翻译
Federated learning (FL) has emerged as an instance of distributed machine learning paradigm that avoids the transmission of data generated on the users' side. Although data are not transmitted, edge devices have to deal with limited communication bandwidths, data heterogeneity, and straggler effects due to the limited computational resources of users' devices. A prominent approach to overcome such difficulties is FedADMM, which is based on the classical two-operator consensus alternating direction method of multipliers (ADMM). The common assumption of FL algorithms, including FedADMM, is that they learn a global model using data only on the users' side and not on the edge server. However, in edge learning, the server is expected to be near the base station and have direct access to rich datasets. In this paper, we argue that leveraging the rich data on the edge server is much more beneficial than utilizing only user datasets. Specifically, we show that the mere application of FL with an additional virtual user node representing the data on the edge server is inefficient. We propose FedTOP-ADMM, which generalizes FedADMM and is based on a three-operator ADMM-type technique that exploits a smooth cost function on the edge server to learn a global model parallel to the edge devices. Our numerical experiments indicate that FedTOP-ADMM has substantial gain up to 33\% in communication efficiency to reach a desired test accuracy with respect to FedADMM, including a virtual user on the edge server.
translated by 谷歌翻译
为了在带宽洪泛环境(例如无线网络)中启用大规模的机器学习,最近在设计借助通信压缩的帮助下,最近在设计沟通效率的联合学习算法方面取得了重大进展。另一方面,隐私保护,尤其是在客户层面上,是另一个重要的避税,在存在高级通信压缩技术的情况下尚未同时解决。在本文中,我们提出了一个统一的框架,以通过沟通压缩提高私人联邦学习的沟通效率。利用通用压缩操作员和局部差异隐私,我们首先检查了一种简单的算法,该算法将压缩直接应用于差异私密的随机梯度下降,并确定其局限性。然后,我们为私人联合学习提出了一个统一的框架Soteriafl,该框架适应了一般的局部梯度估计剂家庭,包括流行的随机方差减少梯度方法和最先进的变化压缩方案。我们在隐私,公用事业和沟通复杂性方面提供了其性能权衡的全面表征,在这种情况下,Soterafl被证明可以在不牺牲隐私或实用性的情况下实现更好的沟通复杂性,而不是其他私人联合联盟学习算法而没有沟通压缩。
translated by 谷歌翻译
This study investigates clustered federated learning (FL), one of the formulations of FL with non-i.i.d. data, where the devices are partitioned into clusters and each cluster optimally fits its data with a localized model. We propose a novel clustered FL framework, which applies a nonconvex penalty to pairwise differences of parameters. This framework can automatically identify clusters without a priori knowledge of the number of clusters and the set of devices in each cluster. To implement the proposed framework, we develop a novel clustered FL method called FPFC. Advancing from the standard ADMM, our method is implemented in parallel, updates only a subset of devices at each communication round, and allows each participating device to perform a variable amount of work. This greatly reduces the communication cost while simultaneously preserving privacy, making it practical for FL. We also propose a new warmup strategy for hyperparameter tuning under FL settings and consider the asynchronous variant of FPFC (asyncFPFC). Theoretically, we provide convergence guarantees of FPFC for general nonconvex losses and establish the statistical convergence rate under a linear model with squared loss. Our extensive experiments demonstrate the advantages of FPFC over existing methods.
translated by 谷歌翻译
联合学习(FL)能够通过定期聚合培训的本地参数来在多个边缘用户执行大的分布式机器学习任务。为了解决在无线迷雾云系统上实现支持的关键挑战(例如,非IID数据,用户异质性),我们首先基于联合平均(称为FedFog)的高效流行算法来执行梯度参数的本地聚合在云端的FOG服务器和全球培训更新。接下来,我们通过调查新的网络知识的流动系统,在无线雾云系统中雇用FEDFog,这促使了全局损失和完成时间之间的平衡。然后开发了一种迭代算法以获得系统性能的精确测量,这有助于设计有效的停止标准以输出适当数量的全局轮次。为了缓解级体效果,我们提出了一种灵活的用户聚合策略,可以先培训快速用户在允许慢速用户加入全局培训更新之前获得一定程度的准确性。提供了使用若干现实世界流行任务的广泛数值结果来验证FEDFOG的理论融合。我们还表明,拟议的FL和通信的共同设计对于在实现学习模型的可比准确性的同时,基本上提高资源利用是必要的。
translated by 谷歌翻译
Federated learning (FL), as a type of distributed machine learning, is capable of significantly preserving clients' private data from being exposed to adversaries. Nevertheless, private information can still be divulged by analyzing uploaded parameters from clients, e.g., weights trained in deep neural networks. In this paper, to effectively prevent information leakage, we propose a novel framework based on the concept of differential privacy (DP), in which artificial noises are added to parameters at the clients' side before aggregating, namely, noising before model aggregation FL (NbAFL). First, we prove that the NbAFL can satisfy DP under distinct protection levels by properly adapting different variances of artificial noises. Then we develop a theoretical convergence bound of the loss function of the trained FL model in the NbAFL. Specifically, the theoretical bound reveals the following three key properties: 1) There is a tradeoff between a convergence performance and privacy protection levels, i.e., better convergence performance leads to a lower protection level; 2) Given a fixed privacy protection level, increasing the number N of overall clients participating in FL can improve the convergence performance; and 3) There is an optimal number aggregation times (communication rounds) in terms of convergence performance for a given protection level. Furthermore, we propose a K-client random scheduling strategy, where K (1 ≤ K < N ) clients are randomly selected from the N overall clients to participate in each aggregation. We also develop a corresponding convergence bound for the loss function in this case and the K-client random scheduling strategy also retains the above three properties. Moreover, we find that there is an optimal K that achieves the best convergence performance at a
translated by 谷歌翻译
牛顿型方法由于其快速收敛而在联合学习中很受欢迎。尽管如此,由于要求将Hessian信息从客户发送到参数服务器(PS),因此他们遭受了两个主要问题:沟通效率低下和较低的隐私性。在这项工作中,我们介绍了一个名为Fednew的新颖框架,其中无需将Hessian信息从客户传输到PS,因此解决了瓶颈以提高沟通效率。此外,与现有的最新技术相比,Fednew隐藏了梯度信息,并导致具有隐私的方法。 Fednew中的核心小说想法是引入两个级别的框架,并在仅使用一种交替的乘数方法(ADMM)步骤更新逆Hessian级别产品之间,然后使用Newton的方法执行全局模型更新。尽管在每次迭代中只使用一个ADMM通行证来近似逆Hessian梯度产品,但我们开发了一种新型的理论方法来显示Fednew在凸问题上的融合行为。此外,通过利用随机量化,可以显着减少通信开销。使用真实数据集的数值结果显示了与现有方法相比,在通信成本方面,Fednew的优越性。
translated by 谷歌翻译
数据异构联合学习(FL)系统遭受了两个重要的收敛误差来源:1)客户漂移错误是由于在客户端执行多个局部优化步骤而引起的,以及2)部分客户参与错误,这是一个事实,仅一小部分子集边缘客户参加每轮培训。我们发现其中,只有前者在文献中受到了极大的关注。为了解决这个问题,我们提出了FedVarp,这是在服务器上应用的一种新颖的差异算法,它消除了由于部分客户参与而导致的错误。为此,服务器只是将每个客户端的最新更新保持在内存中,并将其用作每回合中非参与客户的替代更新。此外,为了减轻服务器上的内存需求,我们提出了一种新颖的基于聚类的方差降低算法clusterfedvarp。与以前提出的方法不同,FedVarp和ClusterFedVarp均不需要在客户端上进行其他计算或其他优化参数的通信。通过广泛的实验,我们表明FedVarp优于最先进的方法,而ClusterFedVarp实现了与FedVarp相当的性能,并且记忆要求较少。
translated by 谷歌翻译
从经验上证明,在跨客户聚集之前应用多个本地更新的实践是克服联合学习(FL)中的通信瓶颈的成功方法。在这项工作中,我们提出了一种通用食谱,即FedShuffle,可以更好地利用FL中的本地更新,尤其是在异质性方面。与许多先前的作品不同,FedShuffle在每个设备的更新数量上没有任何统一性。我们的FedShuffle食谱包括四种简单的功能成分:1)数据的本地改组,2)调整本地学习率,3)更新加权,4)减少动量方差(Cutkosky and Orabona,2019年)。我们对FedShuffle进行了全面的理论分析,并表明从理论和经验上讲,我们的方法都不遭受FL方法中存在的目标功能不匹配的障碍,这些方法假设在异质FL设置中,例如FedAvg(McMahan等人,McMahan等, 2017)。此外,通过将上面的成分结合起来,FedShuffle在Fednova上改善(Wang等,2020),以前提议解决此不匹配。我们还表明,在Hessian相似性假设下,通过降低动量方差的FedShuffle可以改善非本地方法。最后,通过对合成和现实世界数据集的实验,我们说明了FedShuffle中使用的四种成分中的每种如何有助于改善FL中局部更新的使用。
translated by 谷歌翻译
As a novel distributed learning paradigm, federated learning (FL) faces serious challenges in dealing with massive clients with heterogeneous data distribution and computation and communication resources. Various client-variance-reduction schemes and client sampling strategies have been respectively introduced to improve the robustness of FL. Among others, primal-dual algorithms such as the alternating direction of method multipliers (ADMM) have been found being resilient to data distribution and outperform most of the primal-only FL algorithms. However, the reason behind remains a mystery still. In this paper, we firstly reveal the fact that the federated ADMM is essentially a client-variance-reduced algorithm. While this explains the inherent robustness of federated ADMM, the vanilla version of it lacks the ability to be adaptive to the degree of client heterogeneity. Besides, the global model at the server under client sampling is biased which slows down the practical convergence. To go beyond ADMM, we propose a novel primal-dual FL algorithm, termed FedVRA, that allows one to adaptively control the variance-reduction level and biasness of the global model. In addition, FedVRA unifies several representative FL algorithms in the sense that they are either special instances of FedVRA or are close to it. Extensions of FedVRA to semi/un-supervised learning are also presented. Experiments based on (semi-)supervised image classification tasks demonstrate superiority of FedVRA over the existing schemes in learning scenarios with massive heterogeneous clients and client sampling.
translated by 谷歌翻译
联合学习(FL)算法通常在每个圆数(部分参与)大并且服务器的通信带宽有限时对每个轮子(部分参与)进行分数。近期对FL的收敛分析的作品专注于无偏见的客户采样,例如,随机均匀地采样,由于高度的系统异质性和统计异质性而均匀地采样。本文旨在设计一种自适应客户采样算法,可以解决系统和统计异质性,以最小化壁时钟收敛时间。我们获得了具有任意客户端采样概率的流动算法的新的遗传融合。基于界限,我们分析了建立了总学习时间和采样概率之间的关系,这导致了用于训练时间最小化的非凸优化问题。我们设计一种高效的算法来学习收敛绑定中未知参数,并开发低复杂性算法以大致解决非凸面问题。硬件原型和仿真的实验结果表明,与几个基线采样方案相比,我们所提出的采样方案显着降低了收敛时间。值得注意的是,我们的硬件原型的方案比均匀的采样基线花费73%,以达到相同的目标损失。
translated by 谷歌翻译
众所周知,客户师沟通可能是联邦学习中的主要瓶颈。在这项工作中,我们通过一种新颖的客户端采样方案解决了这个问题,我们将允许的客户数量限制为将其更新传达给主节点的数量。在每个通信回合中,所有参与的客户都会计算他们的更新,但只有具有“重要”更新的客户可以与主人通信。我们表明,可以仅使用更新的规范来衡量重要性,并提供一个公式以最佳客户参与。此公式将所有客户参与的完整更新与我们有限的更新(参与客户数量受到限制)之间的距离最小化。此外,我们提供了一种简单的算法,该算法近似于客户参与的最佳公式,该公式仅需要安全的聚合,因此不会损害客户的隐私。我们在理论上和经验上都表明,对于分布式SGD(DSGD)和联合平均(FedAvg),我们的方法的性能可以接近完全参与,并且优于基线,在参与客户均匀地采样的基线。此外,我们的方法与现有的减少通信开销(例如本地方法和通信压缩方法)的现有方法兼容。
translated by 谷歌翻译
联合优化(FedOpt),在大量分布式客户端协作培训学习模型的目标是对联邦学习的重要性。 Fedopt的主要问题可归因于模型分歧和通信效率,这显着影响了性能。在本文中,我们提出了一种新方法,即Losac,更有效地从异构分布式数据中学习。它的关键算法洞察力是在{每个}常规本地模型更新之后本地更新全局全梯度的估计。因此,Losac可以使客户的信息以更紧凑的方式刷新。特别是,我们研究了Losac的收敛结果。此外,Losac的奖金是能够从最近的技术泄漏梯度(DLG)中捍卫信息泄漏。最后,实验已经验证了与最先进的FedOpt算法比较Losac的优越性。具体而言,Losac平均超过100美元的价格提高了通信效率,减轻了模型分歧问题,并配备了对抗DLG的防御能力。
translated by 谷歌翻译
我们考虑开放的联合学习(FL)系统,客户可以在FL过程中加入和/或离开系统。鉴于当前客户端数量的差异,在开放系统中不能保证与固定模型的收敛性。取而代之的是,我们求助于一个新的性能指标,该指标称我们的开放式FL系统的稳定性为量,该指标量化了开放系统中学习模型的幅度。在假设本地客户端的功能强烈凸出和平滑的假设下,我们从理论上量化了两种FL算法的稳定性半径,即本地SGD和本地ADAM。我们观察到此半径依赖于几个关键参数,包括功能条件号以及随机梯度的方差。通过对合成和现实世界基准数据集的数值模拟,我们的理论结果得到了进一步验证。
translated by 谷歌翻译
当上行链路和下行链路通信都有错误时联合学习(FL)工作吗?通信噪音可以处理多少,其对学习性能的影响是什么?这项工作致力于通过明确地纳入流水线中的上行链路和下行链路嘈杂的信道来回答这些实际重要的问题。我们在同时上行链路和下行链路嘈杂通信通道上提供了多种新的融合分析,其包括完整和部分客户端参与,直接模型和模型差分传输,以及非独立和相同分布的(IID)本地数据集。这些分析表征了嘈杂通道的流动条件,使其具有与无通信错误的理想情况相同的融合行为。更具体地,为了保持FEDAVG的O(1 / T)具有完美通信的O(1 / T)收敛速率,应控制用于直接模型传输的上行链路和下行链路信噪比(SNR),使得它们被缩放为O(t ^ 2)其中T是通信轮的索引,但可以保持常量的模型差分传输。这些理论结果的关键洞察力是“雷达下的飞行”原则 - 随机梯度下降(SGD)是一个固有的噪声过程,并且可以容忍上行链路/下行链路通信噪声,只要它们不占据时变的SGD噪声即可。我们举例说明了具有两种广泛采用的通信技术 - 传输功率控制和多样性组合的这些理论发现 - 并通过使用多个真实世界流动任务的广泛数值实验进一步通过标准方法验证它们的性能优势。
translated by 谷歌翻译
Federated learning (FL) has achieved great success as a privacy-preserving distributed training paradigm, where many edge devices collaboratively train a machine learning model by sharing the model updates instead of the raw data with a server. However, the heterogeneous computational and communication resources of edge devices give rise to stragglers that significantly decelerate the training process. To mitigate this issue, we propose a novel FL framework named stochastic coded federated learning (SCFL) that leverages coded computing techniques. In SCFL, before the training process starts, each edge device uploads a privacy-preserving coded dataset to the server, which is generated by adding Gaussian noise to the projected local dataset. During training, the server computes gradients on the global coded dataset to compensate for the missing model updates of the straggling devices. We design a gradient aggregation scheme to ensure that the aggregated model update is an unbiased estimate of the desired global update. Moreover, this aggregation scheme enables periodical model averaging to improve the training efficiency. We characterize the tradeoff between the convergence performance and privacy guarantee of SCFL. In particular, a more noisy coded dataset provides stronger privacy protection for edge devices but results in learning performance degradation. We further develop a contract-based incentive mechanism to coordinate such a conflict. The simulation results show that SCFL learns a better model within the given time and achieves a better privacy-performance tradeoff than the baseline methods. In addition, the proposed incentive mechanism grants better training performance than the conventional Stackelberg game approach.
translated by 谷歌翻译