Federated Learning (FL) allows training machine learning models in privacy-constrained scenarios by enabling the cooperation of edge devices without requiring local data sharing. This approach raises several challenges due to the different statistical distribution of the local datasets and the clients' computational heterogeneity. In particular, the presence of highly non-i.i.d. data severely impairs both the performance of the trained neural network and its convergence rate, increasing the number of communication rounds requested to reach a performance comparable to that of the centralized scenario. As a solution, we propose FedSeq, a novel framework leveraging the sequential training of subgroups of heterogeneous clients, i.e. superclients, to emulate the centralized paradigm in a privacy-compliant way. Given a fixed budget of communication rounds, we show that FedSeq outperforms or match several state-of-the-art federated algorithms in terms of final performance and speed of convergence. Finally, our method can be easily integrated with other approaches available in the literature. Empirical results show that combining existing algorithms with FedSeq further improves its final performance and convergence speed. We test our method on CIFAR-10 and CIFAR-100 and prove its effectiveness in both i.i.d. and non-i.i.d. scenarios.
translated by 谷歌翻译
在联邦设置中接受培训的模型通常会遭受降解的表演,并且在概括方面失败,尤其是在面对异质场景时。在这项工作中,我们通过损失和黑森特征光谱的几何形状的镜头来研究这种行为,将模型缺乏概括能力与溶液的清晰度联系起来。通过先前的研究将损失表面和概括差距连接起来的动机,我们表明i)在本地培训客户,以清晰感最小化(SAM)或其自适应版本(ASAM)和II)平均随机重量(SWA)服务器端可以基本上改善联合学习的概括,并帮助弥合差距,以中央集权模型。通过在具有均匀损失均匀损失的社区中寻求参数,该模型会收敛于平坦的最小值及其泛化,从而在均质和异质情况下都显着改善。经验结果证明了这些优化器在各种基准视觉数据集(例如CIFAR10/100,Landmarks-User-160K,IDDA)和任务(大规模分类,语义分割,域概括)中的有效性。
translated by 谷歌翻译
联合学习(FL)是一种新兴技术,用于协作训练全球机器学习模型,同时将数据局限于用户设备。FL实施实施的主要障碍是用户之间的非独立且相同的(非IID)数据分布,这会减慢收敛性和降低性能。为了解决这个基本问题,我们提出了一种方法(comfed),以增强客户端和服务器侧的整个培训过程。舒适的关键思想是同时利用客户端变量减少技术来促进服务器聚合和全局自适应更新技术以加速学习。我们在CIFAR-10分类任务上的实验表明,Comfed可以改善专用于非IID数据的最新算法。
translated by 谷歌翻译
聚集的联合学习(FL)已显示通过将客户分组为群集,从而产生有希望的结果。这在单独的客户群在其本地数据的分布方面有显着差异的情况下特别有效。现有的集群FL算法实质上是在试图将客户群体组合在一起,以便同一集群中的客户可以利用彼此的数据来更好地执行联合学习。但是,先前的群集FL算法试图在培训期间间接学习这些分布相似性,这可能会很耗时,因为可能需要许多回合的联合学习,直到群集的形成稳定为止。在本文中,我们提出了一种新的联合学习方法,该方法直接旨在通过分析客户数据子空间之间的主要角度来有效地识别客户之间的分布相似性。每个客户端都以单一的方式在其本地数据上应用截断的奇异值分解(SVD)步骤,以得出一小部分主向量,该量提供了一个签名,可简洁地捕获基础分布的主要特征。提供了一组主要的主向量,以便服务器可以直接识别客户端之间的分布相似性以形成簇。这是通过比较这些主要向量跨越的客户数据子空间之间主要角度的相似性来实现的。该方法提供了一个简单而有效的集群FL框架,该框架解决了广泛的数据异质性问题,而不是标签偏斜的更简单的非iids形式。我们的聚类FL方法还可以为非凸目标目标提供融合保证。我们的代码可在https://github.com/mmorafah/pacfl上找到。
translated by 谷歌翻译
联合学习是一种在不违反隐私限制的情况下对分布式数据集进行统计模型培训统计模型的最新方法。通过共享模型而不是客户和服务器之间的数据来保留数据位置原则。这带来了许多优势,但也带来了新的挑战。在本报告中,我们探讨了这个新的研究领域,并执行了几项实验,以加深我们对这些挑战的理解以及不同的问题设置如何影响最终模型的性能。最后,我们为这些挑战之一提供了一种新颖的方法,并将其与文献中的其他方法进行了比较。
translated by 谷歌翻译
The statistical heterogeneity of the non-independent and identically distributed (non-IID) data in local clients significantly limits the performance of federated learning. Previous attempts like FedProx, SCAFFOLD, MOON, FedNova and FedDyn resort to an optimization perspective, which requires an auxiliary term or re-weights local updates to calibrate the learning bias or the objective inconsistency. However, in addition to previous explorations for improvement in federated averaging, our analysis shows that another critical bottleneck is the poorer optima of client models in more heterogeneous conditions. We thus introduce a data-driven approach called FedSkip to improve the client optima by periodically skipping federated averaging and scattering local models to the cross devices. We provide theoretical analysis of the possible benefit from FedSkip and conduct extensive experiments on a range of datasets to demonstrate that FedSkip achieves much higher accuracy, better aggregation efficiency and competing communication efficiency. Source code is available at: https://github.com/MediaBrain-SJTU/FedSkip.
translated by 谷歌翻译
作为一种有希望的隐私机器学习方法,联合学习(FL)可以使客户跨客户培训,而不会损害其机密的本地数据。但是,现有的FL方法遇到了不均分布数据的推理性能低的问题,因为它们中的大多数依赖于联合平均(FIDAVG)基于联合的聚合。通过以粗略的方式平均模型参数,FedAvg将局部模型的个体特征黯然失色,这极大地限制了FL的推理能力。更糟糕的是,在每一轮FL培训中,FedAvg向客户端向客户派遣了相同的初始本地模型,这很容易导致对最佳全局模型的局限性搜索。为了解决上述问题,本文提出了一种新颖有效的FL范式,名为FEDMR(联合模型重组)。与传统的基于FedAvg的方法不同,FEDMR的云服务器将收集到的本地型号的每一层层混合,并重组它们以实现新的模型,以供客户端培训。由于在每场FL比赛中进行了细粒度的模型重组和本地培训,FEDMR可以迅速为所有客户找出一个全球最佳模型。全面的实验结果表明,与最先进的FL方法相比,FEDMR可以显着提高推理准确性而不会引起额外的通信开销。
translated by 谷歌翻译
联合学习(FL)可以培训全球模型,而无需共享存储在多个设备上的分散的原始数据以保护数据隐私。由于设备的能力多样化,FL框架难以解决Straggler效应和过时模型的问题。此外,数据异质性在FL训练过程中会导致全球模型的严重准确性降解。为了解决上述问题,我们提出了一个层次同步FL框架,即Fedhisyn。 Fedhisyn首先根据其计算能力将所有可​​用的设备簇分为少数类别。经过一定的本地培训间隔后,将不同类别培训的模型同时上传到中央服务器。在单个类别中,设备根据环形拓扑会相互传达局部更新的模型权重。随着环形拓扑中训练的效率更喜欢具有均匀资源的设备,基于计算能力的分类减轻了Straggler效应的影响。此外,多个类别的同步更新与单个类别中的设备通信的组合有助于解决数据异质性问题,同时达到高精度。我们评估了基于MNIST,EMNIST,CIFAR10和CIFAR100数据集的提议框架以及设备的不同异质设置。实验结果表明,在训练准确性和效率方面,Fedhisyn的表现优于六种基线方法,例如FedAvg,脚手架和Fedat。
translated by 谷歌翻译
随着对数据隐私和数据量迅速增加的越来越关注,联邦学习(FL)已成为重要的学习范式。但是,在FL环境中共同学习深层神经网络模型被证明是一项非平凡的任务,因为与神经网络相关的复杂性,例如跨客户的各种体系结构,神经元的置换不变性以及非线性的存在每一层的转换。这项工作介绍了一个新颖的联合异质神经网络(FEDHENN)框架,该框架允许每个客户构建个性化模型,而无需在跨客户范围内实施共同的架构。这使每个客户都可以优化本地数据并计算约束,同时仍能从其他(可能更强大)客户端的学习中受益。 Fedhenn的关键思想是使用从同行客户端获得的实例级表示,以指导每个客户的同时培训。广泛的实验结果表明,Fedhenn框架能够在跨客户的同质和异质体系结构的设置中学习更好地表现客户的模型。
translated by 谷歌翻译
经常引用联合学习的挑战是数据异质性的存在 - 不同客户的数据可能遵循非常不同的分布。已经提出了几种联合优化方法来应对这些挑战。在文献中,经验评估通常从随机初始化开始联合培训。但是,在联合学习的许多实际应用中,服务器可以访问培训任务的代理数据,该数据可用于在开始联合培训之前用于预训练模型。我们从经验上研究了使用四个常见联合学习基准数据集从联邦学习中的预训练模型开始的影响。毫不奇怪,从预先训练的模型开始,比从随机初始化开始时,缩短了达到目标错误率所需的训练时间,并使训练更准确的模型(最高40 \%)。令人惊讶的是,我们还发现,从预先训练的初始化开始联合培训时,数据异质性的效果不那么重要。相反,从预先训练的模型开始时,使用服务器上的自适应优化器(例如\ textsc {fedadam})始终导致最佳准确性。我们建议未来提出和评估联合优化方法的工作在开始随机和预训练的初始化时考虑性能。我们还认为,这项研究提出了几个问题,以进一步了解异质性在联合优化中的作用。
translated by 谷歌翻译
联邦学习(FL)是一种分布式学习方法,它为医学机构提供了在全球模型中合作的前景,同时保留患者的隐私。尽管大多数医疗中心执行类似的医学成像任务,但它们的差异(例如专业,患者数量和设备)导致了独特的数据分布。数据异质性对FL和本地模型的个性化构成了挑战。在这项工作中,我们研究了FL生产中间半全球模型的一种自适应分层聚类方法,因此具有相似数据分布的客户有机会形成更专业的模型。我们的方法形成了几个群集,这些集群由具有最相似数据分布的客户端组成;然后,每个集群继续分开训练。在集群中,我们使用元学习来改善参与者模型的个性化。我们通过评估我们在HAM10K数据集上的建议方法和极端异质数据分布的HAM10K数据集上的我们提出的方法,将聚类方法与经典的FedAvg和集中式培训进行比较。我们的实验表明,与标准的FL方法相比,分类精度相比,异质分布的性能显着提高。此外,我们表明,如果在群集中应用,则模型会更快地收敛,并且仅使用一小部分数据,却优于集中式培训。
translated by 谷歌翻译
The heterogeneity of hardware and data is a well-known and studied problem in the community of Federated Learning (FL) as running under heterogeneous settings. Recently, custom-size client models trained with Knowledge Distillation (KD) has emerged as a viable strategy for tackling the heterogeneity challenge. However, previous efforts in this direction are aimed at client model tuning rather than their impact onto the knowledge aggregation of the global model. Despite performance of global models being the primary objective of FL systems, under heterogeneous settings client models have received more attention. Here, we provide more insights into how the chosen approach for training custom client models has an impact on the global model, which is essential for any FL application. We show the global model can fully leverage the strength of KD with heterogeneous data. Driven by empirical observations, we further propose a new approach that combines KD and Learning without Forgetting (LwoF) to produce improved personalised models. We bring heterogeneous FL on pair with the mighty FedAvg of homogeneous FL, in realistic deployment scenarios with dropping clients.
translated by 谷歌翻译
The growing interest in intelligent services and privacy protection for mobile devices has given rise to the widespread application of federated learning in Multi-access Edge Computing (MEC). Diverse user behaviors call for personalized services with heterogeneous Machine Learning (ML) models on different devices. Federated Multi-task Learning (FMTL) is proposed to train related but personalized ML models for different devices, whereas previous works suffer from excessive communication overhead during training and neglect the model heterogeneity among devices in MEC. Introducing knowledge distillation into FMTL can simultaneously enable efficient communication and model heterogeneity among clients, whereas existing methods rely on a public dataset, which is impractical in reality. To tackle this dilemma, Federated MultI-task Distillation for Multi-access Edge CompuTing (FedICT) is proposed. FedICT direct local-global knowledge aloof during bi-directional distillation processes between clients and the server, aiming to enable multi-task clients while alleviating client drift derived from divergent optimization directions of client-side local models. Specifically, FedICT includes Federated Prior Knowledge Distillation (FPKD) and Local Knowledge Adjustment (LKA). FPKD is proposed to reinforce the clients' fitting of local data by introducing prior knowledge of local data distributions. Moreover, LKA is proposed to correct the distillation loss of the server, making the transferred local knowledge better match the generalized representation. Experiments on three datasets show that FedICT significantly outperforms all compared benchmarks in various data heterogeneous and model architecture settings, achieving improved accuracy with less than 1.2% training communication overhead compared with FedAvg and no more than 75% training communication round compared with FedGKT.
translated by 谷歌翻译
跨不同边缘设备(客户)局部数据的分布不均匀,导致模型训练缓慢,并降低了联合学习的准确性。幼稚的联合学习(FL)策略和大多数替代解决方案试图通过加权跨客户的深度学习模型来实现更多公平。这项工作介绍了在现实世界数据集中遇到的一种新颖的非IID类型,即集群键,其中客户组具有具有相似分布的本地数据,从而导致全局模型收敛到过度拟合的解决方案。为了处理非IID数据,尤其是群集串数据的数据,我们提出了FedDrl,这是一种新型的FL模型,它采用了深厚的强化学习来适应每个客户的影响因素(将用作聚合过程中的权重)。在一组联合数据集上进行了广泛的实验证实,拟议的FEDDR可以根据CIFAR-100数据集的平均平均为FedAvg和FedProx方法提高了有利的改进,例如,高达4.05%和2.17%。
translated by 谷歌翻译
Federated learning (FL) is a method to train model with distributed data from numerous participants such as IoT devices. It inherently assumes a uniform capacity among participants. However, participants have diverse computational resources in practice due to different conditions such as different energy budgets or executing parallel unrelated tasks. It is necessary to reduce the computation overhead for participants with inefficient computational resources, otherwise they would be unable to finish the full training process. To address the computation heterogeneity, in this paper we propose a strategy for estimating local models without computationally intensive iterations. Based on it, we propose Computationally Customized Federated Learning (CCFL), which allows each participant to determine whether to perform conventional local training or model estimation in each round based on its current computational resources. Both theoretical analysis and exhaustive experiments indicate that CCFL has the same convergence rate as FedAvg without resource constraints. Furthermore, CCFL can be viewed of a computation-efficient extension of FedAvg that retains model performance while considerably reducing computation overhead.
translated by 谷歌翻译
自从联合学习(FL)被引入具有隐私保护的分散学习技术以来,分布式数据的统计异质性是实现FL应用中实现稳健性能和稳定收敛性的主要障碍。已经研究了模型个性化方法来克服这个问题。但是,现有的方法主要是在完全标记的数据的先决条件下,这在实践中是不现实的,由于需要专业知识。由部分标记的条件引起的主要问题是,标记数据不足的客户可能会遭受不公平的性能增益,因为他们缺乏足够的本地分销见解来自定义全球模型。为了解决这个问题,1)我们提出了一个新型的个性化的半监督学习范式,该范式允许部分标记或未标记的客户寻求与数据相关的客户(助手代理)的标签辅助,从而增强他们对本地数据的认识; 2)基于此范式,我们设计了一个基于不确定性的数据关系度量,以确保选定的帮助者可以提供值得信赖的伪标签,而不是误导当地培训; 3)为了减轻助手搜索引入的网络过载,我们进一步开发了助手选择协议,以实现有效的绩效牺牲的有效沟通。实验表明,与其他具有部分标记数据的相关作品相比,我们提出的方法可以获得卓越的性能和更稳定的收敛性,尤其是在高度异质的环境中。
translated by 谷歌翻译
在存在参与者的非IID数据分布的情况下,经典联合学习方法会产生明显的绩效降解。当每个本地数据集的分布与全局数据集有很大不同时,每个客户端的本地目标将与全局Optima不一致,从而导致本地更新中的漂移。这种现象极大地影响了客户的表现。这是为了让客户参加联合学习的主要动力是获得更好的个性化模型。为了解决上述问题,我们提出了一种新的算法弗利斯(Flis),该算法通过利用客户模型的推理相似性,将客户人口与可共同训练数据分布的群集分组。该框架捕获了设置,其中不同的用户组具有自己的目标(学习任务),但通过在同一集群(相同的学习任务)中汇总其数据以执行更有效和个性化的联合学习。我们提出了实验结果,以证明FLIS比CIFAR-100/10,SVHN和FMNIST数据集的最先进基准的好处。我们的代码可在https://github.com/mmorafah/flis上找到。
translated by 谷歌翻译
Federated Learning (FL) is extensively used to train AI/ML models in distributed and privacy-preserving settings. Participant edge devices in FL systems typically contain non-independent and identically distributed~(Non-IID) private data and unevenly distributed computational resources. Preserving user data privacy while optimizing AI/ML models in a heterogeneous federated network requires us to address data heterogeneity and system/resource heterogeneity. Hence, we propose \underline{R}esource-\underline{a}ware \underline{F}ederated \underline{L}earning~(RaFL) to address these challenges. RaFL allocates resource-aware models to edge devices using Neural Architecture Search~(NAS) and allows heterogeneous model architecture deployment by knowledge extraction and fusion. Integrating NAS into FL enables on-demand customized model deployment for resource-diverse edge devices. Furthermore, we propose a multi-model architecture fusion scheme allowing the aggregation of the distributed learning results. Results demonstrate RaFL's superior resource efficiency compared to SoTA.
translated by 谷歌翻译
联合学习使不同的各方能够在服务器的编排下协作建立全球模型,同时将培训数据保留在客户的设备上。但是,当客户具有异质数据时,性能会受到影响。为了解决这个问题,我们假设尽管数据异质性,但有些客户的数据分布可以集群。在以前的方法中,为了群集客户端,服务器要求客户端同时发送参数。但是,在有大量参与者可能有限的参与者的情况下,这可能是有问题的。为了防止这种瓶颈,我们提出了FLIC(使用增量聚类的联合学习),其中服务器利用客户在联合培训期间发送的客户发送的更新,而不是要求他们同时发送参数。因此,除了经典的联合学习所需的内容外,服务器与客户之间没有任何其他沟通。我们从经验上证明了各种非IID案例,我们的方法成功地按照相同的数据分布将客户分组分组。我们还通过研究其能力在联邦学习过程的早期阶段对客户进行分配的能力来确定FLIC的局限性。我们进一步将对模型的攻击作为数据异质性的一种形式,并从经验上表明,即使恶意客户的比例高于50 \%,FLIC也是针对中毒攻击的强大防御。
translated by 谷歌翻译
An oft-cited open problem of federated learning is the existence of data heterogeneity at the clients. One pathway to understanding the drastic accuracy drop in federated learning is by scrutinizing the behavior of the clients' deep models on data with different levels of "difficulty", which has been left unaddressed. In this paper, we investigate a different and rarely studied dimension of FL: ordered learning. Specifically, we aim to investigate how ordered learning principles can contribute to alleviating the heterogeneity effects in FL. We present theoretical analysis and conduct extensive empirical studies on the efficacy of orderings spanning three kinds of learning: curriculum, anti-curriculum, and random curriculum. We find that curriculum learning largely alleviates non-IIDness. Interestingly, the more disparate the data distributions across clients the more they benefit from ordered learning. We provide analysis explaining this phenomenon, specifically indicating how curriculum training appears to make the objective landscape progressively less convex, suggesting fast converging iterations at the beginning of the training procedure. We derive quantitative results of convergence for both convex and nonconvex objectives by modeling the curriculum training on federated devices as local SGD with locally biased stochastic gradients. Also, inspired by ordered learning, we propose a novel client selection technique that benefits from the real-world disparity in the clients. Our proposed approach to client selection has a synergic effect when applied together with ordered learning in FL.
translated by 谷歌翻译