智能论文笔记

TxAllo: Dynamic Transaction Allocation in Sharded Blockchain Systems

Yuanzhe Zhang , Shirui Pan , Jiangshan Yu

分类：人工智能

2022-12-22

The scalability problem has been one of the most significant barriers limiting the adoption of blockchains. Blockchain sharding is a promising approach to this problem. However, the sharding mechanism introduces a significant number of cross-shard transactions, which are expensive to process. This paper focuses on the transaction allocation problem to reduce the number of cross-shard transactions for better scalability. In particular, we systematically formulate the transaction allocation problem and convert it to the community detection problem on a graph. A deterministic and fast allocation scheme TxAllo is proposed to dynamically infer the allocation of accounts and their associated transactions. It directly optimizes the system throughput, considering both the number of cross-shard transactions and the workload balance among shards. We evaluate the performance of TxAllo on an Ethereum dataset containing over 91 million transactions. Our evaluation results show that for a blockchain with 60 shards, TxAllo reduces the cross-shard transaction ratio from 98% (by using traditional hash-based allocation) to about 12%. In the meantime, the workload balance is well maintained. Compared with other methods, the execution time of TxAllo is almost negligible. For example, when updating the allocation every hour, the execution of TxAllo only takes 0.5 seconds on average, whereas other concurrent works, such as BrokerChain (INFOCOM'22) leveraging the classic METIS method, require 422 seconds.

translated by 谷歌翻译

More Recent Advances in (Hyper)Graph Partitioning

Ümit V. Çatalyürek , Karen D. Devine , Marcelo Fonseca Faraj , Lars Gottesbüren , Tobias Heuer , Henning Meyerhenke , Peter Sanders , Sebastian Schlag , Christian Schulz , Daniel Seemaier

分类：机器学习

2022-05-26

近年来，在平衡（超级）图分配算法的设计和评估中取得了重大进展。我们调查了过去十年的实用算法的趋势，用于平衡（超级）图形分区以及未来的研究方向。我们的工作是对先前有关该主题的调查的更新。特别是，该调查还通过涵盖了超图形分区和流算法来扩展先前的调查，并额外关注并行算法。

translated by 谷歌翻译

How Much Does It Cost to Train a Machine Learning Model over Distributed Data Sources?

Elia Guerra , Francesc Wilhelmi , Marco Miozzo , Paolo Dini

分类：机器学习

2022-09-15

联合学习（FL）是标准集中学习范式的最吸引人的替代方案之一，允许异质的设备集训练机器学习模型而无需共享其原始数据。但是，FL需要中央服务器来协调学习过程，从而引入潜在的可扩展性和安全性问题。在文献中，已经提出了诸如八卦联合学习（GFL）和支持区块链的联合学习（BFL）之类的无服务器的方法来减轻这些问题。在这项工作中，我们提出了这三种技术的完整概述，该技术根据整体性能指标进行比较，包括模型准确性，时间复杂性，交流开销，收敛时间和能源消耗。广泛的模拟活动允许进行定量分析。特别是，GFL能够节省18％的训练时间，68％的能源和51％的数据相对于CFL解决方案，但无法达到CFL的准确性水平。另一方面，BFL代表了一个可行的解决方案，用于以更高级别的安全性实施分散的学习，以额外的能源使用和数据共享为代价。最后，我们确定了两个分散的联合学习实施的开放问题，并就该新研究领域的潜在扩展和可能的研究方向提供见解。

translated by 谷歌翻译

APPFLChain: A Privacy Protection Distributed Artificial-Intelligence Architecture Based on Federated Learning and Consortium Blockchain

Jun-Teng Yang , Wen-Yuan Chen , Che-Hua Li , Scott C. -H. Huang , Hsiao-Chun Wu

分类：人工智能 | 机器学习

2022-06-26

物联网的最新研究已被广泛应用于工业实践，促进了数据和连接设备的指数增长。此后，各方通过某些数据共享策略将访问数据驱动的AI模型。但是，当前大多数培训程序都依赖于集中式数据收集策略和单个计算服务器。但是，这样的集中计划可能会导致许多问题。存储在集中数据库中的客户数据可能会被篡改，因此数据的出处和真实性是不能合理的。一旦出现上述安全问题，训练有素的AI模型的可信度将是值得怀疑的，甚至在测试阶段也可能产生不利的结果。最近，已经探索了行业4.0和Web 3.0的两种核心技术区块链和AI，以促进分散的AI培训策略。为了实现这一目的，我们提出了一种称为Appflchain的新系统体系结构，即基于Hyperledger织物的区块链和联合学习范式的集成体系结构。我们提出的新系统允许不同的各方共同培训AI模型，其客户或利益相关者由基于联盟区块链的网络连接。由于用户不需要向服务器共享敏感的个人信息，因此我们的新系统可以保持高度的安全性和隐私性。为了进行数值评估，我们模拟了现实世界的场景，以说明Appflchain的整个操作过程。仿真结果表明，利用联盟区块链和联邦学习的特征，Appflchain可以证明有利的特性，包括不可耐受性，可追溯性，隐私保护和可靠的决策。

translated by 谷歌翻译

Partition-Tolerant and Byzantine-Tolerant Decision-Making for Distributed Robotic Systems with IOTA and ROS 2

Farhad Keramat , Jorge Peña Queralta , Tomi Westerlund

分类：机器人

2022-08-29

随着自动机器人解决方案无处不在的越来越多，对它们的连通性和多机器人系统中的合作的兴趣正在上升。当前研究问题的两个方面是机器人安全性和对拜占庭代理商的确保多机器人协作。已提出了区块链和其他分布式分类帐技术（DLT）来应对两个领域的挑战。但是，一些关键挑战包括现实世界网络中的可扩展性和部署。本文提出了一种集成IOTA和ROS 2的方法，以实现更可扩展的基于DLT的机器人系统，同时允许部署后进行网络分区耐受性。据我们所知，这是机器人系统IOTA智能合约的首次实施，以及与ROS2的首次集成设计，这与依赖以太坊的绝大多数文献相比。我们提出了一般的IOTA+ROS 2体系结构，导致耐隔离的决策过程，该过程也从嵌入式区块链结构中继承了拜占庭式公差属性。我们证明了在具有间歇性网络连接的系统中进行合作映射应用程序的拟议框架的有效性。在存在网络分区的情况下，我们在以太坊方面表现出了卓越的性能，在计算资源利用方面的影响很小。这些结果为分布式机器人系统中的区块链解决方案更广泛地集成开辟了道路，其连接性和计算要求较少。

translated by 谷歌翻译

Learning, Computing, and Trustworthiness in Intelligent IoT Environments: Performance-Energy Tradeoffs

Beatriz Soret , Lam D. Nguyen , Jan Seeger , Arne Bröring , Chaouki Ben Issaid , Sumudu Samarakoon , Anis El Gabli , Vivek Kulkarni , Mehdi Bennis , Petar Popovski

分类：人工智能

2021-10-04

智能物联网环境（iiote）由可以协作执行半自动的IOT应用的异构装置，其示例包括高度自动化的制造单元或自主交互收获机器。能量效率是这种边缘环境中的关键，因为它们通常基于由无线和电池运行设备组成的基础设施，例如电子拖拉机，无人机，自动引导车辆（AGV）S和机器人。总能源消耗从多种技术技术汲取贡献，使得能够实现边缘计算和通信，分布式学习以及分布式分区和智能合同。本文提供了本技术的最先进的概述，并说明了它们的功能和性能，特别关注资源，延迟，隐私和能源消耗之间的权衡。最后，本文提供了一种在节能IIOTE和路线图中集成这些能力技术的愿景，以解决开放的研究挑战

translated by 谷歌翻译

FAIR-BFL: Flexible and Incentive Redesign for Blockchain-based Federated Learning

Rongxin Xu , Shiva Raj Pokhrel , Qiujun Lan , Gang Li

分类：人工智能

2022-06-26

Vanilla联合学习（FL）依赖于集中的全球聚合机制，并假设所有客户都是诚实的。这使得FL减轻单一失败和不诚实客户的挑战。由于FL和区块链的好处（例如，民主，激励性和不变性），FL的设计理念中的这些即将到来的挑战呼吁基于区块链的联邦学习（BFL）。但是，香草BFL中的一个问题是，它的功能不会以动态的方式遵循采用者的需求。此外，Vanilla BFL依赖于无法验证的客户的自我报告的贡献，例如数据大小，因为在FL中不允许检查客户的原始数据是否存在隐私问题。我们设计和评估了一种新型的BFL框架，并以更大的灵活性和激励机制（称为Fair-BFL）解决了香草BFL中确定的挑战。与现有作品相反，Fair-BFL通过模块化设计提供了前所未有的灵活性，使采用者可以按照动态的方式调整其业务需求的能力。我们的设计说明了BFL量化每个客户对全球学习过程的贡献的能力。这种量化提供了一个合理的指标，可以在联邦客户之间分配奖励，并帮助发现可能毒害全球模型的恶意参与者。

translated by 谷歌翻译

Blockchain-enabled Server-less Federated Learning

Francesc Wilhelmi , Lorenza Giupponi , Paolo Dini

分类：机器学习

2021-12-15

通过参与大规模联合学习（FL）优化的设备的异构性质的激励，我们专注于由区块链（BC）技术赋予的异步服务器的FL解决方案。与主要采用的FL方法相比，假设同步操作，我们提倡一个异步方法，由此，模型聚合作为客户端提交本地更新。异步设置与具有异构客户端的实际大规模设置中的联合优化思路非常适合。因此，它可能导致通信开销和空闲时段的效率提高。为了评估启用了BC启用的FL的学习完成延迟，我们提供了基于批量服务队列理论的分析模型。此外，我们提供仿真结果以评估同步和异步机制的性能。涉及BC启用的流量的重要方面，例如网络大小，链路容量或用户要求，并分析并分析。随着我们的结果表明，同步设置导致比异步案例更高的预测精度。然而，异步联合优化在许多情况下提供了更低的延迟，从而在处理大数据集时成为一种吸引力的FL解决方案，严重的时序约束（例如，近实时应用）或高度不同的训练数据。

translated by 谷歌翻译

Edge-Native Intelligence for 6G Communications Driven by Federated Learning: A Survey of Trends and Challenges

Mohammad Al-Quraan , Lina Mohjazi , Lina Bariah , Anthony Centeno , Ahmed Zoha , Sami Muhaidat , Mérouane Debbah , Muhammad Ali Imran

分类：人工智能

2021-11-14

使用人工智能（AI）赋予无线网络中数据量的前所未有的数据量激增，为提供无处不在的数据驱动智能服务而开辟了新的视野。通过集中收集数据集和培训模型来实现传统的云彩中心学习（ML）基础的服务。然而，这种传统的训练技术包括两个挑战：（i）由于数据通信增加而导致的高通信和能源成本，（ii）通过允许不受信任的各方利用这些信息来威胁数据隐私。最近，鉴于这些限制，一种新兴的新兴技术，包括联合学习（FL），以使ML带到无线网络的边缘。通过以分布式方式培训全局模型，可以通过FL Server策划的全局模型来提取数据孤岛的好处。 FL利用分散的数据集和参与客户的计算资源，在不影响数据隐私的情况下开发广义ML模型。在本文中，我们介绍了对FL的基本面和能够实现技术的全面调查。此外，提出了一个广泛的研究，详细说明了无线网络中的流体的各种应用，并突出了他们的挑战和局限性。进一步探索了FL的疗效，其新兴的前瞻性超出了第五代（B5G）和第六代（6G）通信系统。本调查的目的是在关键的无线技术中概述了流动的技术，这些技术将作为建立对该主题的坚定了解的基础。最后，我们向未来的研究方向提供前进的道路。

translated by 谷歌翻译

An Efficient and Reliable Asynchronous Federated Learning Scheme for Smart Public Transportation

Chenhao Xu , Youyang Qu , Tom H. Luan , Peter W. Eklund , Yong Xiang , Longxiang Gao

分类：机器学习

2022-08-15

机器学习（ML）是一种在车辆互联网（IOV）上培训预测模型的分布式方法，以实现智能公共交通。由于交通状况会随着时间而变化，因此必须连续有效地更新流量流动和乘客等待时间的ML模型。联合学习（FL）是一种分布式机器学习方案，允许车辆接收连续的模型更新，而无需将原始数据上传到云中并等待培训模型。但是，由于车辆在公共场所旅行以来，智能公共交通中FL容易受到中毒或DDOS攻击的影响。此外，由于设备异质性和不平衡数据分布，同步聚合策略在聚集之前从特定车辆中收集本地模型的同步聚合策略效率低下。尽管有异步联合学习（AFL）方案是通过收到本地模型来提高效率的，但陈旧的本地模型仍然不合理地加权，导致学习绩效不佳。为了实现更明智的公共交通，本文提供了一个基于动态缩放系数（DBAFL）的基于区块链的异步联合学习方案。具体而言，基于委员会的新型共识算法用于区块链，以最低的时间成本提高了可靠性。同时，设计的动态缩放系数允许AFL为陈旧的本地模型分配合理的重量。在异质设备上进行的广泛实验验证了DBAFL的学习效果，效率和可靠性优于外观的实验。

translated by 谷歌翻译

Deep Learning-Driven Edge Video Analytics: A Survey

Renjie Xu , Saiedeh Razavi , Rong Zheng

分类：计算机视觉 | 机器学习

2022-11-28

Video, as a key driver in the global explosion of digital information, can create tremendous benefits for human society. Governments and enterprises are deploying innumerable cameras for a variety of applications, e.g., law enforcement, emergency management, traffic control, and security surveillance, all facilitated by video analytics (VA). This trend is spurred by the rapid advancement of deep learning (DL), which enables more precise models for object classification, detection, and tracking. Meanwhile, with the proliferation of Internet-connected devices, massive amounts of data are generated daily, overwhelming the cloud. Edge computing, an emerging paradigm that moves workloads and services from the network core to the network edge, has been widely recognized as a promising solution. The resulting new intersection, edge video analytics (EVA), begins to attract widespread attention. Nevertheless, only a few loosely-related surveys exist on this topic. A dedicated venue for collecting and summarizing the latest advances of EVA is highly desired by the community. Besides, the basic concepts of EVA (e.g., definition, architectures, etc.) are ambiguous and neglected by these surveys due to the rapid development of this domain. A thorough clarification is needed to facilitate a consensus on these concepts. To fill in these gaps, we conduct a comprehensive survey of the recent efforts on EVA. In this paper, we first review the fundamentals of edge computing, followed by an overview of VA. The EVA system and its enabling techniques are discussed next. In addition, we introduce prevalent frameworks and datasets to aid future researchers in the development of EVA systems. Finally, we discuss existing challenges and foresee future research directions. We believe this survey will help readers comprehend the relationship between VA and edge computing, and spark new ideas on EVA.

translated by 谷歌翻译

A Comprehensive Survey on Distributed Training of Graph Neural Networks

Haiyang Lin , Mingyu Yan , Xiaochun Ye , Dongrui Fan , Shirui Pan , Wenguang Chen , Yuan Xie

分类：机器学习

2022-11-10

Graph neural networks (GNNs) have been demonstrated to be a powerful algorithmic model in broad application fields for their effectiveness in learning over graphs. To scale GNN training up for large-scale and ever-growing graphs, the most promising solution is distributed training which distributes the workload of training across multiple computing nodes. However, the workflows, computational patterns, communication patterns, and optimization techniques of distributed GNN training remain preliminarily understood. In this paper, we provide a comprehensive survey of distributed GNN training by investigating various optimization techniques used in distributed GNN training. First, distributed GNN training is classified into several categories according to their workflows. In addition, their computational patterns and communication patterns, as well as the optimization techniques proposed by recent work are introduced. Second, the software frameworks and hardware platforms of distributed GNN training are also introduced for a deeper understanding. Third, distributed GNN training is compared with distributed training of deep neural networks, emphasizing the uniqueness of distributed GNN training. Finally, interesting issues and opportunities in this field are discussed.

translated by 谷歌翻译

A Marketplace for Trading AI Models based on Blockchain and Incentives for IoT Data

Lam Duc Nguyen , Shashi Raj Pandey , Soret Beatriz , Arne Broering , Petar Popovski

分类：机器学习

2021-12-06

由于机器学习（ML）模型变得越来越复杂，其中一个中央挑战是它们在规模的部署，使得公司和组织可以通过人工智能（AI）创造价值。 ML中的新兴范式是一种联合方法，其中学习模型部分地将其交付给一组异构剂，允许代理与自己的数据一起培训模型。然而，模型的估值问题，以及数据/模型的协作培训和交易的激励问题，在文献中获得了有限的待遇。本文提出了一种在基于信任区块基网络上交易的ML模型交易的新生态系统。买方可以获得ML市场的兴趣模型，兴趣的卖家将本地计算花在他们的数据上，以增强该模型的质量。在这样做时，考虑了本地数据与训练型型号的质量之间的比例关系，并且通过分布式数据福价（DSV）估计了销售课程中的训练中的数据的估值。同时，通过分布式分区技术（DLT）提供整个交易过程的可信度。对拟议方法的广泛实验评估显示出具有竞争力的运行时间绩效，在参与者的激励方面下降了15 \％。

translated by 谷歌翻译

Federated Learning in Mobile Edge Networks: A Comprehensive Survey

Wei Yang Bryan Lim , Nguyen Cong Luong , Dinh Thai Hoang , Yutao Jiao , Ying-Chang Liang , Qiang Yang , Dusit Niyato , Chunyan Miao

分类：

2019-09-26

In recent years, mobile devices are equipped with increasingly advanced sensing and computing capabilities. Coupled with advancements in Deep Learning (DL), this opens up countless possibilities for meaningful applications, e.g., for medical purposes and in vehicular networks. Traditional cloudbased Machine Learning (ML) approaches require the data to be centralized in a cloud server or data center. However, this results in critical issues related to unacceptable latency and communication inefficiency. To this end, Mobile Edge Computing (MEC) has been proposed to bring intelligence closer to the edge, where data is produced. However, conventional enabling technologies for ML at mobile edge networks still require personal data to be shared with external parties, e.g., edge servers. Recently, in light of increasingly stringent data privacy legislations and growing privacy concerns, the concept of Federated Learning (FL) has been introduced. In FL, end devices use their local data to train an ML model required by the server. The end devices then send the model updates rather than raw data to the server for aggregation. FL can serve as an enabling technology in mobile edge networks since it enables the collaborative training of an ML model and also enables DL for mobile edge network optimization. However, in a large-scale and complex mobile edge network, heterogeneous devices with varying constraints are involved. This raises challenges of communication costs, resource allocation, and privacy and security in the implementation of FL at scale. In this survey, we begin with an introduction to the background and fundamentals of FL. Then, we highlight the aforementioned challenges of FL implementation and review existing solutions. Furthermore, we present the applications of FL for mobile edge network optimization. Finally, we discuss the important challenges and future research directions in FL.

translated by 谷歌翻译

Secure and Efficient Federated Learning Through Layering and Sharding Blockchain

Shuo Yuan , Bin Cao , Yao Sun , Mugen Peng

分类：人工智能

2021-04-27

联合学习（FL）已成为工业物联网（IIOT）网络中数字双胞胎的必不可少的技术。但是，由于FL的主/奴隶结构，抵制主聚合器的单点失败以及恶意IIOT设备的攻击是非常具有挑战性的，同时保证了模型收敛速度和准确性。最近，区块链已进入FL系统，将范式转换为分散的方式，从而进一步提高了系统的安全性和学习可靠性。不幸的是，由于资源消耗庞大，交易量有限和高度沟通复杂性，区块链系统的传统共识机制和架构几乎无法处理大规模的FL任务并在IIT设备上运行。为了解决这些问题，本文提出了一个两层区块链驱动的FL系统，称为Chainfl，该系统将IIOT网络分为多个碎片，作为限制信息交换的标准层，并采用直接的无循环图（DAG） - 基于主链作为主链层，以实现平行和异步的横断面验证。此外，FL程序是定制的，以与区块链深入集成，并提出了修改的DAG共识机制来减轻由异常模型引起的失真。为了提供概念验证的实施和评估，部署了基于HyperLeDger面料和基于自发DAG的Mainchain的多个子链。广泛的实验结果表明，我们提出的链条系统以可接受和快速的训练效率（最高14％）和更强的鲁棒性（最多3次）优于现有的主要FL系统。

translated by 谷歌翻译

Applications of Multi-Agent Reinforcement Learning in Future Internet: A Comprehensive Survey

Tianxu Li , Kun Zhu , Nguyen Cong Luong , Dusit Niyato , Qihui Wu , Yang Zhang , Bing Chen

分类：人工智能 | 机器学习

2021-10-26

未来的互联网涉及几种新兴技术，例如5G和5G网络，车辆网络，无人机（UAV）网络和物联网（IOT）。此外，未来的互联网变得异质并分散了许多相关网络实体。每个实体可能需要做出本地决定，以在动态和不确定的网络环境下改善网络性能。最近使用标准学习算法，例如单药强化学习（RL）或深入强化学习（DRL），以使每个网络实体作为代理人通过与未知环境进行互动来自适应地学习最佳决策策略。但是，这种算法未能对网络实体之间的合作或竞争进行建模，而只是将其他实体视为可能导致非平稳性问题的环境的一部分。多机构增强学习（MARL）允许每个网络实体不仅观察环境，还可以观察其他实体的政策来学习其最佳政策。结果，MAL可以显着提高网络实体的学习效率，并且最近已用于解决新兴网络中的各种问题。在本文中，我们因此回顾了MAL在新兴网络中的应用。特别是，我们提供了MARL的教程，以及对MARL在下一代互联网中的应用进行全面调查。特别是，我们首先介绍单代机Agent RL和MARL。然后，我们回顾了MAL在未来互联网中解决新兴问题的许多应用程序。这些问题包括网络访问，传输电源控制，计算卸载，内容缓存，数据包路由，无人机网络的轨迹设计以及网络安全问题。

translated by 谷歌翻译

Beyond 5G Networks: Integration of Communication, Computing, Caching, and Control

Musbahu Mohammed Adam , Liqiang Zhao , Kezhi Wang , Zhu Han

分类：机器学习

2022-12-26

In recent years, the exponential proliferation of smart devices with their intelligent applications poses severe challenges on conventional cellular networks. Such challenges can be potentially overcome by integrating communication, computing, caching, and control (i4C) technologies. In this survey, we first give a snapshot of different aspects of the i4C, comprising background, motivation, leading technological enablers, potential applications, and use cases. Next, we describe different models of communication, computing, caching, and control (4C) to lay the foundation of the integration approach. We review current state-of-the-art research efforts related to the i4C, focusing on recent trends of both conventional and artificial intelligence (AI)-based integration approaches. We also highlight the need for intelligence in resources integration. Then, we discuss integration of sensing and communication (ISAC) and classify the integration approaches into various classes. Finally, we propose open challenges and present future research directions for beyond 5G networks, such as 6G.

translated by 谷歌翻译

Blockchain-based Recommender Systems: Applications, Challenges and Future Opportunities

Yassine Himeur , Aya Sayed , Abdullah Alsalemi , Faycal Bensaali , Abbes Amira , Iraklis Varlamis , Magdalini Eirinaki , Christos Sardianos , George Dimitrakopoulos

分类：机器学习

2021-11-22

推荐系统已广泛应用于不同的应用领域，包括能量保存，电子商务，医疗保健，社交媒体等。此类应用需要分析和挖掘大量各种类型的用户数据，包括人口统计，偏好，社会互动等，以便开发准确和精确的推荐系统。此类数据集通常包括敏感信息，但大多数推荐系统专注于模型的准确性和忽略与安全性和用户隐私相关的问题。尽管使用不同的风险减少技术克服这些问题，但它们都没有完全成功，确保了对用户的私人信息的密码安全和保护。为了弥合这一差距，区块链技术作为推动推荐系统中的安全和隐私保存的有希望的策略，不仅是因为其安全性和隐私性突出特征，而且由于其恢复力，适应性，容错和信任特性。本文介绍了涵盖挑战，开放问题和解决方案的基于区块链的推荐系统的整体综述。因此，引入了精心设计的分类，以描述安全和隐私挑战，概述现有框架并在使用区块链之前讨论其应用程序和利益，以指示未来的研究机会。

translated by 谷歌翻译

A Fast Blockchain-based Federated Learning Framework with Compressed Communications

Laizhong Cui , Xiaoxin Su , Yipeng Zhou

分类：机器学习

2022-08-12

最近，基于区块链的联合学习（BFL）引起了密集的研究关注，因为培训过程是可审核的，并且该体系结构无助于避免了Vanilla Federated学习（VFL）中参数服务器的单点故障。然而，BFL大大升级了通信流量量，因为BFL客户端获得的所有本地模型更新（即，模型参数的更改）都将转移给所有矿工进行验证以及所有客户端以进行聚合。相比之下，参数服务器和VFL中的客户端仅保留汇总模型更新。因此，BFL的巨大沟通流量将不可避免地损害培训效率，并阻碍BFL现实的部署。为了提高BFL的实用性，我们是第一个通过压缩BFL中的通信（称为BCFL）来提出基于快速区块链的联合学习框架的人之一。同时，我们得出了BCFL的收敛速率，而非凸损失损失。为了最大化最终模型的准确性，我们进一步提出问题，以最大程度地减少收敛率的训练损失，而相对于压缩率和块生成速率的训练时间有限，这是BI-CONVEX优化问题，可以是有效解决。最后，为了证明BCFL的效率，我们对标准CIFAR-10和女权主义数据集进行了广泛的实验。我们的实验结果不仅验证了我们的分析的正确性，而且还表明BCFL可以显着将通信流量降低95-98％，或者与BFL相比，训练时间缩短了90-95％。

translated by 谷歌翻译

Tutel: Adaptive Mixture-of-Experts at Scale

Changho Hwang , Wei Cui , Yifan Xiong , Ziyue Yang , Ze Liu , Han Hu , Zilong Wang , Rafael Salas , Jithin Jose , Prabhat Ram

分类：自然语言处理 | 计算机视觉

2022-06-07

近年来，Experts（MOE）的混合物已成为一种有前途的深度学习技术，可以将模型能力扩展为万亿多个参数，同时通过稀疏计算降低计算成本。虽然MoE开设了一个非常大的模型的新领域，但由于MOE的动态性质与系统的静态平行性/管道层之间的不匹配，因此其数以千计的GPU的实现受到限制。我们提出了Tutel，这是一种具有动态自适应并行性和管道的高度可扩展的堆栈设计和实现。 TUTEL在运行时提供自适应并行性切换和自适应管道，分别达到1.74倍和2.00倍的单MOE层加速度。我们还提出了一种用于MOE通信速度的新颖的二维层次结构算法，该算法的表现超过了2,048 GPU的先前最先前的最新时间。 Tutel汇总了所有技术，最终在16 GPU和2,048 GPU上分别提供了4.96倍和5.75倍的加速度，分别通过Fairseq：Meta的Facebook AI AI研究序列到序列工具Kit（Tutel（Tutel）（Tutel）（Tutel）（现在由Fairseq部分采用）。 Tutel源代码可在公共场所获得：https：//github.com/microsoft/tutel。我们的评估表明，Tutel有效，有效地运行了一个基于现实的MOE模型，名为Swinv2-Moe，建立在Swin Transformer V2上，这是一种最先进的计算机视觉体系结构。在效率方面，Tutel加速了Swinv2-MoE，在FairSeq的训练和推理中分别达到1.55倍和2.11倍的速度。关于有效性，SWINV2-MOE模型在预训练和下游计算机视觉任务（例如可可对象检测）方面都比对应的密度密度模型都达到了卓越的精度，这表明Tutel准备对端到端现实世界模型训练的准备就绪和推理。 Swinv2-Moe在https://github.com/microsoft/swin-transformer中开放。

translated by 谷歌翻译