To make machine learning (ML) sustainable and apt to run on the diverse devices where relevant data is, it is essential to compress ML models as needed, while still meeting the required learning quality and time performance. However, how much and when an ML model should be compressed, and {\em where} its training should be executed, are hard decisions to make, as they depend on the model itself, the resources of the available nodes, and the data such nodes own. Existing studies focus on each of those aspects individually, however, they do not account for how such decisions can be made jointly and adapted to one another. In this work, we model the network system focusing on the training of DNNs, formalize the above multi-dimensional problem, and, given its NP-hardness, formulate an approximate dynamic programming problem that we solve through the PACT algorithmic framework. Importantly, PACT leverages a time-expanded graph representing the learning process, and a data-driven and theoretical approach for the prediction of the loss evolution to be expected as a consequence of training decisions. We prove that PACT's solutions can get as close to the optimum as desired, at the cost of an increased time complexity, and that, in any case, such complexity is polynomial. Numerical results also show that, even under the most disadvantageous settings, PACT outperforms state-of-the-art alternatives and closely matches the optimal energy cost.
translated by 谷歌翻译
分布式机器学习的传统方法是将学习算法调整到网络中,例如减少更新以遏制开销。相反,基于智能边缘的网络使得可以遵循相反的方法,即定义围绕要执行的学习任务的逻辑网络拓扑,以达到所需的学习表现。在本文中,我们提出了一个系统模型,该模型在监督机器学习的背景下捕获了此类方面,考虑了学习节点(执行计算)和信息节点(提供数据)。然后,我们制定了选择(i)的问题,哪些学习和信息节点应配合以完成学习任务,以及(ii)执行的迭代次数,以最大程度地减少学习成本,同时满足目标预测错误和执行时间。在证明了上述问题的重要属性之后,我们设计了一种名为DoubleClemb的算法,该算法可以找到1+1/| i | -competive解决方案(具有i是一组信息节点),具有分立最差的复杂性。我们的绩效评估,利用现实世界的网络拓扑并考虑分类和回归任务,还表明,双重攀登与最佳,优于最先进的替代方案非常匹配。
translated by 谷歌翻译
组合优化是运营研究和计算机科学领域的一个公认领域。直到最近,它的方法一直集中在孤立地解决问题实例,而忽略了它们通常源于实践中的相关数据分布。但是,近年来,人们对使用机器学习,尤其是图形神经网络(GNN)的兴趣激增,作为组合任务的关键构件,直接作为求解器或通过增强确切的求解器。GNN的电感偏差有效地编码了组合和关系输入,因为它们对排列和对输入稀疏性的意识的不变性。本文介绍了对这个新兴领域的最新主要进步的概念回顾,旨在优化和机器学习研究人员。
translated by 谷歌翻译
This paper surveys the recent attempts, both from the machine learning and operations research communities, at leveraging machine learning to solve combinatorial optimization problems. Given the hard nature of these problems, state-of-the-art algorithms rely on handcrafted heuristics for making decisions that are otherwise too expensive to compute or mathematically not well defined. Thus, machine learning looks like a natural candidate to make such decisions in a more principled and optimized way. We advocate for pushing further the integration of machine learning and combinatorial optimization and detail a methodology to do so. A main point of the paper is seeing generic optimization problems as data points and inquiring what is the relevant distribution of problems to use for learning on a given task.
translated by 谷歌翻译
In recent years, deep learning (DL) models have demonstrated remarkable achievements on non-trivial tasks such as speech recognition and natural language understanding. One of the significant contributors to its success is the proliferation of end devices that acted as a catalyst to provide data for data-hungry DL models. However, computing DL training and inference is the main challenge. Usually, central cloud servers are used for the computation, but it opens up other significant challenges, such as high latency, increased communication costs, and privacy concerns. To mitigate these drawbacks, considerable efforts have been made to push the processing of DL models to edge servers. Moreover, the confluence point of DL and edge has given rise to edge intelligence (EI). This survey paper focuses primarily on the fifth level of EI, called all in-edge level, where DL training and inference (deployment) are performed solely by edge servers. All in-edge is suitable when the end devices have low computing resources, e.g., Internet-of-Things, and other requirements such as latency and communication cost are important in mission-critical applications, e.g., health care. Firstly, this paper presents all in-edge computing architectures, including centralized, decentralized, and distributed. Secondly, this paper presents enabling technologies, such as model parallelism and split learning, which facilitate DL training and deployment at edge servers. Thirdly, model adaptation techniques based on model compression and conditional computation are described because the standard cloud-based DL deployment cannot be directly applied to all in-edge due to its limited computational resources. Fourthly, this paper discusses eleven key performance metrics to evaluate the performance of DL at all in-edge efficiently. Finally, several open research challenges in the area of all in-edge are presented.
translated by 谷歌翻译
通过边界估计可以显着简化求解约束优化问题(COP),即提供成本函数的紧密边界。通过使用由已知边界的数据组成的数据以及COMPS提取的特征来馈送监督机器学习(ML)模型,可以训练模型以估计新COP实例的边界。在本文中,我们首先概述了来自问题实例的约束编程(CP)的ML的现有知识体系。其次,我们介绍了应用于支持CP解算器的工具的边界估计框架。在该框架内,讨论并评估了不同的ML模型,并评估其对边界估计的适用性,并避免避免求解器找到最佳解决方案的不可行估计的对策。第三,我们在七个警察中提出了一种实验研究,与不同的CP溶剂。我们的结果表明,可以仅限于这些警察的近似最佳边界。这些估计的边界将客观域大小减少60-88%,可以帮助求解器在搜索期间提前找到近乎最佳解决方案。
translated by 谷歌翻译
深度学习技术在各种任务中都表现出了出色的有效性,并且深度学习具有推进多种应用程序(包括在边缘计算中)的潜力,其中将深层模型部署在边缘设备上,以实现即时的数据处理和响应。一个关键的挑战是,虽然深层模型的应用通常会产生大量的内存和计算成本,但Edge设备通常只提供非常有限的存储和计算功能,这些功能可能会在各个设备之间差异很大。这些特征使得难以构建深度学习解决方案,以释放边缘设备的潜力,同时遵守其约束。应对这一挑战的一种有希望的方法是自动化有效的深度学习模型的设计,这些模型轻巧,仅需少量存储,并且仅产生低计算开销。该调查提供了针对边缘计算的深度学习模型设计自动化技术的全面覆盖。它提供了关键指标的概述和比较,这些指标通常用于量化模型在有效性,轻度和计算成本方面的水平。然后,该调查涵盖了深层设计自动化技术的三类最新技术:自动化神经体系结构搜索,自动化模型压缩以及联合自动化设计和压缩。最后,调查涵盖了未来研究的开放问题和方向。
translated by 谷歌翻译
In recent years, mobile devices are equipped with increasingly advanced sensing and computing capabilities. Coupled with advancements in Deep Learning (DL), this opens up countless possibilities for meaningful applications, e.g., for medical purposes and in vehicular networks. Traditional cloudbased Machine Learning (ML) approaches require the data to be centralized in a cloud server or data center. However, this results in critical issues related to unacceptable latency and communication inefficiency. To this end, Mobile Edge Computing (MEC) has been proposed to bring intelligence closer to the edge, where data is produced. However, conventional enabling technologies for ML at mobile edge networks still require personal data to be shared with external parties, e.g., edge servers. Recently, in light of increasingly stringent data privacy legislations and growing privacy concerns, the concept of Federated Learning (FL) has been introduced. In FL, end devices use their local data to train an ML model required by the server. The end devices then send the model updates rather than raw data to the server for aggregation. FL can serve as an enabling technology in mobile edge networks since it enables the collaborative training of an ML model and also enables DL for mobile edge network optimization. However, in a large-scale and complex mobile edge network, heterogeneous devices with varying constraints are involved. This raises challenges of communication costs, resource allocation, and privacy and security in the implementation of FL at scale. In this survey, we begin with an introduction to the background and fundamentals of FL. Then, we highlight the aforementioned challenges of FL implementation and review existing solutions. Furthermore, we present the applications of FL for mobile edge network optimization. Finally, we discuss the important challenges and future research directions in FL.
translated by 谷歌翻译
While machine learning is traditionally a resource intensive task, embedded systems, autonomous navigation, and the vision of the Internet of Things fuel the interest in resource-efficient approaches. These approaches aim for a carefully chosen trade-off between performance and resource consumption in terms of computation and energy. The development of such approaches is among the major challenges in current machine learning research and key to ensure a smooth transition of machine learning technology from a scientific environment with virtually unlimited computing resources into everyday's applications. In this article, we provide an overview of the current state of the art of machine learning techniques facilitating these real-world requirements. In particular, we focus on deep neural networks (DNNs), the predominant machine learning models of the past decade. We give a comprehensive overview of the vast literature that can be mainly split into three non-mutually exclusive categories: (i) quantized neural networks, (ii) network pruning, and (iii) structural efficiency. These techniques can be applied during training or as post-processing, and they are widely used to reduce the computational demands in terms of memory footprint, inference speed, and energy efficiency. We also briefly discuss different concepts of embedded hardware for DNNs and their compatibility with machine learning techniques as well as potential for energy and latency reduction. We substantiate our discussion with experiments on well-known benchmark datasets using compression techniques (quantization, pruning) for a set of resource-constrained embedded systems, such as CPUs, GPUs and FPGAs. The obtained results highlight the difficulty of finding good trade-offs between resource efficiency and predictive performance.
translated by 谷歌翻译
这本数字本书包含在物理模拟的背景下与深度学习相关的一切实际和全面的一切。尽可能多,所有主题都带有Jupyter笔记本的形式的动手代码示例,以便快速入门。除了标准的受监督学习的数据中,我们将看看物理丢失约束,更紧密耦合的学习算法,具有可微分的模拟,以及加强学习和不确定性建模。我们生活在令人兴奋的时期:这些方法具有从根本上改变计算机模拟可以实现的巨大潜力。
translated by 谷歌翻译
Convergence bounds are one of the main tools to obtain information on the performance of a distributed machine learning task, before running the task itself. In this work, we perform a set of experiments to assess to which extent, and in which way, such bounds can predict and improve the performance of real-world distributed (namely, federated) learning tasks. We find that, as can be expected given the way they are obtained, bounds are quite loose and their relative magnitude reflects the training rather than the testing loss. More unexpectedly, we find that some of the quantities appearing in the bounds turn out to be very useful to identify the clients that are most likely to contribute to the learning process, without requiring the disclosure of any information about the quality or size of their datasets. This suggests that further research is warranted on the ways -- often counter-intuitive -- in which convergence bounds can be exploited to improve the performance of real-world distributed learning tasks.
translated by 谷歌翻译
Deep neural networks (DNNs) are currently widely used for many artificial intelligence (AI) applications including computer vision, speech recognition, and robotics. While DNNs deliver state-of-the-art accuracy on many AI tasks, it comes at the cost of high computational complexity. Accordingly, techniques that enable efficient processing of DNNs to improve energy efficiency and throughput without sacrificing application accuracy or increasing hardware cost are critical to the wide deployment of DNNs in AI systems.This article aims to provide a comprehensive tutorial and survey about the recent advances towards the goal of enabling efficient processing of DNNs. Specifically, it will provide an overview of DNNs, discuss various hardware platforms and architectures that support DNNs, and highlight key trends in reducing the computation cost of DNNs either solely via hardware design changes or via joint hardware design and DNN algorithm changes. It will also summarize various development resources that enable researchers and practitioners to quickly get started in this field, and highlight important benchmarking metrics and design considerations that should be used for evaluating the rapidly growing number of DNN hardware designs, optionally including algorithmic co-designs, being proposed in academia and industry.The reader will take away the following concepts from this article: understand the key design considerations for DNNs; be able to evaluate different DNN hardware implementations with benchmarks and comparison metrics; understand the trade-offs between various hardware architectures and platforms; be able to evaluate the utility of various DNN design techniques for efficient processing; and understand recent implementation trends and opportunities.
translated by 谷歌翻译
在本文中,提出了一种新的方法,该方法允许基于神经网络(NN)均衡器的低复杂性发展,以缓解高速相干光学传输系统中的损伤。在这项工作中,我们提供了已应用于馈电和经常性NN设计的各种深层模型压缩方法的全面描述和比较。此外,我们评估了这些策略对每个NN均衡器的性能的影响。考虑量化,重量聚类,修剪和其他用于模型压缩的尖端策略。在这项工作中,我们提出并评估贝叶斯优化辅助压缩,其中选择了压缩的超参数以同时降低复杂性并提高性能。总之,通过使用模拟和实验数据来评估每种压缩方法的复杂性及其性能之间的权衡,以完成分析。通过利用最佳压缩方法,我们表明可以设计基于NN的均衡器,该均衡器比传统的数字背部传播(DBP)均衡器具有更好的性能,并且只有一个步骤。这是通过减少使用加权聚类和修剪算法后在NN均衡器中使用的乘数数量来完成的。此外,我们证明了基于NN的均衡器也可以实现卓越的性能,同时仍然保持与完整的电子色色散补偿块相同的复杂性。我们通过强调开放问题和现有挑战以及未来的研究方向来结束分析。
translated by 谷歌翻译
\ textit {约束路径发现}的经典问题是一个经过充分研究但充满挑战的主题,在各个领域,例如沟通和运输等各个领域的应用。权重限制了最短路径问题(WCSPP),作为仅具有一个侧面约束的约束路径查找的基本形式,旨在计划成本最佳路径,其权重/资源使用受到限制。鉴于问题的双标准性质(即处理路径的成本和权重),解决WCSPP的方法具有一些带有双目标搜索的共同属性。本文在约束路径查找和双目标搜索中利用了最新的基于A*的最新技术,并为WCSPP提供了两种精确的解决方案方法,两者都可以在非常大的图表上解决硬性问题实例。我们从经验上评估了算法在新的大型和现实的问题实例上的性能,并在时空指标中显示出它们比最新算法的优势。本文还调查了优先级队列在被a*的约束搜索中的重要性。我们通过对逼真的和随机图进行了广泛的实验来展示,基于桶的队列没有打破打盘的方式可以有效地改善详尽的双标准搜索的算法性能。
translated by 谷歌翻译
培训深度神经网络(DNNS)每年都会变得越来越多地资源和能源密集型。不幸的是,现有作品主要集中于优化DNN培训以更快完成,而无需考虑对能源效率的影响。在本文中,我们观察到改善训练绩效的常见实践通常会导致能源使用效率低下。更重要的是,我们证明能耗和性能优化之间存在权衡。为此,我们提出了一个优化框架,宙斯,通过自动找到重复出现的DNN培训工作的最佳作业和GPU级配置来导航这种权衡。宙斯与即时的能源分析一起使用了在线探索 - 开发方法,避免了对昂贵的离线测量的需求,同时适应数据随着时间的流逝。我们的评估表明,宙斯可以将DNN培训的能源效率提高15.3%-75.8%,以减少75.8%。
translated by 谷歌翻译
最近,使用卷积神经网络(CNNS)存在移动和嵌入式应用的爆炸性增长。为了减轻其过度的计算需求,开发人员传统上揭示了云卸载,突出了高基础设施成本以及对网络条件的强烈依赖。另一方面,强大的SOC的出现逐渐启用设备执行。尽管如此,低端和中层平台仍然努力充分运行最先进的CNN。在本文中,我们展示了Dyno,一种分布式推断框架,将两全其人的最佳框架结合起来解决了几个挑战,例如设备异质性,不同的带宽和多目标要求。启用这是其新的CNN特定数据包装方法,其在onloading计算时利用CNN的不同部分的精度需求的可变性以及其新颖的调度器,该调度器共同调谐分区点并在运行时传输数据精度适应其执行环境的推理。定量评估表明,Dyno优于当前最先进的,通过竞争对手的CNN卸载系统,在竞争对手的CNN卸载系统上提高吞吐量超过一个数量级,最高可达60倍的数据。
translated by 谷歌翻译
联邦学习(FL)最近由于其在保留隐私而使用分散数据的能力,最近引起了人们的关注。但是,这也提出了与参与设备的异质性有关的其他挑战,无论是在其计算能力和贡献数据方面。同时,神经体系结构搜索(NAS)已成功用于集中式数据集,从而产生了最新的结果,从而获得了受限(硬件意识)和不受约束的设置。但是,即使是在NAS和FL的交集的最新工作,也假定了与数据中心硬件的均匀计算环境,并且无法解决使用受约束,异质设备的问题。结果,在联合环境中对NAS的实际用法仍然是我们在工作中解决的一个空旷的问题。我们设计我们的系统Fedoras,在处理具有非IID分布数据的不同功能的设备时发现和培训有希望的体系结构,并提供了其在不同环境中有效性的经验证据。具体而言,我们在跨越三种不同模式(视觉,语音,文本)的数据集中评估了Fedoras,并且与最先进的联合解决方案相比,其性能更好,同时保持资源效率。
translated by 谷歌翻译
在线强化学习(RL)中的挑战之一是代理人需要促进对环境的探索和对样品的利用来优化其行为。无论我们是否优化遗憾,采样复杂性,状态空间覆盖范围或模型估计,我们都需要攻击不同的勘探开发权衡。在本文中,我们建议在分离方法组成的探索 - 剥削问题:1)“客观特定”算法(自适应)规定哪些样本以收集到哪些状态,似乎它可以访问a生成模型(即环境的模拟器); 2)负责尽可能快地生成规定样品的“客观无关的”样品收集勘探策略。建立最近在随机最短路径问题中进行探索的方法,我们首先提供一种算法,它给出了每个状态动作对所需的样本$ B(S,a)$的样本数量,需要$ \ tilde {o} (bd + d ^ {3/2} s ^ 2 a)收集$ b = \ sum_ {s,a} b(s,a)$所需样本的$时间步骤,以$ s $各国,$ a $行动和直径$ d $。然后我们展示了这种通用探索算法如何与“客观特定的”策略配对,这些策略规定了解决各种设置的样本要求 - 例如,模型估计,稀疏奖励发现,无需无成本勘探沟通MDP - 我们获得改进或新颖的样本复杂性保证。
translated by 谷歌翻译
智能物联网环境(iiote)由可以协作执行半自动的IOT应用的异构装置,其示例包括高度自动化的制造单元或自主交互收获机器。能量效率是这种边缘环境中的关键,因为它们通常基于由无线和电池运行设备组成的基础设施,例如电子拖拉机,无人机,自动引导车辆(AGV)S和机器人。总能源消耗从多种技术技术汲取贡献,使得能够实现边缘计算和通信,分布式学习以及分布式分区和智能合同。本文提供了本技术的最先进的概述,并说明了它们的功能和性能,特别关注资源,延迟,隐私和能源消耗之间的权衡。最后,本文提供了一种在节能IIOTE和路线图中集成这些能力技术的愿景,以解决开放的研究挑战
translated by 谷歌翻译
近年来,变异量子算法(例如量子近似优化算法(QAOA))越来越受欢迎,因为它们提供了使用NISQ设备来解决硬组合优化问题的希望。但是,众所周知,在低深度,QAOA的某些位置限制限制了其性能。为了超越这些局限性,提出了QAOA的非本地变体,即递归QAOA(RQAOA),以提高近似溶液的质量。 RQAOA的研究比QAOA的研究较少,例如,对于哪种情况,它可能无法提供高质量的解决方案。但是,由于我们正在解决$ \ mathsf {np} $ - 硬问题(特别是Ising旋转模型),因此预计RQAOA确实会失败,这提出了设计更好的组合优化量子算法的问题。本着这种精神,我们识别和分析了RQAOA失败的情况,并基于此,提出了增强的学习增强的RQAOA变体(RL-RQAOA),从而改善了RQAOA。我们表明,RL-RQAOA的性能改善了RQAOA:RL-RQAOA在这些识别的实例中,RQAOA表现不佳,并且在RQAOA几乎是最佳的情况下也表现出色。我们的工作体现了增强学习与量子(启发)优化之间的潜在有益的协同作用,这是针对硬性问题的新的,甚至更好的启发式方法。
translated by 谷歌翻译