基于深度强化学习(DRL)的神经调度程序已经显示出巨大的解决现实世界资源分配问题的潜力,因为它们在集群计算领域表现出显着的性能增长。在本文中,我们通过广泛的实验和与非神经,启发式调度程序进行比较,调查了神经调度程序对芯片(SOC)资源分配的域(SOC)资源域的可行性。关键发现是三倍。首先,由于i)SOC计算资源的异质性和ii)由传入工作中的随机性引起的可变动作集,因此为群集计算域而设计的神经调度程序对SOC无法正常工作。其次,我们的新型神经调度程序技术,折衷的相互作用匹配(EIM)克服了上述挑战,从而显着改善了现有的神经调度程序。具体而言,我们合理化了基于EIM的神经调度程序的性能增长背后的根本原因。第三,我们发现平均处理元件(PE)切换延迟和平均PE计算时间的比率也会显着影响神经SOC调度程序的性能,即使使用EIM。因此,未来的神经SOC调度程序设计必须考虑该指标及其实施开销,以实施实用性。
translated by 谷歌翻译
The exponential growth in demand for digital services drives massive datacenter energy consumption and negative environmental impacts. Promoting sustainable solutions to pressing energy and digital infrastructure challenges is crucial. Several hyperscale cloud providers have announced plans to power their datacenters using renewable energy. However, integrating renewables to power the datacenters is challenging because the power generation is intermittent, necessitating approaches to tackle power supply variability. Hand engineering domain-specific heuristics-based schedulers to meet specific objective functions in such complex dynamic green datacenter environments is time-consuming, expensive, and requires extensive tuning by domain experts. The green datacenters need smart systems and system software to employ multiple renewable energy sources (wind and solar) by intelligently adapting computing to renewable energy generation. We present RARE (Renewable energy Aware REsource management), a Deep Reinforcement Learning (DRL) job scheduler that automatically learns effective job scheduling policies while continually adapting to datacenters' complex dynamic environment. The resulting DRL scheduler performs better than heuristic scheduling policies with different workloads and adapts to the intermittent power supply from renewables. We demonstrate DRL scheduler system design parameters that, when tuned correctly, produce better performance. Finally, we demonstrate that the DRL scheduler can learn from and improve upon existing heuristic policies using Offline Learning.
translated by 谷歌翻译
计算机架构和系统已优化了很长时间,以便高效执行机器学习(ML)模型。现在,是时候重新考虑ML和系统之间的关系,并让ML转换计算机架构和系统的设计方式。这有一个双重含义:改善设计师的生产力,以及完成良性周期。在这篇论文中,我们对应用ML进行计算机架构和系统设计的工作进行了全面的审查。首先,我们考虑ML技术在架构/系统设计中的典型作用,即快速预测建模或设计方法,我们执行高级分类学。然后,我们总结了通过ML技术解决的计算机架构/系统设计中的常见问题,并且所用典型的ML技术来解决它们中的每一个。除了在狭义中强调计算机架构外,我们采用数据中心可被认为是仓库规模计算机的概念;粗略的计算机系统中提供粗略讨论,例如代码生成和编译器;我们还注意ML技术如何帮助和改造设计自动化。我们进一步提供了对机会和潜在方向的未来愿景,并设想应用ML的计算机架构和系统将在社区中蓬勃发展。
translated by 谷歌翻译
Coflow是最近提出的网络抽象,以帮助提高数据并行计算作业的通信性能。在多阶段作业中,每个作业包括多个Coflows,由定向的非循环图(DAG)表示。有效地调度Coflows对于提高数据中心中的数据并行计算性能至关重要。与手动调度启发式相比,现有的工作Deepweave [1]利用强化学习(RL)框架自动生成高效的CoFlow调度策略。它采用图形神经网络(GNN)来编码一组嵌入向量中的作业信息,并将包含整个作业信息的平面嵌入载体馈送到策略网络。然而,这种方法的可扩展性差,因为它无法应对由任意尺寸和形状的DAG表示的作业,这需要大型策略网络来处理难以训练的高维嵌入载体。在本文中,我们首先利用了一条定向的无循环图神经网络(DAGNN)来处理输入并提出一种新型流水线-DAGNN,其可以有效地加速DAGNN的特征提取过程。接下来,我们馈送由可调度的Coflows组成的嵌入序列,而不是将所有Coflows的平面嵌入到策略网络上,并输出优先级序列,这使得策略网络的大小仅取决于特征的维度而不是产品的维度作业的DAG中的节点数量和节点数量,提高优先级调度策略的准确性,我们将自我注意机制纳入深度RL模型,以捕获嵌入序列不同部分之间的交互,以使输出优先级进行输出优先级分数相关。基于此模型,我们开发了一种用于在线多级作业的Coflow调度算法。
translated by 谷歌翻译
The following article presents a memetic algorithm with applying deep reinforcement learning (DRL) for solving practically oriented dual resource constrained flexible job shop scheduling problems (DRC-FJSSP). In recent years, there has been extensive research on DRL techniques, but without considering realistic, flexible and human-centered shopfloors. A research gap can be identified in the context of make-to-order oriented discontinuous manufacturing as it is often represented in medium-size companies with high service levels. From practical industry projects in this domain, we recognize requirements to depict flexible machines, human workers and capabilities, setup and processing operations, material arrival times, complex job paths with parallel tasks for bill of material (BOM) manufacturing, sequence-depended setup times and (partially) automated tasks. On the other hand, intensive research has been done on metaheuristics in the context of DRC-FJSSP. However, there is a lack of suitable and generic scheduling methods that can be holistically applied in sociotechnical production and assembly processes. In this paper, we first formulate an extended DRC-FJSSP induced by the practical requirements mentioned. Then we present our proposed hybrid framework with parallel computing for multicriteria optimization. Through numerical experiments with real-world data, we confirm that the framework generates feasible schedules efficiently and reliably. Utilizing DRL instead of random operations leads to better results and outperforms traditional approaches.
translated by 谷歌翻译
This paper studies a model for online job scheduling in green datacenters. In green datacenters, resource availability depends on the power supply from the renewables. Intermittent power supply from renewables leads to intermittent resource availability, inducing job delays (and associated costs). Green datacenter operators must intelligently manage their workloads and available power supply to extract maximum benefits. The scheduler's objective is to schedule jobs on a set of resources to maximize the total value (revenue) while minimizing the overall job delay. A trade-off exists between achieving high job value on the one hand and low expected delays on the other. Hence, the aims of achieving high rewards and low costs are in opposition. In addition, datacenter operators often prioritize multiple objectives, including high system utilization and job completion. To accomplish the opposing goals of maximizing total job value and minimizing job delays, we apply the Proportional-Integral-Derivative (PID) Lagrangian methods in Deep Reinforcement Learning to job scheduling problem in the green datacenter environment. Lagrangian methods are widely used algorithms for constrained optimization problems. We adopt a controls perspective to learn the Lagrange multiplier with proportional, integral, and derivative control, achieving favorable learning dynamics. Feedback control defines cost terms for the learning agent, monitors the cost limits during training, and continuously adjusts the learning parameters to achieve stable performance. Our experiments demonstrate improved performance compared to scheduling policies without the PID Lagrangian methods. Experimental results illustrate the effectiveness of the Constraint Controlled Reinforcement Learning (CoCoRL) scheduler that simultaneously satisfies multiple objectives.
translated by 谷歌翻译
未来的互联网涉及几种新兴技术,例如5G和5G网络,车辆网络,无人机(UAV)网络和物联网(IOT)。此外,未来的互联网变得异质并分散了许多相关网络实体。每个实体可能需要做出本地决定,以在动态和不确定的网络环境下改善网络性能。最近使用标准学习算法,例如单药强化学习(RL)或深入强化学习(DRL),以使每个网络实体作为代理人通过与未知环境进行互动来自适应地学习最佳决策策略。但是,这种算法未能对网络实体之间的合作或竞争进行建模,而只是将其他实体视为可能导致非平稳性问题的环境的一部分。多机构增强学习(MARL)允许每个网络实体不仅观察环境,还可以观察其他实体的政策来学习其最佳政策。结果,MAL可以显着提高网络实体的学习效率,并且最近已用于解决新兴网络中的各种问题。在本文中,我们因此回顾了MAL在新兴网络中的应用。特别是,我们提供了MARL的教程,以及对MARL在下一代互联网中的应用进行全面调查。特别是,我们首先介绍单代机Agent RL和MARL。然后,我们回顾了MAL在未来互联网中解决新兴问题的许多应用程序。这些问题包括网络访问,传输电源控制,计算卸载,内容缓存,数据包路由,无人机网络的轨迹设计以及网络安全问题。
translated by 谷歌翻译
动态作业车间调度问题(DJSP)是一类是专门考虑固有的不确定性,如切换顺序要求和现实的智能制造的设置可能机器故障调度任务。因为传统方法不能动态生成环境的扰动面有效调度策略,我们制定DJSP马尔可夫决策过程(MDP)通过强化学习(RL)加以解决。为此,我们提出了一个灵活的混合架构,采用析取图的状态和一组通用的调度规则与之前最小的领域知识的行动空间。注意机制被用作状态的特征提取的图形表示学习(GRL)模块,并且采用双决斗深Q-网络与优先重放和嘈杂的网络(D3QPN)到每个状态映射到最适当的调度规则。此外,我们提出Gymjsp,基于众所周知的或图书馆公共标杆,提供了RL和DJSP研究社区标准化现成的现成工具。各种DJSP实例综合实验证实,我们提出的框架是优于基准算法可在所有情况下,较小的完工时间,并提供了在混合架构的各个组成部分的有效性实证理由。
translated by 谷歌翻译
深度强化学习(DRL)赋予了各种人工智能领域,包括模式识别,机器人技术,推荐系统和游戏。同样,图神经网络(GNN)也证明了它们在图形结构数据的监督学习方面的出色表现。最近,GNN与DRL用于图形结构环境的融合引起了很多关注。本文对这些混合动力作品进行了全面评论。这些作品可以分为两类:(1)算法增强,其中DRL和GNN相互补充以获得更好的实用性; (2)特定于应用程序的增强,其中DRL和GNN相互支持。这种融合有效地解决了工程和生命科学方面的各种复杂问题。基于审查,我们进一步分析了融合这两个领域的适用性和好处,尤其是在提高通用性和降低计算复杂性方面。最后,集成DRL和GNN的关键挑战以及潜在的未来研究方向被突出显示,这将引起更广泛的机器学习社区的关注。
translated by 谷歌翻译
蒙特卡洛树搜索(MCT)是设计游戏机器人或解决顺序决策问题的强大方法。该方法依赖于平衡探索和开发的智能树搜索。MCT以模拟的形式进行随机抽样,并存储动作的统计数据,以在每个随后的迭代中做出更有教育的选择。然而,该方法已成为组合游戏的最新技术,但是,在更复杂的游戏(例如那些具有较高的分支因素或实时系列的游戏)以及各种实用领域(例如,运输,日程安排或安全性)有效的MCT应用程序通常需要其与问题有关的修改或与其他技术集成。这种特定领域的修改和混合方法是本调查的主要重点。最后一项主要的MCT调查已于2012年发布。自发布以来出现的贡献特别感兴趣。
translated by 谷歌翻译
Deep reinforcement learning algorithms have succeeded in several challenging domains. Classic Online RL job schedulers can learn efficient scheduling strategies but often takes thousands of timesteps to explore the environment and adapt from a randomly initialized DNN policy. Existing RL schedulers overlook the importance of learning from historical data and improving upon custom heuristic policies. Offline reinforcement learning presents the prospect of policy optimization from pre-recorded datasets without online environment interaction. Following the recent success of data-driven learning, we explore two RL methods: 1) Behaviour Cloning and 2) Offline RL, which aim to learn policies from logged data without interacting with the environment. These methods address the challenges concerning the cost of data collection and safety, particularly pertinent to real-world applications of RL. Although the data-driven RL methods generate good results, we show that the performance is highly dependent on the quality of the historical datasets. Finally, we demonstrate that by effectively incorporating prior expert demonstrations to pre-train the agent, we short-circuit the random exploration phase to learn a reasonable policy with online training. We utilize Offline RL as a \textbf{launchpad} to learn effective scheduling policies from prior experience collected using Oracle or heuristic policies. Such a framework is effective for pre-training from historical datasets and well suited to continuous improvement with online data collection.
translated by 谷歌翻译
Graph mining tasks arise from many different application domains, ranging from social networks, transportation to E-commerce, etc., which have been receiving great attention from the theoretical and algorithmic design communities in recent years, and there has been some pioneering work employing the research-rich Reinforcement Learning (RL) techniques to address graph data mining tasks. However, these graph mining methods and RL models are dispersed in different research areas, which makes it hard to compare them. In this survey, we provide a comprehensive overview of RL and graph mining methods and generalize these methods to Graph Reinforcement Learning (GRL) as a unified formulation. We further discuss the applications of GRL methods across various domains and summarize the method descriptions, open-source codes, and benchmark datasets of GRL methods. Furthermore, we propose important directions and challenges to be solved in the future. As far as we know, this is the latest work on a comprehensive survey of GRL, this work provides a global view and a learning resource for scholars. In addition, we create an online open-source for both interested scholars who want to enter this rapidly developing domain and experts who would like to compare GRL methods.
translated by 谷歌翻译
工作流程调度是一个并行和分布式计算(PDC)的长期研究,旨在有效地利用计算资源来满足用户的服务要求。最近提出的调度方法利用边缘计算平台的低响应时间来优化服务质量(QoS)。然而,由于计算异质性,移动设备的延迟以及工作负载资源要求的挥发性,因此由于计算异质性而挑战,在移动边缘云系统中的调度工作流程应用是具有挑战性的。为了克服这些困难,它是必不可少的,但同时具有挑战性,开发一种有效地模拟QoS目标的长视力优化方案。在这项工作中,我们提出了MCDS:Monte Carlo学习使用Deep代理模型来有效地安排移动边缘云计算系统中的工作流程应用。 MCD是一种基于人工智能(AI)的调度方法,它使用基于树的搜索策略和基于深度神经网络的代理模型来估计即时动作的长期QoS影响,以实现调度决策的鲁棒优化。物理和模拟边缘云试验台的实验表明,MCD在能耗,响应时间,SLA违规方面可以改善最先进的方法,违规和成本分别至少为6.13,4.56,45.09和30.71%。
translated by 谷歌翻译
我们为处理顺序决策和外在不确定性的应用程序开发了增强学习(RL)框架,例如资源分配和库存管理。在这些应用中,不确定性仅由于未来需求等外源变量所致。一种流行的方法是使用历史数据预测外源变量,然后对预测进行计划。但是,这种间接方法需要对外源过程进行高保真模型,以确保良好的下游决策,当外源性过程复杂时,这可能是不切实际的。在这项工作中,我们提出了一种基于事后观察学习的替代方法,该方法避开了对外源过程进行建模的建模。我们的主要见解是,与Sim2real RL不同,我们可以在历史数据中重新审视过去的决定,并在这些应用程序中对其他动作产生反事实后果。我们的框架将事后最佳的行动用作政策培训信号,并在决策绩效方面具有强大的理论保证。我们使用框架开发了一种算法,以分配计算资源,以用于现实世界中的Microsoft Azure工作负载。结果表明,我们的方法比域特异性的启发式方法和SIM2REAL RL基准学习更好的政策。
translated by 谷歌翻译
近年来,在平衡(超级)图分配算法的设计和评估中取得了重大进展。我们调查了过去十年的实用算法的趋势,用于平衡(超级)图形分区以及未来的研究方向。我们的工作是对先前有关该主题的调查的更新。特别是,该调查还通过涵盖了超图形分区和流算法来扩展先前的调查,并额外关注并行算法。
translated by 谷歌翻译
In recent years, the exponential proliferation of smart devices with their intelligent applications poses severe challenges on conventional cellular networks. Such challenges can be potentially overcome by integrating communication, computing, caching, and control (i4C) technologies. In this survey, we first give a snapshot of different aspects of the i4C, comprising background, motivation, leading technological enablers, potential applications, and use cases. Next, we describe different models of communication, computing, caching, and control (4C) to lay the foundation of the integration approach. We review current state-of-the-art research efforts related to the i4C, focusing on recent trends of both conventional and artificial intelligence (AI)-based integration approaches. We also highlight the need for intelligence in resources integration. Then, we discuss integration of sensing and communication (ISAC) and classify the integration approaches into various classes. Finally, we propose open challenges and present future research directions for beyond 5G networks, such as 6G.
translated by 谷歌翻译
Monte Carlo Tree Search (MCTS) is a recently proposed search method that combines the precision of tree search with the generality of random sampling. It has received considerable interest due to its spectacular success in the difficult problem of computer Go, but has also proved beneficial in a range of other domains. This paper is a survey of the literature to date, intended to provide a snapshot of the state of the art after the first five years of MCTS research. We outline the core algorithm's derivation, impart some structure on the many variations and enhancements that have been proposed, and summarise the results from the key game and non-game domains to which MCTS methods have been applied. A number of open research questions indicate that the field is ripe for future work.
translated by 谷歌翻译
事件处理是动态和响应互联网(物联网)的基石。该领域的最近方法基于代表性状态转移(REST)原则,其允许将事件处理任务放置在遵循相同原理的任何设备上。但是,任务应在边缘设备之间正确分布,以确保公平资源利用率和保证无缝执行。本文调查了深入学习的使用,以公平分配任务。提出了一种基于关注的神经网络模型,在不同场景下产生有效的负载平衡解决方案。所提出的模型基于变压器和指针网络架构,并通过Advantage演员批评批评学习算法训练。该模型旨在缩放到事件处理任务的数量和边缘设备的数量,不需要重新调整甚至再刷新。广泛的实验结果表明,拟议的模型在许多关键绩效指标中优于传统的启发式。通用设计和所获得的结果表明,所提出的模型可能适用于几个其他负载平衡问题变化,这使得该提案是由于其可扩展性和效率而在现实世界场景中使用的有吸引力的选择。
translated by 谷歌翻译
培训深神经网络(DNNS)在企业和云数据中心都广受欢迎。现有的DNN培训调度程序将GPU视为主要资源,并分配其他资源,例如CPU和内存与作业要求的GPU数量成正比。不幸的是,这些调度程序不考虑作业对CPU,内存和存储资源分配的敏感性的影响。在这项工作中,我们提出了Synergy,这是一种对共享GPU群集的资源敏感调度程序。通过乐观的分析,协同作用侵犯了DNN对不同资源的敏感性;某些工作可能会从GPU育儿分配中受益更多,而某些工作可能不会受到GPU育儿分配的影响。 Synergy使用新的近乎最佳的在线算法在共享的多租户集群上安排的一组作业进行了多余的工作量感知作业。我们的实验表明,与传统的GPU育儿计划相比,工作量感知的CPU和内存分配可以提高平均JCT高达3.4倍。
translated by 谷歌翻译
Recent advances in distributed artificial intelligence (AI) have led to tremendous breakthroughs in various communication services, from fault-tolerant factory automation to smart cities. When distributed learning is run over a set of wirelessly connected devices, random channel fluctuations and the incumbent services running on the same network impact the performance of both distributed learning and the coexisting service. In this paper, we investigate a mixed service scenario where distributed AI workflow and ultra-reliable low latency communication (URLLC) services run concurrently over a network. Consequently, we propose a risk sensitivity-based formulation for device selection to minimize the AI training delays during its convergence period while ensuring that the operational requirements of the URLLC service are met. To address this challenging coexistence problem, we transform it into a deep reinforcement learning problem and address it via a framework based on soft actor-critic algorithm. We evaluate our solution with a realistic and 3GPP-compliant simulator for factory automation use cases. Our simulation results confirm that our solution can significantly decrease the training delay of the distributed AI service while keeping the URLLC availability above its required threshold and close to the scenario where URLLC solely consumes all network resources.
translated by 谷歌翻译