在分布式计算集群上有效地调度数据处理作业需要复杂的算法。然而,当前系统使用简单的通用启发式并忽略工作负载结构,因为为每个工作负载开发和调整定制启发式是不可行的。在本文中,我们展示了现代机器学习技术可以自动生成高效的策略。 Decima使用强化学习(RL)和神经网络来学习特定于工作负载的调度算法,而无需任何人工指令,超出了指定高级目标,例如最小化平均作业完成时间。然而,现成的RL技术无法处理调度问题的复杂性和规模。为了构建Decima,我们必须为工作依赖图开发新的代表,设计可扩展的RL模型,并为连续工作到达提供新的RL培训方法。我们在25节点集群上与Spark的原型集成显示Decimaout对包括手动调整在内的几种启发式方法执行了至少21%。进一步的工业生产工作负载跟踪实验证明,Decima的平均工作完成率降低了17%大型集群的时间和时间。
translated by 谷歌翻译
The design of good heuristics or approximation algorithms for NP-hardcombinatorial optimization problems often requires significant specializedknowledge and trial-and-error. Can we automate this challenging, tediousprocess, and learn the algorithms instead? In many real-world applications, itis typically the case that the same optimization problem is solved again andagain on a regular basis, maintaining the same problem structure but differingin the data. This provides an opportunity for learning heuristic algorithmsthat exploit the structure of such recurring problems. In this paper, wepropose a unique combination of reinforcement learning and graph embedding toaddress this challenge. The learned greedy policy behaves like a meta-algorithmthat incrementally constructs a solution, and the action is determined by theoutput of a graph embedding network capturing the current state of thesolution. We show that our framework can be applied to a diverse range ofoptimization problems over graphs, and learns effective algorithms for theMinimum Vertex Cover, Maximum Cut and Traveling Salesman problems.
translated by 谷歌翻译
进化和学习是生命适应生存和超越限制的两个基本机制。这些生物学现象启发了成功的计算方法,如进化算法和深度学习。进化依赖于随机突变和随机遗传重组。在这里,我们表明,学习进化,即学习变异和重组比在任意时更好,改善了每代健身增加的进化结果,甚至在可达到的健康方面。我们使用深度强化学习来学习动态调整进化算法的策略以适应不同的环境。我们的方法在组合和连续优化问题上优于经典的进化算法。
translated by 谷歌翻译
对机器学习和深度学习自动化的兴趣日益增长,这不可避免地导致了用于神经结构优化的自动化方法的发展。事实证明,网络架构的选择至关重要,深度学习的许多进步源于其直接的改进。然而,深度学习技术是计算密集型的,并且它们的应用需要高水平的领域知识。因此,即使是这一过程的部分自动化,也有助于使研究人员和从业人员更容易进行深度学习。通过这项调查,我们提供了一种形式主义,它将现有方法的景观统一和分类,并通过详细分析比较和对比不同的方法。我们通过讨论基于执行学习和进化算法原理的通用架构搜索空间和架构优化算法以及包含代理和一次性模型的方法来实现这一目标。此外,我们还讨论了新的研究方向,包括约束和多目标架构搜索以及自动数据增强,优化器和激活功能搜索。
translated by 谷歌翻译
有一种趋势是使用非常大的深度神经网络(DNN)来提高复杂机器学习任务的准确性。但是,今天可以探索的DNN模型的大小受GPU设备内存量的限制。本文介绍了Tofu,一种用于跨多个GPU设备划分非常大的DNN模型的系统。豆腐专为基于张量的数据流系统而设计:对于数据流图中的每个操作员,它对输入/输出张量进行分区,并将其执行并行化。豆腐可以通过分析用简单规范语言表达的语义,自动发现每个操作符的分区方式。豆腐使用基于动态编程的搜索算法来确定整个数据流图中每个运算符的最佳分区策略。我们在8-GPU机器上的实验表明,Tofu可以训练非常大的CNN和RNN模型。它还比在多个GPU上训练超大型模型的替代方法实现更好的性能。
translated by 谷歌翻译
对于解决问题,基于问题描述做出被动决策的速度很快但不准确,而使用启发式的基于搜索的规划可以提供更好的解决方案,但可能会呈指数级缓慢。在本文中,我们提出了一种新方法,通过迭代选择和重新调整其本地组件直到收敛来改进现有解决方案。重写政策采用经过强化学习训练的神经网络。我们在两个领域评估我们的方法:作业调度和表达式简化。比较tocommon有效启发式,基线深度模型和搜索算法,有效地提供了更高质量的解决方案。
translated by 谷歌翻译
The past few years have witnessed a growth in size and computational requirements for training and inference with neural networks. Currently, a common approach to address these requirements is to use a heterogeneous distributed environment with a mixture of hardware devices such as CPUs and GPUs. Importantly, the decision of placing parts of the neural models on devices is often made by human experts based on simple heuristics and intuitions. In this paper, we propose a method which learns to optimize device placement for TensorFlow computational graphs. Key to our method is the use of a sequence-to-sequence model to predict which subsets of operations in a TensorFlow graph should run on which of the available devices. The execution time of the predicted placements is then used as the reward signal to optimize the parameters of the sequence-to-sequence model. Our main result is that on Inception-V3 for ImageNet classification , and on RNN LSTM, for language mod-eling and neural machine translation, our model finds non-trivial device placements that outper-form hand-crafted heuristics and traditional algo-rithmic methods.
translated by 谷歌翻译
深度神经网络(DNN)正在成为现代计算应用中的重要工具。加速他们的培训是一项重大挑战,技术范围从分布式算法到低级电路设计。在这项调查中,我们从理论的角度描述了这个问题,然后是并行化的方法。我们介绍了DNN体系结构的趋势以及由此产生的对并行化策略的影响。然后,我们回顾并模拟DNN中不同类型的并发性:从单个运算符,到网络推理和训练中的并行性,再到分布式深度学习。我们讨论异步随机优化,分布式系统架构,通信方案和神经架构搜索。基于这些方法,我们推断了在深度学习中并行性的潜在方向。
translated by 谷歌翻译
机器学习已成为我们日常生活中许多方面的重要组成部分。然而,构建表现良好的机器学习应用程序需要高度专业化的数据科学家和领域专家。自动机器学习(AutoML)旨在通过使domainexperts能够自动构建机器学习应用程序而无需广泛了解统计数据和机器学习,从而减少对数据科学家的需求。在本次调查中,我们总结了学术界和工业界对AutoML的最新发展。首先,我们介绍一个整体问题的表述。接下来,介绍了解决AutoML变量问题的方法。最后,我们对所提出的合成和实际数据方法进行了广泛的经验评估。
translated by 谷歌翻译
Temporal difference methods are theoretically grounded and empirically effective methods for addressing reinforcement learning problems. In most real-world reinforcement learning tasks, TD methods require a function approximator to represent the value function. However, using function approximators requires manually making crucial representational decisions. This paper investigates evolutionary function approximation, a novel approach to automatically selecting function approximator representations that enable efficient individual learning. This method evolves individuals that are better able to learn. We present a fully implemented instantiation of evolutionary function approximation which combines NEAT, a neuroevolutionary optimization technique, with Q-learning, a popular TD method. The resulting NEAT+Q algorithm automatically discovers effective representations for neural network function approximators. This paper also presents on-line evolutionary computation, which improves the on-line performance of evolutionary computation by borrowing selection mechanisms used in TD methods to choose individual actions and using them in evolutionary computation to select policies for evaluation. We evaluate these contributions with extended empirical studies in two domains: 1) the mountain car task, a standard reinforcement learning benchmark on which neural network function approximators have previously performed poorly and 2) server job scheduling, a large probabilistic domain drawn from the field of autonomic computing. The results demonstrate that evolutionary function approximation can significantly improve the performance of TD methods and on-line evolutionary computation can significantly improve evolutionary methods. This paper also presents additional tests that offer insight into what factors can make neural network function approximation difficult in practice.
translated by 谷歌翻译
具有卷积和循环网络的深度学习模型现在已经存在,并且可以分析大量的音频,图像,视频,文本和图形数据,应用于自动翻译,语音到文本,场景理解,用户偏好排名,广告放置等。竞争框架构建这些网络,如TensorFlow,Chainer,CNTK,Torch / PyTorch,Caffe1 / 2,MXNet和Theano,探索在可用性和表现力,研究或生产方向和支持的硬件之间的不同权衡。它们运行在计算运算符的DAG上,包括高性能库,如CUDNN(用于NVIDIA GPU)或NNPACK(用于各种CPU),并自动执行内存分配,同步和分配。需要定制运算符,其中计算不适合现有的高性能图书馆电话,通常费用很高。当研究人员发明新的操作员时,这通常是必需的:这样的操作员遭受严重的性能损失,这限制了创新的速度。此外,即使这些框架可以使用现有的运行时调用,它通常也不能为用户特定的网络体系结构和数据集提供最佳性能,缺少操作员之间的优化以及可以在知道数据大小和形状的情况下完成的优化。我们的贡献包括(1)一种接近深度学习数学的语言,称为Tensor Comprehensions,(2)多面体即时编译器,用于将深度学习DAG的数学描述转换为具有委托内存管理和同步的CORA内核,同时提供优化例如运算符融合和特定大小的专门化,(3)由自动调整器填充的编译高速缓存。 [摘要截止]
translated by 谷歌翻译
本文调查了机器学习和操作研究社区最近的尝试,利用机器学习来解决组合优化问题。鉴于这些问题的硬性,最先进的方法涉及算法决策,这些决策要么需要太多的计算时间,要么在数学上没有明确定义。因此,机器学习似乎是一个有效处理决策的有希望的候选人。我们主张进一步推动机器学习和组合优化以及详细方法的整合。本文的一个要点是将通用优化问题视为数据点,并询问用于学习特定任务的问题的相关分布是什么。
translated by 谷歌翻译
We address the problem of learning structured policies for continuous control. In traditional reinforcement learning, policies of agents are learned by multi-layer perceptrons (MLPs) which take the concatenation of all observations from the environment as input for predicting actions. In this work, we propose NerveNet to explicitly model the structure of an agent, which naturally takes the form of a graph. Specifically, serving as the agent's policy network, NerveNet first propagates information over the structure of the agent and then predict actions for different parts of the agent. In the experiments, we first show that our NerveNet is comparable to state-of-the-art methods on standard MuJoCo environments. We further propose our customized reinforcement learning environments for benchmarking two types of structure transfer learning tasks, i.e., size and disability transfer, as well as multi-task learning. We demonstrate that policies learned by NerveNet are significantly more transferable and generalizable than policies learned by other models and are able to transfer even in a zero-shot setting.
translated by 谷歌翻译
The identification of performance-optimizing parameter settings is animportant part of the development and application of algorithms. We describe anautomatic framework for this algorithm configuration problem. More formally, weprovide methods for optimizing a target algorithm's performance on a givenclass of problem instances by varying a set of ordinal and/or categoricalparameters. We review a family of local-search-based algorithm configurationprocedures and present novel techniques for accelerating them by adaptivelylimiting the time spent for evaluating individual configurations. We describethe results of a comprehensive experimental evaluation of our methods, based onthe configuration of prominent complete and incomplete algorithms for SAT. Wealso present what is, to our knowledge, the first published work onautomatically configuring the CPLEX mixed integer programming solver. All thealgorithms we considered had default parameter settings that were manuallyidentified with considerable effort. Nevertheless, using our automatedalgorithm configuration procedures, we achieved substantial and consistentperformance improvements.
translated by 谷歌翻译
Response surface models Highly parameterized algorithms Propositional satisfiability Mixed integer programming Travelling salesperson problem Perhaps surprisingly, it is possible to predict how long an algorithm will take to run on a previously unseen input, using machine learning techniques to build a model of the algorithm's runtime as a function of problem-specific instance features. Such models have important applications to algorithm analysis, portfolio-based algorithm selection, and the automatic configuration of parameterized algorithms. Over the past decade, a wide variety of techniques have been studied for building such models. Here, we describe extensions and improvements of existing models, new families of models, and-perhaps most importantly-a much more thorough treatment of algorithm parameters as model inputs. We also comprehensively describe new and existing features for predicting algorithm runtime for propositional satisfiability (SAT), travelling salesperson (TSP) and mixed integer programming (MIP) problems. We evaluate these innovations through the largest empirical analysis of its kind, comparing to a wide range of runtime modelling techniques from the literature. Our experiments consider 11 algorithms and 35 instance distributions; they also span a very wide range of SAT, MIP, and TSP instances, with the least structured having been generated uniformly at random and the most structured having emerged from real industrial applications. Overall, we demonstrate that our new models yield substantially better runtime predictions than previous approaches in terms of their generalization to new problem instances, to new algorithms from a parameterized space, and to both simultaneously.
translated by 谷歌翻译
强化学习领域(RL)面临着越来越具有组合复杂性的挑战性领域。对于RL代理来解决这些挑战,它必须能够有效地进行规划。先前的工作通常结合非特定的计划算法(例如树搜索)来利用环境的显式模型。最近,已经提出了一种新的方法家族,通过在函数逼近器(例如树形结构神经网络)中通过归纳偏差提供结构来学习如何规划,通过无模型RL算法进行端到端训练。 。在本文中,我们更进一步,并且凭经验证明,除了卷积网络和LSTM之类的标准神经网络组件之外没有特殊结构的完全无模型方法,可以学习展示通常与基于模型的计划器相关的许多特征。我们衡量我们的代理人在规划方面的有效性,以便在组合和不可逆转的状态空间,其数据效率以及利用额外思考时间的能力方面进行推广。我们发现我们的代理具有许多人可能期望在规划算法中找到的特征。此外,它超越了最先进的组合领域,如推箱子,并且优于其他无模型方法,利用强大的归纳偏向规划。
translated by 谷歌翻译
TensorFlow is a machine learning system that operates at large scale and in heterogeneous environments. Tensor-Flow uses dataflow graphs to represent computation, shared state, and the operations that mutate that state. It maps the nodes of a dataflow graph across many machines in a cluster, and within a machine across multiple computational devices, including multicore CPUs, general-purpose GPUs, and custom-designed ASICs known as Tensor Processing Units (TPUs). This architecture gives flexibility to the application developer: whereas in previous "parameter server" designs the management of shared state is built into the system, TensorFlow enables developers to experiment with novel optimizations and training algorithms. TensorFlow supports a variety of applications, with a focus on training and inference on deep neural networks. Several Google services use TensorFlow in production , we have released it as an open-source project, and it has become widely used for machine learning research. In this paper, we describe the TensorFlow dataflow model and demonstrate the compelling performance that Tensor-Flow achieves for several real-world applications.
translated by 谷歌翻译
Hyper-heuristics comprise a set of approaches that are motivated (at least in part) by the goal of automating the design of heuristic methods to solve hard computational search problems. An underlying strategic research challenge is to develop more generally applicable search methodologies. The term hyper-heuristic is relatively new; it was first used in 2000 to describe heuristics to choose heuristics in the context of combinatorial optimisation. However, the idea of automating the design of heuristics is not new; it can be traced back to the 1960s. The definition of hyper-heuristics has been recently extended to refer to a search method or learning mechanism for selecting or generating heuristics to solve computational search problems. Two main hyper-heuristic categories can be considered: heuristic selection and heuristic generation. The distinguishing feature of hyper-heuristics is that they operate on a search space of heuristics (or heuristic components) rather than directly on the search space of solutions to the underlying problem that is being addressed. This paper presents a critical discussion of the scientific literature on hyper-heuristics including their origin and intellectual roots, a detailed account of the main types of approaches, and an overview of some related areas. Current research trends and directions for future research are also discussed.
translated by 谷歌翻译
强化学习方法长期以来一直吸引着数据管理社区,因为它们能够学习如何从原始系统性能中控制动态行为。最近在深度神经网络与强化学习相结合方面取得的成功引发了人们对该领域的重大兴趣。然而,由于大量的训练数据需求,算法不稳定以及缺乏标准工具,实际解决方案仍然难以实现。在这项工作中,我们介绍了LIFT,一种端到端的软件堆栈,用于将深度强化学习应用于数据管理任务。虽然之前的工作经常探索模拟中的应用,但LIFT主要利用人类专业知识从示范中学习,从而减少在线培训时间。我们进一步介绍了TensorForce,一个TensorFlow库,用于应用深度强化学习,为常见的RL算法提供统一的声明性接口,从而为LIFT提供后端。我们在流处理中的数据库复合索引和资源管理的两个案例研究中证明了LIFT的实用性。结果显示,从演示中初始化的LIFT控制器可以在延迟度量和空间使用率方面优于人类基线和启发式,最高可达70%。
translated by 谷歌翻译