We learn recurrent neural network optimizers trained on simple syntheticfunctions by gradient descent. We show that these learned optimizers exhibit aremarkable degree of transfer in that they can be used to efficiently optimizea broad range of derivative-free black-box functions, including Gaussianprocess bandits, simple control objectives, global optimization benchmarks andhyper-parameter tuning tasks. Up to the training horizon, the learnedoptimizers learn to trade-off exploration and exploitation, and comparefavourably with heavily engineered Bayesian optimization packages forhyper-parameter tuning.
translated by 谷歌翻译
Bayesian optimization with Gaussian processes has become an increasingly popular tool in the machine learning community. It is efficient and can be used when very little is known about the objective function, making it popular in expensive black-box optimization scenarios. It uses Bayesian methods to sample the objective efficiently using an acquisition function which incorporates the posterior estimate of the objective. However, there are several different parameterized acquisition functions in the literature, and it is often unclear which one to use. Instead of using a single acquisition function, we adopt a portfolio of acquisition functions governed by an online multi-armed bandit strategy. We propose several portfolio strategies, the best of which we call GP-Hedge, and show that this method outperforms the best individual acquisition function. We also provide a theoretical bound on the algorithm's performance .
translated by 谷歌翻译
大多数策略搜索算法需要数千个训练集才能找到有效的策略,这对于物理机器人来说通常是不可行的。这篇调查文章侧重于极端的另一端:arobot如何才能适应少数试验(一打)和几分钟?通过“大数据”这个词,我们将这一挑战称为“微数据增强学习”。我们表明,第一种策略是利用政策结构(例如,动态运动原语),政策参数(例如演示)或动态(例如模拟器)的预知。第二种策略是创建预期奖励(例如,贝叶斯优化)或动态模型(例如,基于模型的策略搜索)的数据驱动的替代模型,以便策略优化器查询模型而不是真实系统。总的来说,所有成功的微观数据算法都通过改变模型的类型和先验知识来结合这两种策略。当前的科学挑战主要围绕扩展tocomplex机器人(例如人形机器人),设计通用先验,以及优化计算时间。
translated by 谷歌翻译
We present a tutorial on Bayesian optimization, a method of finding the maximum of expensive cost functions. Bayesian optimization employs the Bayesian technique of setting a prior over the objective function and combining it with evidence to get a posterior function. This permits a utility-based selection of the next observation to make on the objective function, which must take into account both exploration (sampling from areas of high uncertainty) and exploitation (sampling areas likely to offer improvement over the current best observation). We also present two detailed extensions of Bayesian optimization, with experiments-active user modelling with preferences, and hierarchical reinforcement learning-and a discussion of the pros and cons of Bayesian optimization based on our experiences.
translated by 谷歌翻译
We propose minimum regret search (MRS), a novel acquisition function for Bayesian optimization. MRS bears similarities with information-theoretic approaches such as en-tropy search (ES). However, while ES aims in each query at maximizing the information gain with respect to the global maximum, MRS aims at minimizing the expected simple regret of its ultimate recommendation for the optimum. While empirically ES and MRS perform similar in most of the cases, MRS produces fewer out-liers with high simple regret than ES. We provide empirical results both for a synthetic single-task optimization problem as well as for a simulated multi-task robotic control problem.
translated by 谷歌翻译
Designing gaits and corresponding control policies is a key challenge in robot locomotion. Even with a viable controller parameterization, finding near-optimal parameters can be daunting. Typically, this kind of parameter optimization requires specific expert knowledge and extensive robot experiments. Automatic black-box gait optimization methods greatly reduce the need for human expertise and time-consuming design processes. Many different approaches for automatic gait optimization have been suggested to date, such as grid search and evolutionary algorithms. In this article, we thoroughly discuss multiple of these optimization methods in the context of automatic gait optimization. Moreover, we extensively evaluate Bayesian optimization, a model-based approach to black-box optimization under uncertainty, on both simulated problems and real robots. This evaluation demonstrates that Bayesian optimization is particularly suited for robotic applications, where it is crucial to find a good set of gait parameters in a small number of experiments.
translated by 谷歌翻译
Reward functions are an essential component of many robot learning methods. Defining such functions, however, remains hard in many practical applications. For tasks such as grasping, there are no reliable success measures available. Defining reward functions by hand requires extensive task knowledge and often leads to undesired emergent behavior. We introduce a framework, wherein the robot simultaneously learns an action policy and a model of the reward function by actively querying a human expert for ratings. We represent the reward model using a Gaussian process and evaluate several classical acquisition functions from the Bayesian optimization literature in this context. Furthermore, we present a novel acquisition function , expected policy divergence. We demonstrate results of our method for a robot grasping task and show that the learned reward function generalizes to a similar task. Additionally, we evaluate the proposed novel acquisition function on a real robot pendulum swing-up task. Fig. 1: The Robot-Grasping Task. While grasping is one of the most researched robotic tasks, finding a good reward function still proves difficult.
translated by 谷歌翻译
贝叶斯优化(BO)是指用于对昂贵的黑盒函数进行全局优化的一套技术,它使用函数的内省贝叶斯模型来有效地找到最优值。虽然BO已经在许多应用中成功应用,但现代优化任务迎来了传统方法失败的新挑战。在这项工作中,我们展示了Dragonfly,这是一个开源Python库,用于可扩展和强大的BO.Dragonfly包含多个最近开发的方法,允许BO应用于具有挑战性的现实世界环境;这些包括更好的处理更高维域的方法,当昂贵函数的廉价近似可用时处理多保真评估的方法,优化结构化组合空间的方法,例如神经网络架构的空间,以及处理并行评估的方法。此外,我们在BO中开发了新的方法改进,用于选择贝叶斯模型,选择采集函数,以及优化具有不同变量类型和附加约束的过复杂域。我们将Dragonfly与一套用于全局优化的其他软件包和算法进行比较,并证明当上述方法集成时,它们可以显着改善BO的性能。 Dragonfly图书馆可在dragonfly.github.io上找到。
translated by 谷歌翻译
现代深度学习方法对许多超参数非常敏感,并且由于最先进模型的长训练时间,香草贝叶斯超参数优化通常在计算上是不可行的。另一方面,基于随机搜索的基​​于强盗的配置评估方法缺乏指导,并且不能快速收敛到最佳配置。在这里,我们建议结合贝叶斯优化和基于带宽的方法的优点,以实现最佳两个世界:强大的时间性能和快速收敛到最佳配置。我们提出了一种新的实用的最先进的超参数优化方法,它在广泛的问题类型上始终优于贝叶斯优化和超带,包括高维玩具函数,支持向量机,前馈神经网络,贝叶斯神经网络,深度执行学习和卷积神经网络。我们的方法坚固耐用,功能多样,同时在概念上简单易行。
translated by 谷歌翻译
提出贝叶斯优化用于从实验数据自动学习最优控制器参数。概率描述(aGaussian过程)用于将未知函数从控制器参数建模到用户定义的成本。概率模型用数据更新,数据是通过测试物理系统上的一组参数并评估成本而获得的。为了快速学习,贝叶斯优化算法选择下一个参数以系统的方式进行评估,例如,通过最大化关于最优的信息增益。该算法实际上只需很少的实验即可找到全局最优参数。以节流阀控制为代表的工业控制实例,所提出的自动调整方法优于手动校准:通过少量实验,它始终如一地实现了更好的性能。提出的自动调整框架是灵活的,可以处理不同的控制结构和目标。
translated by 谷歌翻译
Reinforcement learning offers to robotics a framework and set of tools for the design of sophisticated and hard-to-engineer behaviors. Conversely, the challenges of robotic problems provide both inspiration, impact, and validation for developments in reinforcement learning. The relationship between disciplines has sufficient promise to be likened to that between physics and mathematics. In this article, we attempt to strengthen the links between the two research communities by providing a survey of work in reinforcement learning for behavior generation in robots. We highlight both key challenges in robot reinforcement learning as well as notable successes. We discuss how contributions tamed the complexity of the domain and study the role of algorithms, representations, and prior knowledge in achieving these successes. As a result, a particular focus of our paper lies on the choice between model-based and model-free as well as between value-function-based and policy-search methods. By analyzing a simple problem in some detail we demonstrate how reinforcement learning approaches may be profitably applied, and we note throughout open questions and the tremendous potential for future research.
translated by 谷歌翻译
Bayesian optimization is a sample-efficient method for black-box global optimization. However , the performance of a Bayesian optimization method very much depends on its exploration strategy, i.e. the choice of acquisition function , and it is not clear a priori which choice will result in superior performance. While portfolio methods provide an effective, principled way of combining a collection of acquisition functions, they are often based on measures of past performance which can be misleading. To address this issue, we introduce the Entropy Search Portfolio (ESP): a novel approach to portfolio construction which is motivated by information theoretic considerations. We show that ESP outperforms existing portfolio methods on several real and synthetic problems, including geostatistical datasets and simulated control tasks. We not only show that ESP is able to offer performance as good as the best, but unknown, acquisition function, but surprisingly it often gives better performance. Finally , over a wide range of conditions we find that ESP is robust to the inclusion of poor acquisition functions.
translated by 谷歌翻译
Bayesian optimization is a prominent method for optimizing expensive-to-evaluate black-box functions that is widely applied to tuning the hyperparameters of machine learning algorithms. Despite its successes, the prototypical Bayesian optimization approach-using Gaussian process models-does not scale well to either many hyperparameters or many function evaluations. Attacking this lack of scalability and flexibility is thus one of the key challenges of the field. We present a general approach for using flexible parametric models (neural networks) for Bayesian optimization, staying as close to a truly Bayesian treatment as possible. We obtain scalability through stochastic gradient Hamiltonian Monte Carlo, whose robustness we improve via a scale adaptation. Experiments including multi-task Bayesian optimization with 21 tasks, parallel optimization of deep neural networks and deep reinforcement learning show the power and flexibility of this approach.
translated by 谷歌翻译
对机器学习和深度学习自动化的兴趣日益增长,这不可避免地导致了用于神经结构优化的自动化方法的发展。事实证明,网络架构的选择至关重要,深度学习的许多进步源于其直接的改进。然而,深度学习技术是计算密集型的,并且它们的应用需要高水平的领域知识。因此,即使是这一过程的部分自动化,也有助于使研究人员和从业人员更容易进行深度学习。通过这项调查,我们提供了一种形式主义,它将现有方法的景观统一和分类,并通过详细分析比较和对比不同的方法。我们通过讨论基于执行学习和进化算法原理的通用架构搜索空间和架构优化算法以及包含代理和一次性模型的方法来实现这一目标。此外,我们还讨论了新的研究方向,包括约束和多目标架构搜索以及自动数据增强,优化器和激活功能搜索。
translated by 谷歌翻译
当出现具有不同成本的多个相互依赖的信息源时,我们如何有效地收集信息以优化未知功能?例如,在优化机器人系统时,智能地交换计算机模拟和真实的机器人测试可以带来显着的节省。现有方法,例如基于多保真GP-UCB或基于熵搜索的方法,或者对不同保真度的交互作出简单假设,或者使用缺乏理论保证的简单启发法。在本文中,我们研究多保真贝叶斯优化与多输出之间的复杂结构依赖关系,并提出了MF-MI-Greedy,这是一个解决这个问题的原理算法框架。特别是,我们使用基于共享潜在结构的加性高斯过程来模拟不同的保真度。目标函数。然后,我们使用成本敏感的互信息增益进行有效的贝叶斯全局优化。我们提出一个简单的遗憾概念,其中包含不同保真度的成本,并证明MF-MI-Greedy实现了低度遗憾。我们在合成数据集和真实数据集上展示了我们算法的强大经验性能。
translated by 谷歌翻译
In this paper we present a fully automated approach to (approximate) optimal control of non-linear systems. Our algorithm jointly learns a non-parametric model of the system dynamics-based on Gaussian Process Regression (GPR)-and performs receding horizon control using an adapted iterative LQR formulation. This results in an extremely data-efficient learning algorithm that can operate under real-time constraints. When combined with an exploration strategy based on GPR variance, our algorithm successfully learns to control two benchmark problems in simulation (two-link manipulator, cart-pole) as well as to swing-up and balance a real cart-pole system. For all considered problems learning from scratch, that is without prior knowledge provided by an expert, succeeds in less than 10 episodes of interaction with the system.
translated by 谷歌翻译
贝叶斯优化和Lipschitz优化已经开发出用于优化黑盒功能的替代技术。它们各自利用关于函数的不同形式的先验。在这项工作中,我们探索了这些技术的策略,以便更好地进行全局优化。特别是,我们提出了在传统BO算法中使用Lipschitz连续性假设的方法,我们称之为Lipschitz贝叶斯优化(LBO)。这种方法不会增加渐近运行时间,并且在某些情况下会大大提高性能(而在最坏的情况下,性能类似)。实际上,在一个特定的环境中,我们证明使用Lipschitz信息产生与后悔相同或更好的界限,而不是单独使用贝叶斯优化。此外,我们提出了一个简单的启发式方法来估计Lipschitz常数,并证明Lipschitz常数的增长估计在某种意义上是“无害的”。我们对具有4个采集函数的15个数据集进行的实验表明,在最坏的情况下,LBO的表现类似于底层BO方法,而在某些情况下,它的表现要好得多。特别是汤普森采样通常看到了极大的改进(因为Lipschitz信息已经得到了很好的修正) - 探索“现象”及其LBO变体通常优于其他采集功能。
translated by 谷歌翻译
最近,人们越来越关注贝叶斯优化 - 一种未知函数的优化,其假设通常由高斯过程(GP)先前表示。我们研究了一种直接使用函数argmax估计的优化策略。该策略提供了实践和理论上的优势:不需要选择权衡参数,而且,我们建立与流行的GP-UCB和GP-PI策略的紧密联系。我们的方法可以被理解为自动和自适应地在GP-UCB和GP-PI中进行勘探和利用。我们通过对遗憾的界限以及对机器人和视觉任务的广泛经验评估来说明这种自适应调整的效果,展示了该策略对一系列性能标准的稳健性。
translated by 谷歌翻译
Bayesian optimization has become a successful tool for hyperparameter optimization of machine learning algorithms, such as support vector machines or deep neural networks. Despite its success , for large datasets, training and validating a single configuration often takes hours, days, or even weeks, which limits the achievable performance. To accelerate hyperparameter optimization , we propose a generative model for the validation error as a function of training set size, which is learned during the optimization process and allows exploration of preliminary configurations on small subsets, by extrapolating to the full dataset. We construct a Bayesian optimization procedure, dubbed FABOLAS, which models loss and training time as a function of dataset size and automatically trades off high information gain about the global optimum against computational cost. Experiments optimizing support vector machines and deep neural networks show that FABOLAS often finds high-quality solutions 10 to 100 times faster than other state-of-the-art Bayesian optimization methods or the recently proposed bandit strategy Hyperband.
translated by 谷歌翻译
元学习或学习学习是系统地观察不同机器学习方法如何在广泛的学习任务中执行的科学,然后从这种经验或元数据中学习,以比其他方式更快地学习新任务。这不仅大大加快和改进了机器学习管道或神经系统架构的设计,还使我们能够用数据驱动方式学习的新方法取代手工设计算法。在本章中,我们将在这个迷人且不断发展的领域中提供最先进的技术。
translated by 谷歌翻译