Entropy Search (ES) and Predictive Entropy Search (PES) are popular and empirically successful Bayesian Optimization techniques. Both rely on a compelling information-theoretic motivation , and maximize the information gained about the arg max of the unknown function; yet, both are plagued by the expensive computation for estimating entropies. We propose a new criterion , Max-value Entropy Search (MES), that instead uses the information about the maximum function value. We show relations of MES to other Bayesian optimization methods, and establish a regret bound. We observe that MES maintains or improves the good empirical performance of ES/PES, while tremendously lightening the computational burden. In particular, MES is much more robust to the number of samples used for computing the entropy, and hence more efficient for higher dimensional problems.
translated by 谷歌翻译
Bayesian optimization with Gaussian processes has become an increasingly popular tool in the machine learning community. It is efficient and can be used when very little is known about the objective function, making it popular in expensive black-box optimization scenarios. It uses Bayesian methods to sample the objective efficiently using an acquisition function which incorporates the posterior estimate of the objective. However, there are several different parameterized acquisition functions in the literature, and it is often unclear which one to use. Instead of using a single acquisition function, we adopt a portfolio of acquisition functions governed by an online multi-armed bandit strategy. We propose several portfolio strategies, the best of which we call GP-Hedge, and show that this method outperforms the best individual acquisition function. We also provide a theoretical bound on the algorithm's performance .
translated by 谷歌翻译
最近,人们越来越关注贝叶斯优化 - 一种未知函数的优化,其假设通常由高斯过程(GP)先前表示。我们研究了一种直接使用函数argmax估计的优化策略。该策略提供了实践和理论上的优势:不需要选择权衡参数,而且,我们建立与流行的GP-UCB和GP-PI策略的紧密联系。我们的方法可以被理解为自动和自适应地在GP-UCB和GP-PI中进行勘探和利用。我们通过对遗憾的界限以及对机器人和视觉任务的广泛经验评估来说明这种自适应调整的效果,展示了该策略对一系列性能标准的稳健性。
translated by 谷歌翻译
基于高斯过程模型的贝叶斯优化(BO)是优化评估成本昂贵的黑盒函数的有力范例。虽然几个BO算法可证明地收敛到未知函数的全局最优,但他们认为内核的超参数是已知的。在实践中情况并非如此,并且错误指定经常导致这些算法收敛到较差的局部最优。在本文中,我们提出了第一个BO算法,它可以证明是无后悔的,并且在不参考超参数的情况下收敛到最优。我们慢慢地调整了固定核的超参数,从而扩展了相关的函数类超时,使BO算法考虑了更复杂的函数候选。基于理论上的见解,我们提出了几种实用的算法,通过在线超参数估计来实现BO的经验数据效率,但是保留理论收敛保证。我们评估了几个基准问题的方法。
translated by 谷歌翻译
Bandit methods for black-box optimisation, such as Bayesian optimisation, are used in a variety of applications including hyper-parameter tuning and experiment design. Recently, multi-fidelity methods have garnered considerable attention since function evaluations have become increasingly expensive in such applications. Multi-fidelity methods use cheap approximations to the function of interest to speed up the overall opti-misation process. However, most multi-fidelity methods assume only a finite number of approximations. In many practical applications however, a continuous spectrum of approximations might be available. For instance, when tuning an expensive neural network, one might choose to approximate the cross validation performance using less data N and/or few training iterations T. Here, the approximations are best viewed as arising out of a continuous two dimensional space (N, T). In this work, we develop a Bayesian optimisa-tion method, BOCA, for this setting. We char-acterise its theoretical properties and show that it achieves better regret than than strategies which ignore the approximations. BOCA outperforms several other baselines in synthetic and real experiments .
translated by 谷歌翻译
We present a tutorial on Bayesian optimization, a method of finding the maximum of expensive cost functions. Bayesian optimization employs the Bayesian technique of setting a prior over the objective function and combining it with evidence to get a posterior function. This permits a utility-based selection of the next observation to make on the objective function, which must take into account both exploration (sampling from areas of high uncertainty) and exploitation (sampling areas likely to offer improvement over the current best observation). We also present two detailed extensions of Bayesian optimization, with experiments-active user modelling with preferences, and hierarchical reinforcement learning-and a discussion of the pros and cons of Bayesian optimization based on our experiences.
translated by 谷歌翻译
已知贝叶斯优化难以扩展到高维度,因为获取步骤需要在相同的搜索空间中解决非凸优化问题。为了扩展方法并保持其优点,我们提出了一种算法(LineBO),该算法将问题限制为一系列选择性的一维子问题。我们表明,当函数是强凸的时,我们的算法会全局收敛并获得快速的本地速率。此外,如果目标具有不变的子空间,则我们的方法在不改变算法的情况下自动适应有效维度。我们的方法可以很好地扩展到高维度并且使用全局高斯过程模型。当结合SafeOpt算法来解决子问题时,我们得到了第一个安全的贝叶斯优化算法,其理论保证适用于高维设置。我们在多个综合基准测试中评估我们的方法,我们获得了竞争性能。此外,我们部署我们的算法,以最多40个参数优化瑞士自由电子激光的光束强度,同时满足安全操作约束。
translated by 谷歌翻译
Bayesian optimisation has gained great popularity as a tool for optimising the parameters of machine learning algorithms and models. Somewhat ironically, setting up the hyper-parameters of Bayesian optimisation methods is notoriously hard. While reasonable practical solutions have been advanced, they can often fail to find the best optima. Surprisingly, there is little theoretical analysis of this crucial problem in the literature. To address this, we derive a cumulative regret bound for Bayesian optimisation with Gaussian processes and unknown kernel hyper-parameters in the stochastic setting. The bound, which applies to the expected improvement acquisition function and sub-Gaussian observation noise, provides us with guidelines on how to design hyper-parameter estimation methods. A simple simulation demonstrates the importance of following these guidelines.
translated by 谷歌翻译
在许多科学和工程应用中,我们的任务是评估昂贵的黑盒功能$ f $。这个问题的传统设置只假设这个单一函数的可用性。但是,在许多情况下,可以获得$ f $的便宜近似值。例如,机器人的昂贵的现实世界行为可以通过acheap计算机模拟来近似。我们可以使用这些近似值来廉价地消除低功能值区域,并在尽可能小的区域中使用昂贵的$ f $评估并快速确定最佳值。我们将此任务形式化为\ emph {多保真}强盗问题,其中目标函数和近似值是从高斯过程中采样的。我们开发了基于上置信界限技术的MF-GP-UCB,anovel方法。在我们的理论分析中,我们证明它恰好表现出上述行为,并且比忽略多保真信息的策略更令人遗憾。实际上,MF-GP-UCB在几个合成和实际实验中优于这种天真策略和其他多保真方法。
translated by 谷歌翻译
Bayesian Optimisation (BO) is a technique used in optimising a$D$-dimensional function which is typically expensive to evaluate. While therehave been many successes for BO in low dimensions, scaling it to highdimensions has been notoriously difficult. Existing literature on the topic areunder very restrictive settings. In this paper, we identify two key challengesin this endeavour. We tackle these challenges by assuming an additive structurefor the function. This setting is substantially more expressive and contains aricher class of functions than previous work. We prove that, for additivefunctions the regret has only linear dependence on $D$ even though the functiondepends on all $D$ dimensions. We also demonstrate several other statisticaland computational benefits in our framework. Via synthetic examples, ascientific simulation and a face detection problem we demonstrate that ourmethod outperforms naive BO on additive functions and on several examples wherethe function is not additive.
translated by 谷歌翻译
贝叶斯优化通常假设给出贝叶斯先验。然而,贝叶斯优化中强有力的理论保证在实践中经常因为先验中的未知参数而受到损害。在本文中,我们采用经验贝叶斯的变量并表明,通过估计从同一个先前采样的离线数据之前的高斯过程和构建后验的无偏估计,GP-UCB的变体和改进概率实现近乎零的后悔界限,其随着离线数据和离线数据的数量减少到与观测噪声成比例的常数。在线评估的数量增加。根据经验,我们已经验证了我们的方法,以挑战模拟机器人问题为特色的任务和运动规划。
translated by 谷歌翻译
Bayesian optimization techniques have been successfully applied to robotics,planning, sensor placement, recommendation, advertising, intelligent userinterfaces and automatic algorithm configuration. Despite these successes, theapproach is restricted to problems of moderate dimension, and several workshopson Bayesian optimization have identified its scaling to high-dimensions as oneof the holy grails of the field. In this paper, we introduce a novel randomembedding idea to attack this problem. The resulting Random EMbedding BayesianOptimization (REMBO) algorithm is very simple, has important invarianceproperties, and applies to domains with both categorical and continuousvariables. We present a thorough theoretical analysis of REMBO. Empiricalresults confirm that REMBO can effectively solve problems with billions ofdimensions, provided the intrinsic dimensionality is low. They also show thatREMBO achieves state-of-the-art performance in optimizing the 47 discreteparameters of a popular mixed integer linear programming solver.
translated by 谷歌翻译
贝叶斯优化是一种优化目标函数的方法,需要花费很长时间(几分钟或几小时)来评估。它最适合于在小于20维的连续域上进行优化,并且在功能评估中容忍随机噪声。它构建了目标的替代品,并使用贝叶斯机器学习技术,高斯过程回归量化该替代品中的不确定性,然后使用从该代理定义的获取函数来决定在何处进行抽样。在本教程中,我们描述了贝叶斯优化的工作原理,包括高斯过程回归和三种常见的采集功能:预期改进,熵搜索和知识梯度。然后,我们讨论了更先进的技术,包括在并行,多保真和多信息源优化,昂贵的评估约束,随机环境条件,多任务贝叶斯优化以及包含衍生信息的情况下运行多功能评估。最后,我们讨论了贝叶斯优化软件和该领域未来的研究方向。在我们的教程材料中,我们提供了对噪声评估的预期改进的时间化,超出了无噪声设置,在更常用的情况下。这种概括通过正式的决策理论论证来证明,与先前的临时修改形成鲜明对比。
translated by 谷歌翻译
Bayesian optimization is a sample-efficient method for black-box global optimization. However , the performance of a Bayesian optimization method very much depends on its exploration strategy, i.e. the choice of acquisition function , and it is not clear a priori which choice will result in superior performance. While portfolio methods provide an effective, principled way of combining a collection of acquisition functions, they are often based on measures of past performance which can be misleading. To address this issue, we introduce the Entropy Search Portfolio (ESP): a novel approach to portfolio construction which is motivated by information theoretic considerations. We show that ESP outperforms existing portfolio methods on several real and synthetic problems, including geostatistical datasets and simulated control tasks. We not only show that ESP is able to offer performance as good as the best, but unknown, acquisition function, but surprisingly it often gives better performance. Finally , over a wide range of conditions we find that ESP is robust to the inclusion of poor acquisition functions.
translated by 谷歌翻译
We propose a novel information-theoretic approach for Bayesian optimization called Predictive Entropy Search (PES). At each iteration, PES selects the next evaluation point that maximizes the expected information gained with respect to the global maximum. PES codifies this intractable acquisition function in terms of the expected reduction in the differential entropy of the predictive distribution. This reformulation allows PES to obtain approximations that are both more accurate and efficient than other alternatives such as Entropy Search (ES). Furthermore , PES can easily perform a fully Bayesian treatment of the model hy-perparameters while ES cannot. We evaluate PES in both synthetic and real-world applications, including optimization problems in machine learning, finance, biotechnology, and robotics. We show that the increased accuracy of PES leads to significant gains in optimization performance.
translated by 谷歌翻译
We propose minimum regret search (MRS), a novel acquisition function for Bayesian optimization. MRS bears similarities with information-theoretic approaches such as en-tropy search (ES). However, while ES aims in each query at maximizing the information gain with respect to the global maximum, MRS aims at minimizing the expected simple regret of its ultimate recommendation for the optimum. While empirically ES and MRS perform similar in most of the cases, MRS produces fewer out-liers with high simple regret than ES. We provide empirical results both for a synthetic single-task optimization problem as well as for a simulated multi-task robotic control problem.
translated by 谷歌翻译
Many applications require optimizing an unknown, noisy function that is expensive to evaluate. We formalize this task as a multiarmed bandit problem, where the payoff function is either sampled from a Gaussian process (GP) or has low norm in a reproducing kernel Hilbert space. We resolve the important open problem of deriving regret bounds for this setting, which imply novel convergence rates for GP optimization. We analyze an intuitive Gaussian process upper confidence bound (-algorithm , and bound its cumulative regret in terms of maximal information gain, establishing a novel connection between GP optimization and experimental design. Moreover, by bounding the latter in terms of operator spectra, we obtain explicit sublinear regret bounds for many commonly used covariance functions. In some important cases, our bounds have surprisingly weak dependence on the dimensionality. In our experiments on real sensor data,-compares favorably with other heuristical GP optimization approaches.
translated by 谷歌翻译
In this paper, we analyze a generic algorithm scheme for sequential global optimization using Gaussian processes. The upper bounds we derive on the cumulative regret for this generic algorithm improve by an exponential factor the previously known bounds for algorithms like GP-UCB. We also introduce the novel Gaussian Process Mutual Information algorithm (GP-MI), which significantly improves further these upper bounds for the cumulative regret. We confirm the efficiency of this algorithm on synthetic and real tasks against the natural competitor, GP-UCB, and also the Expected Improvement heuristic.
translated by 谷歌翻译
贝叶斯优化已成为具有昂贵评估成本的功能全局优化的有力候选工具。然而,由于贝叶斯方法研究的动态性以及计算技术的发展,在并行计算环境中使用贝叶斯优化仍然是非专家的挑战。在本报告中,我回顾了最先进的并行和可扩展贝叶斯优化方法。另外,我提出了一些实用的方法来避免贝叶斯优化的一些缺陷,例如边缘参数的过采样和高性能参数的过度利用。最后,我提供了相对简单的启发式算法,以及他们的开源软件实现,可以在任何计算环境中立即轻松部署。
translated by 谷歌翻译
贝叶斯优化是一种样本有效的方法,用于查找昂贵的评估黑盒函数的全局优化。通过累积一对查询点和相应的函数值来发现全局解,重复这两个过程:(i)使用到目前为止观察到的数据学习目标函数的替代模型; (ii)最大化获取函数以确定在下一个查询目标函数的位置。收敛保证仅在找到并占据了获取函数的全局优化器并作为下一个查询点时才有效。然而,在实践中,还使用了采集函数的局部优化器,因为搜索采集函数的精确优化器通常是非常重要或耗时的任务。在本文中,我们通过对全局优化器的简单遗憾来分析本地优化器的采集函数的行为。我们还提供了多启动局部优化器用于找到最大采集函数时的性能分析。数值实验证实了理论分析的有效性。
translated by 谷歌翻译