We propose minimum regret search (MRS), a novel acquisition function for Bayesian optimization. MRS bears similarities with information-theoretic approaches such as en-tropy search (ES). However, while ES aims in each query at maximizing the information gain with respect to the global maximum, MRS aims at minimizing the expected simple regret of its ultimate recommendation for the optimum. While empirically ES and MRS perform similar in most of the cases, MRS produces fewer out-liers with high simple regret than ES. We provide empirical results both for a synthetic single-task optimization problem as well as for a simulated multi-task robotic control problem.
translated by 谷歌翻译
Entropy Search (ES) and Predictive Entropy Search (PES) are popular and empirically successful Bayesian Optimization techniques. Both rely on a compelling information-theoretic motivation , and maximize the information gained about the arg max of the unknown function; yet, both are plagued by the expensive computation for estimating entropies. We propose a new criterion , Max-value Entropy Search (MES), that instead uses the information about the maximum function value. We show relations of MES to other Bayesian optimization methods, and establish a regret bound. We observe that MES maintains or improves the good empirical performance of ES/PES, while tremendously lightening the computational burden. In particular, MES is much more robust to the number of samples used for computing the entropy, and hence more efficient for higher dimensional problems.
translated by 谷歌翻译
We develop parallel predictive entropy search (PPES), a novel algorithm for Bayesian optimization of expensive black-box objective functions. At each iteration , PPES aims to select a batch of points which will maximize the information gain about the global maximizer of the objective. Well known strategies exist for suggesting a single evaluation point based on previous observations, while far fewer are known for selecting batches of points to evaluate in parallel. The few batch selection schemes that have been studied all resort to greedy methods to compute an optimal batch. To the best of our knowledge, PPES is the first non-greedy batch Bayesian optimization strategy. We demonstrate the benefit of this approach in optimization performance on both synthetic and real world applications , including problems in machine learning, rocket science and robotics.
translated by 谷歌翻译
We consider parallel global optimization of derivative-free expensive-to-evaluate functions, and propose an efficient method based on stochastic approximation for implementing a conceptual Bayesian optimization algorithm proposed by Ginsbourger et al. (2007). At the heart of this algorithm is maximizing the information criterion called the "multi-points expected improvement", or the q-EI. To accomplish this, we use infinitessimal perturbation analysis (IPA) to construct a stochastic gradient estimator and show that this estimator is unbiased. We also show that the stochastic gradient ascent algorithm using the constructed gradient estimator converges to a stationary point of the q-EI surface, and therefore, as the number of multiple starts of the gradient ascent algorithm and the number of steps for each start grow large, the one-step Bayes optimal set of points is recovered. We show in numerical experiments that our method for maximizing the q-EI is faster than methods based on closed-form evaluation using high-dimensional integration, when considering many parallel function evaluations, and is comparable in speed when considering few. We also show that the resulting one-step Bayes optimal algorithm for parallel global optimization finds high-quality solutions with fewer evaluations than a heuristic based on approximately maximizing the q-EI. A high-quality open source implementation of this algorithm is available in the open source Metrics
translated by 谷歌翻译
We propose a novel information-theoretic approach for Bayesian optimization called Predictive Entropy Search (PES). At each iteration, PES selects the next evaluation point that maximizes the expected information gained with respect to the global maximum. PES codifies this intractable acquisition function in terms of the expected reduction in the differential entropy of the predictive distribution. This reformulation allows PES to obtain approximations that are both more accurate and efficient than other alternatives such as Entropy Search (ES). Furthermore , PES can easily perform a fully Bayesian treatment of the model hy-perparameters while ES cannot. We evaluate PES in both synthetic and real-world applications, including optimization problems in machine learning, finance, biotechnology, and robotics. We show that the increased accuracy of PES leads to significant gains in optimization performance.
translated by 谷歌翻译
最近,人们越来越关注贝叶斯优化 - 一种未知函数的优化,其假设通常由高斯过程(GP)先前表示。我们研究了一种直接使用函数argmax估计的优化策略。该策略提供了实践和理论上的优势:不需要选择权衡参数,而且,我们建立与流行的GP-UCB和GP-PI策略的紧密联系。我们的方法可以被理解为自动和自适应地在GP-UCB和GP-PI中进行勘探和利用。我们通过对遗憾的界限以及对机器人和视觉任务的广泛经验评估来说明这种自适应调整的效果,展示了该策略对一系列性能标准的稳健性。
translated by 谷歌翻译
当只能获得有限数量的noisylog-likelihood评估时,我们考虑贝叶斯推断。例如,当基于复杂模拟器的统计模型适合于数据时,这发生,并且使用合成似然(SL)来使用计算成本高的前向模拟来形成噪声对数似然估计。我们将推理任务构建为贝叶斯序列设计问题,其中对数似然函数使用分层高斯过程(GP)代理模型进行建模,该模型用于有效地选择其他对数似然评估位置。最近在批处理贝叶斯优化中取得了进展,我们开发了各种顺序策略,其中自适应地选择多个模拟以最小化预期或中值损失函数,从而测量所得到的后验中的不确定性。我们从理论上和经验上分析了所得方法的性质。玩具问题和三个模拟模型的实验表明我们的方法是稳健的,高度可并行的,并且样本有效。
translated by 谷歌翻译
How can we take advantage of opportunities for experimental parallelization in exploration-exploitation tradeoffs? In many experimental scenarios, it is often desirable to execute experiments simultaneously or in batches, rather than only performing one at a time. Additionally , observations may be both noisy and expensive. We introduce Gaussian Process Batch Upper Confidence Bound (GP-BUCB), an upper confidence bound-based algorithm, which models the reward function as a sample from a Gaussian process and which can select batches of experiments to run in parallel. We prove a general regret bound for GP-BUCB, as well as the surprising result that for some common kernels, the asymptotic average regret can be made independent of the batch size. The GP-BUCB algorithm is also applicable in the related case of a delay between initiation of an experiment and observation of its results , for which the same regret bounds hold. We also introduce Gaussian Process Adaptive Upper Confidence Bound (GP-AUCB), a variant of GP-BUCB which can exploit parallelism in an adaptive manner. We evaluate GP-BUCB and GP-AUCB on several simulated and real data sets. These experiments show that GP-BUCB and GP-AUCB are competitive with state-of-the-art heuristics. 1
translated by 谷歌翻译
随机实验是评估变化对现实世界系统影响的黄金标准。这些测试中的数据可能难以收集,结果可能具有高度差异,从而导致潜在的大量测量误差。贝叶斯优化是一种有效优化多个连续参数的有前途的技术,但是当噪声水平高时,现有方法降低了性能,限制了其对多个随机实验的适用性。我们得到了一个表达式,用于预期的改进,具有噪声观察和噪声约束的批量优化,并开发了一种准蒙特卡罗近似,可以有效地进行优化。使用合成函数进行的仿真表明,噪声约束问题的优化性能优于现有方法。我们通过在Facebook上进行的两个真实的实验来进一步证明该方法的有效性:优化排名系统和优化服务器编译器标志。
translated by 谷歌翻译
贝叶斯优化是一种优化目标函数的方法,需要花费很长时间(几分钟或几小时)来评估。它最适合于在小于20维的连续域上进行优化,并且在功能评估中容忍随机噪声。它构建了目标的替代品,并使用贝叶斯机器学习技术,高斯过程回归量化该替代品中的不确定性,然后使用从该代理定义的获取函数来决定在何处进行抽样。在本教程中,我们描述了贝叶斯优化的工作原理,包括高斯过程回归和三种常见的采集功能:预期改进,熵搜索和知识梯度。然后,我们讨论了更先进的技术,包括在并行,多保真和多信息源优化,昂贵的评估约束,随机环境条件,多任务贝叶斯优化以及包含衍生信息的情况下运行多功能评估。最后,我们讨论了贝叶斯优化软件和该领域未来的研究方向。在我们的教程材料中,我们提供了对噪声评估的预期改进的时间化,超出了无噪声设置,在更常用的情况下。这种概括通过正式的决策理论论证来证明,与先前的临时修改形成鲜明对比。
translated by 谷歌翻译
Bayesian optimization is a sample-efficient method for black-box global optimization. However , the performance of a Bayesian optimization method very much depends on its exploration strategy, i.e. the choice of acquisition function , and it is not clear a priori which choice will result in superior performance. While portfolio methods provide an effective, principled way of combining a collection of acquisition functions, they are often based on measures of past performance which can be misleading. To address this issue, we introduce the Entropy Search Portfolio (ESP): a novel approach to portfolio construction which is motivated by information theoretic considerations. We show that ESP outperforms existing portfolio methods on several real and synthetic problems, including geostatistical datasets and simulated control tasks. We not only show that ESP is able to offer performance as good as the best, but unknown, acquisition function, but surprisingly it often gives better performance. Finally , over a wide range of conditions we find that ESP is robust to the inclusion of poor acquisition functions.
translated by 谷歌翻译
We present a tutorial on Bayesian optimization, a method of finding the maximum of expensive cost functions. Bayesian optimization employs the Bayesian technique of setting a prior over the objective function and combining it with evidence to get a posterior function. This permits a utility-based selection of the next observation to make on the objective function, which must take into account both exploration (sampling from areas of high uncertainty) and exploitation (sampling areas likely to offer improvement over the current best observation). We also present two detailed extensions of Bayesian optimization, with experiments-active user modelling with preferences, and hierarchical reinforcement learning-and a discussion of the pros and cons of Bayesian optimization based on our experiences.
translated by 谷歌翻译
许多应用程序需要优化评估成本昂贵的未知噪声函数。我们将这项任务正式化为一个多臂强盗问题,其中支付函数要么是从高斯过程(GP)中采样,要么是具有低RKHS范数。我们解决了导致此设置后悔限制的重要开放问题,这意味着GP优化的新收敛速度。 Weanalyze GP-UCB,一种直观的基于上置信度的算法,并且在最大信息增益方面限制了它的累积遗憾,在GP优化和实验设计之间建立了新的连接。此外,根据运算符光谱对后者进行处理,我们获得了许多常用协方差函数的显式次线性区域边界。在一些重要的案例中,我们的界限对维度的依赖程度令人惊讶。在我们对真实传感器数据的实验中,GP-UCB与其他的GP优化方法相比具有优势。
translated by 谷歌翻译
Bayesian optimization has been successful at global optimization of expensive-to-evaluate multimodal objective functions. However, unlike most optimization methods, Bayesian optimization typically does not use derivative information. In this paper we show how Bayesian optimization can exploit derivative information to find good solutions with fewer objective function evaluations. In particular, we develop a novel Bayesian optimization algorithm, the derivative-enabled knowledge-gradient (d-KG), which is one-step Bayes-optimal, asymptotically consistent, and provides greater one-step value of information than in the derivative-free setting. d-KG accommodates noisy and incomplete derivative information, comes in both sequential and batch forms, and can optionally reduce the computational cost of inference through automatically selected retention of a single directional derivative. We also compute the d-KG acquisition function and its gradient using a novel fast discretization-free technique. We show d-KG provides state-of-the-art performance compared to a wide range of optimization procedures with and without gradients, on benchmarks including logistic regression, deep learning, kernel learning, and k-nearest neighbors.
translated by 谷歌翻译
Many applications require optimizing an unknown, noisy function that is expensive to evaluate. We formalize this task as a multiarmed bandit problem, where the payoff function is either sampled from a Gaussian process (GP) or has low norm in a reproducing kernel Hilbert space. We resolve the important open problem of deriving regret bounds for this setting, which imply novel convergence rates for GP optimization. We analyze an intuitive Gaussian process upper confidence bound (-algorithm , and bound its cumulative regret in terms of maximal information gain, establishing a novel connection between GP optimization and experimental design. Moreover, by bounding the latter in terms of operator spectra, we obtain explicit sublinear regret bounds for many commonly used covariance functions. In some important cases, our bounds have surprisingly weak dependence on the dimensionality. In our experiments on real sensor data,-compares favorably with other heuristical GP optimization approaches.
translated by 谷歌翻译
In many applications of black-box optimization, one can evaluate multiple points simultaneously, e.g. when evaluating the performances of several different neural networks in a parallel computing environment. In this paper, we develop a novel batch Bayesian optimization algorithm-the parallel knowledge gradient method. By construction, this method provides the one-step Bayes optimal batch of points to sample. We provide an efficient strategy for computing this Bayes-optimal batch of points, and we demonstrate that the parallel knowledge gradient method finds global optima significantly faster than previous batch Bayesian optimization algorithms on both synthetic test functions and when tuning hyperparameters of practical machine learning algorithms, especially when function evaluations are noisy.
translated by 谷歌翻译
贝叶斯优化是一种样本有效的方法,用于查找昂贵的评估黑盒函数的全局优化。通过累积一对查询点和相应的函数值来发现全局解,重复这两个过程:(i)使用到目前为止观察到的数据学习目标函数的替代模型; (ii)最大化获取函数以确定在下一个查询目标函数的位置。收敛保证仅在找到并占据了获取函数的全局优化器并作为下一个查询点时才有效。然而,在实践中,还使用了采集函数的局部优化器,因为搜索采集函数的精确优化器通常是非常重要或耗时的任务。在本文中,我们通过对全局优化器的简单遗憾来分析本地优化器的采集函数的行为。我们还提供了多启动局部优化器用于找到最大采集函数时的性能分析。数值实验证实了理论分析的有效性。
translated by 谷歌翻译
本文提出了采集汤普森采样(ATS),这是一种基于随机过程采样多采集函数的思想的批量贝叶斯优化算法(BO)。我们通过采集函数对一组模型参数的依赖来定义该过程。 ATS在概念上简单,直接实现,与其他批处理BO方法不同,它可以用于并行化任何顺序采集功能。为了提高多模态任务的性能,我们表明ATS可以与现有技术结合以实现不同探索 - 利用交易,并考虑未决的功能评估。我们在各种基准函数和流行的梯度增强树算法的超参数优化上进行了实验。这些证明了我们的算法与两个最先进的批量BO方法的竞争力,以及它对经典并行Thompson采样BO的优势。
translated by 谷歌翻译
Recent work on Bayesian optimization has shown its effectiveness in global optimization of difficult black-box objective functions. Many real-world optimization problems of interest also have constraints which are unknown a priori. In this paper, we study Bayesian optimization for constrained problems in the general case that noise may be present in the constraint functions, and the objective and constraints may be evaluated independently. We provide motivating practical examples, and present a general framework to solve such problems. We demonstrate the effectiveness of our approach on optimizing the performance of online latent Dirichlet allocation subject to topic sparsity constraints, tuning a neural network given test-time memory constraints, and optimizing Hamiltonian Monte Carlo to achieve maximal effectiveness in a fixed time, subject to passing standard convergence diagnostics.
translated by 谷歌翻译
Bayesian optimization with Gaussian processes has become an increasingly popular tool in the machine learning community. It is efficient and can be used when very little is known about the objective function, making it popular in expensive black-box optimization scenarios. It uses Bayesian methods to sample the objective efficiently using an acquisition function which incorporates the posterior estimate of the objective. However, there are several different parameterized acquisition functions in the literature, and it is often unclear which one to use. Instead of using a single acquisition function, we adopt a portfolio of acquisition functions governed by an online multi-armed bandit strategy. We propose several portfolio strategies, the best of which we call GP-Hedge, and show that this method outperforms the best individual acquisition function. We also provide a theoretical bound on the algorithm's performance .
translated by 谷歌翻译