We develop parallel predictive entropy search (PPES), a novel algorithm for Bayesian optimization of expensive black-box objective functions. At each iteration , PPES aims to select a batch of points which will maximize the information gain about the global maximizer of the objective. Well known strategies exist for suggesting a single evaluation point based on previous observations, while far fewer are known for selecting batches of points to evaluate in parallel. The few batch selection schemes that have been studied all resort to greedy methods to compute an optimal batch. To the best of our knowledge, PPES is the first non-greedy batch Bayesian optimization strategy. We demonstrate the benefit of this approach in optimization performance on both synthetic and real world applications , including problems in machine learning, rocket science and robotics.
translated by 谷歌翻译
本文提出了采集汤普森采样(ATS),这是一种基于随机过程采样多采集函数的思想的批量贝叶斯优化算法(BO)。我们通过采集函数对一组模型参数的依赖来定义该过程。 ATS在概念上简单,直接实现,与其他批处理BO方法不同,它可以用于并行化任何顺序采集功能。为了提高多模态任务的性能,我们表明ATS可以与现有技术结合以实现不同探索 - 利用交易,并考虑未决的功能评估。我们在各种基准函数和流行的梯度增强树算法的超参数优化上进行了实验。这些证明了我们的算法与两个最先进的批量BO方法的竞争力,以及它对经典并行Thompson采样BO的优势。
translated by 谷歌翻译
We propose a novel information-theoretic approach for Bayesian optimization called Predictive Entropy Search (PES). At each iteration, PES selects the next evaluation point that maximizes the expected information gained with respect to the global maximum. PES codifies this intractable acquisition function in terms of the expected reduction in the differential entropy of the predictive distribution. This reformulation allows PES to obtain approximations that are both more accurate and efficient than other alternatives such as Entropy Search (ES). Furthermore , PES can easily perform a fully Bayesian treatment of the model hy-perparameters while ES cannot. We evaluate PES in both synthetic and real-world applications, including optimization problems in machine learning, finance, biotechnology, and robotics. We show that the increased accuracy of PES leads to significant gains in optimization performance.
translated by 谷歌翻译
In many applications of black-box optimization, one can evaluate multiple points simultaneously, e.g. when evaluating the performances of several different neural networks in a parallel computing environment. In this paper, we develop a novel batch Bayesian optimization algorithm-the parallel knowledge gradient method. By construction, this method provides the one-step Bayes optimal batch of points to sample. We provide an efficient strategy for computing this Bayes-optimal batch of points, and we demonstrate that the parallel knowledge gradient method finds global optima significantly faster than previous batch Bayesian optimization algorithms on both synthetic test functions and when tuning hyperparameters of practical machine learning algorithms, especially when function evaluations are noisy.
translated by 谷歌翻译
贝叶斯优化已成为具有昂贵评估成本的功能全局优化的有力候选工具。然而,由于贝叶斯方法研究的动态性以及计算技术的发展,在并行计算环境中使用贝叶斯优化仍然是非专家的挑战。在本报告中,我回顾了最先进的并行和可扩展贝叶斯优化方法。另外,我提出了一些实用的方法来避免贝叶斯优化的一些缺陷,例如边缘参数的过采样和高性能参数的过度利用。最后,我提供了相对简单的启发式算法,以及他们的开源软件实现,可以在任何计算环境中立即轻松部署。
translated by 谷歌翻译
贝叶斯优化是一种优化目标函数的方法,需要花费很长时间(几分钟或几小时)来评估。它最适合于在小于20维的连续域上进行优化,并且在功能评估中容忍随机噪声。它构建了目标的替代品,并使用贝叶斯机器学习技术,高斯过程回归量化该替代品中的不确定性,然后使用从该代理定义的获取函数来决定在何处进行抽样。在本教程中,我们描述了贝叶斯优化的工作原理,包括高斯过程回归和三种常见的采集功能:预期改进,熵搜索和知识梯度。然后,我们讨论了更先进的技术,包括在并行,多保真和多信息源优化,昂贵的评估约束,随机环境条件,多任务贝叶斯优化以及包含衍生信息的情况下运行多功能评估。最后,我们讨论了贝叶斯优化软件和该领域未来的研究方向。在我们的教程材料中,我们提供了对噪声评估的预期改进的时间化,超出了无噪声设置,在更常用的情况下。这种概括通过正式的决策理论论证来证明,与先前的临时修改形成鲜明对比。
translated by 谷歌翻译
贝叶斯优化(BO)及其批量扩展是成功的,以优化昂贵的黑盒功能。然而,当BO的计算成本可以主导评估黑盒功能的成本时,这些传统的BOapproach还不是优化较便宜功能的理想选择。这些较便宜的功能的例子是便宜的机器学习模型,通过模拟器的廉价物理实验,以及贝叶斯优化中的获取功能优化。在本文中,我们考虑了功能评估较便宜的情况下的abatch BO设置。我们的模型基于一种新的探索策略,使用几何距离提供了另一种探索方式,选择远离观察位置的点。使用这种直觉,我们建议使用Sobol序列来指导探索,这将消除运行多个全局优化步骤,如以前的工作中所使用的那样。基于提出的距离探索,我们提出了一种有效的批量BO方法。我们证明,当函数评估较便宜时,我们的方法优于其他基线和全局优化方法。
translated by 谷歌翻译
随机实验是评估变化对现实世界系统影响的黄金标准。这些测试中的数据可能难以收集,结果可能具有高度差异,从而导致潜在的大量测量误差。贝叶斯优化是一种有效优化多个连续参数的有前途的技术,但是当噪声水平高时,现有方法降低了性能,限制了其对多个随机实验的适用性。我们得到了一个表达式,用于预期的改进,具有噪声观察和噪声约束的批量优化,并开发了一种准蒙特卡罗近似,可以有效地进行优化。使用合成函数进行的仿真表明,噪声约束问题的优化性能优于现有方法。我们通过在Facebook上进行的两个真实的实验来进一步证明该方法的有效性:优化排名系统和优化服务器编译器标志。
translated by 谷歌翻译
Bayesian optimization methods are often used to optimize unknown functions that are costly to evaluate. Typically, these methods sequentially select inputs to be evaluated one at a time based on a posterior over the unknown function that is updated after each evaluation. In many applications, however, it is desirable to perform multiple evaluations in parallel, which requires selecting batches of multiple inputs to evaluate at once. In this paper, we propose a novel approach to batch Bayesian optimization, providing a policy for selecting batches of inputs with the goal of optimizing the function as efficiently as possible. The key idea is to exploit the availability of high-quality and efficient sequential policies, by using Monte-Carlo simulation to select input batches that closely match their expected behavior. Our experimental results on six benchmarks show that the proposed approach significantly outperforms two baselines and can lead to large advantages over a top sequential approach in terms of performance per unit time.
translated by 谷歌翻译
我们提出了K-Means Batch Bayesian Optimization(KMBBO),一种用于贝叶斯优化(BO)的新型批量采样算法。 KMBBO使用无监督学习来有效地估计模型采集功能的峰值。 Weshow在实证实验中表示,我们的方法在各种测试问题上优于当前最先进的批量分配算法,包括调整算法超参数和具有挑战性的药物发现问题。为了适应高维数据的现实问题,我们提出了对KMBBO的修改,将其与压缩传感相结合,以将优化投影到较低维度的子空间中。我们根据经验证明,这种两步法优于降低非维度性的算法。
translated by 谷歌翻译
Recent work on Bayesian optimization has shown its effectiveness in global optimization of difficult black-box objective functions. Many real-world optimization problems of interest also have constraints which are unknown a priori. In this paper, we study Bayesian optimization for constrained problems in the general case that noise may be present in the constraint functions, and the objective and constraints may be evaluated independently. We provide motivating practical examples, and present a general framework to solve such problems. We demonstrate the effectiveness of our approach on optimizing the performance of online latent Dirichlet allocation subject to topic sparsity constraints, tuning a neural network given test-time memory constraints, and optimizing Hamiltonian Monte Carlo to achieve maximal effectiveness in a fixed time, subject to passing standard convergence diagnostics.
translated by 谷歌翻译
代理辅助优化算法是一种很有前途的方法,可以解决昂贵的多目标优化问题。然而,大多数现有的辅助多目标优化算法有三个回调:1)不能很好地解决高维决策空间的问题,2)不能结合可用的梯度信息,3)不支持批量优化。这些缺点阻碍了它们用于解决许多现实世界的大规模优化问题。本文提出了一种可扩展的可扩展多目标贝叶斯优化算法来解决这些问题。该算法使用贝叶斯神经网络作为可扩展的替代模型。由Monte Carlo辍学和Sobolovtraining提供支持,该模型可以轻松培训,并可以包含可用的梯度信息。我们还提出了一种新的批量超体积上置信息约束采集函数来支持批量优化。各种基准问题和实际应用的实验结果证明了该算法的有效性。
translated by 谷歌翻译
贝叶斯优化(BO)是指用于对昂贵的黑盒函数进行全局优化的一套技术,它使用函数的内省贝叶斯模型来有效地找到最优值。虽然BO已经在许多应用中成功应用,但现代优化任务迎来了传统方法失败的新挑战。在这项工作中,我们展示了Dragonfly,这是一个开源Python库,用于可扩展和强大的BO.Dragonfly包含多个最近开发的方法,允许BO应用于具有挑战性的现实世界环境;这些包括更好的处理更高维域的方法,当昂贵函数的廉价近似可用时处理多保真评估的方法,优化结构化组合空间的方法,例如神经网络架构的空间,以及处理并行评估的方法。此外,我们在BO中开发了新的方法改进,用于选择贝叶斯模型,选择采集函数,以及优化具有不同变量类型和附加约束的过复杂域。我们将Dragonfly与一套用于全局优化的其他软件包和算法进行比较,并证明当上述方法集成时,它们可以显着改善BO的性能。 Dragonfly图书馆可在dragonfly.github.io上找到。
translated by 谷歌翻译
Bayesian optimization is a prominent method for optimizing expensive-to-evaluate black-box functions that is widely applied to tuning the hyperparameters of machine learning algorithms. Despite its successes, the prototypical Bayesian optimization approach-using Gaussian process models-does not scale well to either many hyperparameters or many function evaluations. Attacking this lack of scalability and flexibility is thus one of the key challenges of the field. We present a general approach for using flexible parametric models (neural networks) for Bayesian optimization, staying as close to a truly Bayesian treatment as possible. We obtain scalability through stochastic gradient Hamiltonian Monte Carlo, whose robustness we improve via a scale adaptation. Experiments including multi-task Bayesian optimization with 21 tasks, parallel optimization of deep neural networks and deep reinforcement learning show the power and flexibility of this approach.
translated by 谷歌翻译
用反向传播训练的大型多层神经网络最近在广泛的问题中实现了最先进的结果。然而,使用反向支持进行神经网络学习仍然具有一些缺点,例如,对数据进行大量的超参数,缺乏校准的概率预测,以及过度拟合训练数据的趋势。原则上,贝叶斯学习神经网络的方法没有这些问题。但是,现有的贝叶斯技术缺乏对大数据集和网络规模的可扩展性。在这项工作中,我们提出了一种新的可扩展贝叶斯神经网络学习方法,称为概率反向传播(PBP)。与经典反向传播类似,PBP通过计算网络中概率的前向传播,然后进行梯度的后向计算。对十个真实世界数据集进行的一系列实验表明,PBP显着快于其他技术,同时提供竞争性预测能力。我们的实验还表明,PBP可以准确估计网络权重的后验方差。
translated by 谷歌翻译
Bayesian optimization is a sample-efficient method for black-box global optimization. However , the performance of a Bayesian optimization method very much depends on its exploration strategy, i.e. the choice of acquisition function , and it is not clear a priori which choice will result in superior performance. While portfolio methods provide an effective, principled way of combining a collection of acquisition functions, they are often based on measures of past performance which can be misleading. To address this issue, we introduce the Entropy Search Portfolio (ESP): a novel approach to portfolio construction which is motivated by information theoretic considerations. We show that ESP outperforms existing portfolio methods on several real and synthetic problems, including geostatistical datasets and simulated control tasks. We not only show that ESP is able to offer performance as good as the best, but unknown, acquisition function, but surprisingly it often gives better performance. Finally , over a wide range of conditions we find that ESP is robust to the inclusion of poor acquisition functions.
translated by 谷歌翻译
贝叶斯优化和Lipschitz优化已经开发出用于优化黑盒功能的替代技术。它们各自利用关于函数的不同形式的先验。在这项工作中,我们探索了这些技术的策略,以便更好地进行全局优化。特别是,我们提出了在传统BO算法中使用Lipschitz连续性假设的方法,我们称之为Lipschitz贝叶斯优化(LBO)。这种方法不会增加渐近运行时间,并且在某些情况下会大大提高性能(而在最坏的情况下,性能类似)。实际上,在一个特定的环境中,我们证明使用Lipschitz信息产生与后悔相同或更好的界限,而不是单独使用贝叶斯优化。此外,我们提出了一个简单的启发式方法来估计Lipschitz常数,并证明Lipschitz常数的增长估计在某种意义上是“无害的”。我们对具有4个采集函数的15个数据集进行的实验表明,在最坏的情况下,LBO的表现类似于底层BO方法,而在某些情况下,它的表现要好得多。特别是汤普森采样通常看到了极大的改进(因为Lipschitz信息已经得到了很好的修正) - 探索“现象”及其LBO变体通常优于其他采集功能。
translated by 谷歌翻译
We present a tutorial on Bayesian optimization, a method of finding the maximum of expensive cost functions. Bayesian optimization employs the Bayesian technique of setting a prior over the objective function and combining it with evidence to get a posterior function. This permits a utility-based selection of the next observation to make on the objective function, which must take into account both exploration (sampling from areas of high uncertainty) and exploitation (sampling areas likely to offer improvement over the current best observation). We also present two detailed extensions of Bayesian optimization, with experiments-active user modelling with preferences, and hierarchical reinforcement learning-and a discussion of the pros and cons of Bayesian optimization based on our experiences.
translated by 谷歌翻译
近似贝叶斯计算(ABC)是贝叶斯推理的一种方法,当可能性不可用时,但是可以从模型中进行模拟。然而,许多ABC算法需要大量的模拟,这可能是昂贵的。为了降低计算成本,已经提出了贝叶斯优化(BO)和诸如高斯过程的模拟模型。贝叶斯优化使人们可以智能地决定在哪里评估模型下一个,但是常见的BO策略不是为了估计后验分布而设计的。我们的论文解决了文献中的这一差距。我们建议计算ABC后验密度的不确定性,这是因为缺乏模拟来准确估计这个数量,并且定义了测量这种不确定性的aloss函数。然后,我们建议选择下一个评估位置,以尽量减少预期的损失。实验表明,与普通BO策略相比,所提出的方法通常产生最准确的近似。
translated by 谷歌翻译
Bayesian optimization has been successful at global optimization of expensive-to-evaluate multimodal objective functions. However, unlike most optimization methods, Bayesian optimization typically does not use derivative information. In this paper we show how Bayesian optimization can exploit derivative information to find good solutions with fewer objective function evaluations. In particular, we develop a novel Bayesian optimization algorithm, the derivative-enabled knowledge-gradient (d-KG), which is one-step Bayes-optimal, asymptotically consistent, and provides greater one-step value of information than in the derivative-free setting. d-KG accommodates noisy and incomplete derivative information, comes in both sequential and batch forms, and can optionally reduce the computational cost of inference through automatically selected retention of a single directional derivative. We also compute the d-KG acquisition function and its gradient using a novel fast discretization-free technique. We show d-KG provides state-of-the-art performance compared to a wide range of optimization procedures with and without gradients, on benchmarks including logistic regression, deep learning, kernel learning, and k-nearest neighbors.
translated by 谷歌翻译