Bayesian optimization has been successful at global optimization of expensive-to-evaluate multimodal objective functions. However, unlike most optimization methods, Bayesian optimization typically does not use derivative information. In this paper we show how Bayesian optimization can exploit derivative information to find good solutions with fewer objective function evaluations. In particular, we develop a novel Bayesian optimization algorithm, the derivative-enabled knowledge-gradient (d-KG), which is one-step Bayes-optimal, asymptotically consistent, and provides greater one-step value of information than in the derivative-free setting. d-KG accommodates noisy and incomplete derivative information, comes in both sequential and batch forms, and can optionally reduce the computational cost of inference through automatically selected retention of a single directional derivative. We also compute the d-KG acquisition function and its gradient using a novel fast discretization-free technique. We show d-KG provides state-of-the-art performance compared to a wide range of optimization procedures with and without gradients, on benchmarks including logistic regression, deep learning, kernel learning, and k-nearest neighbors.
translated by 谷歌翻译
贝叶斯优化在优化耗时的黑盒目标方面很受欢迎。尽管如此,对于深度神经网络中的超参数调整,即使是一些超参数设置评估验证错误所需的时间仍然是瓶颈。多保真优化有望减少对这些目标使用更便宜的代理 - 例如,使用训练点的子集训练网络的验证错误或者收敛所需的迭代次数更少。我们提出了一种高度灵活和实用的多保真贝叶斯优化方法,重点是有效地优化迭代训练的监督学习模型的超参数。我们引入了一种新的采集功能,即跟踪感知知识梯度,它有效地利用了多个连续保真度控制和跟踪观察---保真序列中物镜的值,当使用训练迭代改变保真度时可用。我们提供了可用于优化我们的采集功能的可变方法,并展示了它为超神经网络和大规模内核学习的超参数调整提供了最先进的替代方案。
translated by 谷歌翻译
贝叶斯优化是一种优化目标函数的方法,需要花费很长时间(几分钟或几小时)来评估。它最适合于在小于20维的连续域上进行优化,并且在功能评估中容忍随机噪声。它构建了目标的替代品,并使用贝叶斯机器学习技术,高斯过程回归量化该替代品中的不确定性,然后使用从该代理定义的获取函数来决定在何处进行抽样。在本教程中,我们描述了贝叶斯优化的工作原理,包括高斯过程回归和三种常见的采集功能:预期改进,熵搜索和知识梯度。然后,我们讨论了更先进的技术,包括在并行,多保真和多信息源优化,昂贵的评估约束,随机环境条件,多任务贝叶斯优化以及包含衍生信息的情况下运行多功能评估。最后,我们讨论了贝叶斯优化软件和该领域未来的研究方向。在我们的教程材料中,我们提供了对噪声评估的预期改进的时间化,超出了无噪声设置,在更常用的情况下。这种概括通过正式的决策理论论证来证明,与先前的临时修改形成鲜明对比。
translated by 谷歌翻译
We develop parallel predictive entropy search (PPES), a novel algorithm for Bayesian optimization of expensive black-box objective functions. At each iteration , PPES aims to select a batch of points which will maximize the information gain about the global maximizer of the objective. Well known strategies exist for suggesting a single evaluation point based on previous observations, while far fewer are known for selecting batches of points to evaluate in parallel. The few batch selection schemes that have been studied all resort to greedy methods to compute an optimal batch. To the best of our knowledge, PPES is the first non-greedy batch Bayesian optimization strategy. We demonstrate the benefit of this approach in optimization performance on both synthetic and real world applications , including problems in machine learning, rocket science and robotics.
translated by 谷歌翻译
We consider Bayesian methods for multi-information source optimization (MISO), in which we seek to optimize an expensive-to-evaluate black-box objective function while also accessing cheaper but biased and noisy approximations ("information sources"). We present a novel algorithm that outperforms the state of the art for this problem by using a Gaussian process covariance kernel better suited to MISO than those used by previous approaches, and an acquisition function based on a one-step optimality analysis supported by efficient parallelization. We also provide a novel technique to guarantee the asymptotic quality of the solution provided by this algorithm. Experimental evaluations demonstrate that this algorithm consistently finds designs of higher value at less cost than previous approaches.
translated by 谷歌翻译
We consider parallel global optimization of derivative-free expensive-to-evaluate functions, and propose an efficient method based on stochastic approximation for implementing a conceptual Bayesian optimization algorithm proposed by Ginsbourger et al. (2007). At the heart of this algorithm is maximizing the information criterion called the "multi-points expected improvement", or the q-EI. To accomplish this, we use infinitessimal perturbation analysis (IPA) to construct a stochastic gradient estimator and show that this estimator is unbiased. We also show that the stochastic gradient ascent algorithm using the constructed gradient estimator converges to a stationary point of the q-EI surface, and therefore, as the number of multiple starts of the gradient ascent algorithm and the number of steps for each start grow large, the one-step Bayes optimal set of points is recovered. We show in numerical experiments that our method for maximizing the q-EI is faster than methods based on closed-form evaluation using high-dimensional integration, when considering many parallel function evaluations, and is comparable in speed when considering few. We also show that the resulting one-step Bayes optimal algorithm for parallel global optimization finds high-quality solutions with fewer evaluations than a heuristic based on approximately maximizing the q-EI. A high-quality open source implementation of this algorithm is available in the open source Metrics
translated by 谷歌翻译
机器学习算法经常需要仔细调整模型超参数,正则化项和优化参数。不幸的是,这种调整通常是一种“黑色艺术”,需要专家经验,不成文的经验法则,或者有时需要强力搜索。更具吸引力的是开发自动方法的想法,该方法可以优化给定学习算法的性能以适应手头的任务。在这项工作中,我们考虑了贝叶斯优化框架内的自动调整问题,其中学习算法的泛化性能被建模为来自高斯过程(GP)的样本。由GP诱导的可处理的后部分布导致有效使用先前实验收集的信息,从而能够对接下来尝试的参数进行最佳选择。在这里,我们展示了高斯过程先验和相关推理过程的效果如何对贝叶斯优化的成功或失败产生很大影响。我们表明,在思考机器学习算法中,思考能够导致超出专家级性能的结果。我们还描述了新的算法,它们考虑了学习实验的可变成本(持续时间),并且可以利用多个核心的存在进行并行实验。我们证明了这些提出的算法改进了以前的自动程序,并且可以在包括潜在Dirichlet分配,结构化SVM和卷积神经网络在内的各种当代算法上进行或超越人类专家级优化。
translated by 谷歌翻译
Chemical space is so large that brute force searches for new interesting molecules are in-feasible. High-throughput virtual screening via computer cluster simulations can speed up the discovery process by collecting very large amounts of data in parallel, e.g., up to hundreds or thousands of parallel measurements. Bayesian optimization (BO) can produce additional acceleration by sequentially identifying the most useful simulations or experiments to be performed next. However, current BO methods cannot scale to the large numbers of parallel measurements and the massive libraries of molecules currently used in high-throughput screening. Here, we propose a scalable solution based on a parallel and distributed implementation of Thompson sampling (PDTS). We show that, in small scale problems, PDTS performs similarly as parallel expected improvement (EI), a batch version of the most widely used BO heuristic. Additionally , in settings where parallel EI does not scale, PDTS outperforms other scalable baselines such as a greedy search,-greedy approaches and a random search method. These results show that PDTS is a successful solution for large-scale parallel BO.
translated by 谷歌翻译
本文提出了采集汤普森采样(ATS),这是一种基于随机过程采样多采集函数的思想的批量贝叶斯优化算法(BO)。我们通过采集函数对一组模型参数的依赖来定义该过程。 ATS在概念上简单,直接实现,与其他批处理BO方法不同,它可以用于并行化任何顺序采集功能。为了提高多模态任务的性能,我们表明ATS可以与现有技术结合以实现不同探索 - 利用交易,并考虑未决的功能评估。我们在各种基准函数和流行的梯度增强树算法的超参数优化上进行了实验。这些证明了我们的算法与两个最先进的批量BO方法的竞争力,以及它对经典并行Thompson采样BO的优势。
translated by 谷歌翻译
贝叶斯优化和Lipschitz优化已经开发出用于优化黑盒功能的替代技术。它们各自利用关于函数的不同形式的先验。在这项工作中,我们探索了这些技术的策略,以便更好地进行全局优化。特别是,我们提出了在传统BO算法中使用Lipschitz连续性假设的方法,我们称之为Lipschitz贝叶斯优化(LBO)。这种方法不会增加渐近运行时间,并且在某些情况下会大大提高性能(而在最坏的情况下,性能类似)。实际上,在一个特定的环境中,我们证明使用Lipschitz信息产生与后悔相同或更好的界限,而不是单独使用贝叶斯优化。此外,我们提出了一个简单的启发式方法来估计Lipschitz常数,并证明Lipschitz常数的增长估计在某种意义上是“无害的”。我们对具有4个采集函数的15个数据集进行的实验表明,在最坏的情况下,LBO的表现类似于底层BO方法,而在某些情况下,它的表现要好得多。特别是汤普森采样通常看到了极大的改进(因为Lipschitz信息已经得到了很好的修正) - 探索“现象”及其LBO变体通常优于其他采集功能。
translated by 谷歌翻译
Bayesian optimization has recently been proposed as a framework for automatically tuning the hyperparameters of machine learning models and has been shown to yield state-of-the-art performance with impressive ease and efficiency. In this paper, we explore whether it is possible to transfer the knowledge gained from previous optimizations to new tasks in order to find optimal hyperparameter settings more efficiently. Our approach is based on extending multi-task Gaussian processes to the framework of Bayesian optimization. We show that this method significantly speeds up the optimization process when compared to the standard single-task approach. We further propose a straightforward extension of our algorithm in order to jointly minimize the average error across multiple tasks and demonstrate how this can be used to greatly speed up k-fold cross-validation. Lastly, we propose an adaptation of a recently developed acquisition function, en-tropy search, to the cost-sensitive, multi-task setting. We demonstrate the utility of this new acquisition function by leveraging a small dataset to explore hyper-parameter settings for a large dataset. Our algorithm dynamically chooses which dataset to query in order to yield the most information per unit cost.
translated by 谷歌翻译
贝叶斯优化(BO)是指用于对昂贵的黑盒函数进行全局优化的一套技术,它使用函数的内省贝叶斯模型来有效地找到最优值。虽然BO已经在许多应用中成功应用,但现代优化任务迎来了传统方法失败的新挑战。在这项工作中,我们展示了Dragonfly,这是一个开源Python库,用于可扩展和强大的BO.Dragonfly包含多个最近开发的方法,允许BO应用于具有挑战性的现实世界环境;这些包括更好的处理更高维域的方法,当昂贵函数的廉价近似可用时处理多保真评估的方法,优化结构化组合空间的方法,例如神经网络架构的空间,以及处理并行评估的方法。此外,我们在BO中开发了新的方法改进,用于选择贝叶斯模型,选择采集函数,以及优化具有不同变量类型和附加约束的过复杂域。我们将Dragonfly与一套用于全局优化的其他软件包和算法进行比较,并证明当上述方法集成时,它们可以显着改善BO的性能。 Dragonfly图书馆可在dragonfly.github.io上找到。
translated by 谷歌翻译
We propose minimum regret search (MRS), a novel acquisition function for Bayesian optimization. MRS bears similarities with information-theoretic approaches such as en-tropy search (ES). However, while ES aims in each query at maximizing the information gain with respect to the global maximum, MRS aims at minimizing the expected simple regret of its ultimate recommendation for the optimum. While empirically ES and MRS perform similar in most of the cases, MRS produces fewer out-liers with high simple regret than ES. We provide empirical results both for a synthetic single-task optimization problem as well as for a simulated multi-task robotic control problem.
translated by 谷歌翻译
We propose a novel information-theoretic approach for Bayesian optimization called Predictive Entropy Search (PES). At each iteration, PES selects the next evaluation point that maximizes the expected information gained with respect to the global maximum. PES codifies this intractable acquisition function in terms of the expected reduction in the differential entropy of the predictive distribution. This reformulation allows PES to obtain approximations that are both more accurate and efficient than other alternatives such as Entropy Search (ES). Furthermore , PES can easily perform a fully Bayesian treatment of the model hy-perparameters while ES cannot. We evaluate PES in both synthetic and real-world applications, including optimization problems in machine learning, finance, biotechnology, and robotics. We show that the increased accuracy of PES leads to significant gains in optimization performance.
translated by 谷歌翻译
Bayesian optimization with Gaussian processes has become an increasingly popular tool in the machine learning community. It is efficient and can be used when very little is known about the objective function, making it popular in expensive black-box optimization scenarios. It uses Bayesian methods to sample the objective efficiently using an acquisition function which incorporates the posterior estimate of the objective. However, there are several different parameterized acquisition functions in the literature, and it is often unclear which one to use. Instead of using a single acquisition function, we adopt a portfolio of acquisition functions governed by an online multi-armed bandit strategy. We propose several portfolio strategies, the best of which we call GP-Hedge, and show that this method outperforms the best individual acquisition function. We also provide a theoretical bound on the algorithm's performance .
translated by 谷歌翻译
Bayesian optimization has become a successful tool for hyperparameter optimization of machine learning algorithms, such as support vector machines or deep neural networks. Despite its success , for large datasets, training and validating a single configuration often takes hours, days, or even weeks, which limits the achievable performance. To accelerate hyperparameter optimization , we propose a generative model for the validation error as a function of training set size, which is learned during the optimization process and allows exploration of preliminary configurations on small subsets, by extrapolating to the full dataset. We construct a Bayesian optimization procedure, dubbed FABOLAS, which models loss and training time as a function of dataset size and automatically trades off high information gain about the global optimum against computational cost. Experiments optimizing support vector machines and deep neural networks show that FABOLAS often finds high-quality solutions 10 to 100 times faster than other state-of-the-art Bayesian optimization methods or the recently proposed bandit strategy Hyperband.
translated by 谷歌翻译
We develop a framework for warm-starting Bayesian optimization, that reduces the solution time required to solve an optimization problem that is one in a sequence of related problems. This is useful when optimizing the output of a stochastic simulator that fails to provide derivative information, for which Bayesian optimization methods are well-suited. Solving sequences of related optimization problems arises when making several business decisions using one optimization model and input data collected over different time periods or markets. While many gradient-based methods can be warm started by initiating optimization at the solution to the previous problem, this warm start approach does not apply to Bayesian optimization methods, which carry a full metamodel of the objective function from iteration to iteration. Our approach builds a joint statistical model of the entire collection of related objective functions, and uses a value of information calculation to recommend points to evaluate.
translated by 谷歌翻译
Bayesian optimization is a sample-efficient method for black-box global optimization. However , the performance of a Bayesian optimization method very much depends on its exploration strategy, i.e. the choice of acquisition function , and it is not clear a priori which choice will result in superior performance. While portfolio methods provide an effective, principled way of combining a collection of acquisition functions, they are often based on measures of past performance which can be misleading. To address this issue, we introduce the Entropy Search Portfolio (ESP): a novel approach to portfolio construction which is motivated by information theoretic considerations. We show that ESP outperforms existing portfolio methods on several real and synthetic problems, including geostatistical datasets and simulated control tasks. We not only show that ESP is able to offer performance as good as the best, but unknown, acquisition function, but surprisingly it often gives better performance. Finally , over a wide range of conditions we find that ESP is robust to the inclusion of poor acquisition functions.
translated by 谷歌翻译
Recent work on Bayesian optimization has shown its effectiveness in global optimization of difficult black-box objective functions. Many real-world optimization problems of interest also have constraints which are unknown a priori. In this paper, we study Bayesian optimization for constrained problems in the general case that noise may be present in the constraint functions, and the objective and constraints may be evaluated independently. We provide motivating practical examples, and present a general framework to solve such problems. We demonstrate the effectiveness of our approach on optimizing the performance of online latent Dirichlet allocation subject to topic sparsity constraints, tuning a neural network given test-time memory constraints, and optimizing Hamiltonian Monte Carlo to achieve maximal effectiveness in a fixed time, subject to passing standard convergence diagnostics.
translated by 谷歌翻译
Entropy Search (ES) and Predictive Entropy Search (PES) are popular and empirically successful Bayesian Optimization techniques. Both rely on a compelling information-theoretic motivation , and maximize the information gained about the arg max of the unknown function; yet, both are plagued by the expensive computation for estimating entropies. We propose a new criterion , Max-value Entropy Search (MES), that instead uses the information about the maximum function value. We show relations of MES to other Bayesian optimization methods, and establish a regret bound. We observe that MES maintains or improves the good empirical performance of ES/PES, while tremendously lightening the computational burden. In particular, MES is much more robust to the number of samples used for computing the entropy, and hence more efficient for higher dimensional problems.
translated by 谷歌翻译