在线现场实验是评估现实世界交互式机器学习系统变化的黄金标准方法。然而,我们探索复杂的多维政策空间的能力 - 例如在建议和排名问题中发现的那些 - 往往受到可以同时运行的有限数量的实验的限制。为了减轻这些限制,我们使用离线模拟器增加在线实验,并应用多任务贝叶斯优化来调整实时机器学习系统。我们描述了在这些类型的应用程序中出现的实际问题,包括使用模拟器产生的偏差和多任务内核的假设。我们测量经验学习曲线,其显示来自包括偏向离线实验的数据的实质性收益,并且显示这些学习曲线如何与多任务高斯过程概括的理论结果一致。我们发现改进的内核推理是多任务泛化的重要驱动因素。最后,我们展示了贝叶斯优化的几个例子,它们通过结合离线实验和在线实验来有效地调整实时机器学习系统。
translated by 谷歌翻译
贝叶斯优化是一种优化目标函数的方法,需要花费很长时间(几分钟或几小时)来评估。它最适合于在小于20维的连续域上进行优化,并且在功能评估中容忍随机噪声。它构建了目标的替代品,并使用贝叶斯机器学习技术,高斯过程回归量化该替代品中的不确定性,然后使用从该代理定义的获取函数来决定在何处进行抽样。在本教程中,我们描述了贝叶斯优化的工作原理,包括高斯过程回归和三种常见的采集功能:预期改进,熵搜索和知识梯度。然后,我们讨论了更先进的技术,包括在并行,多保真和多信息源优化,昂贵的评估约束,随机环境条件,多任务贝叶斯优化以及包含衍生信息的情况下运行多功能评估。最后,我们讨论了贝叶斯优化软件和该领域未来的研究方向。在我们的教程材料中,我们提供了对噪声评估的预期改进的时间化,超出了无噪声设置,在更常用的情况下。这种概括通过正式的决策理论论证来证明,与先前的临时修改形成鲜明对比。
translated by 谷歌翻译
Recent work on Bayesian optimization has shown its effectiveness in global optimization of difficult black-box objective functions. Many real-world optimization problems of interest also have constraints which are unknown a priori. In this paper, we study Bayesian optimization for constrained problems in the general case that noise may be present in the constraint functions, and the objective and constraints may be evaluated independently. We provide motivating practical examples, and present a general framework to solve such problems. We demonstrate the effectiveness of our approach on optimizing the performance of online latent Dirichlet allocation subject to topic sparsity constraints, tuning a neural network given test-time memory constraints, and optimizing Hamiltonian Monte Carlo to achieve maximal effectiveness in a fixed time, subject to passing standard convergence diagnostics.
translated by 谷歌翻译
Bayesian optimization is a sample-efficient method for black-box global optimization. However , the performance of a Bayesian optimization method very much depends on its exploration strategy, i.e. the choice of acquisition function , and it is not clear a priori which choice will result in superior performance. While portfolio methods provide an effective, principled way of combining a collection of acquisition functions, they are often based on measures of past performance which can be misleading. To address this issue, we introduce the Entropy Search Portfolio (ESP): a novel approach to portfolio construction which is motivated by information theoretic considerations. We show that ESP outperforms existing portfolio methods on several real and synthetic problems, including geostatistical datasets and simulated control tasks. We not only show that ESP is able to offer performance as good as the best, but unknown, acquisition function, but surprisingly it often gives better performance. Finally , over a wide range of conditions we find that ESP is robust to the inclusion of poor acquisition functions.
translated by 谷歌翻译
近似贝叶斯计算(ABC)是贝叶斯推理的一种方法,当可能性不可用时,但是可以从模型中进行模拟。然而,许多ABC算法需要大量的模拟,这可能是昂贵的。为了降低计算成本,已经提出了贝叶斯优化(BO)和诸如高斯过程的模拟模型。贝叶斯优化使人们可以智能地决定在哪里评估模型下一个,但是常见的BO策略不是为了估计后验分布而设计的。我们的论文解决了文献中的这一差距。我们建议计算ABC后验密度的不确定性,这是因为缺乏模拟来准确估计这个数量,并且定义了测量这种不确定性的aloss函数。然后,我们建议选择下一个评估位置,以尽量减少预期的损失。实验表明,与普通BO策略相比,所提出的方法通常产生最准确的近似。
translated by 谷歌翻译
当只能获得有限数量的noisylog-likelihood评估时,我们考虑贝叶斯推断。例如,当基于复杂模拟器的统计模型适合于数据时,这发生,并且使用合成似然(SL)来使用计算成本高的前向模拟来形成噪声对数似然估计。我们将推理任务构建为贝叶斯序列设计问题,其中对数似然函数使用分层高斯过程(GP)代理模型进行建模,该模型用于有效地选择其他对数似然评估位置。最近在批处理贝叶斯优化中取得了进展,我们开发了各种顺序策略,其中自适应地选择多个模拟以最小化预期或中值损失函数,从而测量所得到的后验中的不确定性。我们从理论上和经验上分析了所得方法的性质。玩具问题和三个模拟模型的实验表明我们的方法是稳健的,高度可并行的,并且样本有效。
translated by 谷歌翻译
We consider parallel global optimization of derivative-free expensive-to-evaluate functions, and propose an efficient method based on stochastic approximation for implementing a conceptual Bayesian optimization algorithm proposed by Ginsbourger et al. (2007). At the heart of this algorithm is maximizing the information criterion called the "multi-points expected improvement", or the q-EI. To accomplish this, we use infinitessimal perturbation analysis (IPA) to construct a stochastic gradient estimator and show that this estimator is unbiased. We also show that the stochastic gradient ascent algorithm using the constructed gradient estimator converges to a stationary point of the q-EI surface, and therefore, as the number of multiple starts of the gradient ascent algorithm and the number of steps for each start grow large, the one-step Bayes optimal set of points is recovered. We show in numerical experiments that our method for maximizing the q-EI is faster than methods based on closed-form evaluation using high-dimensional integration, when considering many parallel function evaluations, and is comparable in speed when considering few. We also show that the resulting one-step Bayes optimal algorithm for parallel global optimization finds high-quality solutions with fewer evaluations than a heuristic based on approximately maximizing the q-EI. A high-quality open source implementation of this algorithm is available in the open source Metrics
translated by 谷歌翻译
We present a tutorial on Bayesian optimization, a method of finding the maximum of expensive cost functions. Bayesian optimization employs the Bayesian technique of setting a prior over the objective function and combining it with evidence to get a posterior function. This permits a utility-based selection of the next observation to make on the objective function, which must take into account both exploration (sampling from areas of high uncertainty) and exploitation (sampling areas likely to offer improvement over the current best observation). We also present two detailed extensions of Bayesian optimization, with experiments-active user modelling with preferences, and hierarchical reinforcement learning-and a discussion of the pros and cons of Bayesian optimization based on our experiences.
translated by 谷歌翻译
We develop parallel predictive entropy search (PPES), a novel algorithm for Bayesian optimization of expensive black-box objective functions. At each iteration , PPES aims to select a batch of points which will maximize the information gain about the global maximizer of the objective. Well known strategies exist for suggesting a single evaluation point based on previous observations, while far fewer are known for selecting batches of points to evaluate in parallel. The few batch selection schemes that have been studied all resort to greedy methods to compute an optimal batch. To the best of our knowledge, PPES is the first non-greedy batch Bayesian optimization strategy. We demonstrate the benefit of this approach in optimization performance on both synthetic and real world applications , including problems in machine learning, rocket science and robotics.
translated by 谷歌翻译
We propose a novel information-theoretic approach for Bayesian optimization called Predictive Entropy Search (PES). At each iteration, PES selects the next evaluation point that maximizes the expected information gained with respect to the global maximum. PES codifies this intractable acquisition function in terms of the expected reduction in the differential entropy of the predictive distribution. This reformulation allows PES to obtain approximations that are both more accurate and efficient than other alternatives such as Entropy Search (ES). Furthermore , PES can easily perform a fully Bayesian treatment of the model hy-perparameters while ES cannot. We evaluate PES in both synthetic and real-world applications, including optimization problems in machine learning, finance, biotechnology, and robotics. We show that the increased accuracy of PES leads to significant gains in optimization performance.
translated by 谷歌翻译
我们考虑学习嘈杂的黑盒功能超过给定阈值的水平集的问题。为了有效地重建水平集,我们研究了高斯过程(GP)元模型。我们的重点是强随机采样器,特别是重尾模拟噪声和低信噪比。为了防止噪声错误指定,我们评估了三个变量的性能:(i)具有Student-$ t $观察值的GP; (ii)学生 - $ t $流程(TP); (iii)分类GP对响应的符号进行建模。作为第四个扩展,我们研究具有单调性约束的GP代理,这些约束在已知连接的级别集时是相关的。结合这些模型,我们分析了几个采集函数,用于指导顺序实验设计,将现有的逐步不确定性减少标准扩展到随机轮廓发现环境。这也促使我们开发(近似)更新公式以有效地计算取代函数。我们的方案通过在1-6维度中使用各种合成实验进行基准测试。我们还考虑应用水平集估计来确定最佳的运动政策和百慕大金融期权的估值。
translated by 谷歌翻译
Bayesian optimization has recently been proposed as a framework for automatically tuning the hyperparameters of machine learning models and has been shown to yield state-of-the-art performance with impressive ease and efficiency. In this paper, we explore whether it is possible to transfer the knowledge gained from previous optimizations to new tasks in order to find optimal hyperparameter settings more efficiently. Our approach is based on extending multi-task Gaussian processes to the framework of Bayesian optimization. We show that this method significantly speeds up the optimization process when compared to the standard single-task approach. We further propose a straightforward extension of our algorithm in order to jointly minimize the average error across multiple tasks and demonstrate how this can be used to greatly speed up k-fold cross-validation. Lastly, we propose an adaptation of a recently developed acquisition function, en-tropy search, to the cost-sensitive, multi-task setting. We demonstrate the utility of this new acquisition function by leveraging a small dataset to explore hyper-parameter settings for a large dataset. Our algorithm dynamically chooses which dataset to query in order to yield the most information per unit cost.
translated by 谷歌翻译
Bayesian optimization is a sample-efficient approach to global optimization that relies on theoretically motivated value heuristics (acquisition functions) to guide its search process. Fully maximizing acquisition functions produces the Bayes' decision rule, but this ideal is difficult to achieve since these functions are frequently non-trivial to optimize. This statement is especially true when evaluating queries in parallel, where acquisition functions are routinely non-convex, high-dimensional, and intractable. We first show that acquisition functions estimated via Monte Carlo integration are consistently amenable to gradient-based optimization. Subsequently, we identify a common family of acquisition functions, including EI and UCB, whose properties not only facilitate but justify use of greedy approaches for their maximization.
translated by 谷歌翻译
In many global optimization problems motivated by engineering applications, the number of function evaluations is severely limited by time or cost. To ensure that each evaluation contributes to the localization of good candidates for the role of global minimizer, a sequential choice of evaluation points is usually carried out. In particular, when Kriging is used to interpolate past evaluations, the uncertainty associated with the lack of information on the function can be expressed and used to compute a number of criteria accounting for the interest of an additional evaluation at any given point. This paper introduces minimizer entropy as a new Kriging-based criterion for the sequential choice of points at which the function should be evaluated. Based on stepwise uncertainty reduction, it accounts for the informational gain on the minimizer expected from a new evaluation. The criterion is approximated using conditional simulations of the Gaussian process model behind Kriging, and then inserted into an algorithm similar in spirit to the Efficient Global Optimization (EGO) algorithm. An empirical comparison is carried out between our criterion and expected improvement, one of the reference criteria in the literature. Experimental results indicate major evaluation savings over EGO. Finally, the method, which we call IAGO (for Informational Approach to Global Optimization) is extended to robust optimization problems, where both the factors to be tuned and the function evaluations are corrupted by noise.
translated by 谷歌翻译
Bayesian optimization has become a successful tool for hyperparameter optimization of machine learning algorithms, such as support vector machines or deep neural networks. Despite its success , for large datasets, training and validating a single configuration often takes hours, days, or even weeks, which limits the achievable performance. To accelerate hyperparameter optimization , we propose a generative model for the validation error as a function of training set size, which is learned during the optimization process and allows exploration of preliminary configurations on small subsets, by extrapolating to the full dataset. We construct a Bayesian optimization procedure, dubbed FABOLAS, which models loss and training time as a function of dataset size and automatically trades off high information gain about the global optimum against computational cost. Experiments optimizing support vector machines and deep neural networks show that FABOLAS often finds high-quality solutions 10 to 100 times faster than other state-of-the-art Bayesian optimization methods or the recently proposed bandit strategy Hyperband.
translated by 谷歌翻译
Bayesian optimization techniques have been successfully applied to robotics,planning, sensor placement, recommendation, advertising, intelligent userinterfaces and automatic algorithm configuration. Despite these successes, theapproach is restricted to problems of moderate dimension, and several workshopson Bayesian optimization have identified its scaling to high-dimensions as oneof the holy grails of the field. In this paper, we introduce a novel randomembedding idea to attack this problem. The resulting Random EMbedding BayesianOptimization (REMBO) algorithm is very simple, has important invarianceproperties, and applies to domains with both categorical and continuousvariables. We present a thorough theoretical analysis of REMBO. Empiricalresults confirm that REMBO can effectively solve problems with billions ofdimensions, provided the intrinsic dimensionality is low. They also show thatREMBO achieves state-of-the-art performance in optimizing the 47 discreteparameters of a popular mixed integer linear programming solver.
translated by 谷歌翻译
Chemical space is so large that brute force searches for new interesting molecules are in-feasible. High-throughput virtual screening via computer cluster simulations can speed up the discovery process by collecting very large amounts of data in parallel, e.g., up to hundreds or thousands of parallel measurements. Bayesian optimization (BO) can produce additional acceleration by sequentially identifying the most useful simulations or experiments to be performed next. However, current BO methods cannot scale to the large numbers of parallel measurements and the massive libraries of molecules currently used in high-throughput screening. Here, we propose a scalable solution based on a parallel and distributed implementation of Thompson sampling (PDTS). We show that, in small scale problems, PDTS performs similarly as parallel expected improvement (EI), a batch version of the most widely used BO heuristic. Additionally , in settings where parallel EI does not scale, PDTS outperforms other scalable baselines such as a greedy search,-greedy approaches and a random search method. These results show that PDTS is a successful solution for large-scale parallel BO.
translated by 谷歌翻译
多目标贝叶斯优化研究的持续目标是扩展其对大量目标的适用性。在应对评估的有限预算时,恢复最佳妥协解决方案通常需要大量观察,而且解释较少,因为这一组随着目标数量的增加而趋于变大。因此,我们建议专注于源自博弈论的特定解决方案,即具有吸引力特性的Kalai-Smorodinsky解决方案。特别是,它确保了所有目标的平等边际收益。我们进一步认为,通过考虑copula空间中的目标,它对目标的单调变换不敏感。提出了一种新颖的定制算法,以贝叶斯优化算法的形式搜索解决方案:基于从仪器高斯过程先验得出的获取函数进行顺序抽样决策。我们的方法分别针对三个问题进行了测试,分别有四个,六个和十个目标。该方法可在CRAN上的GPGame包中获得:http://cran.r-project.org/package=GPGame。
translated by 谷歌翻译
贝叶斯优化在优化耗时的黑盒目标方面很受欢迎。尽管如此,对于深度神经网络中的超参数调整,即使是一些超参数设置评估验证错误所需的时间仍然是瓶颈。多保真优化有望减少对这些目标使用更便宜的代理 - 例如,使用训练点的子集训练网络的验证错误或者收敛所需的迭代次数更少。我们提出了一种高度灵活和实用的多保真贝叶斯优化方法,重点是有效地优化迭代训练的监督学习模型的超参数。我们引入了一种新的采集功能,即跟踪感知知识梯度,它有效地利用了多个连续保真度控制和跟踪观察---保真序列中物镜的值,当使用训练迭代改变保真度时可用。我们提供了可用于优化我们的采集功能的可变方法,并展示了它为超神经网络和大规模内核学习的超参数调整提供了最先进的替代方案。
translated by 谷歌翻译
使用高斯过程的贝叶斯优化是处理昂贵的黑盒功能优化的流行方法。然而,由于经典GaussianProcesses的协方差矩阵的平稳性的先验,该方法可能不适用于优化问题中涉及的非平稳函数。为了克服这个问题,提出了一种新的贝叶斯优化方法。它基于深度高斯过程的assurrogate模型而不是经典的高斯过程。该建模技术通过简单地考虑静态高斯过程的功能组合来提高表示的能力以捕获非平稳性,从而提供多层结构。本文提出了一种新的全局优化算法,通过耦合深度高斯过程和贝叶斯优化算法。通过学术测试案例讨论并突出了这种优化方法的特殊性。所提出的算法的性能在分析测试用例和航空设计优化问题上进行评估,并与最先进的固定和非静态贝叶斯优化方法进行比较。
translated by 谷歌翻译