Bayesian optimization has recently been proposed as a framework for automatically tuning the hyperparameters of machine learning models and has been shown to yield state-of-the-art performance with impressive ease and efficiency. In this paper, we explore whether it is possible to transfer the knowledge gained from previous optimizations to new tasks in order to find optimal hyperparameter settings more efficiently. Our approach is based on extending multi-task Gaussian processes to the framework of Bayesian optimization. We show that this method significantly speeds up the optimization process when compared to the standard single-task approach. We further propose a straightforward extension of our algorithm in order to jointly minimize the average error across multiple tasks and demonstrate how this can be used to greatly speed up k-fold cross-validation. Lastly, we propose an adaptation of a recently developed acquisition function, en-tropy search, to the cost-sensitive, multi-task setting. We demonstrate the utility of this new acquisition function by leveraging a small dataset to explore hyper-parameter settings for a large dataset. Our algorithm dynamically chooses which dataset to query in order to yield the most information per unit cost.
translated by 谷歌翻译
贝叶斯优化是一种优化目标函数的方法,需要花费很长时间(几分钟或几小时)来评估。它最适合于在小于20维的连续域上进行优化,并且在功能评估中容忍随机噪声。它构建了目标的替代品,并使用贝叶斯机器学习技术,高斯过程回归量化该替代品中的不确定性,然后使用从该代理定义的获取函数来决定在何处进行抽样。在本教程中,我们描述了贝叶斯优化的工作原理,包括高斯过程回归和三种常见的采集功能:预期改进,熵搜索和知识梯度。然后,我们讨论了更先进的技术,包括在并行,多保真和多信息源优化,昂贵的评估约束,随机环境条件,多任务贝叶斯优化以及包含衍生信息的情况下运行多功能评估。最后,我们讨论了贝叶斯优化软件和该领域未来的研究方向。在我们的教程材料中,我们提供了对噪声评估的预期改进的时间化,超出了无噪声设置,在更常用的情况下。这种概括通过正式的决策理论论证来证明,与先前的临时修改形成鲜明对比。
translated by 谷歌翻译
随机实验是评估变化对现实世界系统影响的黄金标准。这些测试中的数据可能难以收集,结果可能具有高度差异,从而导致潜在的大量测量误差。贝叶斯优化是一种有效优化多个连续参数的有前途的技术,但是当噪声水平高时,现有方法降低了性能,限制了其对多个随机实验的适用性。我们得到了一个表达式,用于预期的改进,具有噪声观察和噪声约束的批量优化,并开发了一种准蒙特卡罗近似,可以有效地进行优化。使用合成函数进行的仿真表明,噪声约束问题的优化性能优于现有方法。我们通过在Facebook上进行的两个真实的实验来进一步证明该方法的有效性:优化排名系统和优化服务器编译器标志。
translated by 谷歌翻译
机器学习算法经常需要仔细调整模型超参数,正则化项和优化参数。不幸的是,这种调整通常是一种“黑色艺术”,需要专家经验,不成文的经验法则,或者有时需要强力搜索。更具吸引力的是开发自动方法的想法,该方法可以优化给定学习算法的性能以适应手头的任务。在这项工作中,我们考虑了贝叶斯优化框架内的自动调整问题,其中学习算法的泛化性能被建模为来自高斯过程(GP)的样本。由GP诱导的可处理的后部分布导致有效使用先前实验收集的信息,从而能够对接下来尝试的参数进行最佳选择。在这里,我们展示了高斯过程先验和相关推理过程的效果如何对贝叶斯优化的成功或失败产生很大影响。我们表明,在思考机器学习算法中,思考能够导致超出专家级性能的结果。我们还描述了新的算法,它们考虑了学习实验的可变成本(持续时间),并且可以利用多个核心的存在进行并行实验。我们证明了这些提出的算法改进了以前的自动程序,并且可以在包括潜在Dirichlet分配,结构化SVM和卷积神经网络在内的各种当代算法上进行或超越人类专家级优化。
translated by 谷歌翻译
Bayesian optimization is a sample-efficient method for black-box global optimization. However , the performance of a Bayesian optimization method very much depends on its exploration strategy, i.e. the choice of acquisition function , and it is not clear a priori which choice will result in superior performance. While portfolio methods provide an effective, principled way of combining a collection of acquisition functions, they are often based on measures of past performance which can be misleading. To address this issue, we introduce the Entropy Search Portfolio (ESP): a novel approach to portfolio construction which is motivated by information theoretic considerations. We show that ESP outperforms existing portfolio methods on several real and synthetic problems, including geostatistical datasets and simulated control tasks. We not only show that ESP is able to offer performance as good as the best, but unknown, acquisition function, but surprisingly it often gives better performance. Finally , over a wide range of conditions we find that ESP is robust to the inclusion of poor acquisition functions.
translated by 谷歌翻译
Bayesian optimization has become a successful tool for hyperparameter optimization of machine learning algorithms, such as support vector machines or deep neural networks. Despite its success , for large datasets, training and validating a single configuration often takes hours, days, or even weeks, which limits the achievable performance. To accelerate hyperparameter optimization , we propose a generative model for the validation error as a function of training set size, which is learned during the optimization process and allows exploration of preliminary configurations on small subsets, by extrapolating to the full dataset. We construct a Bayesian optimization procedure, dubbed FABOLAS, which models loss and training time as a function of dataset size and automatically trades off high information gain about the global optimum against computational cost. Experiments optimizing support vector machines and deep neural networks show that FABOLAS often finds high-quality solutions 10 to 100 times faster than other state-of-the-art Bayesian optimization methods or the recently proposed bandit strategy Hyperband.
translated by 谷歌翻译
Bayesian optimization is an effective methodology for the global optimizationof functions with expensive evaluations. It relies on querying a distributionover functions defined by a relatively cheap surrogate model. An accurate modelfor this distribution over functions is critical to the effectiveness of theapproach, and is typically fit using Gaussian processes (GPs). However, sinceGPs scale cubically with the number of observations, it has been challenging tohandle objectives whose optimization requires many evaluations, and as such,massively parallelizing the optimization. In this work, we explore the use of neural networks as an alternative to GPsto model distributions over functions. We show that performing adaptive basisfunction regression with a neural network as the parametric form performscompetitively with state-of-the-art GP-based approaches, but scales linearlywith the number of data rather than cubically. This allows us to achieve apreviously intractable degree of parallelism, which we apply to large scalehyperparameter optimization, rapidly finding competitive models on benchmarkobject recognition tasks using convolutional networks, and image captiongeneration using neural language models.
translated by 谷歌翻译
我们提出了一种自适应方法来构建贝叶斯推理的高斯过程,并使用昂贵的评估正演模型。我们的方法依赖于完全贝叶斯方法来训练高斯过程模型,并利用贝叶斯全局优化的预期改进思想。我们通过最大化高斯过程模型与噪声观测数据拟合的预期改进来自适应地构建训练设计。对合成数据模型问题的数值实验证明了所获得的自适应设计与固定非自适应设计相比,在前向模型推断成本的精确后验估计方面的有效性。
translated by 谷歌翻译
We present a tutorial on Bayesian optimization, a method of finding the maximum of expensive cost functions. Bayesian optimization employs the Bayesian technique of setting a prior over the objective function and combining it with evidence to get a posterior function. This permits a utility-based selection of the next observation to make on the objective function, which must take into account both exploration (sampling from areas of high uncertainty) and exploitation (sampling areas likely to offer improvement over the current best observation). We also present two detailed extensions of Bayesian optimization, with experiments-active user modelling with preferences, and hierarchical reinforcement learning-and a discussion of the pros and cons of Bayesian optimization based on our experiences.
translated by 谷歌翻译
高效全局优化(EGO)广泛用于优化计算上昂贵的黑盒功能。它使用基于高斯过程(克里金法)的替代建模技术。然而,由于使用了静态协方差,克里金不太适合近似非平稳函数。本文探讨了深度高斯过程(DGP)在EGO框架中的整合,以处理非平稳问题,并研究诱发的挑战和机遇。对分析问题进行数值实验以突出DGP和EGO的不同方面。
translated by 谷歌翻译
近似贝叶斯计算(ABC)是贝叶斯推理的一种方法,当可能性不可用时,但是可以从模型中进行模拟。然而,许多ABC算法需要大量的模拟,这可能是昂贵的。为了降低计算成本,已经提出了贝叶斯优化(BO)和诸如高斯过程的模拟模型。贝叶斯优化使人们可以智能地决定在哪里评估模型下一个,但是常见的BO策略不是为了估计后验分布而设计的。我们的论文解决了文献中的这一差距。我们建议计算ABC后验密度的不确定性,这是因为缺乏模拟来准确估计这个数量,并且定义了测量这种不确定性的aloss函数。然后,我们建议选择下一个评估位置,以尽量减少预期的损失。实验表明,与普通BO策略相比,所提出的方法通常产生最准确的近似。
translated by 谷歌翻译
Bayesian optimization has been successful at global optimization of expensive-to-evaluate multimodal objective functions. However, unlike most optimization methods, Bayesian optimization typically does not use derivative information. In this paper we show how Bayesian optimization can exploit derivative information to find good solutions with fewer objective function evaluations. In particular, we develop a novel Bayesian optimization algorithm, the derivative-enabled knowledge-gradient (d-KG), which is one-step Bayes-optimal, asymptotically consistent, and provides greater one-step value of information than in the derivative-free setting. d-KG accommodates noisy and incomplete derivative information, comes in both sequential and batch forms, and can optionally reduce the computational cost of inference through automatically selected retention of a single directional derivative. We also compute the d-KG acquisition function and its gradient using a novel fast discretization-free technique. We show d-KG provides state-of-the-art performance compared to a wide range of optimization procedures with and without gradients, on benchmarks including logistic regression, deep learning, kernel learning, and k-nearest neighbors.
translated by 谷歌翻译
许多对科学计算和机器学习感兴趣的概率模型具有昂贵的黑盒可能性,这些可能性阻止了贝叶斯推理的标准技术的应用,例如MCMC,其需要接近梯度或大量可能性评估。我们在这里介绍一种新颖的样本有效推理框架,VariationalBayesian Monte Carlo(VBMC)。 VBMC将变分推理与基于高斯过程的有源采样贝叶斯积分结合起来,使用latterto有效逼近变分目标中的难以求的积分。我们的方法产生了后验分布的非参数近似和模型证据的近似下界,对模型选择很有用。我们在几种合成可能性和神经元模型上展示VBMC,其中包含来自真实神经元的数据。在所有测试的问题和维度(高达$ D = 10 $)中,VBMC始终如一地通过有限的可能性评估预算重建后验证和模型证据,而不像其他仅在非常低维度下工作的方法。我们的框架作为一种新颖的工具,具有昂贵的黑盒可能性,可用于后期模型推理。
translated by 谷歌翻译
多目标贝叶斯优化研究的持续目标是扩展其对大量目标的适用性。在应对评估的有限预算时,恢复最佳妥协解决方案通常需要大量观察,而且解释较少,因为这一组随着目标数量的增加而趋于变大。因此,我们建议专注于源自博弈论的特定解决方案,即具有吸引力特性的Kalai-Smorodinsky解决方案。特别是,它确保了所有目标的平等边际收益。我们进一步认为,通过考虑copula空间中的目标,它对目标的单调变换不敏感。提出了一种新颖的定制算法,以贝叶斯优化算法的形式搜索解决方案:基于从仪器高斯过程先验得出的获取函数进行顺序抽样决策。我们的方法分别针对三个问题进行了测试,分别有四个,六个和十个目标。该方法可在CRAN上的GPGame包中获得:http://cran.r-project.org/package=GPGame。
translated by 谷歌翻译
贝叶斯优化在优化耗时的黑盒目标方面很受欢迎。尽管如此,对于深度神经网络中的超参数调整,即使是一些超参数设置评估验证错误所需的时间仍然是瓶颈。多保真优化有望减少对这些目标使用更便宜的代理 - 例如,使用训练点的子集训练网络的验证错误或者收敛所需的迭代次数更少。我们提出了一种高度灵活和实用的多保真贝叶斯优化方法,重点是有效地优化迭代训练的监督学习模型的超参数。我们引入了一种新的采集功能,即跟踪感知知识梯度,它有效地利用了多个连续保真度控制和跟踪观察---保真序列中物镜的值,当使用训练迭代改变保真度时可用。我们提供了可用于优化我们的采集功能的可变方法,并展示了它为超神经网络和大规模内核学习的超参数调整提供了最先进的替代方案。
translated by 谷歌翻译
贝叶斯优化(BO)是用于解决挑战优化任务的流行算法。它适用于目标函数评估成本昂贵的问题,可能无法以精确的形式提供,没有梯度信息并且可能返回噪声值。不同版本的算法在采集功能的选择上有所不同,它建议点在下一步查询目标。最初,研究人员专注于基于改进的收购,而最近注意力已经转移了计算上昂贵的信息理论措施。在本文中,我们提出了两个主要的文献贡献。首先,我们提出了一种新的基于改进的采集功能,该功能可以推荐高可信度提高的查询点。所提出的算法在全局优化文献的大量基准函数上进行评估,其中至少与当前最先进的采集函数一样,并且通常更好。这表明它是BO的强大默认选择。然后将新颖的策略与有用的全局优化求解器进行比较,以确认BO方法通过保持数量的函数评估较小来降低优化的计算成本。第二个主要贡献代表了对精准医学的应用,其中兴趣在于估计人肺血循环系统的偏微分方程模型的参数。一旦推断,这些参数可以帮助临床医生诊断患有肺动脉高压的患者,而无需通过右心导管插入术的标准侵入性程序,这可导致toside效应和并发症(例如严重疼痛,内出血,血栓形成)。
translated by 谷歌翻译
Bayesian optimization is a prominent method for optimizing expensive-to-evaluate black-box functions that is widely applied to tuning the hyperparameters of machine learning algorithms. Despite its successes, the prototypical Bayesian optimization approach-using Gaussian process models-does not scale well to either many hyperparameters or many function evaluations. Attacking this lack of scalability and flexibility is thus one of the key challenges of the field. We present a general approach for using flexible parametric models (neural networks) for Bayesian optimization, staying as close to a truly Bayesian treatment as possible. We obtain scalability through stochastic gradient Hamiltonian Monte Carlo, whose robustness we improve via a scale adaptation. Experiments including multi-task Bayesian optimization with 21 tasks, parallel optimization of deep neural networks and deep reinforcement learning show the power and flexibility of this approach.
translated by 谷歌翻译
In many applications of black-box optimization, one can evaluate multiple points simultaneously, e.g. when evaluating the performances of several different neural networks in a parallel computing environment. In this paper, we develop a novel batch Bayesian optimization algorithm-the parallel knowledge gradient method. By construction, this method provides the one-step Bayes optimal batch of points to sample. We provide an efficient strategy for computing this Bayes-optimal batch of points, and we demonstrate that the parallel knowledge gradient method finds global optima significantly faster than previous batch Bayesian optimization algorithms on both synthetic test functions and when tuning hyperparameters of practical machine learning algorithms, especially when function evaluations are noisy.
translated by 谷歌翻译
Bayesian optimization has proven invaluable for black-box optimization of expensive functions. Its main limitation is its exponential complexity with respect to the dimensionality of the search space using typical kernels. Luckily, many objective functions can be decomposed into additive sub-problems, which can be optimized independently. We investigate how to automatically discover such (typically unknown) additive structure while simultaneously exploiting it through Bayesian optimization. We propose an efficient algorithm based on Metropolis-Hastings sampling and demonstrate its efficacy empirically on synthetic and real-world data sets. Throughout all our experiments we reliably discover hidden additive structure whenever it exists and exploit it to yield significantly faster convergence.
translated by 谷歌翻译
We propose minimum regret search (MRS), a novel acquisition function for Bayesian optimization. MRS bears similarities with information-theoretic approaches such as en-tropy search (ES). However, while ES aims in each query at maximizing the information gain with respect to the global maximum, MRS aims at minimizing the expected simple regret of its ultimate recommendation for the optimum. While empirically ES and MRS perform similar in most of the cases, MRS produces fewer out-liers with high simple regret than ES. We provide empirical results both for a synthetic single-task optimization problem as well as for a simulated multi-task robotic control problem.
translated by 谷歌翻译