Several recent advances to the state of the art in image classification benchmarks have come from better configurations of existing techniques rather than novel approaches to feature learning. Traditionally, hyper-parameter optimization has been the job of humans because they can be very efficient in regimes where only a few trials are possible. Presently, computer clusters and GPU processors make it possible to run more trials and we show that algorithmic approaches can find better results. We present hyper-parameter optimization results on tasks of training neural networks and deep belief networks (DBNs). We optimize hyper-parameters using random search and two new greedy sequential methods based on the expected improvement criterion. Random search has been shown to be sufficiently efficient for learning neural networks for several datasets, but we show it is unreliable for training DBNs. The sequential algorithms are applied to the most difficult DBN learning problems from [1] and find significantly better results than the best previously reported. This work contributes novel techniques for making response surface models P (y|x) in which many elements of hyper-parameter assignment (x) are known to be irrelevant given particular values of other elements.
translated by 谷歌翻译
Grid search and manual search are the most widely used strategies for hyper-parameter optimization. This paper shows empirically and theoretically that randomly chosen trials are more efficient for hyper-parameter optimization than trials on a grid. Empirical evidence comes from a comparison with a large previous study that used grid search and manual search to configure neural networks and deep belief networks. Compared with neural networks configured by a pure grid search, we find that random search over the same domain is able to find models that are as good or better within a small fraction of the computation time. Granting random search the same computational budget, random search finds better models by effectively searching a larger, less promising configuration space. Compared with deep belief networks configured by a thoughtful combination of manual search and grid search, purely random search over the same 32-dimensional configuration space found statistically equal performance on four of seven data sets, and superior performance on one of seven. A Gaussian process analysis of the function from hyper-parameters to validation set performance reveals that for most data sets only a few of the hyper-parameters really matter, but that different hyper-parameters are important on different data sets. This phenomenon makes grid search a poor choice for configuring algorithms for new data sets. Our analysis casts some light on why recent "High Throughput" methods achieve surprising success-they appear to search through a large number of hyper-parameters because most hyper-parameters do not matter much. We anticipate that growing interest in large hierarchical models will place an increasing burden on techniques for hyper-parameter optimization; this work shows that random search is a natural baseline against which to judge progress in the development of adaptive (sequential) hyper-parameter optimization algorithms.
translated by 谷歌翻译
大多数机器学习算法由一个或多个超参数配置,必须仔细选择并且通常会影响性能。为避免耗时和不可递销的手动试验和错误过程来查找性能良好的超参数配置,可以采用各种自动超参数优化(HPO)方法,例如,基于监督机器学习的重新采样误差估计。本文介绍了HPO后,本文审查了重要的HPO方法,如网格或随机搜索,进化算法,贝叶斯优化,超带和赛车。它给出了关于进行HPO的重要选择的实用建议,包括HPO算法本身,性能评估,如何将HPO与ML管道,运行时改进和并行化结合起来。这项工作伴随着附录,其中包含关于R和Python的特定软件包的信息,以及用于特定学习算法的信息和推荐的超参数搜索空间。我们还提供笔记本电脑,这些笔记本展示了这项工作的概念作为补充文件。
translated by 谷歌翻译
Bayesian optimization has recently been proposed as a framework for automatically tuning the hyperparameters of machine learning models and has been shown to yield state-of-the-art performance with impressive ease and efficiency. In this paper, we explore whether it is possible to transfer the knowledge gained from previous optimizations to new tasks in order to find optimal hyperparameter settings more efficiently. Our approach is based on extending multi-task Gaussian processes to the framework of Bayesian optimization. We show that this method significantly speeds up the optimization process when compared to the standard single-task approach. We further propose a straightforward extension of our algorithm in order to jointly minimize the average error across multiple tasks and demonstrate how this can be used to greatly speed up k-fold cross-validation. Lastly, we propose an adaptation of a recently developed acquisition function, entropy search, to the cost-sensitive, multi-task setting. We demonstrate the utility of this new acquisition function by leveraging a small dataset to explore hyperparameter settings for a large dataset. Our algorithm dynamically chooses which dataset to query in order to yield the most information per unit cost.
translated by 谷歌翻译
Many computer vision algorithms depend on configuration settings that are typically hand-tuned in the course of evaluating the algorithm for a particular data set. While such parameter tuning is often presented as being incidental to the algorithm, correctly setting these parameter choices is frequently critical to realizing a method's full potential. Compounding matters, these parameters often must be re-tuned when the algorithm is applied to a new problem domain, and the tuning process itself often depends on personal experience and intuition in ways that are hard to quantify or describe. Since the performance of a given technique depends on both the fundamental quality of the algorithm and the details of its tuning, it is sometimes difficult to know whether a given technique is genuinely better, or simply better tuned.
translated by 谷歌翻译
自动化封路计优化(HPO)已经获得了很大的普及,并且是大多数自动化机器学习框架的重要成分。然而,设计HPO算法的过程仍然是一个不系统和手动的过程:确定了现有工作的限制,提出的改进是 - 即使是专家知识的指导 - 仍然是一定任意的。这很少允许对哪些算法分量的驾驶性能进行全面了解,并且承载忽略良好算法设计选择的风险。我们提出了一个原理的方法来实现应用于多倍性HPO(MF-HPO)的自动基准驱动算法设计的原则方法:首先,我们正式化包括的MF-HPO候选的丰富空间,但不限于普通的HPO算法,然后呈现可配置的框架覆盖此空间。要自动和系统地查找最佳候选者,我们遵循通过优化方法,并通过贝叶斯优化搜索算法候选的空间。我们挑战是否必须通过执行消融分析来挑战所发现的设计选择或可以通过更加天真和更简单的设计。我们观察到使用相对简单的配置,在某些方式中比建立的方法更简单,只要某些关键配置参数具有正确的值,就可以很好地执行得很好。
translated by 谷歌翻译
由于其数据效率,贝叶斯优化已经出现在昂贵的黑盒优化的最前沿。近年来,关于新贝叶斯优化算法及其应用的发展的研究激增。因此,本文试图对贝叶斯优化的最新进展进行全面和更新的调查,并确定有趣的开放问题。我们将贝叶斯优化的现有工作分为九个主要群体,并根据所提出的算法的动机和重点。对于每个类别,我们介绍了替代模型的构建和采集功能的适应的主要进步。最后,我们讨论了开放的问题,并提出了有希望的未来研究方向,尤其是在分布式和联合优化系统中的异质性,隐私保护和公平性方面。
translated by 谷歌翻译
Modern deep learning methods are very sensitive to many hyperparameters, and, due to the long training times of state-of-the-art models, vanilla Bayesian hyperparameter optimization is typically computationally infeasible. On the other hand, bandit-based configuration evaluation approaches based on random search lack guidance and do not converge to the best configurations as quickly. Here, we propose to combine the benefits of both Bayesian optimization and banditbased methods, in order to achieve the best of both worlds: strong anytime performance and fast convergence to optimal configurations. We propose a new practical state-of-the-art hyperparameter optimization method, which consistently outperforms both Bayesian optimization and Hyperband on a wide range of problem types, including high-dimensional toy functions, support vector machines, feed-forward neural networks, Bayesian neural networks, deep reinforcement learning, and convolutional neural networks. Our method is robust and versatile, while at the same time being conceptually simple and easy to implement.
translated by 谷歌翻译
Machine learning algorithms frequently require careful tuning of model hyperparameters, regularization terms, and optimization parameters. Unfortunately, this tuning is often a "black art" that requires expert experience, unwritten rules of thumb, or sometimes brute-force search. Much more appealing is the idea of developing automatic approaches which can optimize the performance of a given learning algorithm to the task at hand. In this work, we consider the automatic tuning problem within the framework of Bayesian optimization, in which a learning algorithm's generalization performance is modeled as a sample from a Gaussian process (GP). The tractable posterior distribution induced by the GP leads to efficient use of the information gathered by previous experiments, enabling optimal choices about what parameters to try next. Here we show how the effects of the Gaussian process prior and the associated inference procedure can have a large impact on the success or failure of Bayesian optimization. We show that thoughtful choices can lead to results that exceed expert-level performance in tuning machine learning algorithms. We also describe new algorithms that take into account the variable cost (duration) of learning experiments and that can leverage the presence of multiple cores for parallel experimentation. We show that these proposed algorithms improve on previous automatic procedures and can reach or surpass human expert-level optimization on a diverse set of contemporary algorithms including latent Dirichlet allocation, structured SVMs and convolutional neural networks.
translated by 谷歌翻译
算法配置(AC)与对参数化算法最合适的参数配置的自动搜索有关。目前,文献中提出了各种各样的交流问题变体和方法。现有评论没有考虑到AC问题的所有衍生物,也没有提供完整的分类计划。为此,我们引入分类法以分别描述配置方法的交流问题和特征。我们回顾了分类法的镜头中现有的AC文献,概述相关的配置方法的设计选择,对比方法和问题变体相互对立,并描述行业中的AC状态。最后,我们的评论为研究人员和从业人员提供了AC领域的未来研究方向。
translated by 谷歌翻译
Many different machine learning algorithms exist; taking into account each algorithm's hyperparameters, there is a staggeringly large number of possible alternatives overall. We consider the problem of simultaneously selecting a learning algorithm and setting its hyperparameters, going beyond previous work that addresses these issues in isolation. We show that this problem can be addressed by a fully automated approach, leveraging recent innovations in Bayesian optimization. Specifically, we consider a wide range of feature selection techniques (combining 3 search and 8 evaluator methods) and all classification approaches implemented in WEKA, spanning 2 ensemble methods, 10 meta-methods, 27 base classifiers, and hyperparameter settings for each classifier. On each of 21 popular datasets from the UCI repository, the KDD Cup 09, variants of the MNIST dataset and CIFAR-10, we show classification performance often much better than using standard selection/hyperparameter optimization methods. We hope that our approach will help non-expert users to more effectively identify machine learning algorithms and hyperparameter settings appropriate to their applications, and hence to achieve improved performance.
translated by 谷歌翻译
贝叶斯优化(BO)算法在涉及昂贵的黑盒功能的应用中表现出了显着的成功。传统上,BO被设置为一个顺序决策过程,该过程通过采集函数和先前的功能(例如高斯过程)来估计查询点的实用性。然而,最近,通过密度比率估计(BORE)对BO进行重新制定允许将采集函数重新诠释为概率二进制分类器,从而消除了对函数的显式先验和提高可伸缩性的需求。在本文中,我们介绍了对孔的遗憾和算法扩展的理论分析,并提高了不确定性估计。我们还表明,通过将问题重新提交为近似贝叶斯推断,可以自然地扩展到批处理优化设置。所得算法配备了理论性能保证,并在一系列实验中对其他批处理基本线进行了评估。
translated by 谷歌翻译
采集函数是贝叶斯优化(BO)中的关键组成部分,通常可以写为在替代模型下对效用函数的期望。但是,为了确保采集功能是可以优化的,必须对替代模型和实用程序功能进行限制。为了将BO扩展到更广泛的模型和实用程序,我们提出了不含可能性的BO(LFBO),这是一种基于无似然推理的方法。 LFBO直接对采集函数进行建模,而无需单独使用概率替代模型进行推断。我们表明,可以将计算LFBO中的采集函数缩小为优化加权分类问题,而权重对应于所选择的实用程序。通过为预期改进选择实用程序功能,LFBO在几个现实世界优化问题上都优于各种最新的黑盒优化方法。 LFBO还可以有效利用目标函数的复合结构,从而进一步改善了其遗憾。
translated by 谷歌翻译
We present the GPry algorithm for fast Bayesian inference of general (non-Gaussian) posteriors with a moderate number of parameters. GPry does not need any pre-training, special hardware such as GPUs, and is intended as a drop-in replacement for traditional Monte Carlo methods for Bayesian inference. Our algorithm is based on generating a Gaussian Process surrogate model of the log-posterior, aided by a Support Vector Machine classifier that excludes extreme or non-finite values. An active learning scheme allows us to reduce the number of required posterior evaluations by two orders of magnitude compared to traditional Monte Carlo inference. Our algorithm allows for parallel evaluations of the posterior at optimal locations, further reducing wall-clock times. We significantly improve performance using properties of the posterior in our active learning scheme and for the definition of the GP prior. In particular we account for the expected dynamical range of the posterior in different dimensionalities. We test our model against a number of synthetic and cosmological examples. GPry outperforms traditional Monte Carlo methods when the evaluation time of the likelihood (or the calculation of theoretical observables) is of the order of seconds; for evaluation times of over a minute it can perform inference in days that would take months using traditional methods. GPry is distributed as an open source Python package (pip install gpry) and can also be found at https://github.com/jonaselgammal/GPry.
translated by 谷歌翻译
寻找可调谐GPU内核的最佳参数配置是一种非普通的搜索空间练习,即使在自动化时也是如此。这在非凸搜索空间上造成了优化任务,使用昂贵的来评估具有未知衍生的函数。这些特征为贝叶斯优化做好了良好的候选人,以前尚未应用于这个问题。然而,贝叶斯优化对这个问题的应用是具有挑战性的。我们演示如何处理粗略的,离散的受限搜索空间,包含无效配置。我们介绍了一种新颖的上下文方差探索因子,以及具有改进的可扩展性的新采集功能,与知识的采集功能选择机制相结合。通过比较我们贝叶斯优化实现对各种测试用例的性能,以及核心调谐器中的现有搜索策略以及其他贝叶斯优化实现,我们证明我们的搜索策略概括了良好的良好,并始终如一地以广泛的保证金更优于其他搜索策略。
translated by 谷歌翻译
Bayesian optimization is an effective methodology for the global optimization of functions with expensive evaluations. It relies on querying a distribution over functions defined by a relatively cheap surrogate model. An accurate model for this distribution over functions is critical to the effectiveness of the approach, and is typically fit using Gaussian processes (GPs). However, since GPs scale cubically with the number of observations, it has been challenging to handle objectives whose optimization requires many evaluations, and as such, massively parallelizing the optimization.In this work, we explore the use of neural networks as an alternative to GPs to model distributions over functions. We show that performing adaptive basis function regression with a neural network as the parametric form performs competitively with state-of-the-art GP-based approaches, but scales linearly with the number of data rather than cubically. This allows us to achieve a previously intractable degree of parallelism, which we apply to large scale hyperparameter optimization, rapidly finding competitive models on benchmark object recognition tasks using convolutional networks, and image caption generation using neural language models.
translated by 谷歌翻译
Performance of machine learning algorithms depends critically on identifying a good set of hyperparameters. While recent approaches use Bayesian optimization to adaptively select configurations, we focus on speeding up random search through adaptive resource allocation and early-stopping. We formulate hyperparameter optimization as a pure-exploration nonstochastic infinite-armed bandit problem where a predefined resource like iterations, data samples, or features is allocated to randomly sampled configurations. We introduce a novel algorithm, Hyperband, for this framework and analyze its theoretical properties, providing several desirable guarantees. Furthermore, we compare Hyperband with popular Bayesian optimization methods on a suite of hyperparameter optimization problems. We observe that Hyperband can provide over an order-of-magnitude speedup over our competitor set on a variety of deep-learning and kernel-based learning problems.
translated by 谷歌翻译
机器学习算法中多个超参数的最佳设置是发出大多数可用数据的关键。为此目的,已经提出了几种方法,例如进化策略,随机搜索,贝叶斯优化和启发式拇指规则。在钢筋学习(RL)中,学习代理在与其环境交互时收集的数据的信息内容严重依赖于许多超参数的设置。因此,RL算法的用户必须依赖于基于搜索的优化方法,例如网格搜索或Nelder-Mead单简单算法,这对于大多数R1任务来说是非常效率的,显着减慢学习曲线和离开用户的速度有目的地偏见数据收集的负担。在这项工作中,为了使RL算法更加用户独立,提出了一种使用贝叶斯优化的自主超参数设置的新方法。来自过去剧集和不同的超参数值的数据通过执行行为克隆在元学习水平上使用,这有助于提高最大化获取功能的加强学习变体的有效性。此外,通过紧密地整合在加强学习代理设计中的贝叶斯优化,还减少了收敛到给定任务的最佳策略所需的状态转换的数量。与其他手动调整和基于优化的方法相比,计算实验显示了有希望的结果,这突出了改变算法超级参数来增加所生成数据的信息内容的好处。
translated by 谷歌翻译
许多昂贵的黑匣子优化问题对其输入敏感。在这些问题中,定位一个良好的设计区域更有意义,而不是一个可能的脆弱的最佳设计。昂贵的黑盒功能可以有效地优化贝叶斯优化,在那里高斯过程是在昂贵的功能之前的流行选择。我们提出了一种利用贝叶斯优化的强大优化方法,找到一种设计空间区域,其中昂贵的功能的性能对输入相对不敏感,同时保持质量好。这是通过从正在建模昂贵的功能的高斯进程的实现来实现这一点,并评估每个实现的改进。这些改进的期望可以用进化算法廉价地优化,以确定评估昂贵功能的下一个位置。我们描述了一个有效的过程来定位最佳预期改进。我们凭经验展示了评估候选不确定区域的昂贵功能的昂贵功能,该模型最不确定,或随机地产生最佳收敛与利用方案相比。我们在两个,五个和十个维度中说明了我们的六个测试功能的方法,并证明它能够优于来自文献的两种最先进的方法。我们还展示了我们的方法在4和8维中展示了两个真实问题,这涉及训练机器人臂,将物体推到目标上。
translated by 谷歌翻译
Bayesian Optimization(BO)是全球优化的黑匣子客观功能的方法,这是昂贵的评估。 Bo Powered实验设计在材料科学,化学,实验物理,药物开发等方面发现了广泛的应用。这项工作旨在提请注意应用BO在设计实验中的益处,并提供博手册,涵盖方法和软件,为了方便任何想要申请或学习博的人。特别是,我们简要解释了BO技术,审查BO中的所有应用程序在添加剂制造中,比较和举例说明不同开放BO库的功能,解锁BO的新潜在应用,以外的数据(例如,优先输出)。本文针对读者,了解贝叶斯方法的一些理解,但不一定符合添加剂制造的知识;软件性能概述和实施说明是任何实验设计从业者的乐器。此外,我们在添加剂制造领域的审查突出了博的目前的知识和技术趋势。本文在线拥有补充材料。
translated by 谷歌翻译