现代深度学习方法对许多超参数非常敏感,并且由于最先进模型的长训练时间,香草贝叶斯超参数优化通常在计算上是不可行的。另一方面,基于随机搜索的基​​于强盗的配置评估方法缺乏指导,并且不能快速收敛到最佳配置。在这里,我们建议结合贝叶斯优化和基于带宽的方法的优点,以实现最佳两个世界:强大的时间性能和快速收敛到最佳配置。我们提出了一种新的实用的最先进的超参数优化方法,它在广泛的问题类型上始终优于贝叶斯优化和超带,包括高维玩具函数,支持向量机,前馈神经网络,贝叶斯神经网络,深度执行学习和卷积神经网络。我们的方法坚固耐用,功能多样,同时在概念上简单易行。
translated by 谷歌翻译
Bayesian optimization has become a successful tool for hyperparameter optimization of machine learning algorithms, such as support vector machines or deep neural networks. Despite its success , for large datasets, training and validating a single configuration often takes hours, days, or even weeks, which limits the achievable performance. To accelerate hyperparameter optimization , we propose a generative model for the validation error as a function of training set size, which is learned during the optimization process and allows exploration of preliminary configurations on small subsets, by extrapolating to the full dataset. We construct a Bayesian optimization procedure, dubbed FABOLAS, which models loss and training time as a function of dataset size and automatically trades off high information gain about the global optimum against computational cost. Experiments optimizing support vector machines and deep neural networks show that FABOLAS often finds high-quality solutions 10 to 100 times faster than other state-of-the-art Bayesian optimization methods or the recently proposed bandit strategy Hyperband.
translated by 谷歌翻译
Different neural network architectures, hyperparameters and training protocols lead to different performances as a function of time. Human experts routinely inspect the resulting learning curves to quickly terminate runs with poor hyperparameter settings and thereby considerably speed up manual hyperparameter optimization. The same information can be exploited in automatic hyperparameter optimization by means of a probabilistic model of learning curves across hyperparameter settings. Here, we study the use of Bayesian neural networks for this purpose and improve their performance by a specialized learning curve layer.
translated by 谷歌翻译
Bayesian optimization is an effective methodology for the global optimizationof functions with expensive evaluations. It relies on querying a distributionover functions defined by a relatively cheap surrogate model. An accurate modelfor this distribution over functions is critical to the effectiveness of theapproach, and is typically fit using Gaussian processes (GPs). However, sinceGPs scale cubically with the number of observations, it has been challenging tohandle objectives whose optimization requires many evaluations, and as such,massively parallelizing the optimization. In this work, we explore the use of neural networks as an alternative to GPsto model distributions over functions. We show that performing adaptive basisfunction regression with a neural network as the parametric form performscompetitively with state-of-the-art GP-based approaches, but scales linearlywith the number of data rather than cubically. This allows us to achieve apreviously intractable degree of parallelism, which we apply to large scalehyperparameter optimization, rapidly finding competitive models on benchmarkobject recognition tasks using convolutional networks, and image captiongeneration using neural language models.
translated by 谷歌翻译
深度高斯过程(DGP)是GaussianProcesses的分层概括,它将良好校准的不确定性估计与多层模型的高灵活性相结合。这些模型面临的最大挑战之一是精确推断是难以处理的。当前的现有技术参考方法,变分推理(VI),采用高斯近似到后验分布。这可能是通常多模式后路的潜在差的单峰近似。在这项工作中,我们提供了后验的非高斯性质的证据,并且我们应用随机梯度哈密顿蒙特卡罗方法直接从中进行采样。为了有效地优化超参数,我们引入了移动窗口MCEM算法。与VI对应的计算成本相比,这导致了更好的预测。因此,我们的方法为DGP中的推理建立了一种新的先进技术。
translated by 谷歌翻译
许多对科学计算和机器学习感兴趣的概率模型具有昂贵的黑盒可能性,这些可能性阻止了贝叶斯推理的标准技术的应用,例如MCMC,其需要接近梯度或大量可能性评估。我们在这里介绍一种新颖的样本有效推理框架,VariationalBayesian Monte Carlo(VBMC)。 VBMC将变分推理与基于高斯过程的有源采样贝叶斯积分结合起来,使用latterto有效逼近变分目标中的难以求的积分。我们的方法产生了后验分布的非参数近似和模型证据的近似下界,对模型选择很有用。我们在几种合成可能性和神经元模型上展示VBMC,其中包含来自真实神经元的数据。在所有测试的问题和维度(高达$ D = 10 $)中,VBMC始终如一地通过有限的可能性评估预算重建后验证和模型证据,而不像其他仅在非常低维度下工作的方法。我们的框架作为一种新颖的工具,具有昂贵的黑盒可能性,可用于后期模型推理。
translated by 谷歌翻译
深度学习中的不确定性计算对于设计健壮且可靠的系统至关重要。变分推理(VI)是用于这种计算的有前途的方法,但与最大似然方法相比需要更多的努力来实现和执行。在本文中,我们提出了新的自然梯度算法来减少高斯平均场VI的这种努力。我们的算法可以通过在梯度评估期间扰乱网络权重来在Adam优化器内实现,并且可以通过使用适应学习速率的向量来廉价地获得不确定性估计。与现有的VI方法相比,这需要更低的内存,计算和实现工作量,同时获得可比质量的不确定性估计。我们的实证结果证实了这一点,并进一步表明我们的算法中的权重扰动对于强化学习和随机优化的探索是有用的。
translated by 谷歌翻译
This paper makes two contributions to Bayesian machine learning algorithms. Firstly, we propose stochastic natural gradient expectation propagation (SNEP), a novel alternative to expectation propagation (EP), a popular variational inference algorithm. SNEP is a black box variational algorithm, in that it does not require any simplifying assumptions on the distribution of interest, beyond the existence of some Monte Carlo sampler for estimating the moments of the EP tilted distributions. Further, as opposed to EP which has no guarantee of convergence, SNEP can be shown to be convergent, even when using Monte Carlo moment estimates. Secondly, we propose a novel architecture for distributed Bayesian learning which we call the posterior server. The posterior server allows scalable and robust Bayesian learning in cases where a data set is stored in a distributed manner across a cluster, with each compute node containing a disjoint subset of data. An independent Monte Carlo sampler is run on each compute node, with direct access only to the local data subset, but which targets an approximation to the global posterior distribution given all data across the whole cluster. This is achieved by using a distributed asynchronous implementation of SNEP to pass messages across the cluster. We demonstrate SNEP and the posterior server on distributed Bayesian learning of logistic regression and neural networks.
translated by 谷歌翻译
机器学习的许多实际应用需要数据有效的黑盒功能优化,例如,识别超参数或过程设置。然而,容易获得的算法通常被设计为通用优化器,因此对于特定任务而言通常是次优的。因此,提出了一种学习优化器的方法,该优化器自动适应于给定类别的目标函数,例如,在sim-to-realapplications的上下文中。所提出的方法不是从头开始学习优化,而是基于着名的贝叶斯优化框架。只有采集函数(AF)被学习的神经网络所取代,因此得到的算法仍然能够利用高斯过程的经过验证的广义化能力。我们在几个模拟以及模拟到真实传输任务上进行实验。结果表明,学习的优化器(1)在一般函数类上始终表现优于或与已知AF相媲美,并且(2)可以使用廉价模拟自动识别函数类的结构属性并转换该知识以快速适应实际硬件任务,从而显着优于现有的与问题无关的AF。
translated by 谷歌翻译
我们开发了一种自动变分方法,用于推导具有高斯过程(GP)先验和一般可能性的模型。该方法支持多个输出和多个潜在函数,不需要条件似然的详细知识,只需将其评估为ablack-box函数。使用高斯混合作为变分分布,我们表明使用来自单变量高斯分布的样本可以有效地估计证据下界及其梯度。此外,该方法可扩展到大数据集,这是通过使用诱导变量使用增广先验来实现的。支持最稀疏GP近似的方法,以及并行计算和随机优化。我们在小数据集,中等规模数据集和大型数据集上定量和定性地评估我们的方法,显示其在不同似然模型和稀疏性水平下的竞争力。在涉及航空延误预测和手写数字分类的大规模实验中,我们表明我们的方法与可扩展的GP回归和分类的最先进的硬编码方法相同。
translated by 谷歌翻译
Recent work on Bayesian optimization has shown its effectiveness in global optimization of difficult black-box objective functions. Many real-world optimization problems of interest also have constraints which are unknown a priori. In this paper, we study Bayesian optimization for constrained problems in the general case that noise may be present in the constraint functions, and the objective and constraints may be evaluated independently. We provide motivating practical examples, and present a general framework to solve such problems. We demonstrate the effectiveness of our approach on optimizing the performance of online latent Dirichlet allocation subject to topic sparsity constraints, tuning a neural network given test-time memory constraints, and optimizing Hamiltonian Monte Carlo to achieve maximal effectiveness in a fixed time, subject to passing standard convergence diagnostics.
translated by 谷歌翻译
Bayesian optimization has recently been proposed as a framework for automatically tuning the hyperparameters of machine learning models and has been shown to yield state-of-the-art performance with impressive ease and efficiency. In this paper, we explore whether it is possible to transfer the knowledge gained from previous optimizations to new tasks in order to find optimal hyperparameter settings more efficiently. Our approach is based on extending multi-task Gaussian processes to the framework of Bayesian optimization. We show that this method significantly speeds up the optimization process when compared to the standard single-task approach. We further propose a straightforward extension of our algorithm in order to jointly minimize the average error across multiple tasks and demonstrate how this can be used to greatly speed up k-fold cross-validation. Lastly, we propose an adaptation of a recently developed acquisition function, en-tropy search, to the cost-sensitive, multi-task setting. We demonstrate the utility of this new acquisition function by leveraging a small dataset to explore hyper-parameter settings for a large dataset. Our algorithm dynamically chooses which dataset to query in order to yield the most information per unit cost.
translated by 谷歌翻译
贝叶斯优化(BO)是指用于对昂贵的黑盒函数进行全局优化的一套技术,它使用函数的内省贝叶斯模型来有效地找到最优值。虽然BO已经在许多应用中成功应用,但现代优化任务迎来了传统方法失败的新挑战。在这项工作中,我们展示了Dragonfly,这是一个开源Python库,用于可扩展和强大的BO.Dragonfly包含多个最近开发的方法,允许BO应用于具有挑战性的现实世界环境;这些包括更好的处理更高维域的方法,当昂贵函数的廉价近似可用时处理多保真评估的方法,优化结构化组合空间的方法,例如神经网络架构的空间,以及处理并行评估的方法。此外,我们在BO中开发了新的方法改进,用于选择贝叶斯模型,选择采集函数,以及优化具有不同变量类型和附加约束的过复杂域。我们将Dragonfly与一套用于全局优化的其他软件包和算法进行比较,并证明当上述方法集成时,它们可以显着改善BO的性能。 Dragonfly图书馆可在dragonfly.github.io上找到。
translated by 谷歌翻译
近似贝叶斯计算(ABC)是贝叶斯推理的一种方法,当可能性不可用时,但是可以从模型中进行模拟。然而,许多ABC算法需要大量的模拟,这可能是昂贵的。为了降低计算成本,已经提出了贝叶斯优化(BO)和诸如高斯过程的模拟模型。贝叶斯优化使人们可以智能地决定在哪里评估模型下一个,但是常见的BO策略不是为了估计后验分布而设计的。我们的论文解决了文献中的这一差距。我们建议计算ABC后验密度的不确定性,这是因为缺乏模拟来准确估计这个数量,并且定义了测量这种不确定性的aloss函数。然后,我们建议选择下一个评估位置,以尽量减少预期的损失。实验表明,与普通BO策略相比,所提出的方法通常产生最准确的近似。
translated by 谷歌翻译
我们提出了SWA-Gaussian(SWAG),一种简单,可扩展,通用的方法,用于深度学习中的不确定性表示和校准。随机权重平均(SWA),计算随机梯度下降(SGD)的第一时刻,用修改的学习率计划迭代最近,我们已经证明了它可以改善深度学习中的泛化。利用SWAG,我们使用SWA解决方案作为第一时刻拟合高斯,并且还从SGD迭代得到秩和对角协方差,在神经网络权重上形成近似后验分布;我们从这个高斯分布中抽样来进行贝叶斯模型平均。 Weempirically发现SWAG近似于真实后验的形状,与描述SGD迭代的静态分布的结果一致。此外,我们证明SWAG在各种计算机视觉任务上表现良好,包括样本外检测,校准和转移学习,与许多流行的替代品相比,包括MC压差,KFACLaplace和温度缩放。
translated by 谷歌翻译
Deep Gaussian processes (DGPs) are multi-layer hierarchical generalisationsof Gaussian processes (GPs) and are formally equivalent to neural networks withmultiple, infinitely wide hidden layers. DGPs are nonparametric probabilisticmodels and as such are arguably more flexible, have a greater capacity togeneralise, and provide better calibrated uncertainty estimates thanalternative deep models. This paper develops a new approximate Bayesianlearning scheme that enables DGPs to be applied to a range of medium to largescale regression problems for the first time. The new method uses anapproximate Expectation Propagation procedure and a novel and efficientextension of the probabilistic backpropagation algorithm for learning. Weevaluate the new method for non-linear regression on eleven real-worlddatasets, showing that it always outperforms GP regression and is almost alwaysbetter than state-of-the-art deterministic and sampling-based approximateinference methods for Bayesian neural networks. As a by-product, this workprovides a comprehensive analysis of six approximate Bayesian methods fortraining neural networks.
translated by 谷歌翻译
Chemical space is so large that brute force searches for new interesting molecules are in-feasible. High-throughput virtual screening via computer cluster simulations can speed up the discovery process by collecting very large amounts of data in parallel, e.g., up to hundreds or thousands of parallel measurements. Bayesian optimization (BO) can produce additional acceleration by sequentially identifying the most useful simulations or experiments to be performed next. However, current BO methods cannot scale to the large numbers of parallel measurements and the massive libraries of molecules currently used in high-throughput screening. Here, we propose a scalable solution based on a parallel and distributed implementation of Thompson sampling (PDTS). We show that, in small scale problems, PDTS performs similarly as parallel expected improvement (EI), a batch version of the most widely used BO heuristic. Additionally , in settings where parallel EI does not scale, PDTS outperforms other scalable baselines such as a greedy search,-greedy approaches and a random search method. These results show that PDTS is a successful solution for large-scale parallel BO.
translated by 谷歌翻译
由于通用性,使用简单性和贝叶斯预测的效用,高斯过程(GP)回归已广泛应用于机器人技术。特别是,GP回归的主要实现是基于内核的,因为通过利用内核函数作为无限维特征,可以拟合任意非线性函数。虽然结合先前信息有可能大大提高基于内核的GP回归的数据效率,但通过选择内核函数和相关的超参数来表达复杂的先验通常具有挑战性且不直观。此外,基于内核的GPregression的计算复杂度随着样本数量的不足而缩小,限制了其在可获得大量数据的情况下的应用。在这项工作中,我们提出了ALPaCA,一种有效的贝叶斯回归算法,可以解决这些问题。 ALPaCA使用样本函数的数据集来学习特定于域的有限维特征编码,以及相关权重之前的先验,使得该特征空间中的贝叶斯线性回归产生对后密度的准确在线预测。这些功能是神经网络,通过元学习方法进行训练。 ALPaCA从数据集中提取所有先行信息,而不是依赖于任意限制性内核超参数的选择。此外,它大大降低了样本的复杂性,并允许扩展到大型系统。我们研究了ALPaCA在两个简单回归问题,两个模拟机器人系统以及人类执行的车道变换驾驶任务上的表现。我们发现,我们的方法优于基于内核的GP回归,以及theart元学习方法的状态,从而为机器人中的多种回归任务提供了一种有前途的插件工具,其中可扩展性和数据效率是重要的。
translated by 谷歌翻译
We propose minimum regret search (MRS), a novel acquisition function for Bayesian optimization. MRS bears similarities with information-theoretic approaches such as en-tropy search (ES). However, while ES aims in each query at maximizing the information gain with respect to the global maximum, MRS aims at minimizing the expected simple regret of its ultimate recommendation for the optimum. While empirically ES and MRS perform similar in most of the cases, MRS produces fewer out-liers with high simple regret than ES. We provide empirical results both for a synthetic single-task optimization problem as well as for a simulated multi-task robotic control problem.
translated by 谷歌翻译
随机梯度马尔可夫链蒙特卡罗(SG-MCMC)在大规模贝叶斯模型中模拟后验样本已变得越来越流行。然而,现有的SG-MCMC方案并不适用于任何特定的概率模型,即使对底层动力系统的简单修改也需要显着的物理直觉。本文介绍了第一个学习算法,该算法允许自动设计SG-MCMC采样器的基础连续动态。学习的采样器通过状态相关的漂移和扩散推广了哈密顿动力学,实现了神经网络能量景观的快速遍历和有效探索。实验验证了贝叶斯全连通神经网络和贝叶斯递归神经网络任务的拟议方法,表明该学习的采样器优于通用,手工设计的SG-MCMC算法,并扩展到不同的数据集和更大的架构。
translated by 谷歌翻译