使用高斯过程的贝叶斯优化是处理昂贵的黑盒功能优化的流行方法。然而,由于经典GaussianProcesses的协方差矩阵的平稳性的先验,该方法可能不适用于优化问题中涉及的非平稳函数。为了克服这个问题,提出了一种新的贝叶斯优化方法。它基于深度高斯过程的assurrogate模型而不是经典的高斯过程。该建模技术通过简单地考虑静态高斯过程的功能组合来提高表示的能力以捕获非平稳性,从而提供多层结构。本文提出了一种新的全局优化算法,通过耦合深度高斯过程和贝叶斯优化算法。通过学术测试案例讨论并突出了这种优化方法的特殊性。所提出的算法的性能在分析测试用例和航空设计优化问题上进行评估,并与最先进的固定和非静态贝叶斯优化方法进行比较。
translated by 谷歌翻译
我们提出了一种自适应方法来构建贝叶斯推理的高斯过程,并使用昂贵的评估正演模型。我们的方法依赖于完全贝叶斯方法来训练高斯过程模型,并利用贝叶斯全局优化的预期改进思想。我们通过最大化高斯过程模型与噪声观测数据拟合的预期改进来自适应地构建训练设计。对合成数据模型问题的数值实验证明了所获得的自适应设计与固定非自适应设计相比,在前向模型推断成本的精确后验估计方面的有效性。
translated by 谷歌翻译
Recent work on Bayesian optimization has shown its effectiveness in global optimization of difficult black-box objective functions. Many real-world optimization problems of interest also have constraints which are unknown a priori. In this paper, we study Bayesian optimization for constrained problems in the general case that noise may be present in the constraint functions, and the objective and constraints may be evaluated independently. We provide motivating practical examples, and present a general framework to solve such problems. We demonstrate the effectiveness of our approach on optimizing the performance of online latent Dirichlet allocation subject to topic sparsity constraints, tuning a neural network given test-time memory constraints, and optimizing Hamiltonian Monte Carlo to achieve maximal effectiveness in a fixed time, subject to passing standard convergence diagnostics.
translated by 谷歌翻译
In many global optimization problems motivated by engineering applications, the number of function evaluations is severely limited by time or cost. To ensure that each evaluation contributes to the localization of good candidates for the role of global minimizer, a sequential choice of evaluation points is usually carried out. In particular, when Kriging is used to interpolate past evaluations, the uncertainty associated with the lack of information on the function can be expressed and used to compute a number of criteria accounting for the interest of an additional evaluation at any given point. This paper introduces minimizer entropy as a new Kriging-based criterion for the sequential choice of points at which the function should be evaluated. Based on stepwise uncertainty reduction, it accounts for the informational gain on the minimizer expected from a new evaluation. The criterion is approximated using conditional simulations of the Gaussian process model behind Kriging, and then inserted into an algorithm similar in spirit to the Efficient Global Optimization (EGO) algorithm. An empirical comparison is carried out between our criterion and expected improvement, one of the reference criteria in the literature. Experimental results indicate major evaluation savings over EGO. Finally, the method, which we call IAGO (for Informational Approach to Global Optimization) is extended to robust optimization problems, where both the factors to be tuned and the function evaluations are corrupted by noise.
translated by 谷歌翻译
This paper proposes a new method that extends the Efficient Global Optimization to address stochastic black-box systems. The method is based on a kriging meta-model that provides a global prediction of the objective values and a measure of prediction uncertainty at every point. The criterion for the infill sample selection is an augmented Expected Improvement function with desirable properties for stochastic responses. The method is empirically compared with the Revised Simplex Search, the Simultaneous Perturbation Stochastic Approximation, and the DIRECT methods using six test problems from the literature. An application case study on an inventory system is also documented. The results suggest that the proposed method has excellent consistency and efficiency in finding global optimal solutions, and is particularly useful for expensive systems.
translated by 谷歌翻译
我们开发了一种自动变分方法,用于推导具有高斯过程(GP)先验和一般可能性的模型。该方法支持多个输出和多个潜在函数,不需要条件似然的详细知识,只需将其评估为ablack-box函数。使用高斯混合作为变分分布,我们表明使用来自单变量高斯分布的样本可以有效地估计证据下界及其梯度。此外,该方法可扩展到大数据集,这是通过使用诱导变量使用增广先验来实现的。支持最稀疏GP近似的方法,以及并行计算和随机优化。我们在小数据集,中等规模数据集和大型数据集上定量和定性地评估我们的方法,显示其在不同似然模型和稀疏性水平下的竞争力。在涉及航空延误预测和手写数字分类的大规模实验中,我们表明我们的方法与可扩展的GP回归和分类的最先进的硬编码方法相同。
translated by 谷歌翻译
深度高斯过程(DGP)是GaussianProcesses的分层概括,它将良好校准的不确定性估计与多层模型的高灵活性相结合。这些模型面临的最大挑战之一是精确推断是难以处理的。当前的现有技术参考方法,变分推理(VI),采用高斯近似到后验分布。这可能是通常多模式后路的潜在差的单峰近似。在这项工作中,我们提供了后验的非高斯性质的证据,并且我们应用随机梯度哈密顿蒙特卡罗方法直接从中进行采样。为了有效地优化超参数,我们引入了移动窗口MCEM算法。与VI对应的计算成本相比,这导致了更好的预测。因此,我们的方法为DGP中的推理建立了一种新的先进技术。
translated by 谷歌翻译
许多对科学计算和机器学习感兴趣的概率模型具有昂贵的黑盒可能性,这些可能性阻止了贝叶斯推理的标准技术的应用,例如MCMC,其需要接近梯度或大量可能性评估。我们在这里介绍一种新颖的样本有效推理框架,VariationalBayesian Monte Carlo(VBMC)。 VBMC将变分推理与基于高斯过程的有源采样贝叶斯积分结合起来,使用latterto有效逼近变分目标中的难以求的积分。我们的方法产生了后验分布的非参数近似和模型证据的近似下界,对模型选择很有用。我们在几种合成可能性和神经元模型上展示VBMC,其中包含来自真实神经元的数据。在所有测试的问题和维度(高达$ D = 10 $)中,VBMC始终如一地通过有限的可能性评估预算重建后验证和模型证据,而不像其他仅在非常低维度下工作的方法。我们的框架作为一种新颖的工具,具有昂贵的黑盒可能性,可用于后期模型推理。
translated by 谷歌翻译
We develop a scalable deep non-parametric generative model by augmenting deep Gaussian processes with a recognition model. Inference is performed in a novel scalable variational framework where the variational posterior distributions are reparametrized through a multilayer perceptron. The key aspect of this reformula-tion is that it prevents the proliferation of variational parameters which otherwise grow linearly in proportion to the sample size. We derive a new formulation of the variational lower bound that allows us to distribute most of the computation in a way that enables to handle datasets of the size of mainstream deep learning tasks. We show the efficacy of the method on a variety of challenges including deep unsupervised learning and deep Bayesian optimization.
translated by 谷歌翻译
获取有关嘈杂昂贵的黑盒功能(计算机模拟或物理实验)的信息是一个极具挑战性的问题。有限的计算和财务资源限制了传统方法在实验设计中的应用。当问题中的感兴趣量(QoI)取决于昂贵的黑盒功能时,问题就会被诸如数值误差和随机近似误差等障碍所克服。贝叶斯优化设计的实验已经相当成功地引导设计者针对上述问题的QoI。这通常是通过按照与效用理论兼容的填充采样标准选择的设计顺序查询函数来实现的。但是,大多数当前方法在语义上设计为仅用于优化或推断黑盒功能本身。我们的目标是构建一种启发式方法,无论QoI如何,都能明确地处理上述问题。本文采用上述启发式推断出特定的QoI,即函数的期望值(期望值)。 Kullback Leibler(KL)的差异在用于量化信息增益的技术中相当明显。在本文中,我们推导出预期KLdivergence的表达式,以顺序推断我们的QoI。由Karhunene Loeve扩展围绕黑盒函数的高斯过程(GP)表示提供的分析易处理性允许绕过与样本平均相关的数值问题。建议的方法可以通过合理的假设扩展到任何QoI。所提出的方法在三个具有不同复杂度和维数水平的综合函数上得到验证和验证。我们在钢丝制造问题上展示了我们的方法论。
translated by 谷歌翻译
基于one.g.的数值模拟评估的计算工作量。有限元方法很高。元模型可用于创建低成本替代方案。然而,用于创建足够的元模型的所需样本的数量应该保持较低,这可以通过使用自适应采样技术来实现。在这篇硕士论文中,研究了自适应采样技术在使用克里金技术创建元模型中的应用,该技术通过由先验协方差控制的高斯过程来插值。提出了扩展到多保真问题的Kriging框架,并用于比较文献中提出的基准问题的自适应采样技术以及接触力学的应用。本文首次对Kriging框架的自适应技术的大范围进行了综合比较。此外,自适应技术的灵活性被引入到多保真Kriging以及具有减少的超参数维度的Kriging模型,称为偏最小二乘Kriging。此外,提出了一种创新的二进制分类自适应方案,并用于识别Duffing型振荡器的混沌运动。
translated by 谷歌翻译
贝叶斯优化(BO)是寻求昂贵的黑盒功能的全局优化的强大方法,并且已被证明成功地用于微调机器学习模型的超参数。贝叶斯优化例程涉及学习响应表面并最大化分数以选择在下一次迭代中要查询的最有价值的输入。这些关键步骤受到维数灾难的影响,因此贝叶斯优化不会超出10-20个参数。在这项工作中,我们解决了这个问题,并提出了一种高维BO方法,它可以学习输入空间的非线性低维流形。我们通过在高斯过程的协方差函数中嵌入的多层神经网络来实现这一点。作为监督回归解决方案的副产品,这种方法适用于无监督降维。这也允许在贝叶斯框架中利用高斯过程模型的数据效率。我们还引入了基于多输出高斯过程的从流形到高维空间的非线性映射,并通过边际似然最大化联合训练端到端。我们证明了这种本质上的低维优化在高维BO文献中优于最近的基线。在60维度的一组基准函数上。
translated by 谷歌翻译
Gaussian processes (GPs) are a good choice for function approximation as theyare flexible, robust to over-fitting, and provide well-calibrated predictiveuncertainty. Deep Gaussian processes (DGPs) are multi-layer generalisations ofGPs, but inference in these models has proved challenging. Existing approachesto inference in DGP models assume approximate posteriors that forceindependence between the layers, and do not work well in practice. We present adoubly stochastic variational inference algorithm, which does not forceindependence between layers. With our method of inference we demonstrate that aDGP model can be used effectively on data ranging in size from hundreds to abillion points. We provide strong empirical evidence that our inference schemefor DGPs works well in practice in both classification and regression.
translated by 谷歌翻译
Entropy Search (ES) and Predictive Entropy Search (PES) are popular and empirically successful Bayesian Optimization techniques. Both rely on a compelling information-theoretic motivation , and maximize the information gained about the arg max of the unknown function; yet, both are plagued by the expensive computation for estimating entropies. We propose a new criterion , Max-value Entropy Search (MES), that instead uses the information about the maximum function value. We show relations of MES to other Bayesian optimization methods, and establish a regret bound. We observe that MES maintains or improves the good empirical performance of ES/PES, while tremendously lightening the computational burden. In particular, MES is much more robust to the number of samples used for computing the entropy, and hence more efficient for higher dimensional problems.
translated by 谷歌翻译
We develop an efficient, Bayesian Uncertainty Quantification framework using a novel treed Gaussian process model. The tree is adaptively constructed using information conveyed by the observed data about the length scales of the underlying process. On each leaf of the tree, we utilize Bayesian Experimental Design techniques in order to learn a multi-output Gaussian process. The constructed surrogate can provide analytical point estimates, as well as error bars, for the statistics of interest. We numerically demonstrate the effectiveness of the suggested framework in identifying discontinuities, local features and unimportant dimensions in the solution of stochastic differential equations.
translated by 谷歌翻译
条件密度估计(CDE)模型处理估计条件分布。对分配施加的条件是模型的输入。 CDE是一项具有挑战性的任务,因为模型复杂性,代表性能力和过度拟合之间存在根本的权衡。在这项工作中,我们建议用潜在变量扩展模型的输入,并使用高斯过程(GP)将这个增强的输入映射到条件分布的样本上。我们的贝叶斯方法允许对小数据集进行建模,但我们也提供了使用随机变分推理将其应用于大数据的机制。我们的方法可用于在稀疏数据区域中对densitieseven进行建模,并允许在条件之间共享学习结构。我们说明了我们的模型在各种现实问题上的有效性和广泛适用性,例如出租车下降的时空密度估计,非高斯噪声建模,以及对全方位图像的少量学习。
translated by 谷歌翻译
The MIT Faculty has made this article openly available. Please share how this access benefits you. Your story matters. Abstract Surrogate assisted global optimization is gaining popularity. Similarly, modern advances in computing power increasingly rely on parallelization rather than faster processors. This paper examines some of the methods used to take advantage of parallelization in surrogate based global optimization. A key issue focused on in this review is how different algorithms balance exploration and exploitation. Most of the papers surveyed are adaptive samplers that employ Gaussian Process or Kriging surrogates. These allow sophisticated approaches for balancing exploration and exploitation and even allow to develop algorithms with calculable rate of convergence as function of the number of parallel processors. In addition to optimization based on adaptive sampling, surro-gate assisted parallel evolutionary algorithms are also surveyed. Beyond a review of the present state of the art, the paper also argues that methods that provide easy parallelization, like multiple parallel runs, or methods that rely on population of designs for diversity deserve more attention.
translated by 谷歌翻译
The Gaussian process latent variable model (GP-LVM) provides a flexible approach for non-linear dimensionality reduction that has been widely applied. However, the current approach for training GP-LVMs is based on maximum likelihood, where the latent projection variables are maximised over rather than integrated out. In this paper we present a Bayesian method for training GP-LVMs by introducing a non-standard variational inference framework that allows to approximately integrate out the latent variables and subsequently train a GP-LVM by maximising an analytic lower bound on the exact marginal likelihood. We apply this method for learning a GP-LVM from i.i.d. observations and for learning non-linear dynamical systems where the observations are temporally correlated. We show that a benefit of the variational Bayesian procedure is its robustness to overfitting and its ability to automatically select the dimensionality of the non-linear latent space. The resulting framework is generic, flexible and easy to extend for other purposes, such as Gaussian process regression with uncertain or partially missing inputs. We demonstrate our method on synthetic data and standard machine learning benchmarks, as well as challenging real world datasets, including high resolution video data.
translated by 谷歌翻译
We present a tutorial on Bayesian optimization, a method of finding the maximum of expensive cost functions. Bayesian optimization employs the Bayesian technique of setting a prior over the objective function and combining it with evidence to get a posterior function. This permits a utility-based selection of the next observation to make on the objective function, which must take into account both exploration (sampling from areas of high uncertainty) and exploitation (sampling areas likely to offer improvement over the current best observation). We also present two detailed extensions of Bayesian optimization, with experiments-active user modelling with preferences, and hierarchical reinforcement learning-and a discussion of the pros and cons of Bayesian optimization based on our experiences.
translated by 谷歌翻译