Many applications require optimizing an unknown, noisy function that is expensive to evaluate. We formalize this task as a multiarmed bandit problem, where the payoff function is either sampled from a Gaussian process (GP) or has low norm in a reproducing kernel Hilbert space. We resolve the important open problem of deriving regret bounds for this setting, which imply novel convergence rates for GP optimization. We analyze an intuitive Gaussian process upper confidence bound (-algorithm , and bound its cumulative regret in terms of maximal information gain, establishing a novel connection between GP optimization and experimental design. Moreover, by bounding the latter in terms of operator spectra, we obtain explicit sublinear regret bounds for many commonly used covariance functions. In some important cases, our bounds have surprisingly weak dependence on the dimensionality. In our experiments on real sensor data,-compares favorably with other heuristical GP optimization approaches.
translated by 谷歌翻译
How can we take advantage of opportunities for experimental parallelization in exploration-exploitation tradeoffs? In many experimental scenarios, it is often desirable to execute experiments simultaneously or in batches, rather than only performing one at a time. Additionally , observations may be both noisy and expensive. We introduce Gaussian Process Batch Upper Confidence Bound (GP-BUCB), an upper confidence bound-based algorithm, which models the reward function as a sample from a Gaussian process and which can select batches of experiments to run in parallel. We prove a general regret bound for GP-BUCB, as well as the surprising result that for some common kernels, the asymptotic average regret can be made independent of the batch size. The GP-BUCB algorithm is also applicable in the related case of a delay between initiation of an experiment and observation of its results , for which the same regret bounds hold. We also introduce Gaussian Process Adaptive Upper Confidence Bound (GP-AUCB), a variant of GP-BUCB which can exploit parallelism in an adaptive manner. We evaluate GP-BUCB and GP-AUCB on several simulated and real data sets. These experiments show that GP-BUCB and GP-AUCB are competitive with state-of-the-art heuristics. 1
translated by 谷歌翻译
In this paper, we analyze a generic algorithm scheme for sequential global optimization using Gaussian processes. The upper bounds we derive on the cumulative regret for this generic algorithm improve by an exponential factor the previously known bounds for algorithms like GP-UCB. We also introduce the novel Gaussian Process Mutual Information algorithm (GP-MI), which significantly improves further these upper bounds for the cumulative regret. We confirm the efficiency of this algorithm on synthetic and real tasks against the natural competitor, GP-UCB, and also the Expected Improvement heuristic.
translated by 谷歌翻译
How should we design experiments to maximize performance of a complex system, taking into account uncontrollable environmental conditions? How should we select relevant documents (ads) to display, given information about the user? These tasks can be formalized as contextual bandit problems, where at each round, we receive context (about the experimental conditions, the query), and have to choose an action (parameters, documents). The key challenge is to trade off exploration by gathering data for estimating the mean payoff function over the context-action space, and to exploit by choosing an action deemed optimal based on the gathered data. We model the payoff function as a sample from a Gaussian process defined over the joint context-action space, and develop CGP-UCB, an intuitive upper-confidence style algorithm. We show that by mixing and matching kernels for contexts and actions, CGP-UCB can handle a variety of practical applications. We further provide generic tools for deriving regret bounds when using such composite kernel functions. Lastly, we evaluate our algorithm on two case studies, in the context of automated vaccine design and sensor management. We show that context-sensitive optimization outperforms no or naive use of context.
translated by 谷歌翻译
Bandit methods for black-box optimisation, such as Bayesian optimisation, are used in a variety of applications including hyper-parameter tuning and experiment design. Recently, multi-fidelity methods have garnered considerable attention since function evaluations have become increasingly expensive in such applications. Multi-fidelity methods use cheap approximations to the function of interest to speed up the overall opti-misation process. However, most multi-fidelity methods assume only a finite number of approximations. In many practical applications however, a continuous spectrum of approximations might be available. For instance, when tuning an expensive neural network, one might choose to approximate the cross validation performance using less data N and/or few training iterations T. Here, the approximations are best viewed as arising out of a continuous two dimensional space (N, T). In this work, we develop a Bayesian optimisa-tion method, BOCA, for this setting. We char-acterise its theoretical properties and show that it achieves better regret than than strategies which ignore the approximations. BOCA outperforms several other baselines in synthetic and real experiments .
translated by 谷歌翻译
In this paper, we consider the challenge of maximizing an unknown function f for which evaluations are noisy and are acquired with high cost. An iterative procedure uses the previous measures to actively select the next estimation of f which is predicted to be the most useful. We focus on the case where the function can be evaluated in parallel with batches of fixed size and analyze the benefit compared to the purely sequential procedure in terms of cumulative regret. We introduce the Gaussian Process Upper Confidence Bound and Pure Exploration algorithm (GP-UCB-PE) which combines the UCB strategy and Pure Exploration in the same batch of evaluations along the parallel iterations. We prove theoretical upper bounds on the regret with batches of size K for this procedure which show the improvement of the order of ? K for fixed iteration cost over purely sequential versions. Moreover, the mul-tiplicative constants involved have the property of being dimension-free. We also confirm empirically the efficiency of GP-UCB-PE on real and synthetic problems compared to state-of-the-art competitors.
translated by 谷歌翻译
基于高斯过程模型的贝叶斯优化(BO)是优化评估成本昂贵的黑盒函数的有力范例。虽然几个BO算法可证明地收敛到未知函数的全局最优,但他们认为内核的超参数是已知的。在实践中情况并非如此,并且错误指定经常导致这些算法收敛到较差的局部最优。在本文中,我们提出了第一个BO算法,它可以证明是无后悔的,并且在不参考超参数的情况下收敛到最优。我们慢慢地调整了固定核的超参数,从而扩展了相关的函数类超时,使BO算法考虑了更复杂的函数候选。基于理论上的见解,我们提出了几种实用的算法,通过在线超参数估计来实现BO的经验数据效率,但是保留理论收敛保证。我们评估了几个基准问题的方法。
translated by 谷歌翻译
Bayesian optimisation has gained great popularity as a tool for optimising the parameters of machine learning algorithms and models. Somewhat ironically, setting up the hyper-parameters of Bayesian optimisation methods is notoriously hard. While reasonable practical solutions have been advanced, they can often fail to find the best optima. Surprisingly, there is little theoretical analysis of this crucial problem in the literature. To address this, we derive a cumulative regret bound for Bayesian optimisation with Gaussian processes and unknown kernel hyper-parameters in the stochastic setting. The bound, which applies to the expected improvement acquisition function and sub-Gaussian observation noise, provides us with guidelines on how to design hyper-parameter estimation methods. A simple simulation demonstrates the importance of following these guidelines.
translated by 谷歌翻译
最近,人们越来越关注贝叶斯优化 - 一种未知函数的优化,其假设通常由高斯过程(GP)先前表示。我们研究了一种直接使用函数argmax估计的优化策略。该策略提供了实践和理论上的优势:不需要选择权衡参数,而且,我们建立与流行的GP-UCB和GP-PI策略的紧密联系。我们的方法可以被理解为自动和自适应地在GP-UCB和GP-PI中进行勘探和利用。我们通过对遗憾的界限以及对机器人和视觉任务的广泛经验评估来说明这种自适应调整的效果,展示了该策略对一系列性能标准的稳健性。
translated by 谷歌翻译
Bayesian Optimisation (BO) is a technique used in optimising a$D$-dimensional function which is typically expensive to evaluate. While therehave been many successes for BO in low dimensions, scaling it to highdimensions has been notoriously difficult. Existing literature on the topic areunder very restrictive settings. In this paper, we identify two key challengesin this endeavour. We tackle these challenges by assuming an additive structurefor the function. This setting is substantially more expressive and contains aricher class of functions than previous work. We prove that, for additivefunctions the regret has only linear dependence on $D$ even though the functiondepends on all $D$ dimensions. We also demonstrate several other statisticaland computational benefits in our framework. Via synthetic examples, ascientific simulation and a face detection problem we demonstrate that ourmethod outperforms naive BO on additive functions and on several examples wherethe function is not additive.
translated by 谷歌翻译
本文考虑在学习优化行动时,如多臂强盗问题,在探索和开发之间使用简单的后验抽样算法。该算法也称为ThompsonSampling,与流行的上置信界(UCB)方法相比具有明显的优势,并且可以应用于具有有限或无限动作空间的问题以及动作奖励之间的复杂关系。我们做出了两个理论贡献。第一个建立了后验采样和UCB算法之间的联系。这个结果让我们将为UCB算法开发的后悔边界转换为贝叶斯后悔边界进行后验采样。我们的第二个理论贡献是贝叶斯后悔结合后悔,广泛适用于许多模型类。这个界限取决于我们称之为洗脱维度的新概念,它衡量行动奖励之间的依赖程度。与UCB算法相比,贝叶斯遗嘱对特定模型类别的界限,我们的一般界限匹配线性模型的最佳可用性,并且比最适用于广义线性模型。此外,我们的分析提供了对后验采样的性能优势的深入了解,后者通过仿真结果突出显示了性能超过最近提出的UCB算法。
translated by 谷歌翻译
Many applications in machine learning require optimizing unknown functions defined over a high-dimensional space from noisy samples that are expensive to obtain. We address this notoriously hard challenge, under the assumptions that the function varies only along some low-dimensional subspace and is smooth (i.e., it has a low norm in a Reproducible Kernel Hilbert Space). In particular, we present the SI-BO algorithm, which leverages recent low-rank matrix recovery techniques to learn the underlying subspace of the unknown function and applies Gaussian Process Upper Confidence sampling for optimization of the function. We carefully calibrate the exploration-exploitation tradeoff by allocating the sampling budget to subspace estimation and function optimization, and obtain the first subexponential cumulative regret bounds and convergence rates for Bayesian optimization in high-dimensions under noisy observations. Numerical results demonstrate the effectiveness of our approach in difficult scenarios.
translated by 谷歌翻译
We provide an information-theoretic analysis of Thompson sampling that applies across a broad range of online optimization problems in which a decision-maker must learn from partial feedback. This analysis inherits the simplicity and elegance of information theory and leads to regret bounds that scale with the entropy of the optimal-action distribution. This strengthens preexisting results and yields new insight into how information improves performance.
translated by 谷歌翻译
Entropy Search (ES) and Predictive Entropy Search (PES) are popular and empirically successful Bayesian Optimization techniques. Both rely on a compelling information-theoretic motivation , and maximize the information gained about the arg max of the unknown function; yet, both are plagued by the expensive computation for estimating entropies. We propose a new criterion , Max-value Entropy Search (MES), that instead uses the information about the maximum function value. We show relations of MES to other Bayesian optimization methods, and establish a regret bound. We observe that MES maintains or improves the good empirical performance of ES/PES, while tremendously lightening the computational burden. In particular, MES is much more robust to the number of samples used for computing the entropy, and hence more efficient for higher dimensional problems.
translated by 谷歌翻译
背景强盗文献传统上专注于解决探索 - 利用权衡的算法。特别地,在没有任何探索的情况下利用当前估计的贪婪算法在一般情况下可能是次优的。然而,无需探索的贪婪算法在实际设置中是可取的,其中探索可能是昂贵的或不道德的(例如,临床试验)。令人惊讶的是,我们发现如果有足够的随机性,简单的贪婪算法可以是速率最优的(实现渐近最优遗憾)。在观察到的背景中(协变量)。我们证明,对于满足声音的一般上下文分布类别的atwo武装强盗总是如此,我们称之为$ \ textit {covariate diversity} $。此外,即使没有这个条件,我们也表明贪婪算法可以在具有正概率的情况下进行速率最优化。因此,标准强盗算法可能会进行不必要的探索。通过这些结果,我们引入了一种新算法Greedy-First,该算法仅使用观察到的上下文和奖励来确定是遵循贪婪算法还是探索。我们证明了这种算法是速率最优的,没有对上下文分布或武器数量的任何额外假设。广泛的模拟表明,Greedy-First成功地减少了探索并且优于现有的(基于探索的)上下文频带算法,例如Thompson采样或上置信界( UCB)。
translated by 谷歌翻译
Contextual bandits are widely used in Internet services from news recommendation to advertising , and to Web search. Generalized linear models (logistical regression in particular) have demonstrated stronger performance than linear models in many applications where rewards are binary. However, most theoretical analyses on contextual bandits so far are on linear bandits. In this work, we propose an upper confidence bound based algorithm for generalized linear contex-tual bandits, which achieves añ O(√ dT) regret over T rounds with d dimensional feature vectors. This regret matches the minimax lower bound, up to logarithmic terms, and improves on the best previous result by a √ d factor, assuming the number of arms is fixed. A key component in our analysis is to establish a new, sharp finite-sample confidence bound for maximum-likelihood estimates in generalized linear models , which may be of independent interest. We also analyze a simpler upper confidence bound algorithm, which is useful in practice, and prove it to have optimal regret for certain cases.
translated by 谷歌翻译
Efficient global optimization is the problem of minimizing an unknownfunction f, using as few evaluations f(x) as possible. It can be considered asa continuum-armed bandit problem, with noiseless data and simple regret.Expected improvement is perhaps the most popular method for solving thisproblem; the algorithm performs well in experiments, but little is known aboutits theoretical properties. Implementing expected improvement requires a choiceof Gaussian process prior, which determines an associated space of functions,its reproducing-kernel Hilbert space (RKHS). When the prior is fixed, expectedimprovement is known to converge on the minimum of any function in the RKHS. Webegin by providing convergence rates for this procedure. The rates are optimalfor functions of low smoothness, and we modify the algorithm to attain optimalrates for smoother functions. For practitioners, however, these results aresomewhat misleading. Priors are typically not held fixed, but depend onparameters estimated from the data. For standard estimators, we show thisprocedure may never discover the minimum of f. We then propose alternativeestimators, chosen to minimize the constants in the rate of convergence, andshow these estimators retain the convergence rates of a fixed prior.
translated by 谷歌翻译
This paper analyzes the problem of Gaus-sian process (GP) bandits with determin-istic observations. The analysis uses a branch and bound algorithm that is related to the UCB algorithm of (Srinivas et al., 2010). For GPs with Gaussian observation noise, with variance strictly greater than zero, (Srinivas et al., 2010) proved that the regret vanishes at the approximate rate of O 1 √ t , where t is the number of observations. To complement their result, we attack the deterministic case and attain a much faster exponential convergence rate. Under some regularity assumptions, we show that the regret decreases asymp-totically according to O e − τ t (ln t) d/4 with high probability. Here, d is the dimension of the search space and τ is a constant that depends on the behaviour of the objective function near its global maximum.
translated by 谷歌翻译
Bayesian optimization with Gaussian processes has become an increasingly popular tool in the machine learning community. It is efficient and can be used when very little is known about the objective function, making it popular in expensive black-box optimization scenarios. It uses Bayesian methods to sample the objective efficiently using an acquisition function which incorporates the posterior estimate of the objective. However, there are several different parameterized acquisition functions in the literature, and it is often unclear which one to use. Instead of using a single acquisition function, we adopt a portfolio of acquisition functions governed by an online multi-armed bandit strategy. We propose several portfolio strategies, the best of which we call GP-Hedge, and show that this method outperforms the best individual acquisition function. We also provide a theoretical bound on the algorithm's performance .
translated by 谷歌翻译
We consider online learning for minimizing regret in unknown, episodic Markov decision processes (MDPs) with continuous states and actions. We develop variants of the UCRL and posterior sampling algorithms that employ nonparametric Gaussian process priors to generalize across the state and action spaces. When the transition and reward functions of the true MDP are members of the associated Reproducing Kernel Hilbert Spaces of functions induced by symmetric psd kernels (frequentist setting), we show that the algorithms enjoy sublinear regret bounds. The bounds are in terms of explicit structural parameters of the kernels, namely a novel generalization of the information gain metric from kernelized bandit, and highlight the influence of transition and reward function structure on the learning performance. Our results are applicable to multi-dimensional state and action spaces with composite kernel structures, and generalize results from the literature on kernelized bandits, and the adaptive control of parametric linear dynamical systems with quadratic costs.
translated by 谷歌翻译