We present methods for online linear optimization that take advantage of benign (as opposed to worst-case) sequences. Specifically if the sequence encountered by the learner is described well by a known "predictable process", the algorithms presented enjoy tighter bounds as compared to the typical worst case bounds. Additionally, the methods achieve the usual worst-case regret bounds if the sequence is not benign. Our approach can be seen as a way of adding prior knowledge about the sequence within the paradigm of online learning. The setting is shown to encompass partial and side information. Variance and path-length bounds Hazan and Kale (2010); Chiang et al. (2012) can be seen as particular examples of online learning with simple predictable sequences. We further extend our methods to include competing with a set of possible predictable processes (models), that is "learning" the predictable process itself concurrently with using it to obtain better regret guarantees. We show that such model selection is possible under various assumptions on the available feedback.
translated by 谷歌翻译
We consider the problem of minimizing a smooth convex function by reducing the optimization to computing the Nash equilibrium of a particular zero-sum convex-concave game. Zero-sum games can be solved using online learning dynamics, where a classical technique involves simulating two no-regret algorithms that play against each other and, after T rounds, the average iterate is guaranteed to solve the original optimization problem with error decaying as O(log T /T). In this paper we show that the technique can be enhanced to a rate of O(1/T 2) by extending recent work [25, 28] that leverages optimistic learning to speed up equilibrium computation. The resulting optimization algorithm derived from this analysis coincides exactly with the well-known NESTEROVACCELERATION [19] method, and indeed the same story allows us to recover several variants of the Nesterov's algorithm via small tweaks. We are also able to establish the accelerated linear rate for a function which is both strongly-convex and smooth. This methodology unifies a number of different iterative optimization methods: we show that the HEAVYBALL algorithm is precisely the non-optimistic variant of NESTEROVAC-CELERATION, and recent prior work already established a similar perspective on FRANKWOLFE [2, 1].
translated by 谷歌翻译
我们考虑来自单调算子的变分不等式,其中包括凸最小化和凸凹鞍点问题。我们假设可以访问单调运算符的潜在噪声无偏值,并通过兼容的间隙函数评估收敛,该函数对应于上述子集中的标准最优性标准。我们基于Mirror-Prox算法提出了这些不等式的通用算法。具体地说,我们的算法同时实现了平滑/非平滑和噪声/无噪声设置的最佳速率。这是没有任何关于这些属性的先验知识,以及任意规范和兼容的Bregman分歧的一般设置。对于凸最小化和凸凹鞍点问题,这导致了新的自适应算法。我们的方法依赖于步长大小的新颖而简单的自适应选择,这可以被视为AdaGrad对手持训练问题的适当扩展。
translated by 谷歌翻译
We study algorithms for online linear optimization in Hilbert spaces, focusing on the case where the player is unconstrained. We develop a novel characterization of a large class of minimax algorithms, recovering, and even improving, several previous results as immediate corollaries. Moreover, using our tools, we develop an algorithm that provides a regret bound of O U T log(U √ T log 2 T + 1) , where U is the L 2 norm of an arbitrary comparator and both T and U are unknown to the player. This bound is optimal up to √ log log T terms. When T is known, we derive an algorithm with an optimal regret bound (up to constant factors). For both the known and unknown T case, a Normal approximation to the conditional value of the game proves to be the key analysis tool.
translated by 谷歌翻译
We show that natural classes of regularized learning algorithms with a form of recency bias achieve faster convergence rates to approximate efficiency and to coarse correlated equilibria in multiplayer normal form games. When each player in a game uses an algorithm from our class, their individual regret decays at O(T 3/4), while the sum of utilities converges to an approximate optimum at O(T 1)-an improvement upon the worst case O(T 1/2) rates. We show a black-box reduction for any algorithm in the class to achieve˜Oachieve˜ achieve˜O(T 1/2) rates against an adversary, while maintaining the faster rates against algorithms in the class. Our results extend those of Rakhlin and Shridharan [17] and Daskalakis et al. [4], who only analyzed two-player zero-sum games for specific algorithms.
translated by 谷歌翻译
We study the rates of growth of the regret in online convex optimization. First, we show that a simple extension of the algorithm of Hazan et al eliminates the need for a priori knowledge of the lower bound on the second derivatives of the observed functions. We then provide an algorithm, Adaptive Online Gradient Descent, which interpolates between the results of Zinkevich for linear functions and of Hazan et al for strongly convex functions, achieving intermediate rates between √ T and log T. Furthermore, we show strong optimality of the algorithm. Finally, we provide an extension of our results to general norms.
translated by 谷歌翻译
We show that for a general class of convex online learning problems, MirrorDescent can always achieve a (nearly) optimal regret guarantee.
translated by 谷歌翻译
First-order methods play a central role in large-scale machine learning. Eventhough many variations exist, each suited to a particular problem, almost allsuch methods fundamentally rely on two types of algorithmic steps: gradientdescent, which yields primal progress, and mirror descent, which yields dualprogress. We observe that the performances of gradient and mirror descent arecomplementary, so that faster algorithms can be designed by LINEARLY COUPLINGthe two. We show how to reconstruct Nesterov's accelerated gradient methodsusing linear coupling, which gives a cleaner interpretation than Nesterov'soriginal proofs. We also discuss the power of linear coupling by extending itto many other settings that Nesterov's methods cannot apply to.
translated by 谷歌翻译
This monograph presents the main complexity theorems in convex optimizationand their corresponding algorithms. Starting from the fundamental theory ofblack-box optimization, the material progresses towards recent advances instructural optimization and stochastic optimization. Our presentation ofblack-box optimization, strongly influenced by Nesterov's seminal book andNemirovski's lecture notes, includes the analysis of cutting plane methods, aswell as (accelerated) gradient descent schemes. We also pay special attentionto non-Euclidean settings (relevant algorithms include Frank-Wolfe, mirrordescent, and dual averaging) and discuss their relevance in machine learning.We provide a gentle introduction to structural optimization with FISTA (tooptimize a sum of a smooth and a simple non-smooth term), saddle-point mirrorprox (Nemirovski's alternative to Nesterov's smoothing), and a concisedescription of interior point methods. In stochastic optimization we discussstochastic gradient descent, mini-batches, random coordinate descent, andsublinear algorithms. We also briefly touch upon convex relaxation ofcombinatorial problems and the use of randomness to round solutions, as well asrandom walks based methods.
translated by 谷歌翻译
我们研究了一个在线鞍点问题,在每次迭代中,需要在不知道未来(凸凹)支付函数的情况下选择一对动作。目标是最小化累积支付与总支付函数的鞍点值之间的差距,使用称为“SP-regret”的度量来衡量。该问题概括了在线凸优化框架,可以解释为找到一个双人零和游戏序列集合的纳西均衡。我们提出了一种在一般情况下实现$ \ tilde {O}(\ sqrt {T})$ SP-regret的算法,以及对于强凸凹案例$ O(\ log T)$ SP-regret。我们再考虑一个受动态定价,拍卖和众包的各种应用驱动的约束在线凸优化问题。将此问题归咎于在线鞍点问题,并使用原始对偶算法建立$ O(\ sqrt {T})$后悔。
translated by 谷歌翻译
在本文中,我们研究了具有连续动作空间的时变游戏中后悔最小化代理的长期行为。在最基本的形式中,(外部)后悔最小化保证了代理商的累积收益在长期内不会比代理商在后方的最佳固定行动更糟糕。超越这种最坏情况保证,我们考虑一个动态后悔变量,将代理商的累积奖励与任何播放序列的奖励进行比较。专注于基于镜像的一系列无悔策略,我们仅依靠不完美的梯度观察得出明确的遗憾最小化率。然后我们利用这些结果来证明玩家能够在时变单调游戏中保持接近纳什均衡 - 如果阶段游戏的顺序允许限制,甚至会收敛到纳什均衡。
translated by 谷歌翻译
We consider the Frank-Wolfe (FW) method for constrained convex optimization, and we show that this classical technique can be interpreted from a different perspective: FW emerges as the computation of an equilibrium (saddle point) of a special convex-concave zero sum game. This saddle-point trick relies on the existence of no-regret online learning to both generate a sequence of iterates but also to provide a proof of convergence through vanishing regret. We show that our stated equivalence has several nice properties, as it exhibits a modularity that gives rise to various old and new algorithms. We explore a few such resulting methods, and provide experimental results to demonstrate correctness and efficiency.
translated by 谷歌翻译
我们展示的第一次,就我们所知,这是可能的toreconcile在网上学习的零和游戏两个看似contradictoryobjectives:消失时间平均的遗憾和不消失的步长。 Thisphenomenon,我们硬币``速度与激情”的学习游戏,设置一个关于什么是可能无论是在最大最小优化以及inmulti代理系统newbenchmark。我们的分析不依赖于引入carefullytailored动态。相反,我们关注在最充分研究的在线动态梯度下降。同样,我们专注于最简单的教科书类的游戏,2剂的双策略零和游戏,如匹配便士。即使thissimplest基准的总最著名的束缚悔,为ourwork之前,当时的$琐碎一个O(T)$,这是立即适用甚至anon在学习剂。基于扩散核武器-平衡轨迹的双重空间,我们证明了一个遗憾的几何形状的紧密理解结合$ \西塔(\ SQRT横置)$匹配在网上设置开往自适应stepsizes众所周知的最佳的,这保证适用于具有预先知道的时间范围,并调整fixedstep尺寸所有固定步sizeswithout因此。作为一个推论,我们建立,即使fixedlearning率的时间平均的混合策略,公用事业收敛其得到精确的纳什均衡值。
translated by 谷歌翻译
translated by 谷歌翻译
Sparse iterative methods, in particular first-order methods, are known to be among the most effective in solving large-scale two-player zero-sum extensive-form games. The convergence rates of these methods depend heavily on the properties of the distance-generating function that they are based on. We investigate both the theoretical and practical performance improvement of first-order methods (FOMs) for solving extensive-form games through better design of the dilated entropy function-a class of distance-generating functions related to the domains associated with the extensive-form games. By introducing a new weighting scheme for the dilated entropy function, we develop the first distance-generating function for the strategy spaces of sequential games that has only a logarithmic dependence on the branching factor of the player. This result improves the overall convergence rate of several first-order methods working with dilated entropy function by a factor of Ω(b d d), where b is the branching factor of the player, and d is the depth of the game tree. Thus far, counterfactual regret minimization methods have been faster in practice, and more popular, than first-order methods despite their theoretically inferior convergence rates. Using our new weighting scheme and a practical parameter tuning procedure we show that, for the first time, the excessive gap technique, a classical first-order method, can be made faster than the counterfactual regret minimization algorithm in practice for large games, and that the aggressive stepsize scheme of CFR+ is the only reason that the algorithm is faster in practice.
translated by 谷歌翻译
We study the regret of optimal strategies for online convex optimization games. Using von Neumann's minimax theorem, we show that the optimal regret in this adversarial setting is closely related to the behavior of the empirical minimization algorithm in a stochastic process setting: it is equal to the maximum, over joint distributions of the adversary's action sequence, of the difference between a sum of minimal expected losses and the minimal empirical loss. We show that the optimal regret has a natural geometric interpretation, since it can be viewed as the gap in Jensen's inequality for a concave functional-the minimizer over the player's actions of expected loss-defined on a set of probability distributions. We use this expression to obtain upper and lower bounds on the regret of an optimal strategy for a variety of online learning problems. Our method provides upper bounds without the need to construct a learning algorithm; the lower bounds provide explicit optimal strategies for the adversary.
translated by 谷歌翻译
This technical note discusses proofs of convergence for first-order methods based on simple potential-function arguments. We cover methods like gradient descent (for both smooth and non-smooth settings), mirror descent, and some accelerated variants. We hope the structure and presentation of these amortized-analysis proofs will be useful as a guiding principle in learning and using these proofs.
translated by 谷歌翻译
We formulate an affine invariant implementation of the accelerated first-order algorithm in [Nes-terov, 1983]. Its complexity bound is proportional to an affine invariant regularity constant defined with respect to the Minkowski gauge of the feasible set. We extend these results to more general problems, optimizing Hölder smooth functions using p-uniformly convex prox terms, and derive an algorithm whose complexity better fits the geometry of the feasible set and adapts to both the best Hölder smoothness parameter and the best gradient Lipschitz constant. Finally, we detail matching complexity lower bounds when the feasible set is an ℓp ball. In this setting, our upper bounds on iteration complexity for the algorithm in [Nesterov, 1983] are thus optimal in terms of target precision, smoothness and problem dimension.
translated by 谷歌翻译
We address online linear optimization problems when the possible actions of the decision maker are represented by binary vectors. The regret of the decision maker is the difference between her realized loss and the minimal loss she would have achieved by picking, in hind-sight, the best possible action. Our goal is to understand the magnitude of the best possible (minimax) regret. We study the problem under three different assumptions for the feedback the decision maker receives: full information, and the partial information models of the so-called "semi-bandit" and "bandit" problems. In the full information case we show that the standard exponentially weighted average forecaster is a provably suboptimal strategy. For the semi-bandit model, by combining the Mirror Descent algorithm and the INF (Implicitely Normalized Forecaster) strategy, we are able to prove the first optimal bounds. Finally, in the bandit case we discuss existing results in light of a new lower bound, and suggest a conjecture on the optimal regret in that case.
translated by 谷歌翻译
我们为对抗性语境强盗学习引入了一系列基于保证金的遗憾保证。我们的结果基于多类surrogatelosses。使用斜坡损失,我们推导出基于边缘的基于边缘的后悔约束条件,用于基准类的实值回归函数。新的边界限制作为与统计学习相关的经典边界的完整上下文带状模拟。结果适用于大型非参数类,改进Lipschitz语境强盗的最佳已知结果(Cesa-Bianchi等,2017),并且,作为一个特殊情况,将维度独立的Banditron后悔束缚(Kakade等,2008)推广为任意具有平滑规范的线性类。在算法方面,我们使用铰链损失来导出一个有效的算法,其中$ $ sqrt {dT} $ - 类型的错误与$ d $ -dimensional回归函数引发的基准策略绑定。这为Abernethy和Rakhlin(2009)的开放性问题提供了第一个基于铰链损失的解决方案。 Withan附加i.i.d.假设我们给出了一个简单的oracle-efficient算法,其遗憾匹配我们的通用度量基于熵的绑定,用于足够复杂的非参数类。在可实现性假设下,我们的结果也产生了经典的遗憾。
translated by 谷歌翻译