我们提出了一种双时间算法,用于在两个玩家的零和游戏中找到局部纳什均衡。我们首先表明,由于非纳什平稳点的存在,先前的基于梯度的算法不能保证收敛到局部纳什均衡。通过利用游戏的差异结构,我们构造了一种算法,本地纳什均衡是唯一吸引固定点的算法。我们还表明该算法在平衡邻域中没有表现出振荡行为,并且表明它与其他最近提出的算法具有相同的每迭代复杂度。我们通过在两个数字示例上验证算法得出结论:具有多个纳什均衡和非纳西均衡的玩具示例,以及小生成对抗网络(GAN)的训练。
translated by 谷歌翻译
在优化中,函数的负梯度表示自发下降的方向。此外,在与梯度正交的任何方向上行进都保持了函数的值。在这项工作中,我们表明,梯度下降忽略的这些正交方向可能是关键的不平衡问题。由于生成性对抗网络(GAN)的出现,均衡问题引起了人们对机器学习的高度关注。我们使用变分不等式框架来分析基本GAN变体的populartraining算法:WassersteinLinear-Quadratic GAN。我们表明,最陡的下降方向导致平衡的发散,并且通过遵循特定的正交方向来保证收敛到平衡。我们将这种成功的技术称为Crossing-the-Curl,以其数学推导和直觉命名:识别游戏的旋转轴,并朝着较小的“卷曲”方向“跨越”空间。
translated by 谷歌翻译
最近的工作表明,GAN培训的绝对连续数据和发电机分布的局部收敛。在本文中,我们证明了对绝对连续性的要求是必要的:我们描述了一个简单而又原型的反例,表明在更为现实的分布不是绝对连续的情况下,非规范化的GAN训练并不总是收敛的。此外,我们讨论了最近提出的稳定GAN培训的正规化策略。我们的分析表明,具有实例噪声或零中心梯度罚分的GAN训练收敛。另一方面,我们表明,每个发生器更新具有有限数量的鉴别器更新的Wasserstein-GAN和WGAN-GP并不总是收敛到平衡点。我们讨论这些结果,使我们对GAN训练的稳定性问题有了新的解释。基于我们的分析,我们将收敛结果扩展到更一般的GAN,并证明局部收敛的简化梯度惩罚,即使发生器和数据分布位于较低维度的流形。我们发现这些惩罚在实践中运作良好,并使用它们来学习具有很少超参数调整的各种数据集的高分辨率生成图像模型。
translated by 谷歌翻译
在本文中,我们分析了用于训练生成对抗网络(GAN)的通用算法的数值。利用平滑二人游戏的形式,我们分析了GAN训练目标的相关梯度向量场。我们的研究结果表明,当前算法的收敛受到两个因素的影响:i)存在具有零实部的梯度矢量场的雅可比特征值,以及ii)具有大虚部的特征值。利用这些发现,我们设计了一种新的算法,克服了这些局限性并具有更好的收敛性。实验上,我们证明了它在训练常见GAN架构方面的优势,并且在已知非常难以训练的GAN架构上显示出收敛性。
translated by 谷歌翻译
生成对抗网络(GAN)是一种新颖的生成模型方法,其目标是学习实际数据点的分布。它们经常被证明难以训练:GAN与机器学习中的许多技术不同,因为它们最好描述作为鉴别器和发电机之间的双人游戏。这已经在训练过程中产生了不可靠性,并且对于GAN如何收敛,以及如果收敛,通常缺乏理解。本文的目的是提供适用于数学家的GAN理论,突出正面和负面结果。这包括确定引导GAN的问题,以及近年来GAN的拓扑和博弈理论如何为我们的理解和改进我们的技术做出贡献。
translated by 谷歌翻译
Despite the growing prominence of generative adversarial networks (GANs),optimization in GANs is still a poorly understood topic. In this paper, weanalyze the "gradient descent" form of GAN optimization i.e., the naturalsetting where we simultaneously take small gradient steps in both generator anddiscriminator parameters. We show that even though GAN optimization does notcorrespond to a convex-concave game (even for simple parameterizations), underproper conditions, equilibrium points of this optimization procedure are still\emph{locally asymptotically stable} for the traditional GAN formulation. Onthe other hand, we show that the recently proposed Wasserstein GAN can havenon-convergent limit cycles near equilibrium. Motivated by this stabilityanalysis, we propose an additional regularization term for gradient descent GANupdates, which \emph{is} able to guarantee local stability for both the WGANand the traditional GAN, and also shows practical promise in speeding upconvergence and addressing mode collapse.
translated by 谷歌翻译
受优化,博弈论和一般对抗网络训练应用的推动,最小 - 最大问题中一阶方法的收敛性得到了广泛的研究。已经认识到它们可能会循环,并且当它们没有时,它们的极限点没有很好的理解。当它们汇合时,它们是否会收敛到localmin-max解决方案?我们描述了两个基本一阶方法的极限点,即梯度下降/上升(GDA)和乐观梯度下降(OGDA)。我们表明,两种动态都避免了不稳定的关键点,几乎所有的初始化。此外,对于小步长和轻微假设,\ {OGDA \} - 稳定临界点的集合是\ {GDA \}的超集 - 稳定的临界点,这是本地最小 - 最大解决方案的超集(严格的一些)例)。连接线程是可以从动力系统的角度研究动力学的行为。
translated by 谷歌翻译
基于梯度的优化方法是寻找经典最小化和鞍点问题的局部最优的最流行的选择。在这里,我们强调了鞍点问题产生的梯度动力学的系统性问题,即存在不希望的稳定的静止点,这是非本地最优的。我们提出了一种新颖的优化方法,该方法利用曲率信息以逃离这些不希望的静止点。我们证明了具有曲率开发的不同优化方法,包括梯度法和adagrad,可以逃避非最优静止点。我们还提供了常见鞍点问题的实证结果,证实了曲率开发的优势。
translated by 谷歌翻译
Motivated by the pursuit of a systematic computational and algorithmic understanding of Generative Adversarial Networks (GANs), we present a simple yet unified non-asymptotic local convergence theory for smooth two-player games, which subsumes several discrete-time gradient-based saddle point dynamics. The analysis reveals the surprising nature of the off-diagonal interaction term as both a blessing and a curse. On the one hand, this interaction term explains the origin of the slowdown effect in the convergence of Simultaneous Gradient Ascent (SGA) to stable Nash equilibria. On the other hand, for the unstable equilibria, exponential convergence can be proved thanks to the interaction term, for four modified dynamics proposed to stabilize GAN training: Optimistic Mirror Descent (OMD), Consensus Optimization (CO), Implicit Updates (IU) and Predictive Method (PM). The analysis uncovers the intimate connections among these stabilizing techniques, and provides detailed characterization on the choice of learning rate. As a by-product, we present a new analysis for OMD proposed in Daskalakis, Ilyas, Syrgkanis, and Zeng [2017] with improved rates.
translated by 谷歌翻译
Given a non-convex twice differentiable cost function f , we prove that the set of initial conditions so that gradient descent converges to saddle points where 2 f has at least one strictly negative eigenvalue has (Lebesgue) measure zero, even for cost functions f with non-isolated critical points, answering an open question in [12]. Moreover, this result extends to forward-invariant convex subspaces, allowing for weak (non-globally Lipschitz) smoothness assumptions. Finally, we produce an upper bound on the allowable step-size.
translated by 谷歌翻译
抽象地,象棋和扑克等零和游戏的功能是对代理商进行评估,例如将它们标记为“胜利者”和“失败者”。如果游戏具有近似传递性,那么自我游戏会产生强度增加的序列。然而,非传递性游戏,如摇滚剪刀,可以表现出战略周期,并且不再有透明的目标 - 我们希望代理人增加力量,但对谁不清楚。在本文中,我们引入了一个用于在零和游戏中制定目标的几何框架,以构建产生开放式学习的目标的自适应序列。该框架允许我们推断非传递性游戏中的人口表现,并且能够开发一种新算法(纠正的Nash响应,PSRO_rN),该算法使用游戏理论小生境构建不同的有效代理群体,产生比现有算法更强的代理集合。我们将PSRO_rN应用于两个高度非传递性的资源分配游戏,并发现PSRO_rN一直优于现有的替代方案。
translated by 谷歌翻译
我们提出了一类优化方法,它们使用一阶梯度信息和一类凸函数实现线性收敛,这些凸函数比平滑和强凸函数大得多。这个较大的类包括其二阶导数在其最小值处可以是单数或无限的函数。我们的方法是共形哈密顿动力学的离散化,它概括了经典动量方法来模拟具有暴露于附加力的非标准动能的粒子运动和感兴趣函数的梯度场。它们是第一级的,因为它们只需要梯度计算。然而,关键的是,动力学梯度图可以被设计成以允许在非平滑或非强凸的凸函数上的线性会聚的方式结合关于凸共轭的信息。我们研究了一种隐式和两种显式方法。对于一种显式方法,我们提供了收敛到非凸函数静止点的条件。总而言之,我们提供了保证线性收敛的凸函数和kineticenergy对的条件,并表明这些条件可以通过功率增长的函数来满足。总之,这些方法扩展了可以通过一阶计算进行线性收敛的凸函数类。
translated by 谷歌翻译
生成对抗网络(GAN)形成了一种生成有吸引力的样本的生成建模方法,但它们特别难以训练。解决这个问题的一种常见方法是提出GAN目标的新配方。然而,令人惊讶的是,很少有研究考虑过为这种对抗性训练设计的优化方法。在这项工作中,我们在一般变分不等式框架中提出了GAN优化问题。在数学规划文献中,我们反驳了一些关于鞍点优化困难的常见假设,并提出了为变分不等式扩展技术以扩展GAN训练的技术。我们应用平均值,外推法和一种新颖的计算上更便宜的变量,我们称之为从过去的推断到随机梯度法(SGD)和亚当。
translated by 谷歌翻译
除了一些特殊情况,目前GenerativeAdversarial Networks(GANs)的训练方法最好保证收敛到“局部纳西均衡”(LNE)。然而,这样的LNE可以任意远离实际的纳什均衡(NE),这意味着对所发现的发生器或分类器的质量没有保证。本文提出在混合策略中将GAN显式建模为有限游戏,从而确保每个LNE都是NE。通过这个公式,我们提出了一种解决方法,该方法被证明会在一个资源有限的纳什均衡(RB-NE)上收敛:通过增加计算资源,我们可以找到更好的解决方案。我们凭经验证明我们的方法不太容易出现典型的GAN问题,例如模式崩溃,并且产生的解决方案比GAN和MGAN产生的解决方案更少可利用,并且非常类似于NE的理论预测。
translated by 谷歌翻译
The models surveyed include generalized Pólya urns, reinforced random walks, interacting urn models, and continuous reinforced processes. Emphasis is on methods and results, with sketches provided of some proofs. Applications are discussed in statistics, biology, economics and a number of other areas.
translated by 谷歌翻译
在博弈论,优化和生成对抗网络的应用的推动下,Daskalakis等人的最近的工作,以及{DISZ17}和梁和斯托克斯的后续工作〜\ cite {LiangS18}已经确立了广泛使用的梯度下降的可变性/称为“乐观梯度下降/上升(OGDA)”的上升过程在{\ em无约束}凸凹最小 - 最大优化问题中展示了最后迭代收敛到鞍点。我们表明,在一个名为“Optimistic Multiplicative-WeightsUpdate(OMWU)”的no-regretMultiplicative-Weights-Update方法的变体下,{\ emconstrained} min-max优化的更普遍的问题也是如此。这回答了Syrgkanis等人的一个开放性问题〜\ cite {SALS15}。我们的结果的证明需要从根本上不同的技术,这些技术存在于无后悔的学习文献和前面提到的论文中。我们证明了OMWU单调地将当前迭代的Kullback-Leiblerdivergence改进到(适当归一化的)min-maxsolution,直到它进入解的邻域。在theneighborhood内部,我们表明OMWU成为一个收敛于精确解决方案的合同地图。我们相信,我们的技术将有助于分析其他学习算法的最后一次迭代。
translated by 谷歌翻译
我们展示的第一次,就我们所知,这是可能的toreconcile在网上学习的零和游戏两个看似contradictoryobjectives:消失时间平均的遗憾和不消失的步长。 Thisphenomenon,我们硬币``速度与激情”的学习游戏,设置一个关于什么是可能无论是在最大最小优化以及inmulti代理系统newbenchmark。我们的分析不依赖于引入carefullytailored动态。相反,我们关注在最充分研究的在线动态梯度下降。同样,我们专注于最简单的教科书类的游戏,2剂的双策略零和游戏,如匹配便士。即使thissimplest基准的总最著名的束缚悔,为ourwork之前,当时的$琐碎一个O(T)$,这是立即适用甚至anon在学习剂。基于扩散核武器-平衡轨迹的双重空间,我们证明了一个遗憾的几何形状的紧密理解结合$ \西塔(\ SQRT横置)$匹配在网上设置开往自适应stepsizes众所周知的最佳的,这保证适用于具有预先知道的时间范围,并调整fixedstep尺寸所有固定步sizeswithout因此。作为一个推论,我们建立,即使fixedlearning率的时间平均的混合策略,公用事业收敛其得到精确的纳什均衡值。
translated by 谷歌翻译
The focus of this thesis is on solving a sequence of optimization problems that change over time in a structured manner. This type of problem naturally arises in contexts as diverse as channel estimation, target tracking, sequential machine learning, and repeated games. Due to the time-varying nature of these problems, it is necessary to determine new solutions as the problems change in order to ensure good solution quality. However, since the problems change over time in a structured manner, it is beneficial to exploit solutions to the previous optimization problems in order to efficiently solve the current optimization problem. The first problem considered is sequentially solving minimization problems that change slowly, in the sense that the gap between successive minimizers is bounded in norm. The minimization problems are solved by sequentially applying a selected optimization algorithm, such as stochastic gradient descent (SGD), based on drawing a number of samples in order to carry out a desired number of iterations. Two tracking criteria are introduced to evaluate approximate minimizer quality: one based on being accurate with respect to the mean trajectory, and the other based on being accurate in high probability (IHP). Knowledge of the bound on how the minimizers change, combined with properties of the chosen optimization algorithm, is used to select the number of samples needed to meet the desired tracking criterion. Next, it is not assumed that the bound on how the minimizers change is known. A technique to estimate the change in minimizers is provided along with analysis to show that eventually the estimate upper bounds the change in minimizers. This estimate of the change in minimizers is combined with the previous analysis to provide sample size selection rules to ensure that the mean or IHP tracking criterion is met. Simulations are used to confirm that the estimation approach provides the desired tracking accuracy in practice.
translated by 谷歌翻译
We study whether a depth two neural network can learn another depth two network using gradient descent. Assuming a linear output node, we show that the question of whether gradient descent converges to the target function is equivalent to the following question in electrodynamics: Given k fixed protons in R d , and k electrons, each moving due to the attractive force from the protons and repulsive force from the remaining electrons, whether at equilibrium all the electrons will be matched up with the protons, up to a permutation. Under the standard electrical force, this follows from the classic Earnshaw's theorem. In our setting, the force is determined by the activation function and the input distribution. Building on this equivalence, we prove the existence of an activation function such that gradient descent learns at least one of the hidden nodes in the target network. Iterating, we show that gradient descent can be used to learn the entire network one node at a time.
translated by 谷歌翻译
This paper examines the convergence of payoffs and strategies in Erev and Roth's model of reinforcement learning. When all players use this rule it eliminates iteratively dominated strategies and in two-person constant-sum games average payoffs converge to the value of the game. Strategies converge in constant-sum games with unique equilibria if they are pure or if they are mixed and the game is 2 × 2. The long-run behaviour of the learning rule is governed by equations related to Maynard Smith's version of the replicator dynamic. Properties of the learning rule against general opponents are also studied.
translated by 谷歌翻译