智能论文笔记

Nonconvex Matrix Factorization is Geodesically Convex: Global Landscape Analysis for Fixed-rank Matrix Optimization From a Riemannian Perspective

Yuetian Luo , Nicolas Garcia Trillos

分类：机器学习

2022-09-29

We study a general matrix optimization problem with a fixed-rank positive semidefinite (PSD) constraint. We perform the Burer-Monteiro factorization and consider a particular Riemannian quotient geometry in a search space that has a total space equipped with the Euclidean metric. When the original objective f satisfies standard restricted strong convexity and smoothness properties, we characterize the global landscape of the factorized objective under the Riemannian quotient geometry. We show the entire search space can be divided into three regions: (R1) the region near the target parameter of interest, where the factorized objective is geodesically strongly convex and smooth; (R2) the region containing neighborhoods of all strict saddle points; (R3) the remaining regions, where the factorized objective has a large gradient. To our best knowledge, this is the first global landscape analysis of the Burer-Monteiro factorized objective under the Riemannian quotient geometry. Our results provide a fully geometric explanation for the superior performance of vanilla gradient descent under the Burer-Monteiro factorization. When f satisfies a weaker restricted strict convexity property, we show there exists a neighborhood near local minimizers such that the factorized objective is geodesically convex. To prove our results we provide a comprehensive landscape analysis of a matrix factorization problem with a least squares objective, which serves as a critical bridge. Our conclusions are also based on a result of independent interest stating that the geodesic ball centered at Y with a radius 1/3 of the least singular value of Y is a geodesically convex set under the Riemannian quotient geometry, which as a corollary, also implies a quantitative bound of the convexity radius in the Bures-Wasserstein space. The convexity radius obtained is sharp up to constants.

translated by 谷歌翻译

Nonconvex Factorization and Manifold Formulations are Almost Equivalent in Low-rank Matrix Optimization

Yuetian Luo , Xudong Li , Anru R. Zhang

分类：机器学习 | (统计)机器学习

2021-08-03

In this paper, we consider the geometric landscape connection of the widely studied manifold and factorization formulations in low-rank positive semidefinite (PSD) and general matrix optimization. We establish a sandwich relation on the spectrum of Riemannian and Euclidean Hessians at first-order stationary points (FOSPs). As a result of that, we obtain an equivalence on the set of FOSPs, second-order stationary points (SOSPs) and strict saddles between the manifold and the factorization formulations. In addition, we show the sandwich relation can be used to transfer more quantitative geometric properties from one formulation to another. Similarities and differences in the landscape connection under the PSD case and the general case are discussed. To the best of our knowledge, this is the first geometric landscape connection between the manifold and the factorization formulations for handling rank constraints, and it provides a geometric explanation for the similar empirical performance of factorization and manifold approaches in low-rank matrix optimization observed in the literature. In the general low-rank matrix optimization, the landscape connection of two factorization formulations (unregularized and regularized ones) is also provided. By applying these geometric landscape connections, in particular, the sandwich relation, we are able to solve unanswered questions in literature and establish stronger results in the applications on geometric analysis of phase retrieval, well-conditioned low-rank matrix optimization, and the role of regularization in factorization arising from machine learning and signal processing.

translated by 谷歌翻译

Recursive Importance Sketching for Rank Constrained Least Squares: Algorithms and High-order Convergence

Yuetian Luo , Wen Huang , Xudong Li , Anru R. Zhang

分类：机器学习 | (统计)机器学习

2020-11-17

在本文中，我们提出{\ it \下划线{r} ecursive} {\ it \ usef \ undesline {i} mortance} {\ it \ it \ usew supsline {s} ketching} algorithM squares {\ it \下划线{o} ptimization}（risro）。 Risro的关键步骤是递归重要性草图，这是一个基于确定性设计的递归投影的新素描框架，它与文献中的随机素描\ Citep {Mahoney2011 randomized，Woodruff2014sketching}有很大不同。在这个新的素描框架下，可以重新解释文献中的几种现有算法，而Risro比它们具有明显的优势。 Risro易于实现，并在计算上有效，其中每次迭代中的核心过程是解决降低尺寸最小二乘问题的问题。我们在某些轻度条件下建立了Risro的局部二次线性和二次收敛速率。我们还发现了Risro与Riemannian Gauss-Newton算法在固定等级矩阵上的联系。在机器学习和统计数据中的两种应用中，RISRO的有效性得到了证明：低级别矩阵痕量回归和相位检索。仿真研究证明了Risro的出色数值性能。

translated by 谷歌翻译

Tensor-on-Tensor Regression: Riemannian Optimization, Over-parameterization, Statistical-computational Gap, and Their Interplay

Yuetian Luo , Anru R. Zhang

分类：机器学习 | (统计)机器学习

2022-06-17

我们研究了张量张量的回归，其中的目标是将张量的响应与张量协变量与塔克等级参数张量/矩阵连接起来，而没有其内在等级的先验知识。我们提出了Riemannian梯度下降（RGD）和Riemannian Gauss-Newton（RGN）方法，并通过研究等级过度参数化的影响来应对未知等级的挑战。我们通过表明RGD和RGN分别线性地和四边形地收敛到两个等级的统计最佳估计值，从而为一般的张量调节回归提供了第一个收敛保证。我们的理论揭示了一种有趣的现象：Riemannian优化方法自然地适应了过度参数化，而无需修改其实施。我们还为低度多项式框架下的标量调整回归中的统计计算差距提供了第一个严格的证据。我们的理论证明了``统计计算差距的祝福''现象：在张张量的张量回归中，对于三个或更高的张紧器，在张张量的张量回归中，计算所需的样本量与中等级别相匹配的计算量相匹配。在考虑计算可行的估计器时，虽然矩阵设置没有此类好处。这表明中等等级的过度参数化本质上是``在张量调整的样本量三分或更高的样本大小上，三分或更高的样本量。最后，我们进行仿真研究以显示我们提出的方法的优势并证实我们的理论发现。

translated by 谷歌翻译

Low-rank Tensor Estimation via Riemannian Gauss-Newton: Statistical Optimality and Second-Order Convergence

Yuetian Luo , Anru R. Zhang

分类： (统计)机器学习 | 机器学习

2021-04-24

In this paper, we consider the estimation of a low Tucker rank tensor from a number of noisy linear measurements. The general problem covers many specific examples arising from applications, including tensor regression, tensor completion, and tensor PCA/SVD. We consider an efficient Riemannian Gauss-Newton (RGN) method for low Tucker rank tensor estimation. Different from the generic (super)linear convergence guarantee of RGN in the literature, we prove the first local quadratic convergence guarantee of RGN for low-rank tensor estimation in the noisy setting under some regularity conditions and provide the corresponding estimation error upper bounds. A deterministic estimation error lower bound, which matches the upper bound, is provided that demonstrates the statistical optimality of RGN. The merit of RGN is illustrated through two machine learning applications: tensor regression and tensor SVD. Finally, we provide the simulation results to corroborate our theoretical findings.

translated by 谷歌翻译

Unique sparse decomposition of low rank matrices

Dian Jin , Xin Bing , Yuqian Zhang

分类：机器学习

2021-06-14

找到给定矩阵的独特低维分解的问题是许多领域的基本和经常发生的问题。在本文中，我们研究了寻求一个唯一分解的问题，以\ mathbb {r} ^ {p \ times n} $ in \ mathbb {p \ time n} $。具体来说，我们考虑$ y = ax \ in \ mathbb {r} ^ {p \ time n} $，其中矩阵$ a \ in \ mathbb {r} ^ {p \ times r} $具有全列等级，带有$ r <\ min \ {n，p \} $，矩阵$ x \ in \ mathbb {r} ^ {r \ times n} $是元素 - 方向稀疏。我们证明，可以唯一确定$ y $的稀疏分解，直至某些内在签名排列。我们的方法依赖于解决在单位球体上限制的非凸优化问题。我们对非透露优化景观的几何分析表明，任何{\ em strict}本地解决方案靠近地面真相解决方案，可以通过任何二阶序列算法遵循的简单数据驱动初始化恢复。最后，我们用数值实验证实了这些理论结果。

translated by 谷歌翻译

Stochastic Subgradient Descent Escapes Active Strict Saddles

Pascal Bianchi , Walid Hachem , Sholom Schechtman

分类： (统计)机器学习

2021-08-04

In non-smooth stochastic optimization, we establish the non-convergence of the stochastic subgradient descent (SGD) to the critical points recently called active strict saddles by Davis and Drusvyatskiy. Such points lie on a manifold $M$ where the function $f$ has a direction of second-order negative curvature. Off this manifold, the norm of the Clarke subdifferential of $f$ is lower-bounded. We require two conditions on $f$. The first assumption is a Verdier stratification condition, which is a refinement of the popular Whitney stratification. It allows us to establish a reinforced version of the projection formula of Bolte \emph{et.al.} for Whitney stratifiable functions, and which is of independent interest. The second assumption, termed the angle condition, allows to control the distance of the iterates to $M$. When $f$ is weakly convex, our assumptions are generic. Consequently, generically in the class of definable weakly convex functions, the SGD converges to a local minimizer.

translated by 谷歌翻译

Riemannian Optimization via Frank-Wolfe Methods

Melanie Weber , Suvrit Sra

分类：机器学习

2017-10-30

我们研究无限制的黎曼优化的免投影方法。特别是，我们提出了黎曼弗兰克 - 沃尔夫（RFW）方法。我们将RFW的非渐近收敛率分析为最佳（高音）凸起问题，以及非凸起目标的临界点。我们还提出了一种实用的设置，其中RFW可以获得线性收敛速度。作为一个具体的例子，我们将RFW专用于正定矩阵的歧管，并将其应用于两个任务：（i）计算矩阵几何平均值（riemannian质心）; （ii）计算Bures-Wasserstein重心。这两个任务都涉及大量凸间间隔约束，为此，我们表明RFW要求的Riemannian“线性”Oracle承认了闭合形式的解决方案;该结果可能是独立的兴趣。我们进一步专门从事RFW到特殊正交组，并表明这里也可以以封闭形式解决riemannian“线性”甲骨文。在这里，我们描述了数据矩阵同步的应用程序（促使问题）。我们补充了我们的理论结果，并对RFW对最先进的riemananian优化方法进行了实证比较，并观察到RFW竞争性地对计算黎曼心质的任务进行竞争性。

translated by 谷歌翻译

Preconditioned Gradient Descent for Overparameterized Nonconvex Burer--Monteiro Factorization with Global Optimality Certification

Gavin Zhang , Salar Fattahi , Richard Y. Zhang

分类：机器学习 | (统计)机器学习

2022-06-07

我们考虑使用梯度下降来最大程度地减少$ f（x）= \ phi（xx^{t}）$在$ n \ times r $因件矩阵$ x $上，其中$ \ phi是一种基础平稳凸成本函数定义了$ n \ times n $矩阵。虽然只能在合理的时间内发现只有二阶固定点$ x $，但如果$ x $的排名不足，则其排名不足证明其是全球最佳的。这种认证全球最优性的方式必然需要当前迭代$ x $的搜索等级$ r $，以相对于级别$ r^{\ star} $过度参数化。不幸的是，过度参数显着减慢了梯度下降的收敛性，从$ r = r = r = r^{\ star} $的线性速率到$ r> r> r> r> r^{\ star} $，即使$ \ phi $是$ \ phi $强烈凸。在本文中，我们提出了一项廉价的预处理，该预处理恢复了过度参数化的情况下梯度下降回到线性的收敛速率，同时也使在全局最小化器$ x^{\ star} $中可能不良条件变得不可知。

translated by 谷歌翻译

Asymptotic Escape of Spurious Critical Points on the Low-rank Matrix Manifold

Thomas Y. Hou , Zhenzhen Li , Ziyun Zhang

分类：机器学习 | (统计)机器学习

2021-07-20

我们表明，在固定级和对称的阳性半明确矩阵上，Riemannian梯度下降算法几乎可以肯定地逃脱了歧管边界上的一些虚假关键点。我们的结果是第一个部分克服低级基质歧管的不完整而不改变香草riemannian梯度下降算法的不完整性。虚假的关键点是一些缺陷的矩阵，仅捕获地面真理的特征成分的一部分。与经典的严格鞍点不同，它们表现出非常奇异的行为。我们表明，使用动力学低级别近似和重新升级的梯度流，可以将某些伪造的临界点转换为参数化域中的经典严格鞍点，从而导致所需的结果。提供数值实验以支持我们的理论发现。

translated by 谷歌翻译

Over-Parametrized Matrix Factorization in the Presence of Spurious Stationary Points

Armin Eftekhari

分类：机器学习

2021-12-25

通过内插机器在信号处理和机器学习中的新兴作用的推动，这项工作考虑了过度参数化矩阵分子的计算方面。在这种情况下，优化景观可能包含虚假的固定点（SSP），其被证明是全级矩阵。这些SSP的存在意味着不可能希望任何全球担保过度参数化矩阵分解。例如，当在SSP上初始化时，梯度流将永远被删除。尽管如此，尽管有这些SSP，我们在这项工作中建立了相应的优势函数的梯度流到全局最小化器，只要其初始化是缺陷并且足够接近可行性问题的可行性集合。我们在数值上观察到，当随机初始化时，通过原始 - 双算法启发的提出梯度流的启发式离散化是成功的。我们的结果与当地的细化方法形成鲜明的对比，该方法需要初始化接近优化问题的最佳集合。更具体地，我们成功避免了SSPS设置的陷阱，因为梯度流始终仍然是缺陷，而不是因为附近没有SSP。后者是本地细化方法的情况。此外，广泛使用的限制性肌肉属性在我们的主要结果中没有作用。

translated by 谷歌翻译

On Asymptotic Linear Convergence of Projected Gradient Descent for Constrained Least Squares

Trung Vu , Raviv Raich

分类：机器学习

2021-12-22

诸如压缩感测，图像恢复，矩阵/张恢复和非负矩阵分子等信号处理和机器学习中的许多近期问题可以作为约束优化。预计的梯度下降是一种解决如此约束优化问题的简单且有效的方法。本地收敛分析将我们对解决方案附近的渐近行为的理解，与全球收敛分析相比，收敛率的较小界限提供了较小的界限。然而，本地保证通常出现在机器学习和信号处理的特定问题领域。此稿件在约束最小二乘范围内，对投影梯度下降的局部收敛性分析提供了统一的框架。该建议的分析提供了枢转局部收敛性的见解，例如线性收敛的条件，收敛区域，精确的渐近收敛速率，以及达到一定程度的准确度所需的迭代次数的界限。为了证明所提出的方法的适用性，我们介绍了PGD的收敛分析的配方，并通过在四个基本问题上的配方的开始延迟应用来证明它，即线性约束最小二乘，稀疏恢复，最小二乘法使用单位规范约束和矩阵完成。

translated by 谷歌翻译

Manifold Free Riemannian Optimization

Boris Shustin , Haim Avron , Barak Sober

分类： (统计)机器学习

2022-09-07

Riemannian优化是解决优化问题的原则框架，其中所需的最佳被限制为光滑的歧管$ \ Mathcal {M} $。在此框架中设计的算法通常需要对歧管的几何描述，该描述通常包括切线空间，缩回和成本函数的梯度。但是，在许多情况下，由于缺乏信息或棘手的性能，只能访问这些元素的子集（或根本没有）。在本文中，我们提出了一种新颖的方法，可以在这种情况下执行近似Riemannian优化，其中约束歧管是$ \ r^{d} $的子手机。至少，我们的方法仅需要一组无噪用的成本函数$（\ x_ {i}，y_ {i}）\ in {\ mathcal {m}} \ times \ times \ times \ times \ times \ mathbb {r} $和内在的歧管$ \ MATHCAL {M} $的维度。使用样品，并利用歧管-MLS框架（Sober和Levin 2020），我们构建了缺少的组件的近似值，这些组件娱乐可证明的保证并分析其计算成本。如果某些组件通过分析给出（例如，如果成本函数及其梯度明确给出，或者可以计算切线空间），则可以轻松地适应该算法以使用准确的表达式而不是近似值。我们使用我们的方法分析了基于Riemannian梯度的方法的全球收敛性，并从经验上证明了该方法的强度，以及基于类似原理的共轭梯度类型方法。

translated by 谷歌翻译

How to Escape Saddle Points Efficiently

Chi Jin , Rong Ge , Praneeth Netrapalli , Sham M. Kakade , Michael I. Jordan

分类：

2017-03-02

This paper shows that a perturbed form of gradient descent converges to a second-order stationary point in a number iterations which depends only poly-logarithmically on dimension (i.e., it is almost "dimension-free"). The convergence rate of this procedure matches the wellknown convergence rate of gradient descent to first-order stationary points, up to log factors. When all saddle points are non-degenerate, all second-order stationary points are local minima, and our result thus shows that perturbed gradient descent can escape saddle points almost for free.Our results can be directly applied to many machine learning applications, including deep learning. As a particular concrete example of such an application, we show that our results can be used directly to establish sharp global convergence rates for matrix factorization. Our results rely on a novel characterization of the geometry around saddle points, which may be of independent interest to the non-convex optimization community.

translated by 谷歌翻译

Improved Global Guarantees for the Nonconvex Burer--Monteiro Factorization via Rank Overparameterization

Richard Y. Zhang

分类：机器学习 | (统计)机器学习

2022-07-05

我们考虑最大程度地减少两次不同的可差异，$ l $ -smooth和$ \ mu $ -stronglongly凸面目标$ \ phi $ phi $ a $ n \ times n $ n $阳性阳性半finite $ m \ succeq0 $，在假设是最小化的假设$ m^{\ star} $具有低等级$ r^{\ star} \ ll n $。遵循burer- monteiro方法，我们相反，在因子矩阵$ x $ size $ n \ times r $的因素矩阵$ x $上最小化nonconvex objection $ f（x）= \ phi（xx^{t}）$。这实际上将变量的数量从$ o（n^{2}）$减少到$ O（n）$的少量，并且免费实施正面的半弱点，但要付出原始问题的均匀性。在本文中，我们证明，如果搜索等级$ r \ ge r^{\ star} $被相对于真等级$ r^{\ star} $的常数因子过度参数化，则如$ r> \ in frac {1} {4}（l/\ mu-1）^{2} r^{\ star} $，尽管非概念性，但保证本地优化可以从任何初始点转换为全局最佳。这显着改善了先前的$ r \ ge n $的过度参数化阈值，如果允许$ \ phi $是非平滑和/或非额外凸的，众所周知，这将是尖锐的，但会增加变量的数量到$ o（n^{2}）$。相反，没有排名过度参数化，我们证明只有$ \ phi $几乎完美地条件，并且条件数量为$ l/\ mu <3 $，我们才能证明这种全局保证是可能的。因此，我们得出的结论是，少量的过度参数化可能会导致非凸室的理论保证得到很大的改善 - 蒙蒂罗分解。

translated by 谷歌翻译

First-Order Algorithms for Min-Max Optimization in Geodesic Metric Spaces

Michael I. Jordan , Tianyi Lin , Emmanouil-Vasileios Vlatakis-Gkaragkounis

分类：机器学习

2022-06-04

从最佳运输到稳健的维度降低，可以将大量的机器学习应用程序放入Riemannian歧管上的Min-Max优化问题中。尽管在欧几里得的环境中已经分析了许多最小的最大算法，但事实证明，将这些结果转化为Riemannian案例已被证明是难以捉摸的。张等。 [2022]最近表明，测量凸凹入的凹入问题总是容纳鞍点解决方案。受此结果的启发，我们研究了Riemannian和最佳欧几里得空间凸入concove算法之间的性能差距。我们在负面的情况下回答了这个问题，证明Riemannian校正的外部（RCEG）方法在地球上强烈convex-concove案例中以线性速率实现了最后近期收敛，与欧几里得结果匹配。我们的结果还扩展到随机或非平滑案例，在这种情况下，RCEG和Riemanian梯度上升下降（RGDA）达到了近乎最佳的收敛速率，直到因歧管的曲率而定为因素。

translated by 谷歌翻译

Riemannian accelerated gradient methods via extrapolation

Andi Han , Bamdev Mishra , Pratik Jawanpuria , Junbin Gao

分类：机器学习 | (统计)机器学习

2022-08-13

在本文中，我们通过推断在歧管上的迭代来提出一种简单的加速度方案，用于利曼梯度方法。我们显示何时从Riemannian梯度下降法生成迭代元素，加速方案是渐近地达到最佳收敛速率，并且比最近提出的Riemannian Nesterov加速梯度方法在计算上更有利。我们的实验验证了新型加速策略的实际好处。

translated by 谷歌翻译

Nonlinear matrix recovery using optimization on the Grassmann manifold

Florentin Goyens , Coralia Cartis , Armin Eftekhari

分类： (统计)机器学习 | 机器学习

2021-09-13

We investigate the problem of recovering a partially observed high-rank matrix whose columns obey a nonlinear structure such as a union of subspaces, an algebraic variety or grouped in clusters. The recovery problem is formulated as the rank minimization of a nonlinear feature map applied to the original matrix, which is then further approximated by a constrained non-convex optimization problem involving the Grassmann manifold. We propose two sets of algorithms, one arising from Riemannian optimization and the other as an alternating minimization scheme, both of which include first- and second-order variants. Both sets of algorithms have theoretical guarantees. In particular, for the alternating minimization, we establish global convergence and worst-case complexity bounds. Additionally, using the Kurdyka-Lojasiewicz property, we show that the alternating minimization converges to a unique limit point. We provide extensive numerical results for the recovery of union of subspaces and clustering under entry sampling and dense Gaussian sampling. Our methods are competitive with existing approaches and, in particular, high accuracy is achieved in the recovery using Riemannian second-order methods.

translated by 谷歌翻译

Riemannian Natural Gradient Methods

Jiang Hu , Ruicheng Ao , Anthony Man-Cho So , Minghan Yang , Zaiwen Wen

分类：机器学习

2022-07-15

本文研究了关于Riemannian流形的大规模优化问题，其目标函数是负面概要损失的有限总和。这些问题在各种机器学习和信号处理应用中出现。通过在歧管环境中引入Fisher信息矩阵的概念，我们提出了一种新型的Riemannian自然梯度方法，可以将其视为自然梯度方法的自然扩展，从欧几里得环境到歧管设置。我们在标准假设下建立了我们提出的方法的几乎纯净的全球融合。此外，我们表明，如果损失函数满足某些凸度和平稳性条件，并且输入输出图满足了雅各布稳定条件，那么我们提出的方法享有局部线性 - 或在Riemannian jacobian的Lipschitz连续性下，输入输出图，甚至二次 - 收敛速率。然后，我们证明，如果网络的宽度足够大，则可以通过具有批归归量的两层完全连接的神经网络来满足Riemannian Jacobian稳定性条件。这证明了我们的收敛率结果的实际相关性。对机器学习产生的应用的数值实验证明了该方法比最先进的方法的优势。

translated by 谷歌翻译

Riemannian Langevin Algorithm for Solving Semidefinite Programs

Mufan Bill Li , Murat A. Erdogdu

分类： (统计)机器学习 | 机器学习

2020-10-21

我们提出了一种基于langevin扩散的算法，以在球体的产物歧管上进行非凸优化和采样。在对数Sobolev不平等的情况下，我们根据Kullback-Leibler Divergence建立了有限的迭代迭代收敛到Gibbs分布的保证。我们表明，有了适当的温度选择，可以保证，次级最小值的次数差距很小，概率很高。作为一种应用，我们考虑了使用对角线约束解决半决赛程序（SDP）的burer- monteiro方法，并分析提出的langevin算法以优化非凸目标。特别是，我们为Burer建立了对数Sobolev的不平等现象 - 当没有虚假的局部最小值时，但在鞍点下，蒙蒂罗问题。结合结果，我们为SDP和最大切割问题提供了全局最佳保证。更确切地说，我们证明了Langevin算法在$ \ widetilde {\ omega}（\ epsilon^{ - 5}）$ tererations $ tererations $ \ widetilde {\ omega}（\ omega}中，具有很高的概率。

translated by 谷歌翻译