在本文中,我们提出{\ it \下划线{r} ecursive} {\ it \ usef \ undesline {i} mortance} {\ it \ it \ usew supsline {s} ketching} algorithM squares {\ it \下划线{o} ptimization}(risro)。 Risro的关键步骤是递归重要性草图,这是一个基于确定性设计的递归投影的新素描框架,它与文献中的随机素描\ Citep {Mahoney2011 randomized,Woodruff2014sketching}有很大不同。在这个新的素描框架下,可以重新解释文献中的几种现有算法,而Risro比它们具有明显的优势。 Risro易于实现,并在计算上有效,其中每次迭代中的核心过程是解决降低尺寸最小二乘问题的问题。我们在某些轻度条件下建立了Risro的局部二次线性和二次收敛速率。我们还发现了Risro与Riemannian Gauss-Newton算法在固定等级矩阵上的联系。在机器学习和统计数据中的两种应用中,RISRO的有效性得到了证明:低级别矩阵痕量回归和相位检索。仿真研究证明了Risro的出色数值性能。
translated by 谷歌翻译
In this paper, we consider the estimation of a low Tucker rank tensor from a number of noisy linear measurements. The general problem covers many specific examples arising from applications, including tensor regression, tensor completion, and tensor PCA/SVD. We consider an efficient Riemannian Gauss-Newton (RGN) method for low Tucker rank tensor estimation. Different from the generic (super)linear convergence guarantee of RGN in the literature, we prove the first local quadratic convergence guarantee of RGN for low-rank tensor estimation in the noisy setting under some regularity conditions and provide the corresponding estimation error upper bounds. A deterministic estimation error lower bound, which matches the upper bound, is provided that demonstrates the statistical optimality of RGN. The merit of RGN is illustrated through two machine learning applications: tensor regression and tensor SVD. Finally, we provide the simulation results to corroborate our theoretical findings.
translated by 谷歌翻译
我们研究了张量张量的回归,其中的目标是将张量的响应与张量协变量与塔克等级参数张量/矩阵连接起来,而没有其内在等级的先验知识。我们提出了Riemannian梯度下降(RGD)和Riemannian Gauss-Newton(RGN)方法,并通过研究等级过度参数化的影响来应对未知等级的挑战。我们通过表明RGD和RGN分别线性地和四边形地收敛到两个等级的统计最佳估计值,从而为一般的张量调节回归提供了第一个收敛保证。我们的理论揭示了一种有趣的现象:Riemannian优化方法自然地适应了过度参数化,而无需修改其实施。我们还为低度多项式框架下的标量调整回归中的统计计算差距提供了第一个严格的证据。我们的理论证明了``统计计算差距的祝福''现象:在张张量的张量回归中,对于三个或更高的张紧器,在张张量的张量回归中,计算所需的样本量与中等级别相匹配的计算量相匹配。在考虑计算可行的估计器时,虽然矩阵设置没有此类好处。这表明中等等级的过度参数化本质上是``在张量调整的样本量三分或更高的样本大小上,三分或更高的样本量。最后,我们进行仿真研究以显示我们提出的方法的优势并证实我们的理论发现。
translated by 谷歌翻译
In this paper, we consider the geometric landscape connection of the widely studied manifold and factorization formulations in low-rank positive semidefinite (PSD) and general matrix optimization. We establish a sandwich relation on the spectrum of Riemannian and Euclidean Hessians at first-order stationary points (FOSPs). As a result of that, we obtain an equivalence on the set of FOSPs, second-order stationary points (SOSPs) and strict saddles between the manifold and the factorization formulations. In addition, we show the sandwich relation can be used to transfer more quantitative geometric properties from one formulation to another. Similarities and differences in the landscape connection under the PSD case and the general case are discussed. To the best of our knowledge, this is the first geometric landscape connection between the manifold and the factorization formulations for handling rank constraints, and it provides a geometric explanation for the similar empirical performance of factorization and manifold approaches in low-rank matrix optimization observed in the literature. In the general low-rank matrix optimization, the landscape connection of two factorization formulations (unregularized and regularized ones) is also provided. By applying these geometric landscape connections, in particular, the sandwich relation, we are able to solve unanswered questions in literature and establish stronger results in the applications on geometric analysis of phase retrieval, well-conditioned low-rank matrix optimization, and the role of regularization in factorization arising from machine learning and signal processing.
translated by 谷歌翻译
We study a general matrix optimization problem with a fixed-rank positive semidefinite (PSD) constraint. We perform the Burer-Monteiro factorization and consider a particular Riemannian quotient geometry in a search space that has a total space equipped with the Euclidean metric. When the original objective f satisfies standard restricted strong convexity and smoothness properties, we characterize the global landscape of the factorized objective under the Riemannian quotient geometry. We show the entire search space can be divided into three regions: (R1) the region near the target parameter of interest, where the factorized objective is geodesically strongly convex and smooth; (R2) the region containing neighborhoods of all strict saddle points; (R3) the remaining regions, where the factorized objective has a large gradient. To our best knowledge, this is the first global landscape analysis of the Burer-Monteiro factorized objective under the Riemannian quotient geometry. Our results provide a fully geometric explanation for the superior performance of vanilla gradient descent under the Burer-Monteiro factorization. When f satisfies a weaker restricted strict convexity property, we show there exists a neighborhood near local minimizers such that the factorized objective is geodesically convex. To prove our results we provide a comprehensive landscape analysis of a matrix factorization problem with a least squares objective, which serves as a critical bridge. Our conclusions are also based on a result of independent interest stating that the geodesic ball centered at Y with a radius 1/3 of the least singular value of Y is a geodesically convex set under the Riemannian quotient geometry, which as a corollary, also implies a quantitative bound of the convexity radius in the Bures-Wasserstein space. The convexity radius obtained is sharp up to constants.
translated by 谷歌翻译
高维非正交掺入张量的CP分解是许多学科的广泛应用的重要问题。然而,以前的理论保证的工作通常在CP组分的基础载体上承担限制性的不连贯条件。在本文中,我们提出了新的计算高效的复合PCA和并发正交化算法,以便在轻度不连结条件下的理论保证。复合PCA将主成分或奇异值分解应用于张量数据的矩阵,以获得奇异矢量,然后在第一步骤中获得的奇异载体的基质折叠。它可以用作Tensor CP分解的任何迭代优化方案的初始化。并发正交化算法通过将突起同时施加到其他模式中的其他模式所产生的空格的正交补充,迭代地估计张量的每个模式的基础向量。旨在改善具有低或中等高CP等级的张量的交替的最小二乘估计器和其他形式的高阶正交迭代,并且当任何给定的初始估计器的错误被小常数界定时,它保证快速收敛。我们的理论调查为两种提出的算法提供了估算准确性和收敛速率。我们对合成数据的实施表明了我们对现有方法的方法的显着实际优势。
translated by 谷歌翻译
素描和项目是一个框架,它统一了许多已知的迭代方法来求解线性系统及其变体,并进一步扩展了非线性优化问题。它包括流行的方法,例如随机kaczmarz,坐标下降,凸优化的牛顿方法的变体等。在本文中,我们通过新的紧密频谱边界为预期的草图投影矩阵获得了素描和项目的收敛速率的敏锐保证。我们的估计值揭示了素描和项目的收敛率与另一个众所周知但看似无关的算法家族的近似误差之间的联系,这些算法使用草图加速了流行的矩阵因子化,例如QR和SVD。这种连接使我们更接近准确量化草图和项目求解器的性能如何取决于其草图大小。我们的分析不仅涵盖了高斯和次高斯的素描矩阵,还涵盖了一个有效的稀疏素描方法,称为较少的嵌入方法。我们的实验备份了理论,并证明即使极稀疏的草图在实践中也显示出相同的收敛属性。
translated by 谷歌翻译
找到给定矩阵的独特低维分解的问题是许多领域的基本和经常发生的问题。在本文中,我们研究了寻求一个唯一分解的问题,以\ mathbb {r} ^ {p \ times n} $ in \ mathbb {p \ time n} $。具体来说,我们考虑$ y = ax \ in \ mathbb {r} ^ {p \ time n} $,其中矩阵$ a \ in \ mathbb {r} ^ {p \ times r} $具有全列等级,带有$ r <\ min \ {n,p \} $,矩阵$ x \ in \ mathbb {r} ^ {r \ times n} $是元素 - 方向稀疏。我们证明,可以唯一确定$ y $的稀疏分解,直至某些内在签名排列。我们的方法依赖于解决在单位球体上限制的非凸优化问题。我们对非透露优化景观的几何分析表明,任何{\ em strict}本地解决方案靠近地面真相解决方案,可以通过任何二阶序列算法遵循的简单数据驱动初始化恢复。最后,我们用数值实验证实了这些理论结果。
translated by 谷歌翻译
提供了一种强大而灵活的模型,可用于代表多属数据和多种方式相互作用,在科学和工程中的各个领域中发挥着现代数据科学中的不可或缺的作用。基本任务是忠实地以统计和计算的有效方式从高度不完整的测量中恢复张量。利用Tucker分解中的张量的低级别结构,本文开发了一个缩放的梯度下降(Scaledgd)算法,可以直接恢复具有定制频谱初始化的张量因子,并表明它以与条件号无关的线性速率收敛对于两个规范问题的地面真理张量 - 张量完成和张量回归 - 一旦样本大小高于$ n ^ {3/2} $忽略其他参数依赖项,$ n $是维度张量。这导致与现有技术相比的低秩张力估计的极其可扩展的方法,这些方法具有以下至少一个缺点:对记忆和计算方面的对不良,偏移成本高的极度敏感性,或差样本复杂性保证。据我们所知,Scaledgd是第一算法,它可以同时实现近最佳统计和计算复杂性,以便与Tucker分解进行低级张力完成。我们的算法突出了加速非耦合统计估计在加速非耦合统计估计中的适当预处理的功率,其中迭代改复的预处理器促进轨迹的所需的不变性属性相对于低级张量分解中的底层对称性。
translated by 谷歌翻译
在本文中,我们利用过度参数化来设计高维单索索引模型的无规矩算法,并为诱导的隐式正则化现象提供理论保证。具体而言,我们研究了链路功能是非线性且未知的矢量和矩阵单索引模型,信号参数是稀疏向量或低秩对称矩阵,并且响应变量可以是重尾的。为了更好地理解隐含正规化的角色而没有过度的技术性,我们假设协变量的分布是先验的。对于载体和矩阵设置,我们通过采用分数函数变换和专为重尾数据的强大截断步骤来构造过度参数化最小二乘损耗功能。我们建议通过将无规则化的梯度下降应用于损耗函数来估计真实参数。当初始化接近原点并且步骤中足够小时,我们证明了所获得的解决方案在载体和矩阵案件中实现了最小的收敛统计速率。此外,我们的实验结果支持我们的理论调查结果,并表明我们的方法在$ \ ell_2 $ -staticatisticated率和变量选择一致性方面具有明确的正则化的经验卓越。
translated by 谷歌翻译
We investigate the problem of recovering a partially observed high-rank matrix whose columns obey a nonlinear structure such as a union of subspaces, an algebraic variety or grouped in clusters. The recovery problem is formulated as the rank minimization of a nonlinear feature map applied to the original matrix, which is then further approximated by a constrained non-convex optimization problem involving the Grassmann manifold. We propose two sets of algorithms, one arising from Riemannian optimization and the other as an alternating minimization scheme, both of which include first- and second-order variants. Both sets of algorithms have theoretical guarantees. In particular, for the alternating minimization, we establish global convergence and worst-case complexity bounds. Additionally, using the Kurdyka-Lojasiewicz property, we show that the alternating minimization converges to a unique limit point. We provide extensive numerical results for the recovery of union of subspaces and clustering under entry sampling and dense Gaussian sampling. Our methods are competitive with existing approaches and, in particular, high accuracy is achieved in the recovery using Riemannian second-order methods.
translated by 谷歌翻译
In a mixed generalized linear model, the objective is to learn multiple signals from unlabeled observations: each sample comes from exactly one signal, but it is not known which one. We consider the prototypical problem of estimating two statistically independent signals in a mixed generalized linear model with Gaussian covariates. Spectral methods are a popular class of estimators which output the top two eigenvectors of a suitable data-dependent matrix. However, despite the wide applicability, their design is still obtained via heuristic considerations, and the number of samples $n$ needed to guarantee recovery is super-linear in the signal dimension $d$. In this paper, we develop exact asymptotics on spectral methods in the challenging proportional regime in which $n, d$ grow large and their ratio converges to a finite constant. By doing so, we are able to optimize the design of the spectral method, and combine it with a simple linear estimator, in order to minimize the estimation error. Our characterization exploits a mix of tools from random matrices, free probability and the theory of approximate message passing algorithms. Numerical simulations for mixed linear regression and phase retrieval display the advantage enabled by our analysis over existing designs of spectral methods.
translated by 谷歌翻译
本文研究了在存在重尾且可能是不对称噪声的情况下,低级矩阵的完成,我们旨在估计一组高度不完整的噪声条目,以估算一个基础的低级矩阵。尽管在过去的十年中,矩阵的完成问题吸引了很多关注,但是当观察结果被重尾噪音污染时,仍然缺乏理论上的理解。先前的理论缺乏解释经验结果,无法捕获估计误差对噪声水平的最佳依赖性。在本文中,我们采用自适应的Huber损失来容纳重尾噪声,当损失函数中的参数经过精心设计以平衡异常值的大偏差和稳健性时,这是对大型且可能不对称的误差的鲁棒性。然后,我们通过平衡的低级数burer-monteiro矩阵分解和梯度不错,并具有稳健的光谱初始化,提出了有效的非凸算法。我们证明,在仅在误差分布上的第二刻条件下,而不是次高斯的假设下,由提议的算法生成的迭代元素的欧几里得误差会快速减少几何,直到达到最小值 - 最佳统计估计误差,这具有相同的相同在次级案件中订购。这一重大进步背后的关键技术是一个强大的一对一分析框架。我们的模拟研究证实了理论结果。
translated by 谷歌翻译
我们考虑使用梯度下降来最大程度地减少$ f(x)= \ phi(xx^{t})$在$ n \ times r $因件矩阵$ x $上,其中$ \ phi是一种基础平稳凸成本函数定义了$ n \ times n $矩阵。虽然只能在合理的时间内发现只有二阶固定点$ x $,但如果$ x $的排名不足,则其排名不足证明其是全球最佳的。这种认证全球最优性的方式必然需要当前迭代$ x $的搜索等级$ r $,以相对于级别$ r^{\ star} $过度参数化。不幸的是,过度参数显着减慢了梯度下降的收敛性,从$ r = r = r = r^{\ star} $的线性速率到$ r> r> r> r> r^{\ star} $,即使$ \ phi $是$ \ phi $强烈凸。在本文中,我们提出了一项廉价的预处理,该预处理恢复了过度参数化的情况下梯度下降回到线性的收敛速率,同时也使在全局最小化器$ x^{\ star} $中可能不良条件变得不可知。
translated by 谷歌翻译
诸如压缩感测,图像恢复,矩阵/张恢复和非负矩阵分子等信号处理和机器学习中的许多近期问题可以作为约束优化。预计的梯度下降是一种解决如此约束优化问题的简单且有效的方法。本地收敛分析将我们对解决方案附近的渐近行为的理解,与全球收敛分析相比,收敛率的较小界限提供了较小的界限。然而,本地保证通常出现在机器学习和信号处理的特定问题领域。此稿件在约束最小二乘范围内,对投影梯度下降的局部收敛性分析提供了统一的框架。该建议的分析提供了枢转局部收敛性的见解,例如线性收敛的条件,收敛区域,精确的渐近收敛速率,以及达到一定程度的准确度所需的迭代次数的界限。为了证明所提出的方法的适用性,我们介绍了PGD的收敛分析的配方,并通过在四个基本问题上的配方的开始延迟应用来证明它,即线性约束最小二乘,稀疏恢复,最小二乘法使用单位规范约束和矩阵完成。
translated by 谷歌翻译
最近以来,在理解与overparameterized模型非凸损失基于梯度的方法收敛性和泛化显著的理论进展。尽管如此,优化和推广,尤其是小的随机初始化的关键作用的许多方面都没有完全理解。在本文中,我们迈出玄机通过证明小的随机初始化这个角色的步骤,然后通过梯度下降的行为类似于流行谱方法的几个迭代。我们还表明,从小型随机初始化,这可证明是用于overparameterized车型更加突出这种隐含的光谱偏差,也使梯度下降迭代在一个特定的轨迹走向,不仅是全局最优的,但也很好期广义的解决方案。具体而言,我们专注于通过天然非凸制剂重构从几个测量值的低秩矩阵的问题。在该设置中,我们表明,从小的随机初始化的梯度下降迭代的轨迹可以近似分解为三个阶段:(Ⅰ)的光谱或对准阶段,其中,我们表明,该迭代具有一个隐含的光谱偏置类似于频谱初始化允许我们表明,在该阶段中进行迭代,并且下面的低秩矩阵的列空间被充分对准的端部,(II)一鞍回避/细化阶段,我们表明,该梯度的轨迹从迭代移动离开某些简并鞍点,和(III)的本地细化阶段,其中,我们表明,避免了鞍座后的迭代快速收敛到底层低秩矩阵。底层我们的分析是,可能有超出低等级的重建计算问题影响overparameterized非凸优化方案的分析见解。
translated by 谷歌翻译
The estimation of cumulative distribution functions (CDFs) is an important learning task with a great variety of downstream applications, such as risk assessments in predictions and decision making. In this paper, we study functional regression of contextual CDFs where each data point is sampled from a linear combination of context dependent CDF basis functions. We propose functional ridge-regression-based estimation methods that estimate CDFs accurately everywhere. In particular, given $n$ samples with $d$ basis functions, we show estimation error upper bounds of $\widetilde{O}(\sqrt{d/n})$ for fixed design, random design, and adversarial context cases. We also derive matching information theoretic lower bounds, establishing minimax optimality for CDF functional regression. Furthermore, we remove the burn-in time in the random design setting using an alternative penalized estimator. Then, we consider agnostic settings where there is a mismatch in the data generation process. We characterize the error of the proposed estimators in terms of the mismatched error, and show that the estimators are well-behaved under model mismatch. Finally, to complete our study, we formalize infinite dimensional models where the parameter space is an infinite dimensional Hilbert space, and establish self-normalized estimation error upper bounds for this setting.
translated by 谷歌翻译
We consider the nonlinear inverse problem of learning a transition operator $\mathbf{A}$ from partial observations at different times, in particular from sparse observations of entries of its powers $\mathbf{A},\mathbf{A}^2,\cdots,\mathbf{A}^{T}$. This Spatio-Temporal Transition Operator Recovery problem is motivated by the recent interest in learning time-varying graph signals that are driven by graph operators depending on the underlying graph topology. We address the nonlinearity of the problem by embedding it into a higher-dimensional space of suitable block-Hankel matrices, where it becomes a low-rank matrix completion problem, even if $\mathbf{A}$ is of full rank. For both a uniform and an adaptive random space-time sampling model, we quantify the recoverability of the transition operator via suitable measures of incoherence of these block-Hankel embedding matrices. For graph transition operators these measures of incoherence depend on the interplay between the dynamics and the graph topology. We develop a suitable non-convex iterative reweighted least squares (IRLS) algorithm, establish its quadratic local convergence, and show that, in optimal scenarios, no more than $\mathcal{O}(rn \log(nT))$ space-time samples are sufficient to ensure accurate recovery of a rank-$r$ operator $\mathbf{A}$ of size $n \times n$. This establishes that spatial samples can be substituted by a comparable number of space-time samples. We provide an efficient implementation of the proposed IRLS algorithm with space complexity of order $O(r n T)$ and per-iteration time complexity linear in $n$. Numerical experiments for transition operators based on several graph models confirm that the theoretical findings accurately track empirical phase transitions, and illustrate the applicability and scalability of the proposed algorithm.
translated by 谷歌翻译
作为一种特殊的无限级矢量自回旋(VAR)模型,矢量自回归移动平均值(VARMA)模型比广泛使用的有限级var模型可以捕获更丰富的时间模式。然而,长期以来,其实用性一直受到其不可识别性,计算疾病性和解释相对难度的阻碍。本文介绍了一种新颖的无限级VAR模型,该模型不仅避免了VARMA模型的缺点,而且继承了其有利的时间模式。作为另一个有吸引力的特征,可以单独解释该模型的时间和横截面依赖性结构,因为它们的特征是不同的参数集。对于高维时间序列,这种分离激发了我们对确定横截面依赖性的参数施加稀疏性。结果,可以在不牺牲任何时间信息的情况下实现更高的统计效率和可解释性。我们为提出的模型引入了一个$ \ ell_1 $调查估计量,并得出相应的非反应误差边界。开发了有效的块坐标下降算法和一致的模型顺序选择方法。拟议方法的优点得到了模拟研究和现实世界的宏观经济数据分析的支持。
translated by 谷歌翻译
Sparse principal component analysis (SPCA) has been widely used for dimensionality reduction and feature extraction in high-dimensional data analysis. Despite there are many methodological and theoretical developments in the past two decades, the theoretical guarantees of the popular SPCA algorithm proposed by Zou, Hastie & Tibshirani (2006) based on the elastic net are still unknown. We aim to close this important theoretical gap in this paper. We first revisit the SPCA algorithm of Zou et al. (2006) and present our implementation. Also, we study a computationally more efficient variant of the SPCA algorithm in Zou et al. (2006) that can be considered as the limiting case of SPCA. We provide the guarantees of convergence to a stationary point for both algorithms. We prove that, under a sparse spiked covariance model, both algorithms can recover the principal subspace consistently under mild regularity conditions. We show that their estimation error bounds match the best available bounds of existing works or the minimax rates up to some logarithmic factors. Moreover, we demonstrate the numerical performance of both algorithms in simulation studies.
translated by 谷歌翻译