The affine rank minimization problem consists of finding a matrix of minimum rank that satisfies a given system of linear equality constraints. Such problems have appeared in the literature of a diverse set of fields including system identification and control, Euclidean embedding, and collaborative filtering. Although specific instances can often be solved with specialized algorithms, the general affine rank minimization problem is NP-hard, because it contains vector cardinality minimization as a special case.In this paper, we show that if a certain restricted isometry property holds for the linear transformation defining the constraints, the minimum rank solution can be recovered by solving a convex optimization problem, namely the minimization of the nuclear norm over the given affine space. We present several random ensembles of equations where the restricted isometry property holds with overwhelming probability, provided the codimension of the subspace is Ω(r(m + n) log mn), where m, n are the dimensions of the matrix, and r is its rank.The techniques used in our analysis have strong parallels in the compressed sensing framework. We discuss how affine rank minimization generalizes this pre-existing concept and outline a dictionary relating concepts from cardinality minimization to those of rank minimization. We also discuss several algorithmic approaches to solving the norm minimization relaxations, and illustrate our results with numerical examples.
translated by 谷歌翻译
We consider a problem of considerable practical interest: the recovery of a data matrix from a sampling of its entries. Suppose that we observe m entries selected uniformly at random from a matrix M . Can we complete the matrix and recover the entries that we have not seen?We show that one can perfectly recover most low-rank matrices from what appears to be an incomplete set of entries. We prove that if the number m of sampled entries obeys m ≥ C n 1.2 r log n for some positive numerical constant C, then with very high probability, most n × n matrices of rank r can be perfectly recovered by solving a simple convex optimization program. This program finds the matrix with minimum nuclear norm that fits the data. The condition above assumes that the rank is not too large. However, if one replaces the 1.2 exponent with 1.25, then the result holds for all values of the rank. Similar results hold for arbitrary rectangular matrices as well. Our results are connected with the recent literature on compressed sensing, and show that objects other than signals and images can be perfectly reconstructed from very limited information.
translated by 谷歌翻译
This paper is about a curious phenomenon. Suppose we have a data matrix, which is the superposition of a low-rank component and a sparse component. Can we recover each component individually? We prove that under some suitable assumptions, it is possible to recover both the low-rank and the sparse components exactly by solving a very convenient convex program called Principal Component Pursuit; among all feasible decompositions, simply minimize a weighted combination of the nuclear norm and of the 1 norm. This suggests the possibility of a principled approach to robust principal component analysis since our methodology and results assert that one can recover the principal components of a data matrix even though a positive fraction of its entries are arbitrarily corrupted. This extends to the situation where a fraction of the entries are missing as well. We discuss an algorithm for solving this optimization problem, and present applications in the area of video surveillance, where our methodology allows for the detection of objects in a cluttered background, and in the area of face recognition, where it offers a principled way of removing shadows and specularities in images of faces.
translated by 谷歌翻译
We investigate the problem of recovering a partially observed high-rank matrix whose columns obey a nonlinear structure such as a union of subspaces, an algebraic variety or grouped in clusters. The recovery problem is formulated as the rank minimization of a nonlinear feature map applied to the original matrix, which is then further approximated by a constrained non-convex optimization problem involving the Grassmann manifold. We propose two sets of algorithms, one arising from Riemannian optimization and the other as an alternating minimization scheme, both of which include first- and second-order variants. Both sets of algorithms have theoretical guarantees. In particular, for the alternating minimization, we establish global convergence and worst-case complexity bounds. Additionally, using the Kurdyka-Lojasiewicz property, we show that the alternating minimization converges to a unique limit point. We provide extensive numerical results for the recovery of union of subspaces and clustering under entry sampling and dense Gaussian sampling. Our methods are competitive with existing approaches and, in particular, high accuracy is achieved in the recovery using Riemannian second-order methods.
translated by 谷歌翻译
Low-rank matrix approximations, such as the truncated singular value decomposition and the rank-revealing QR decomposition, play a central role in data analysis and scientific computing. This work surveys and extends recent research which demonstrates that randomization offers a powerful tool for performing low-rank matrix approximation. These techniques exploit modern computational architectures more fully than classical methods and open the possibility of dealing with truly massive data sets.This paper presents a modular framework for constructing randomized algorithms that compute partial matrix decompositions. These methods use random sampling to identify a subspace that captures most of the action of a matrix. The input matrix is then compressed-either explicitly or implicitly-to this subspace, and the reduced matrix is manipulated deterministically to obtain the desired low-rank factorization. In many cases, this approach beats its classical competitors in terms of accuracy, speed, and robustness. These claims are supported by extensive numerical experiments and a detailed error analysis.The specific benefits of randomized techniques depend on the computational environment. Consider the model problem of finding the k dominant components of the singular value decomposition of an m × n matrix. (i) For a dense input matrix, randomized algorithms require O(mn log(k)) floating-point operations (flops) in contrast with O(mnk) for classical algorithms. (ii) For a sparse input matrix, the flop count matches classical Krylov subspace methods, but the randomized approach is more robust and can easily be reorganized to exploit multi-processor architectures. (iii) For a matrix that is too large to fit in fast memory, the randomized techniques require only a constant number of passes over the data, as opposed to O(k) passes for classical algorithms. In fact, it is sometimes possible to perform matrix approximation with a single pass over the data.
translated by 谷歌翻译
许多基本的低级优化问题,例如矩阵完成,相位同步/检索,功率系统状态估计和鲁棒PCA,可以作为矩阵传感问题提出。求解基质传感的两种主要方法是基于半决赛编程(SDP)和Burer-Monteiro(B-M)分解的。 SDP方法患有高计算和空间复杂性,而B-M方法可能由于问题的非跨性别而返回伪造解决方案。这些方法成功的现有理论保证导致了类似的保守条件,这可能错误地表明这些方法具有可比性的性能。在本文中,我们阐明了这两种方法之间的一些主要差异。首先,我们提出一类结构化矩阵完成问题,而B-M方法则以压倒性的概率失败,而SDP方法正常工作。其次,我们确定了B-M方法工作和SDP方法失败的一类高度稀疏矩阵完成问题。第三,我们证明,尽管B-M方法与未知解决方案的等级无关,但SDP方法的成功与解决方案的等级相关,并随着等级的增加而提高。与现有的文献主要集中在SDP和B-M工作的矩阵传感实例上,本文为每种方法的独特优点提供了与替代方法的唯一优点。
translated by 谷歌翻译
我们提出了一种凸锥程序,可推断随机点产品图(RDPG)的潜在概率矩阵。优化问题最大化Bernoulli最大似然函数,增加核规范正则化术语。双重问题具有特别良好的形式,与众所周知的SemideFinite程序放松MaxCut问题有关。使用原始双功率条件,我们绑定了原始和双解决方案的条目和等级。此外,我们在轻微的技术假设下绑定了最佳目标值并证明了略微修改模型的概率估计的渐近一致性。我们对合成RDPG的实验不仅恢复了自然集群,而且还揭示了原始数据的下面的低维几何形状。我们还证明该方法在空手道俱乐部图表和合成美国参议图中恢复潜在结构,并且可以扩展到最多几百个节点的图表。
translated by 谷歌翻译
We consider the nonlinear inverse problem of learning a transition operator $\mathbf{A}$ from partial observations at different times, in particular from sparse observations of entries of its powers $\mathbf{A},\mathbf{A}^2,\cdots,\mathbf{A}^{T}$. This Spatio-Temporal Transition Operator Recovery problem is motivated by the recent interest in learning time-varying graph signals that are driven by graph operators depending on the underlying graph topology. We address the nonlinearity of the problem by embedding it into a higher-dimensional space of suitable block-Hankel matrices, where it becomes a low-rank matrix completion problem, even if $\mathbf{A}$ is of full rank. For both a uniform and an adaptive random space-time sampling model, we quantify the recoverability of the transition operator via suitable measures of incoherence of these block-Hankel embedding matrices. For graph transition operators these measures of incoherence depend on the interplay between the dynamics and the graph topology. We develop a suitable non-convex iterative reweighted least squares (IRLS) algorithm, establish its quadratic local convergence, and show that, in optimal scenarios, no more than $\mathcal{O}(rn \log(nT))$ space-time samples are sufficient to ensure accurate recovery of a rank-$r$ operator $\mathbf{A}$ of size $n \times n$. This establishes that spatial samples can be substituted by a comparable number of space-time samples. We provide an efficient implementation of the proposed IRLS algorithm with space complexity of order $O(r n T)$ and per-iteration time complexity linear in $n$. Numerical experiments for transition operators based on several graph models confirm that the theoretical findings accurately track empirical phase transitions, and illustrate the applicability and scalability of the proposed algorithm.
translated by 谷歌翻译
本文提出了弗兰克 - 沃尔夫(FW)的新变种​​,称为$ k $ fw。标准FW遭受缓慢的收敛性:迭代通常是Zig-zag作为更新方向振荡约束集的极端点。新变种,$ k $ fw,通过在每次迭代中使用两个更强的子问题oracelles克服了这个问题。第一个是$ k $线性优化Oracle($ k $ loo),计算$ k $最新的更新方向(而不是一个)。第二个是$ k $方向搜索($ k $ ds),最大限度地减少由$ k $最新更新方向和之前迭代表示的约束组的目标。当问题解决方案承认稀疏表示时,奥克斯都易于计算,而且$ k $ FW会迅速收敛,以便平滑凸起目标和几个有趣的约束集:$ k $ fw实现有限$ \ frac {4l_f ^ 3d ^} { \ Gamma \ Delta ^ 2} $融合在多台和集团规范球上,以及光谱和核规范球上的线性收敛。数值实验验证了$ k $ fw的有效性,并展示了现有方法的数量级加速。
translated by 谷歌翻译
我们考虑估计与I.I.D的排名$ 1 $矩阵因素的问题。高斯,排名$ 1 $的测量值,这些测量值非线性转化和损坏。考虑到非线性的两种典型选择,我们研究了从随机初始化开始的此非convex优化问题的天然交流更新规则的收敛性能。我们通过得出确定性递归,即使在高维问题中也是准确的,我们显示出算法的样本分割版本的敏锐收敛保证。值得注意的是,虽然无限样本的种群更新是非信息性的,并提示单个步骤中的精确恢复,但算法 - 我们的确定性预测 - 从随机初始化中迅速地收敛。我们尖锐的非反应分析也暴露了此问题的其他几种细粒度,包括非线性和噪声水平如何影响收敛行为。从技术层面上讲,我们的结果可以通过证明我们的确定性递归可以通过我们的确定性顺序来预测我们的确定性序列,而当每次迭代都以$ n $观测来运行时,我们的确定性顺序可以通过$ n^{ - 1/2} $的波动。我们的技术利用了源自有关高维$ m $估计文献的遗留工具,并为通过随机数据的其他高维优化问题的随机初始化而彻底地分析了高阶迭代算法的途径。
translated by 谷歌翻译
我们考虑使用梯度下降来最大程度地减少$ f(x)= \ phi(xx^{t})$在$ n \ times r $因件矩阵$ x $上,其中$ \ phi是一种基础平稳凸成本函数定义了$ n \ times n $矩阵。虽然只能在合理的时间内发现只有二阶固定点$ x $,但如果$ x $的排名不足,则其排名不足证明其是全球最佳的。这种认证全球最优性的方式必然需要当前迭代$ x $的搜索等级$ r $,以相对于级别$ r^{\ star} $过度参数化。不幸的是,过度参数显着减慢了梯度下降的收敛性,从$ r = r = r = r^{\ star} $的线性速率到$ r> r> r> r> r^{\ star} $,即使$ \ phi $是$ \ phi $强烈凸。在本文中,我们提出了一项廉价的预处理,该预处理恢复了过度参数化的情况下梯度下降回到线性的收敛速率,同时也使在全局最小化器$ x^{\ star} $中可能不良条件变得不可知。
translated by 谷歌翻译
素描和项目是一个框架,它统一了许多已知的迭代方法来求解线性系统及其变体,并进一步扩展了非线性优化问题。它包括流行的方法,例如随机kaczmarz,坐标下降,凸优化的牛顿方法的变体等。在本文中,我们通过新的紧密频谱边界为预期的草图投影矩阵获得了素描和项目的收敛速率的敏锐保证。我们的估计值揭示了素描和项目的收敛率与另一个众所周知但看似无关的算法家族的近似误差之间的联系,这些算法使用草图加速了流行的矩阵因子化,例如QR和SVD。这种连接使我们更接近准确量化草图和项目求解器的性能如何取决于其草图大小。我们的分析不仅涵盖了高斯和次高斯的素描矩阵,还涵盖了一个有效的稀疏素描方法,称为较少的嵌入方法。我们的实验备份了理论,并证明即使极稀疏的草图在实践中也显示出相同的收敛属性。
translated by 谷歌翻译
诸如压缩感测,图像恢复,矩阵/张恢复和非负矩阵分子等信号处理和机器学习中的许多近期问题可以作为约束优化。预计的梯度下降是一种解决如此约束优化问题的简单且有效的方法。本地收敛分析将我们对解决方案附近的渐近行为的理解,与全球收敛分析相比,收敛率的较小界限提供了较小的界限。然而,本地保证通常出现在机器学习和信号处理的特定问题领域。此稿件在约束最小二乘范围内,对投影梯度下降的局部收敛性分析提供了统一的框架。该建议的分析提供了枢转局部收敛性的见解,例如线性收敛的条件,收敛区域,精确的渐近收敛速率,以及达到一定程度的准确度所需的迭代次数的界限。为了证明所提出的方法的适用性,我们介绍了PGD的收敛分析的配方,并通过在四个基本问题上的配方的开始延迟应用来证明它,即线性约束最小二乘,稀疏恢复,最小二乘法使用单位规范约束和矩阵完成。
translated by 谷歌翻译
我们考虑最大程度地减少两次不同的可差异,$ l $ -smooth和$ \ mu $ -stronglongly凸面目标$ \ phi $ phi $ a $ n \ times n $ n $阳性阳性半finite $ m \ succeq0 $,在假设是最小化的假设$ m^{\ star} $具有低等级$ r^{\ star} \ ll n $。遵循burer- monteiro方法,我们相反,在因子矩阵$ x $ size $ n \ times r $的因素矩阵$ x $上最小化nonconvex objection $ f(x)= \ phi(xx^{t})$。这实际上将变量的数量从$ o(n^{2})$减少到$ O(n)$的少量,并且免费实施正面的半弱点,但要付出原始问题的均匀性。在本文中,我们证明,如果搜索等级$ r \ ge r^{\ star} $被相对于真等级$ r^{\ star} $的常数因子过度参数化,则如$ r> \ in frac {1} {4}(l/\ mu-1)^{2} r^{\ star} $,尽管非概念性,但保证本地优化可以从任何初始点转换为全局最佳。这显着改善了先前的$ r \ ge n $的过度参数化阈值,如果允许$ \ phi $是非平滑和/或非额外凸的,众所周知,这将是尖锐的,但会增加变量的数量到$ o(n^{2})$。相反,没有排名过度参数化,我们证明只有$ \ phi $几乎完美地条件,并且条件数量为$ l/\ mu <3 $,我们才能证明这种全局保证是可能的。因此,我们得出的结论是,少量的过度参数化可能会导致非凸室的理论保证得到很大的改善 - 蒙蒂罗分解。
translated by 谷歌翻译
监督字典学习(SDL)是一种经典的机器学习方法,同时寻求特征提取和分类任务,不一定是先验的目标。 SDL的目的是学习类歧视性词典,这是一组潜在特征向量,可以很好地解释特征以及观察到的数据的标签。在本文中,我们提供了SDL的系统研究,包括SDL的理论,算法和应用。首先,我们提供了一个新颖的框架,该框架将“提升” SDL作为组合因子空间中的凸问题,并提出了一种低级别的投影梯度下降算法,该算法将指数成倍收敛于目标的全局最小化器。我们还制定了SDL的生成模型,并根据高参数制度提供真实参数的全局估计保证。其次,我们被视为一个非convex约束优化问题,我们为SDL提供了有效的块坐标下降算法,该算法可以保证在$ O(\ varepsilon^{ - 1}(\ log)中找到$ \ varepsilon $ - 定位点(\ varepsilon \ varepsilon^{ - 1})^{2})$ iterations。对于相应的生成模型,我们为受约束和正则化的最大似然估计问题建立了一种新型的非反应局部一致性结果,这可能是独立的。第三,我们将SDL应用于监督主题建模和胸部X射线图像中的肺炎检测中,以进行不平衡的文档分类。我们还提供了模拟研究,以证明当最佳的重建性和最佳判别词典之间存在差异时,SDL变得更加有效。
translated by 谷歌翻译
随机奇异值分解(RSVD)是用于计算大型数据矩阵截断的SVD的一类计算算法。给定A $ n \ times n $对称矩阵$ \ mathbf {m} $,原型RSVD算法输出通过计算$ \ mathbf {m mathbf {m} $的$ k $引导singular vectors的近似m}^{g} \ mathbf {g} $;这里$ g \ geq 1 $是一个整数,$ \ mathbf {g} \ in \ mathbb {r}^{n \ times k} $是一个随机的高斯素描矩阵。在本文中,我们研究了一般的“信号加上噪声”框架下的RSVD的统计特性,即,观察到的矩阵$ \ hat {\ mathbf {m}} $被认为是某种真实但未知的加法扰动信号矩阵$ \ mathbf {m} $。我们首先得出$ \ ell_2 $(频谱规范)和$ \ ell_ {2 \ to \ infty} $(最大行行列$ \ ell_2 $ norm)$ \ hat {\ hat {\ Mathbf {M}} $和信号矩阵$ \ Mathbf {M} $的真实单数向量。这些上限取决于信噪比(SNR)和功率迭代$ g $的数量。观察到一个相变现象,其中较小的SNR需要较大的$ g $值以保证$ \ ell_2 $和$ \ ell_ {2 \ to \ fo \ infty} $ distances的收敛。我们还表明,每当噪声矩阵满足一定的痕量生长条件时,这些相变发生的$ g $的阈值都会很清晰。最后,我们得出了近似奇异向量的行波和近似矩阵的进入波动的正常近似。我们通过将RSVD的几乎最佳性能保证在应用于三个统计推断问题的情况下,即社区检测,矩阵完成和主要的组件分析,并使用缺失的数据来说明我们的理论结果。
translated by 谷歌翻译
We provide stronger and more general primal-dual convergence results for Frank-Wolfe-type algorithms (a.k.a. conditional gradient) for constrained convex optimization, enabled by a simple framework of duality gap certificates. Our analysis also holds if the linear subproblems are only solved approximately (as well as if the gradients are inexact), and is proven to be worst-case optimal in the sparsity of the obtained solutions.On the application side, this allows us to unify a large variety of existing sparse greedy methods, in particular for optimization over convex hulls of an atomic set, even if those sets can only be approximated, including sparse (or structured sparse) vectors or matrices, low-rank matrices, permutation matrices, or max-norm bounded matrices. We present a new general framework for convex optimization over matrix factorizations, where every Frank-Wolfe iteration will consist of a low-rank update, and discuss the broad application areas of this approach.
translated by 谷歌翻译
低秩矩阵恢复的现有结果在很大程度上专注于二次损失,这享有有利的性质,例如限制强的强凸/平滑度(RSC / RSM)以及在所有低等级矩阵上的良好调节。然而,许多有趣的问题涉及更一般,非二次损失,这不满足这些属性。对于这些问题,标准的非耦合方法,例如秩约为秩约为预定的梯度下降(A.K.A.迭代硬阈值)和毛刺蒙特罗分解可能具有差的经验性能,并且没有令人满意的理论保证了这些算法的全球和快速收敛。在本文中,我们表明,具有非二次损失的可证实低级恢复中的关键组成部分是规律性投影oracle。该Oracle限制在适当的界限集中迭代到低级矩阵,损耗功能在其上表现良好并且满足一组近似RSC / RSM条件。因此,我们分析配备有这样的甲骨文的(平均)投影的梯度方法,并证明它在全球和线性地收敛。我们的结果适用于广泛的非二次低级估计问题,包括一个比特矩阵感测/完成,个性化排名聚集,以及具有等级约束的更广泛的广义线性模型。
translated by 谷歌翻译
我们提出了一个算法框架,用于近距离矩阵上的量子启发的经典算法,概括了Tang的突破性量子启发算法开始的一系列结果,用于推荐系统[STOC'19]。由量子线性代数算法和gily \'en,su,low和wiebe [stoc'19]的量子奇异值转换(SVT)框架[SVT)的动机[STOC'19],我们开发了SVT的经典算法合适的量子启发的采样假设。我们的结果提供了令人信服的证据,表明在相应的QRAM数据结构输入模型中,量子SVT不会产生指数量子加速。由于量子SVT框架基本上概括了量子线性代数的所有已知技术,因此我们的结果与先前工作的采样引理相结合,足以概括所有有关取消量子机器学习算法的最新结果。特别是,我们的经典SVT框架恢复并经常改善推荐系统,主成分分析,监督聚类,支持向量机器,低秩回归和半决赛程序解决方案的取消结果。我们还为汉密尔顿低级模拟和判别分析提供了其他取消化结果。我们的改进来自识别量子启发的输入模型的关键功能,该模型是所有先前量子启发的结果的核心:$ \ ell^2 $ -Norm采样可以及时近似于其尺寸近似矩阵产品。我们将所有主要结果减少到这一事实,使我们的简洁,独立和直观。
translated by 谷歌翻译
我们研究基于Krylov子空间的迭代方法,用于在任何Schatten $ p $ Norm中的低级别近似值。在这里,通过矩阵向量产品访问矩阵$ a $ $如此$ \ | a(i -zz^\ top)\ | _ {s_p} \ leq(1+ \ epsilon)\ min_ {u^\ top u = i_k} } $,其中$ \ | m \ | _ {s_p} $表示$ m $的单数值的$ \ ell_p $ norm。对于$ p = 2 $(frobenius norm)和$ p = \ infty $(频谱规范)的特殊情况,musco and Musco(Neurips 2015)获得了基于Krylov方法的算法,该方法使用$ \ tilde {o}(k)(k /\ sqrt {\ epsilon})$ matrix-vector产品,改进na \“ ive $ \ tilde {o}(k/\ epsilon)$依赖性,可以通过功率方法获得,其中$ \ tilde {o} $抑制均可抑制poly $(\ log(dk/\ epsilon))$。我们的主要结果是仅使用$ \ tilde {o}(kp^{1/6}/\ epsilon^{1/3} {1/3})$ matrix $ matrix的算法 - 矢量产品,并为所有$ p \ geq 1 $。为$ p = 2 $工作,我们的限制改进了先前的$ \ tilde {o}(k/\ epsilon^{1/2})$绑定到$ \ tilde {o}(k/\ epsilon^{1/3})$。由于schatten- $ p $和schatten-$ \ infty $ norms在$(1+ \ epsilon)$ pers $ p时相同\ geq(\ log d)/\ epsilon $,我们的界限恢复了Musco和Musco的结果,以$ p = \ infty $。此外,我们证明了矩阵矢量查询$ \ omega的下限(1/\ epsilon^ {1/3})$对于任何固定常数$ p \ geq 1 $,表明令人惊讶的$ \ tilde {\ theta}(1/\ epsilon^{ 1/3})$是常数〜$ k $的最佳复杂性。为了获得我们的结果,我们介绍了几种新技术,包括同时对多个Krylov子空间进行优化,以及针对分区操作员的不平等现象。我们在[1,2] $中以$ p \的限制使用了Araki-lieb-thirring Trace不平等,而对于$ p> 2 $,我们呼吁对安装分区操作员的规范压缩不平等。
translated by 谷歌翻译