在本文中,我们将颜色图像插入作为纯季基矩阵完成问题。在文献中,季节矩阵完成的理论保证并不确定。我们的主要目的是提出一个新的最小化问题,并将核标准和三个通道之间的二次损失相结合。为了填补理论空缺,我们获得了在干净和损坏的政权中绑定的错误,这依赖于四元素矩阵的一些新结果。在强大的完成中考虑了一般的高斯噪音,所有观察都被损坏。由于界限的动机,我们建议通过二次损失中的跨通道重量来处理不平衡或相关的噪声,这是重新平衡噪声水平或消除噪声相关性的主要目的。提供了有关合成和颜色图像数据的广泛实验结果,以确认和证明我们的理论发现。
translated by 谷歌翻译
在本文中,我们提出了一种均匀抖动的一位量化方案,以进行高维统计估计。该方案包含截断,抖动和量化,作为典型步骤。作为规范示例,量化方案应用于三个估计问题:稀疏协方差矩阵估计,稀疏线性回归和矩阵完成。我们研究了高斯和重尾政权,假定重尾数据的基本分布具有有限的第二或第四刻。对于每个模型,我们根据一位量化的数据提出新的估计器。在高斯次级政权中,我们的估计器达到了对数因素的最佳最小速率,这表明我们的量化方案几乎没有额外的成本。在重尾状态下,虽然我们的估计量基本上变慢,但这些结果是在这种单位量化和重型尾部设置中的第一个结果,或者比现有可比结果表现出显着改善。此外,我们为一位压缩传感和一位矩阵完成的问题做出了巨大贡献。具体而言,我们通过凸面编程将一位压缩感传感扩展到次高斯甚至是重尾传感向量。对于一位矩阵完成,我们的方法与标准似然方法基本不同,并且可以处理具有未知分布的预量化随机噪声。提出了有关合成数据的实验结果,以支持我们的理论分析。
translated by 谷歌翻译
在本文中,我们研究了经验$ \ ell_2 $最小化(erm)的估计性能(标准)阶段检索(NPR),由$ y_k = | \ alpha_k^*x_0 |^2+\ eta_k $,或嘈杂的广义阶段检索(NGPR)以$ y_k = x_0^*a_kx_0 + \ eta_k $,其中$ x_0 \ in \ mathbb {k}^d $是所需的信号,$ n $是样本大小,$ \ eta =(\ eta_1,...,\ eta_n)^\ top $是噪声向量。我们在不同的噪声模式下建立了新的错误界限,我们的证明对$ \ mathbb {k} = \ mathbb {r} $和$ \ mathbb {k} = \ mathbb {c} $有效。在任意噪声向量$ \ eta $下的NPR中,我们得出了一个新的错误$ o \ big(\ | \ eta \ | _ \ | _ \ infty \ sqrt {\ frac {d} {1}^\ top \ eta |} {n} \ big)$,它比当前已知的一个$ o \ big(\ frac {\ | \ eTa \ |} {\ sqrt {\ sqrt {n}} \ big big )$在许多情况下。在NGPR中,我们显示了$ o \ big(\ | \ eta \ | \ frac {\ sqrt {d}}} {n} {n} \ big)$ for nutary $ \ eta $。在这两个问题上,任意噪声的范围立即引起$ \ tilde {o}(\ sqrt {\ frac {d} {n}}}})$,用于次高斯或次指数随机噪声,带有一些常规但不可吻的去除或削弱的假设(例如,独立或均值均值的条件)。此外,我们首次尝试在假定$ l $ -th时刻的重尾随机噪声下进行ERM。为了实现偏见和差异之间的权衡,我们截断了响应并提出了相应的稳健ERM估计器,该估计量具有保证$ \ tilde {o} \ big(\ big [\ sqrt {\ frac {\ frac {d}) {n}} \ big]^{1-1/l} \ big)$在NPR,NGPR中。所有错误都直接扩展到等级$ r $矩阵恢复的更普遍的问题,这些结果得出的结论是,全级框架$ \ {a_k \} _ {k = 1}^n $ in ngpr是比级别1帧$ \ {\ alpha_k \ alpha_k^*\} _ {k = 1}^n $在npr中更强大。提出了广泛的实验结果,以说明我们的理论发现。
translated by 谷歌翻译
This paper studies the quantization of heavy-tailed data in some fundamental statistical estimation problems, where the underlying distributions have bounded moments of some order. We propose to truncate and properly dither the data prior to a uniform quantization. Our major standpoint is that (near) minimax rates of estimation error are achievable merely from the quantized data produced by the proposed scheme. In particular, concrete results are worked out for covariance estimation, compressed sensing, and matrix completion, all agreeing that the quantization only slightly worsens the multiplicative factor. Besides, we study compressed sensing where both covariate (i.e., sensing vector) and response are quantized. Under covariate quantization, although our recovery program is non-convex because the covariance matrix estimator lacks positive semi-definiteness, all local minimizers are proved to enjoy near optimal error bound. Moreover, by the concentration inequality of product process and covering argument, we establish near minimax uniform recovery guarantee for quantized compressed sensing with heavy-tailed noise.
translated by 谷歌翻译
In this paper, we study the trace regression when a matrix of parameters B* is estimated via the convex relaxation of a rank-regularized regression or via regularized non-convex optimization. It is known that these estimators satisfy near-optimal error bounds under assumptions on the rank, coherence, and spikiness of B*. We start by introducing a general notion of spikiness for B* that provides a generic recipe to prove the restricted strong convexity of the sampling operator of the trace regression and obtain near-optimal and non-asymptotic error bounds for the estimation error. Similar to the existing literature, these results require the regularization parameter to be above a certain theory-inspired threshold that depends on observation noise that may be unknown in practice. Next, we extend the error bounds to cases where the regularization parameter is chosen via cross-validation. This result is significant in that existing theoretical results on cross-validated estimators (Kale et al., 2011; Kumar et al., 2013; Abou-Moustafa and Szepesvari, 2017) do not apply to our setting since the estimators we study are not known to satisfy their required notion of stability. Finally, using simulations on synthetic and real data, we show that the cross-validated estimator selects a near-optimal penalty parameter and outperforms the theory-inspired approach of selecting the parameter.
translated by 谷歌翻译
我们开发机器以设计有效的可计算和一致的估计,随着观察人数而达到零的估计误差,因为观察的次数增长,当面对可能损坏的答复,除了样本的所有品,除了每种量之外的ALL。作为具体示例,我们调查了两个问题:稀疏回归和主成分分析(PCA)。对于稀疏回归,我们实现了最佳样本大小的一致性$ n \ gtrsim(k \ log d)/ \ alpha ^ $和最佳错误率$ o(\ sqrt {(k \ log d)/(n \ cdot \ alpha ^ 2))$ N $是观察人数,$ D $是尺寸的数量,$ k $是参数矢量的稀疏性,允许在数量的数量中为逆多项式进行逆多项式样品。在此工作之前,已知估计是一致的,当Inliers $ \ Alpha $ IS $ O(1 / \ log \ log n)$,即使是(非球面)高斯设计矩阵时也是一致的。结果在弱设计假设下持有,并且在这种一般噪声存在下仅被D'Orsi等人最近以密集的设置(即一般线性回归)显示。 [DNS21]。在PCA的上下文中,我们在参数矩阵上的广泛尖端假设下获得最佳错误保证(通常用于矩阵完成)。以前的作品可以仅在假设下获得非琐碎的保证,即与最基于的测量噪声以$ n $(例如,具有方差1 / n ^ 2 $的高斯高斯)。为了设计我们的估算,我们用非平滑的普通方(如$ \ ell_1 $ norm或核规范)装备Huber丢失,并以一种新的方法来分析损失的新方法[DNS21]的方法[DNS21]。功能。我们的机器似乎很容易适用于各种估计问题。
translated by 谷歌翻译
This paper is about a curious phenomenon. Suppose we have a data matrix, which is the superposition of a low-rank component and a sparse component. Can we recover each component individually? We prove that under some suitable assumptions, it is possible to recover both the low-rank and the sparse components exactly by solving a very convenient convex program called Principal Component Pursuit; among all feasible decompositions, simply minimize a weighted combination of the nuclear norm and of the 1 norm. This suggests the possibility of a principled approach to robust principal component analysis since our methodology and results assert that one can recover the principal components of a data matrix even though a positive fraction of its entries are arbitrarily corrupted. This extends to the situation where a fraction of the entries are missing as well. We discuss an algorithm for solving this optimization problem, and present applications in the area of video surveillance, where our methodology allows for the detection of objects in a cluttered background, and in the area of face recognition, where it offers a principled way of removing shadows and specularities in images of faces.
translated by 谷歌翻译
低秩矩阵恢复的现有结果在很大程度上专注于二次损失,这享有有利的性质,例如限制强的强凸/平滑度(RSC / RSM)以及在所有低等级矩阵上的良好调节。然而,许多有趣的问题涉及更一般,非二次损失,这不满足这些属性。对于这些问题,标准的非耦合方法,例如秩约为秩约为预定的梯度下降(A.K.A.迭代硬阈值)和毛刺蒙特罗分解可能具有差的经验性能,并且没有令人满意的理论保证了这些算法的全球和快速收敛。在本文中,我们表明,具有非二次损失的可证实低级恢复中的关键组成部分是规律性投影oracle。该Oracle限制在适当的界限集中迭代到低级矩阵,损耗功能在其上表现良好并且满足一组近似RSC / RSM条件。因此,我们分析配备有这样的甲骨文的(平均)投影的梯度方法,并证明它在全球和线性地收敛。我们的结果适用于广泛的非二次低级估计问题,包括一个比特矩阵感测/完成,个性化排名聚集,以及具有等级约束的更广泛的广义线性模型。
translated by 谷歌翻译
随机奇异值分解(RSVD)是用于计算大型数据矩阵截断的SVD的一类计算算法。给定A $ n \ times n $对称矩阵$ \ mathbf {m} $,原型RSVD算法输出通过计算$ \ mathbf {m mathbf {m} $的$ k $引导singular vectors的近似m}^{g} \ mathbf {g} $;这里$ g \ geq 1 $是一个整数,$ \ mathbf {g} \ in \ mathbb {r}^{n \ times k} $是一个随机的高斯素描矩阵。在本文中,我们研究了一般的“信号加上噪声”框架下的RSVD的统计特性,即,观察到的矩阵$ \ hat {\ mathbf {m}} $被认为是某种真实但未知的加法扰动信号矩阵$ \ mathbf {m} $。我们首先得出$ \ ell_2 $(频谱规范)和$ \ ell_ {2 \ to \ infty} $(最大行行列$ \ ell_2 $ norm)$ \ hat {\ hat {\ Mathbf {M}} $和信号矩阵$ \ Mathbf {M} $的真实单数向量。这些上限取决于信噪比(SNR)和功率迭代$ g $的数量。观察到一个相变现象,其中较小的SNR需要较大的$ g $值以保证$ \ ell_2 $和$ \ ell_ {2 \ to \ fo \ infty} $ distances的收敛。我们还表明,每当噪声矩阵满足一定的痕量生长条件时,这些相变发生的$ g $的阈值都会很清晰。最后,我们得出了近似奇异向量的行波和近似矩阵的进入波动的正常近似。我们通过将RSVD的几乎最佳性能保证在应用于三个统计推断问题的情况下,即社区检测,矩阵完成和主要的组件分析,并使用缺失的数据来说明我们的理论结果。
translated by 谷歌翻译
在本文中,我们利用过度参数化来设计高维单索索引模型的无规矩算法,并为诱导的隐式正则化现象提供理论保证。具体而言,我们研究了链路功能是非线性且未知的矢量和矩阵单索引模型,信号参数是稀疏向量或低秩对称矩阵,并且响应变量可以是重尾的。为了更好地理解隐含正规化的角色而没有过度的技术性,我们假设协变量的分布是先验的。对于载体和矩阵设置,我们通过采用分数函数变换和专为重尾数据的强大截断步骤来构造过度参数化最小二乘损耗功能。我们建议通过将无规则化的梯度下降应用于损耗函数来估计真实参数。当初始化接近原点并且步骤中足够小时,我们证明了所获得的解决方案在载体和矩阵案件中实现了最小的收敛统计速率。此外,我们的实验结果支持我们的理论调查结果,并表明我们的方法在$ \ ell_2 $ -staticatisticated率和变量选择一致性方面具有明确的正则化的经验卓越。
translated by 谷歌翻译
We consider a problem of considerable practical interest: the recovery of a data matrix from a sampling of its entries. Suppose that we observe m entries selected uniformly at random from a matrix M . Can we complete the matrix and recover the entries that we have not seen?We show that one can perfectly recover most low-rank matrices from what appears to be an incomplete set of entries. We prove that if the number m of sampled entries obeys m ≥ C n 1.2 r log n for some positive numerical constant C, then with very high probability, most n × n matrices of rank r can be perfectly recovered by solving a simple convex optimization program. This program finds the matrix with minimum nuclear norm that fits the data. The condition above assumes that the rank is not too large. However, if one replaces the 1.2 exponent with 1.25, then the result holds for all values of the rank. Similar results hold for arbitrary rectangular matrices as well. Our results are connected with the recent literature on compressed sensing, and show that objects other than signals and images can be perfectly reconstructed from very limited information.
translated by 谷歌翻译
找到给定矩阵的独特低维分解的问题是许多领域的基本和经常发生的问题。在本文中,我们研究了寻求一个唯一分解的问题,以\ mathbb {r} ^ {p \ times n} $ in \ mathbb {p \ time n} $。具体来说,我们考虑$ y = ax \ in \ mathbb {r} ^ {p \ time n} $,其中矩阵$ a \ in \ mathbb {r} ^ {p \ times r} $具有全列等级,带有$ r <\ min \ {n,p \} $,矩阵$ x \ in \ mathbb {r} ^ {r \ times n} $是元素 - 方向稀疏。我们证明,可以唯一确定$ y $的稀疏分解,直至某些内在签名排列。我们的方法依赖于解决在单位球体上限制的非凸优化问题。我们对非透露优化景观的几何分析表明,任何{\ em strict}本地解决方案靠近地面真相解决方案,可以通过任何二阶序列算法遵循的简单数据驱动初始化恢复。最后,我们用数值实验证实了这些理论结果。
translated by 谷歌翻译
现代神经网络通常以强烈的过度构造状态运行:它们包含许多参数,即使实际标签被纯粹随机的标签代替,它们也可以插入训练集。尽管如此,他们在看不见的数据上达到了良好的预测错误:插值训练集并不会导致巨大的概括错误。此外,过度散色化似乎是有益的,因为它简化了优化景观。在这里,我们在神经切线(NT)制度中的两层神经网络的背景下研究这些现象。我们考虑了一个简单的数据模型,以及各向同性协变量的矢量,$ d $尺寸和$ n $隐藏的神经元。我们假设样本量$ n $和尺寸$ d $都很大,并且它们在多项式上相关。我们的第一个主要结果是对过份术的经验NT内核的特征结构的特征。这种表征意味着必然的表明,经验NT内核的最低特征值在$ ND \ gg n $后立即从零界限,因此网络可以在同一制度中精确插值任意标签。我们的第二个主要结果是对NT Ridge回归的概括误差的表征,包括特殊情况,最小值-ULL_2 $ NORD插值。我们证明,一旦$ nd \ gg n $,测试误差就会被内核岭回归之一相对于无限宽度内核而近似。多项式脊回归的误差依次近似后者,从而通过与激活函数的高度组件相关的“自我诱导的”项增加了正则化参数。多项式程度取决于样本量和尺寸(尤其是$ \ log n/\ log d $)。
translated by 谷歌翻译
The affine rank minimization problem consists of finding a matrix of minimum rank that satisfies a given system of linear equality constraints. Such problems have appeared in the literature of a diverse set of fields including system identification and control, Euclidean embedding, and collaborative filtering. Although specific instances can often be solved with specialized algorithms, the general affine rank minimization problem is NP-hard, because it contains vector cardinality minimization as a special case.In this paper, we show that if a certain restricted isometry property holds for the linear transformation defining the constraints, the minimum rank solution can be recovered by solving a convex optimization problem, namely the minimization of the nuclear norm over the given affine space. We present several random ensembles of equations where the restricted isometry property holds with overwhelming probability, provided the codimension of the subspace is Ω(r(m + n) log mn), where m, n are the dimensions of the matrix, and r is its rank.The techniques used in our analysis have strong parallels in the compressed sensing framework. We discuss how affine rank minimization generalizes this pre-existing concept and outline a dictionary relating concepts from cardinality minimization to those of rank minimization. We also discuss several algorithmic approaches to solving the norm minimization relaxations, and illustrate our results with numerical examples.
translated by 谷歌翻译
本文为信号去噪提供了一般交叉验证框架。然后将一般框架应用于非参数回归方法,例如趋势过滤和二元推车。然后显示所得到的交叉验证版本以获得最佳调谐的类似物所熟知的几乎相同的收敛速度。没有任何先前的趋势过滤或二元推车的理论分析。为了说明框架的一般性,我们还提出并研究了两个基本估算器的交叉验证版本;套索用于高维线性回归和矩阵估计的奇异值阈值阈值。我们的一般框架是由Chatterjee和Jafarov(2015)的想法的启发,并且可能适用于使用调整参数的广泛估算方法。
translated by 谷歌翻译
在本文中,我们应对PCA:异质性的重大挑战。当从不同趋势的不同来源收集数据的同时仍具有一致性时,提取共享知识的同时保留每个来源的独特功能至关重要。为此,我们提出了个性化的PCA(PERPCA),该PCA(PERPCA)使用相互正交的全球和本地主要组件来编码唯一的和共享的功能。我们表明,在轻度条件下,即使协方差矩阵截然不同,也可以通过约束优化问题来识别和恢复独特的和共享的特征。此外,我们设计了一种完全由分布式stiefel梯度下降来解决问题的完全联合算法。该算法引入了一组新的操作,称为通用缩回,以处理正交性约束,并且仅要求跨来源共享全局PC。我们证明了在合适的假设下算法的线性收敛。全面的数值实验突出了PERPCA在特征提取和异质数据集预测方面的出色性能。作为将共享和唯一功能从异质数据集解除共享和独特功能的系统方法,PERPCA在几种任务中找到了应用程序,包括视频细分,主题提取和分布式聚类。
translated by 谷歌翻译
我们在具有固定设计的高维错误设置中分析主组件回归(PCR)。在适当的条件下,我们表明PCR始终以最小$ \ ell_2 $ -norm识别唯一模型,并且是最小的最佳模型。这些结果使我们能够建立非质子化的样本外预测,以确保提高最著名的速率。在我们的分析中,我们在样本外协变量之间引入了天然的线性代数条件,这使我们能够避免分布假设。我们的模拟说明了即使在协变量转移的情况下,这种条件对于概括的重要性。作为副产品,我们的结果还导致了合成控制文献的新结果,这是政策评估的主要方法。特别是,我们的minimax结果表明,在众多变体中,基于PCR的方法具有吸引力。据我们所知,我们对固定设计设置的预测保证在高维错误和合成控制文献中都是难以捉摸的。
translated by 谷歌翻译
我们考虑与高斯数据的高维线性回归中的插值学习,并在类高斯宽度方面证明了任意假设类别中的内插器的泛化误差。将通用绑定到欧几里德常规球恢复了Bartlett等人的一致性结果。(2020)对于最小规范内插器,并确认周等人的预测。(2020)在高斯数据的特殊情况下,对于近乎最小常态的内插器。我们通过将其应用于单位来证明所界限的一般性,从而获得最小L1-NORM Interpoolator(基础追踪)的新型一致性结果。我们的结果表明,基于规范的泛化界限如何解释并用于分析良性过度装备,至少在某些设置中。
translated by 谷歌翻译
我们调查与高斯的混合的数据分享共同但未知,潜在虐待协方差矩阵的数据。我们首先考虑具有两个等级大小的组件的高斯混合,并根据最大似然估计导出最大切割整数程序。当样品的数量在维度下线性增长时,我们证明其解决方案实现了最佳的错误分类率,直到对数因子。但是,解决最大切割问题似乎是在计算上棘手的。为了克服这一点,我们开发了一种高效的频谱算法,该算法达到最佳速率,但需要一种二次样本量。虽然这种样本复杂性比最大切割问题更差,但我们猜测没有多项式方法可以更好地执行。此外,我们收集了支持统计计算差距存在的数值和理论证据。最后,我们将MAX-CUT程序概括为$ k $ -means程序,该程序处理多组分混合物的可能性不平等。它享有相似的最优性保证,用于满足运输成本不平等的分布式的混合物,包括高斯和强烈的对数的分布。
translated by 谷歌翻译
We consider the nonlinear inverse problem of learning a transition operator $\mathbf{A}$ from partial observations at different times, in particular from sparse observations of entries of its powers $\mathbf{A},\mathbf{A}^2,\cdots,\mathbf{A}^{T}$. This Spatio-Temporal Transition Operator Recovery problem is motivated by the recent interest in learning time-varying graph signals that are driven by graph operators depending on the underlying graph topology. We address the nonlinearity of the problem by embedding it into a higher-dimensional space of suitable block-Hankel matrices, where it becomes a low-rank matrix completion problem, even if $\mathbf{A}$ is of full rank. For both a uniform and an adaptive random space-time sampling model, we quantify the recoverability of the transition operator via suitable measures of incoherence of these block-Hankel embedding matrices. For graph transition operators these measures of incoherence depend on the interplay between the dynamics and the graph topology. We develop a suitable non-convex iterative reweighted least squares (IRLS) algorithm, establish its quadratic local convergence, and show that, in optimal scenarios, no more than $\mathcal{O}(rn \log(nT))$ space-time samples are sufficient to ensure accurate recovery of a rank-$r$ operator $\mathbf{A}$ of size $n \times n$. This establishes that spatial samples can be substituted by a comparable number of space-time samples. We provide an efficient implementation of the proposed IRLS algorithm with space complexity of order $O(r n T)$ and per-iteration time complexity linear in $n$. Numerical experiments for transition operators based on several graph models confirm that the theoretical findings accurately track empirical phase transitions, and illustrate the applicability and scalability of the proposed algorithm.
translated by 谷歌翻译