智能论文笔记

Tensor-on-Tensor Regression: Riemannian Optimization, Over-parameterization, Statistical-computational Gap, and Their Interplay

Yuetian Luo , Anru R. Zhang

分类：机器学习 | (统计)机器学习

2022-06-17

我们研究了张量张量的回归，其中的目标是将张量的响应与张量协变量与塔克等级参数张量/矩阵连接起来，而没有其内在等级的先验知识。我们提出了Riemannian梯度下降（RGD）和Riemannian Gauss-Newton（RGN）方法，并通过研究等级过度参数化的影响来应对未知等级的挑战。我们通过表明RGD和RGN分别线性地和四边形地收敛到两个等级的统计最佳估计值，从而为一般的张量调节回归提供了第一个收敛保证。我们的理论揭示了一种有趣的现象：Riemannian优化方法自然地适应了过度参数化，而无需修改其实施。我们还为低度多项式框架下的标量调整回归中的统计计算差距提供了第一个严格的证据。我们的理论证明了``统计计算差距的祝福''现象：在张张量的张量回归中，对于三个或更高的张紧器，在张张量的张量回归中，计算所需的样本量与中等级别相匹配的计算量相匹配。在考虑计算可行的估计器时，虽然矩阵设置没有此类好处。这表明中等等级的过度参数化本质上是``在张量调整的样本量三分或更高的样本大小上，三分或更高的样本量。最后，我们进行仿真研究以显示我们提出的方法的优势并证实我们的理论发现。

translated by 谷歌翻译

Low-rank Tensor Estimation via Riemannian Gauss-Newton: Statistical Optimality and Second-Order Convergence

Yuetian Luo , Anru R. Zhang

分类： (统计)机器学习 | 机器学习

2021-04-24

In this paper, we consider the estimation of a low Tucker rank tensor from a number of noisy linear measurements. The general problem covers many specific examples arising from applications, including tensor regression, tensor completion, and tensor PCA/SVD. We consider an efficient Riemannian Gauss-Newton (RGN) method for low Tucker rank tensor estimation. Different from the generic (super)linear convergence guarantee of RGN in the literature, we prove the first local quadratic convergence guarantee of RGN for low-rank tensor estimation in the noisy setting under some regularity conditions and provide the corresponding estimation error upper bounds. A deterministic estimation error lower bound, which matches the upper bound, is provided that demonstrates the statistical optimality of RGN. The merit of RGN is illustrated through two machine learning applications: tensor regression and tensor SVD. Finally, we provide the simulation results to corroborate our theoretical findings.

translated by 谷歌翻译

Recursive Importance Sketching for Rank Constrained Least Squares: Algorithms and High-order Convergence

Yuetian Luo , Wen Huang , Xudong Li , Anru R. Zhang

分类：机器学习 | (统计)机器学习

2020-11-17

在本文中，我们提出{\ it \下划线{r} ecursive} {\ it \ usef \ undesline {i} mortance} {\ it \ it \ usew supsline {s} ketching} algorithM squares {\ it \下划线{o} ptimization}（risro）。 Risro的关键步骤是递归重要性草图，这是一个基于确定性设计的递归投影的新素描框架，它与文献中的随机素描\ Citep {Mahoney2011 randomized，Woodruff2014sketching}有很大不同。在这个新的素描框架下，可以重新解释文献中的几种现有算法，而Risro比它们具有明显的优势。 Risro易于实现，并在计算上有效，其中每次迭代中的核心过程是解决降低尺寸最小二乘问题的问题。我们在某些轻度条件下建立了Risro的局部二次线性和二次收敛速率。我们还发现了Risro与Riemannian Gauss-Newton算法在固定等级矩阵上的联系。在机器学习和统计数据中的两种应用中，RISRO的有效性得到了证明：低级别矩阵痕量回归和相位检索。仿真研究证明了Risro的出色数值性能。

translated by 谷歌翻译

Tensor Principal Component Analysis in High Dimensional CP Models

Yuefeng Han , Cun-Hui Zhang

分类： (统计)机器学习 | 机器学习

2021-08-10

高维非正交掺入张量的CP分解是许多学科的广泛应用的重要问题。然而，以前的理论保证的工作通常在CP组分的基础载体上承担限制性的不连贯条件。在本文中，我们提出了新的计算高效的复合PCA和并发正交化算法，以便在轻度不连结条件下的理论保证。复合PCA将主成分或奇异值分解应用于张量数据的矩阵，以获得奇异矢量，然后在第一步骤中获得的奇异载体的基质折叠。它可以用作Tensor CP分解的任何迭代优化方案的初始化。并发正交化算法通过将突起同时施加到其他模式中的其他模式所产生的空格的正交补充，迭代地估计张量的每个模式的基础向量。旨在改善具有低或中等高CP等级的张量的交替的最小二乘估计器和其他形式的高阶正交迭代，并且当任何给定的初始估计器的错误被小常数界定时，它保证快速收敛。我们的理论调查为两种提出的算法提供了估算准确性和收敛速率。我们对合成数据的实施表明了我们对现有方法的方法的显着实际优势。

translated by 谷歌翻译

Nonconvex Factorization and Manifold Formulations are Almost Equivalent in Low-rank Matrix Optimization

Yuetian Luo , Xudong Li , Anru R. Zhang

分类：机器学习 | (统计)机器学习

2021-08-03

In this paper, we consider the geometric landscape connection of the widely studied manifold and factorization formulations in low-rank positive semidefinite (PSD) and general matrix optimization. We establish a sandwich relation on the spectrum of Riemannian and Euclidean Hessians at first-order stationary points (FOSPs). As a result of that, we obtain an equivalence on the set of FOSPs, second-order stationary points (SOSPs) and strict saddles between the manifold and the factorization formulations. In addition, we show the sandwich relation can be used to transfer more quantitative geometric properties from one formulation to another. Similarities and differences in the landscape connection under the PSD case and the general case are discussed. To the best of our knowledge, this is the first geometric landscape connection between the manifold and the factorization formulations for handling rank constraints, and it provides a geometric explanation for the similar empirical performance of factorization and manifold approaches in low-rank matrix optimization observed in the literature. In the general low-rank matrix optimization, the landscape connection of two factorization formulations (unregularized and regularized ones) is also provided. By applying these geometric landscape connections, in particular, the sandwich relation, we are able to solve unanswered questions in literature and establish stronger results in the applications on geometric analysis of phase retrieval, well-conditioned low-rank matrix optimization, and the role of regularization in factorization arising from machine learning and signal processing.

translated by 谷歌翻译

Nonconvex Matrix Factorization is Geodesically Convex: Global Landscape Analysis for Fixed-rank Matrix Optimization From a Riemannian Perspective

Yuetian Luo , Nicolas Garcia Trillos

分类：机器学习

2022-09-29

We study a general matrix optimization problem with a fixed-rank positive semidefinite (PSD) constraint. We perform the Burer-Monteiro factorization and consider a particular Riemannian quotient geometry in a search space that has a total space equipped with the Euclidean metric. When the original objective f satisfies standard restricted strong convexity and smoothness properties, we characterize the global landscape of the factorized objective under the Riemannian quotient geometry. We show the entire search space can be divided into three regions: (R1) the region near the target parameter of interest, where the factorized objective is geodesically strongly convex and smooth; (R2) the region containing neighborhoods of all strict saddle points; (R3) the remaining regions, where the factorized objective has a large gradient. To our best knowledge, this is the first global landscape analysis of the Burer-Monteiro factorized objective under the Riemannian quotient geometry. Our results provide a fully geometric explanation for the superior performance of vanilla gradient descent under the Burer-Monteiro factorization. When f satisfies a weaker restricted strict convexity property, we show there exists a neighborhood near local minimizers such that the factorized objective is geodesically convex. To prove our results we provide a comprehensive landscape analysis of a matrix factorization problem with a least squares objective, which serves as a critical bridge. Our conclusions are also based on a result of independent interest stating that the geodesic ball centered at Y with a radius 1/3 of the least singular value of Y is a geodesically convex set under the Riemannian quotient geometry, which as a corollary, also implies a quantitative bound of the convexity radius in the Bures-Wasserstein space. The convexity radius obtained is sharp up to constants.

translated by 谷歌翻译

An Interpretable and Efficient Infinite-Order Vector Autoregressive Model for High-Dimensional Time Series

Yao Zheng , Shibo Li

分类： (统计)机器学习

2022-09-02

作为一种特殊的无限级矢量自回旋（VAR）模型，矢量自回归移动平均值（VARMA）模型比广泛使用的有限级var模型可以捕获更丰富的时间模式。然而，长期以来，其实用性一直受到其不可识别性，计算疾病性和解释相对难度的阻碍。本文介绍了一种新颖的无限级VAR模型，该模型不仅避免了VARMA模型的缺点，而且继承了其有利的时间模式。作为另一个有吸引力的特征，可以单独解释该模型的时间和横截面依赖性结构，因为它们的特征是不同的参数集。对于高维时间序列，这种分离激发了我们对确定横截面依赖性的参数施加稀疏性。结果，可以在不牺牲任何时间信息的情况下实现更高的统计效率和可解释性。我们为提出的模型引入了一个$ \ ell_1 $调查估计量，并得出相应的非反应误差边界。开发了有效的块坐标下降算法和一致的模型顺序选择方法。拟议方法的优点得到了模拟研究和现实世界的宏观经济数据分析的支持。

translated by 谷歌翻译

Precise Asymptotics for Spectral Methods in Mixed Generalized Linear Models

Yihan Zhang , Marco Mondelli , Ramji Venkataramanan

分类：机器学习 | (统计)机器学习

2022-11-21

In a mixed generalized linear model, the objective is to learn multiple signals from unlabeled observations: each sample comes from exactly one signal, but it is not known which one. We consider the prototypical problem of estimating two statistically independent signals in a mixed generalized linear model with Gaussian covariates. Spectral methods are a popular class of estimators which output the top two eigenvectors of a suitable data-dependent matrix. However, despite the wide applicability, their design is still obtained via heuristic considerations, and the number of samples $n$ needed to guarantee recovery is super-linear in the signal dimension $d$. In this paper, we develop exact asymptotics on spectral methods in the challenging proportional regime in which $n, d$ grow large and their ratio converges to a finite constant. By doing so, we are able to optimize the design of the spectral method, and combine it with a simple linear estimator, in order to minimize the estimation error. Our characterization exploits a mix of tools from random matrices, free probability and the theory of approximate message passing algorithms. Numerical simulations for mixed linear regression and phase retrieval display the advantage enabled by our analysis over existing designs of spectral methods.

translated by 谷歌翻译

Scaling and Scalability: Provable Nonconvex Low-Rank Tensor Estimation from Incomplete Measurements

Tian Tong , Cong Ma , Ashley Prater-Bennette , Erin Tripp , Yuejie Chi

分类：机器学习 | (统计)机器学习

2021-04-29

提供了一种强大而灵活的模型，可用于代表多属数据和多种方式相互作用，在科学和工程中的各个领域中发挥着现代数据科学中的不可或缺的作用。基本任务是忠实地以统计和计算的有效方式从高度不完整的测量中恢复张量。利用Tucker分解中的张量的低级别结构，本文开发了一个缩放的梯度下降（Scaledgd）算法，可以直接恢复具有定制频谱初始化的张量因子，并表明它以与条件号无关的线性速率收敛对于两个规范问题的地面真理张量 - 张量完成和张量回归 - 一旦样本大小高于$ n ^ {3/2} $忽略其他参数依赖项，$ n $是维度张量。这导致与现有技术相比的低秩张力估计的极其可扩展的方法，这些方法具有以下至少一个缺点：对记忆和计算方面的对不良，偏移成本高的极度敏感性，或差样本复杂性保证。据我们所知，Scaledgd是第一算法，它可以同时实现近最佳统计和计算复杂性，以便与Tucker分解进行低级张力完成。我们的算法突出了加速非耦合统计估计在加速非耦合统计估计中的适当预处理的功率，其中迭代改复的预处理器促进轨迹的所需的不变性属性相对于低级张量分解中的底层对称性。

translated by 谷歌翻译

Functional Linear Regression of Cumulative Distribution Functions

Qian Zhang , Anuran Makur , Kamyar Azizzadenesheli

分类：机器学习

2022-05-28

The estimation of cumulative distribution functions (CDFs) is an important learning task with a great variety of downstream applications, such as risk assessments in predictions and decision making. In this paper, we study functional regression of contextual CDFs where each data point is sampled from a linear combination of context dependent CDF basis functions. We propose functional ridge-regression-based estimation methods that estimate CDFs accurately everywhere. In particular, given $n$ samples with $d$ basis functions, we show estimation error upper bounds of $\widetilde{O}(\sqrt{d/n})$ for fixed design, random design, and adversarial context cases. We also derive matching information theoretic lower bounds, establishing minimax optimality for CDF functional regression. Furthermore, we remove the burn-in time in the random design setting using an alternative penalized estimator. Then, we consider agnostic settings where there is a mismatch in the data generation process. We characterize the error of the proposed estimators in terms of the mismatched error, and show that the estimators are well-behaved under model mismatch. Finally, to complete our study, we formalize infinite dimensional models where the parameter space is an infinite dimensional Hilbert space, and establish self-normalized estimation error upper bounds for this setting.

translated by 谷歌翻译

Sharp Analysis of Sketch-and-Project Methods via a Connection to Randomized Singular Value Decomposition

Michał Dereziński , Elizaveta Rebrova

分类： (统计)机器学习

2022-08-20

素描和项目是一个框架，它统一了许多已知的迭代方法来求解线性系统及其变体，并进一步扩展了非线性优化问题。它包括流行的方法，例如随机kaczmarz，坐标下降，凸优化的牛顿方法的变体等。在本文中，我们通过新的紧密频谱边界为预期的草图投影矩阵获得了素描和项目的收敛速率的敏锐保证。我们的估计值揭示了素描和项目的收敛率与另一个众所周知但看似无关的算法家族的近似误差之间的联系，这些算法使用草图加速了流行的矩阵因子化，例如QR和SVD。这种连接使我们更接近准确量化草图和项目求解器的性能如何取决于其草图大小。我们的分析不仅涵盖了高斯和次高斯的素描矩阵，还涵盖了一个有效的稀疏素描方法，称为较少的嵌入方法。我们的实验备份了理论，并证明即使极稀疏的草图在实践中也显示出相同的收敛属性。

translated by 谷歌翻译

Debiased Machine Learning of Set-Identified Linear Models

Vira Semenova

分类： (统计)机器学习 | 机器学习

2017-12-28

This paper provides estimation and inference methods for an identified set's boundary (i.e., support function) where the selection among a very large number of covariates is based on modern regularized tools. I characterize the boundary using a semiparametric moment equation. Combining Neyman-orthogonality and sample splitting ideas, I construct a root-N consistent, uniformly asymptotically Gaussian estimator of the boundary and propose a multiplier bootstrap procedure to conduct inference. I apply this result to the partially linear model, the partially linear IV model and the average partial derivative with an interval-valued outcome.

translated by 谷歌翻译

Stochastic Subgradient Descent Escapes Active Strict Saddles

Pascal Bianchi , Walid Hachem , Sholom Schechtman

分类： (统计)机器学习

2021-08-04

In non-smooth stochastic optimization, we establish the non-convergence of the stochastic subgradient descent (SGD) to the critical points recently called active strict saddles by Davis and Drusvyatskiy. Such points lie on a manifold $M$ where the function $f$ has a direction of second-order negative curvature. Off this manifold, the norm of the Clarke subdifferential of $f$ is lower-bounded. We require two conditions on $f$. The first assumption is a Verdier stratification condition, which is a refinement of the popular Whitney stratification. It allows us to establish a reinforced version of the projection formula of Bolte \emph{et.al.} for Whitney stratifiable functions, and which is of independent interest. The second assumption, termed the angle condition, allows to control the distance of the iterates to $M$. When $f$ is weakly convex, our assumptions are generic. Consequently, generically in the class of definable weakly convex functions, the SGD converges to a local minimizer.

translated by 谷歌翻译

Understanding Implicit Regularization in Over-Parameterized Single Index Model

Jianqing Fan , Zhuoran Yang , Mengxin Yu

分类： (统计)机器学习 | 机器学习

2020-07-16

在本文中，我们利用过度参数化来设计高维单索索引模型的无规矩算法，并为诱导的隐式正则化现象提供理论保证。具体而言，我们研究了链路功能是非线性且未知的矢量和矩阵单索引模型，信号参数是稀疏向量或低秩对称矩阵，并且响应变量可以是重尾的。为了更好地理解隐含正规化的角色而没有过度的技术性，我们假设协变量的分布是先验的。对于载体和矩阵设置，我们通过采用分数函数变换和专为重尾数据的强大截断步骤来构造过度参数化最小二乘损耗功能。我们建议通过将无规则化的梯度下降应用于损耗函数来估计真实参数。当初始化接近原点并且步骤中足够小时，我们证明了所获得的解决方案在载体和矩阵案件中实现了最小的收敛统计速率。此外，我们的实验结果支持我们的理论调查结果，并表明我们的方法在$ \ ell_2 $ -staticatisticated率和变量选择一致性方面具有明确的正则化的经验卓越。

translated by 谷歌翻译

On High dimensional Poisson models with measurement error: hypothesis testing for nonlinear nonconvex optimization

Fei Jiang , Yeqing Zhou , Jianxuan Liu , Yanyuan Ma

分类：机器学习

2022-12-31

We study estimation and testing in the Poisson regression model with noisy high dimensional covariates, which has wide applications in analyzing noisy big data. Correcting for the estimation bias due to the covariate noise leads to a non-convex target function to minimize. Treating the high dimensional issue further leads us to augment an amenable penalty term to the target function. We propose to estimate the regression parameter through minimizing the penalized target function. We derive the L1 and L2 convergence rates of the estimator and prove the variable selection consistency. We further establish the asymptotic normality of any subset of the parameters, where the subset can have infinitely many components as long as its cardinality grows sufficiently slow. We develop Wald and score tests based on the asymptotic normality of the estimator, which permits testing of linear functions of the members if the subset. We examine the finite sample performance of the proposed tests by extensive simulation. Finally, the proposed method is successfully applied to the Alzheimer's Disease Neuroimaging Initiative study, which motivated this work initially.

translated by 谷歌翻译

Unique sparse decomposition of low rank matrices

Dian Jin , Xin Bing , Yuqian Zhang

分类：机器学习

2021-06-14

找到给定矩阵的独特低维分解的问题是许多领域的基本和经常发生的问题。在本文中，我们研究了寻求一个唯一分解的问题，以\ mathbb {r} ^ {p \ times n} $ in \ mathbb {p \ time n} $。具体来说，我们考虑$ y = ax \ in \ mathbb {r} ^ {p \ time n} $，其中矩阵$ a \ in \ mathbb {r} ^ {p \ times r} $具有全列等级，带有$ r <\ min \ {n，p \} $，矩阵$ x \ in \ mathbb {r} ^ {r \ times n} $是元素 - 方向稀疏。我们证明，可以唯一确定$ y $的稀疏分解，直至某些内在签名排列。我们的方法依赖于解决在单位球体上限制的非凸优化问题。我们对非透露优化景观的几何分析表明，任何{\ em strict}本地解决方案靠近地面真相解决方案，可以通过任何二阶序列算法遵循的简单数据驱动初始化恢复。最后，我们用数值实验证实了这些理论结果。

translated by 谷歌翻译

Towards Theoretical Understandings of Robust Markov Decision Processes: Sample Complexity and Asymptotics

Wenhao Yang , Liangyu Zhang , Zhihua Zhang

分类： (统计)机器学习 | 机器学习

2021-05-09

在本文中，我们研究了强大的马尔可夫决策过程（MDPS）的最佳稳健策略和价值功能的非反应性和渐近性能，其中仅从生成模型中求解了最佳的稳健策略和价值功能。尽管在KL不确定性集和$（s，a）$ - 矩形假设的设置中限制了以前专注于可靠MDP的非反应性能的工作，但我们改善了它们的结果，还考虑了其他不确定性集，包括$ L_1 $和$ L_1 $和$ \ chi^2 $球。我们的结果表明，当我们假设$（s，a）$ - 矩形在不确定性集上时，示例复杂度大约为$ \ widetilde {o} \ left（\ frac {| \ mathcal {| \ mathcal {s} |^2 | \ mathcal { a} |} {\ varepsilon^2 \ rho^2（1- \ gamma）^4} \ right）$。此外，我们将结果从$（s，a）$ - 矩形假设扩展到$ s $矩形假设。在这种情况下，样本复杂性随选择不确定性集而变化，通常比$（s，a）$矩形假设下的情况大。此外，我们还表明，在$（s，a）$和$ s $ retectangular的假设下，从理论和经验的角度来看，最佳的鲁棒值函数是渐近的正常，典型的速率$ \ sqrt {n} $。

translated by 谷歌翻译

Optimal high-dimensional and nonparametric distributed testing under communication constraints

Botond Szabó , Lasse Vuursteen , Harry van Zanten

分类： (统计)机器学习

2022-02-02

我们在分布式框架中得出最小值测试错误，其中数据被分成多个机器，并且它们与中央机器的通信仅限于$ b $位。我们研究了高斯白噪声下的$ d $ - 和无限维信号检测问题。我们还得出达到理论下限的分布式测试算法。我们的结果表明，分布式测试受到从根本上不同的现象，这些现象在分布式估计中未观察到。在我们的发现中，我们表明，可以访问共享随机性的测试协议在某些制度中的性能比不进行的测试协议可以更好地表现。我们还观察到，即使仅使用单个本地计算机上可用的信息，一致的非参数分布式测试始终是可能的，即使只有$ 1 $的通信和相应的测试优于最佳本地测试。此外，我们还得出了自适应非参数分布测试策略和相应的理论下限。

translated by 谷歌翻译

Near optimal sample complexity for matrix and tensor normal models via geodesic convexity

Cole Franks , Rafael Oliveira , Akshay Ramachandran , Michael Walter

分类：机器学习

2021-10-14

矩阵正常模型，高斯矩阵变化分布的系列，其协方差矩阵是两个较低尺寸因子的Kronecker乘积，经常用于模拟矩阵变化数据。张量正常模型将该家庭推广到三个或更多因素的Kronecker产品。我们研究了矩阵和张量模型中协方差矩阵的Kronecker因子的估计。我们向几个自然度量中的最大似然估计器（MLE）实现的误差显示了非因素界限。与现有范围相比，我们的结果不依赖于条件良好或稀疏的因素。对于矩阵正常模型，我们所有的所有界限都是最佳的对数因子最佳，对于张量正常模型，我们对最大因数和整体协方差矩阵的绑定是最佳的，所以提供足够的样品以获得足够的样品以获得足够的样品常量Frobenius错误。在与我们的样本复杂性范围相同的制度中，我们表明迭代程序计算称为触发器算法称为触发器算法的MLE的线性地收敛，具有高概率。我们的主要工具是Fisher信息度量诱导的正面矩阵的几何中的测地强凸性。这种强大的凸起由某些随机量子通道的扩展来决定。我们还提供了数值证据，使得将触发器算法与简单的收缩估计器组合可以提高缺乏采样制度的性能。

translated by 谷歌翻译

Generalization Bounds for Inductive Matrix Completion in Low-noise Settings

Antoine Ledent , Rodrigo Alves , Yunwen Lei , Yann Guermeur , Marius Kloft

分类：机器学习 | (统计)机器学习

2022-12-16

We study inductive matrix completion (matrix completion with side information) under an i.i.d. subgaussian noise assumption at a low noise regime, with uniform sampling of the entries. We obtain for the first time generalization bounds with the following three properties: (1) they scale like the standard deviation of the noise and in particular approach zero in the exact recovery case; (2) even in the presence of noise, they converge to zero when the sample size approaches infinity; and (3) for a fixed dimension of the side information, they only have a logarithmic dependence on the size of the matrix. Differently from many works in approximate recovery, we present results both for bounded Lipschitz losses and for the absolute loss, with the latter relying on Talagrand-type inequalities. The proofs create a bridge between two approaches to the theoretical analysis of matrix completion, since they consist in a combination of techniques from both the exact recovery literature and the approximate recovery literature.

translated by 谷歌翻译