This survey provides an overview of higher-order tensor decompositions, their applications, and available software. A tensor is a multidimensional or N-way array. Decompositions of higher-order tensors (i.e., N-way arrays with N ≥ 3) have applications in psycho-metrics, chemometrics, signal processing, numerical linear algebra, computer vision, numerical analysis, data mining, neuroscience, graph analysis, and elsewhere. Two particular tensor decompositions can be considered to be higher-order extensions of the matrix singular value decomposition: CANDECOMP/PARAFAC (CP) decomposes a tensor as a sum of rank-one tensors, and the Tucker decomposition is a higher-order form of principal component analysis. There are many other tensor decompositions, including INDSCAL, PARAFAC2, CANDELINC, DEDICOM, and PARATUCK2 as well as nonnegative variants of all of the above. The N-way Toolbox, Tensor Toolbox, and Multilinear Engine are examples of software packages for working with tensors. 1. Introduction. A tensor is a multidimensional array. More formally, an N-way or N th-order tensor is an element of the tensor product of N vector spaces, each of which has its own coordinate system. This notion of tensors is not to be confused with tensors in physics and engineering (such as stress tensors) [175], which are generally referred to as tensor fields in mathematics [69]. A third-order tensor has three indices, as shown in Figure 1.1. A first-order tensor is a vector, a second-order tensor is a matrix, and tensors of order three or higher are called higher-order tensors. The goal of this survey is to provide an overview of higher-order tensors and their decompositions. Though there has been active research on tensor decompositions and models (i.e., decompositions applied to data arrays for extracting and explaining their properties) for the past four decades, very little of this work has been published
translated by 谷歌翻译
The widespread use of multi-sensor technology and the emergence of big datasets has highlighted the limitations of standard flat-view matrix models and the necessity to move towards more versatile data analysis tools. We show that higher-order tensors (i.e., multiway arrays) enable such a fundamental paradigm shift towards models that are essentially polynomial and whose uniqueness, unlike the matrix methods, is guaranteed under very mild and natural conditions. Benefiting from the power of multilinear algebra as their mathematical backbone, data analysis techniques using tensor decompo-sitions are shown to have great flexibility in the choice of constraints that match data properties, and to find more general latent components in the data than matrix-based methods. A comprehensive introduction to tensor decompositions is provided from a signal processing perspective, starting from the algebraic foundations, via basic Canonical Polyadic and Tucker models, through to advanced cause-effect and multi-view data analysis schemes. We show that tensor decompositions enable natural generalizations of some commonly used signal processing paradigms, such as canonical correlation and subspace techniques, signal separation, linear regression, feature extraction and classification. We also cover computational aspects, and point out how ideas from compressed sensing and scientific computing may be used for addressing the otherwise unmanageable storage and manipulation problems associated with big datasets. The concepts are supported by illustrative real world case studies illuminating the benefits of the tensor framework, as efficient and promising tools for modern signal processing, data analysis and machine learning applications; these benefits also extend to vector/matrix data through tensorization.
translated by 谷歌翻译
低秩矩阵近似,例如截断奇异值分解和秩揭示QR分解,在印度分析和科学计算中起着重要作用。这项工作调查并扩展了最近的研究,这表明随机化提供了一个强大的工具来执行低秩矩阵近似。这些技术比传统方法更充分地利用现代计算体系结构,并打开处理真正大规模数据集的可能性。本文提出了一种用于构建计算部分矩阵分解的随机算法的模块化框架。这些方法使用randomsampling来识别捕获矩阵的大部分动作的子空间。然后将输入矩阵明确地或隐式地压缩到该子空间,并且确定性地操纵简化矩阵以获得期望的低秩。因式分解。在许多情况下,这种方法在准确性,速度和稳健性方面优于其经典竞争对手。这些声明得到了广泛的数值实验和详细的误差分析的支持。
translated by 谷歌翻译
We propose a general algorithmic framework for constrained matrix and tensor factorization, which is widely used in signal processing and machine learning. The new framework is a hybrid between alternating optimization (AO) and the alternating direction method of multipliers (ADMM): each matrix factor is updated in turn, using ADMM, hence the name AO-ADMM. This combination can naturally accommodate a great variety of constraints on the factor matrices, and almost all possible loss measures for the fitting. Computation caching and warm start strategies are used to ensure that each update is evaluated efficiently, while the outer AO framework exploits recent developments in block coordinate descent (BCD)-type methods which help ensure that every limit point is a stationary point, as well as faster and more robust convergence in practice. Three special cases are studied in detail: non-negative matrix/tensor factorization, constrained matrix/tensor completion, and dictionary learning. Extensive simulations and experiments with real data are used to showcase the effectiveness and broad applicability of the proposed framework.
translated by 谷歌翻译
Fourier PCA is Principal Component Analysis of a matrix obtained from higher order derivatives of the logarithm of the Fourier transform of a distribution. We make this method algorithmic by developing a tensor decomposition method for a pair of tensors sharing the same vectors in rank-1 decompositions. Our main application is the first provably polynomial-time algorithm for underdetermined ICA, i.e., learning an n × m matrix A from observations y = Ax where x is drawn from an unknown product distribution with arbitrary non-Gaussian components. The number of component distributions m can be arbitrarily higher than the dimension n and the columns of A only need to satisfy a natural and efficiently verifiable nondegeneracy condition. As a second application, we give an alternative algorithm for learning mixtures of spherical Gaussians with linearly independent means. These results also hold in the presence of Gaussian noise. *
translated by 谷歌翻译
非负矩阵分解(NMF)已成为信号和数据分析的主力,由其模型简约性和可解释性引发。也许有点令人惊讶的是,对模型可识别性的理解 - 主题挖掘和高光谱成像等许多应用中可解释性的主要原因 - 直到近几年才相当有限。从2010年开始,NMF的可识别性研究取得了相当大的进展。 :信号处理(SP)和机器学习(ML)社区已经发现了许多有趣且重要的结果。 NMF可识别性在实践中的许多方面都有很大的影响,例如避免配方避免和性能保证算法设计。另一方面,没有教学论文从可识别性的角度介绍NMF。在本文中,我们旨在通过提供有关NMF模型可识别性的全面而深入的教程以及与算法和应用的连接来填补这一空白。本教程将帮助研究人员和研究生掌握NMF的本质和见解,从而避免典型的“陷阱”,这些常常是由于无法识别的NMF配方造成的。本文还将帮助从业者为自己的问题挑选/设计合适的因子化工具。
translated by 谷歌翻译
最近在通过非凸优化开发用于低秩矩阵分解的可证明的准确且有效的算法方面取得了实质性进展。虽然传统智慧由于它们对伪局部最小值的敏感性而经常对非凸优化算法持模糊观点,但是诸如梯度下降的简单迭代方法在实践中已经非常成功。然而,理论上的立足点直到最近一直在进行。在本教程式概述中,我们强调了统计模型在实现高效非凸优化和性能保证方面的重要作用。我们回顾了两种对比方法:(1)两阶段算法,它包括一个定制的初始化步骤,然后是连续的细化; (2)全球景观分析和无初始化算法。讨论了几种规范矩阵分解问题,包括但不限于矩阵感测,相位检索,矩阵完成,盲去卷积,鲁棒主成分分析,相位同步和联合对齐。特别注意说明他们分析的关键技术见解。本文的作用是优化和统计的综合思想导致了有益的研究成果。
translated by 谷歌翻译
该调查突出了数值线性代数的算法的最新进展,这些算法来自线性草图技术,其中给定的矩阵,首先通过将其乘以具有某些属性的(通常)随机矩阵将其压缩到更小的矩阵。然后可以在较小的矩阵上执行大部分昂贵的计算,从而加速原始问题的解决方案。在本次调查中,我们考虑最小二乘法和稳健回归问题,低秩近似和图形分析。我们还讨论了这些问题的一些变体。最后,我们讨论了草图方法的局限性。
translated by 谷歌翻译
This paper is a survey of the theory and methods of photogrammetric bundle adjustment, aimed at potential implementors in the computer vision community. Bundle adjustment is the problem of refining a visual reconstruction to produce jointly optimal structure and viewing parameter estimates. Topics covered include: the choice of cost function and robustness; numerical optimization including sparse Newton methods, linearly convergent approximations, updating and recursive methods; gauge (datum) invariance; and quality control. The theory is developed for general robust cost functions rather than restricting attention to traditional nonlinear least squares.
translated by 谷歌翻译
估计一组随机变量的联合概率质量函数(PMF)是统计学习和信号处理的核心。没有结构假设,例如将变量建模为马尔可夫链,树或其他图形模型,联合PMF估计通常是认为不可能 - 未知数随着变量数呈指数增长。但谁给了我们结构模型?是否有一种通用的“非参数”方法来控制联合PMF的复杂性,而不依赖于关于潜在概率模型的先验结构假设?是否可以在不预先偏析分析的情况下发现操作结构?如果我们只观察变量的随机子集,我们还可以估算出所有变量的联合PMF吗?本文或许令人惊讶地表明,如果可以估计任何三个变量的联合PMF,则可以在相对温和的条件下可证实地恢复所有变量的联合PMF。结果让人联想到Kolmogorov的扩展定理 - 低维分布的一致性规范诱导了整个过程的唯一概率测量。不同之处在于,对于有限复杂度(高维PMF的等级)的过程,可以仅从三维分布获得完整的表征。事实上,不是所有的三维PMF都是必需的;而且在更严格的条件下甚至是二维的。利用多线性代数,本文证明了这种高维PMF完成可以得到保证 - 推导出几个相关的可识别性结果。它还提供了一种实用而有效的算法来执行恢复任务。提出了关于电影推荐和数据分类的精心设计的模拟和实际数据实验,以展示该方法的有效性。
translated by 谷歌翻译
Iterative constant modulus algorithms such as Godard and CMA have been used to blindly separate a superposition of co-channel constant modulus (CM) signals impinging on an antenna array. These algorithms have certain deficiencies in the context of convergence to local minima and the retrieval of all individual CM signals that are present in the channel. In this paper, we show that the underlying constant modulus fac-torization problem is, in fact, a generalized eigenvalue problem, and may be solved via a simultaneous diagonalization of a set of matrices. With this new, analytical approach, it is possible to detect the number of CM signals present in the channel, and to retrieve all of them exactly, rejecting other, non-CM signals. Only a modest amount of samples are required. The algorithm is robust in the presence of noise, and is tested on measured data, collected from an experimental setup .
translated by 谷歌翻译
这项工作考虑了一种计算和统计上有效的参数估计方法,适用于广泛的潜变量模型 - 包括高斯混合模型,隐马尔可夫模型和潜在的Dirichlet分配---在低或可逆的时刻利用一定的张量结构(通常,二阶和三阶)。具体地,参数估计被简化为提取从矩导出的对称张量的某个(正交)分解的问题;这种分解可以看作是矩阵奇异值分解的自然推广。虽然张量分解通常难以计算,但是这些特殊结构的张量的分解可以通过各种方法有效地获得,包括功耗和最大化方法(类似于矩阵的情况)。提供了一种鲁棒的张量幂方法的分析,建立了矩阵奇异向量的Wedin扰动定理的一个类别。这意味着一种强大且计算易处理的估计方法可以用于几种流行的潜变量模型。
translated by 谷歌翻译
In this paper we proposed quasi-Newton and limited memory quasi-Newton methods for objective functions defined on Grassmannians or a product of Grassmannians. Specifically we defined bfgs and l-bfgs updates in local and global coordinates on Grassmannians or a product of these. We proved that, when local coordinates are used, our bfgs updates on Grassmannians share the same optimality property as the usual bfgs updates on Euclidean spaces. When applied to the best multilinear rank approximation problem for general and symmetric tensors, our approach yields fast, robust, and accurate algorithms that exploit the special Grassmannian structure of the respective problems, and which work on tensors of large dimensions and arbitrarily high order. Extensive numerical experiments are included to substantiate our claims.
translated by 谷歌翻译
AMS classification: 15A69 18F15 57N35 41A46 65K10 15A18 65L05 In this paper, the differential geometry of the novel hierarchical Tucker format for tensors is derived. The set H T,k of tensors with fixed tree T and hierarchical rank k is shown to be a smooth quotient manifold, namely the set of orbits of a Lie group action corresponding to the non-unique basis representation of these hierarchical tensors. Explicit characterizations of the quotient manifold, its tangent space and the tangent space of H T,k are derived, suitable for high-dimensional problems. The usefulness of a complete geometric description is demonstrated by two typical applications. First, new convergence results for the nonlinear Gauss-Seidel method on H T,k are given. Notably and in contrast to earlier works on this subject , the task of minimizing the Rayleigh quotient is also addressed. Second, evolution equations for dynamic tensor approximation are formulated in terms of an explicit projection operator onto the tangent space of H T,k. In addition, a numerical comparison is made between this dynamical approach and the standard one based on truncated singular value decompositions.
translated by 谷歌翻译
Hierarchical tensors can be regarded as a generalisation, preserving many crucial features, of the singular value decomposition to higher-order tensors. For a given tensor product space, a recursive decomposition of the set of coordinates into a dimension tree gives a hierarchy of nested subspaces and corresponding nested bases. The dimensions of these subspaces yield a notion of multilinear rank. This rank tuple, as well as quasi-optimal low-rank approximations by rank truncation, can be obtained by a hierarchical singular value decomposition. For fixed multilinear ranks, the storage and operation complexity of these hierarchical representations scale only linearly in the order of the tensor. As in the matrix case, the set of hierarchical tensors of a given multilinear rank is not a convex set, but forms an open smooth manifold. A number of techniques for the computation of hierarchical low-rank approximations have been developed, including local optimisation techniques on Riemannian manifolds as well as truncated iteration methods, which can be applied for solving high-dimensional partial differential equations. This article gives a survey of these developments. We also discuss applications to problems in uncertainty quantification, to the solution of the electronic Schrödinger equation in the strongly correlated regime, and to the computation of metastable states in molecular dynamics.
translated by 谷歌翻译
Matrix factorization is a popular approach for large-scale matrix completion. The optimization formulation based on matrix factorization can be solved very efficiently by standard algorithms in practice. However, due to the non-convexity caused by the factorization model, there is a limited theoretical understanding of this formulation. In this paper, we establish a theoretical guarantee for the factorization formulation to correctly recover the underlying low-rank matrix. In particular, we show that under similar conditions to those in previous works, many standard optimization algorithms converge to the global optima of a factorization formulation, and recover the true low-rank matrix. We study the local geometry of a properly regularized factorization formulation and prove that any stationary point in a certain local region is globally optimal. A major difference of our work from the existing results is that we do not need resampling in either the algorithm or its analysis. Compared to other works on nonconvex optimization, one extra difficulty lies in analyzing nonconvex constrained optimization when the constraint (or the corresponding regularizer) is not "consistent" with the gradient direction. One technical contribution is the perturbation analysis for non-symmetric matrix factorization.
translated by 谷歌翻译
This paper provides a review and commentary on the past, present, and future of numerical optimization algorithms in the context of machine learning applications. Through case studies on text classification and the training of deep neural networks, we discuss how optimization problems arise in machine learning and what makes them challenging. A major theme of our study is that large-scale machine learning represents a distinctive setting in which the stochastic gradient (SG) method has traditionally played a central role while conventional gradient-based nonlinear optimization techniques typically falter. Based on this viewpoint, we present a comprehensive theory of a straightforward, yet versatile SG algorithm, discuss its practical behavior, and highlight opportunities for designing algorithms with improved performance. This leads to a discussion about the next generation of optimization methods for large-scale machine learning, including an investigation of two main streams of research on techniques that diminish noise in the stochastic directions and methods that make use of second-order derivative approximations.
translated by 谷歌翻译
This work was originally motivated by a classification of tensors proposed by Richard Harshman. In particular, we focus on simple and multiple "bot-tlenecks", and on "swamps". Existing theoretical results are surveyed, some numerical algorithms are described in details, and their numerical complexity is calculated. In particular, the interest in using the ELS enhancement in these algorithms is discussed. Computer simulations feed this discussion.
translated by 谷歌翻译
The low-tubal-rank tensor model has been recently proposed for real-world multidimensional data. In this paper, we study the low-tubal-rank tensor completion problem, i.e., to recover a third-order tensor by observing a subset of its elements selected uniformly at random. We propose a fast iterative algorithm, called Tubal-Alt-Min, that is inspired by a similar approach for low-rank matrix completion. The unknown low-tubal-rank tensor is represented as the product of two much smaller tensors with the low-tubal-rank property being automatically incorporated, and Tubal-Alt-Min alternates between estimating those two tensors using tensor least squares minimization. First, we note that tensor least squares minimization is different from its matrix counterpart and nontrivial as the circular convolution operator of the low-tubal-rank tensor model is intertwined with the sub-sampling operator. Second, the theoretical performance guarantee is challenging since Tubal-Alt-Min is iterative and nonconvex in nature. We prove that 1) Tubal-Alt-Min guarantees exponential convergence to the global optima, and 2) for an n × n × k tensor with tubal-rank r n, the required sampling complexity is O(nr 2 k log 3 n) and the computational complexity is O(n 2 rk 2 log 2 n). Third, on both synthetic data and real-world video data, evaluation results show that compared with tensor-nuclear norm minimization (TNN-ADMM), Tubal-Alt-Min improves the recovery error dramatically (by orders of magnitude). It is estimated that Tubal-Alt-Min converges at an exponential rate 10 −0.4423Iter where Iter denotes the number of iterations, which is much faster than TNN-ADMM's 10 −0.0332Iter , and the running time can be accelerated by more than 5 times for a 200 × 200 × 20 tensor.
translated by 谷歌翻译