 This survey provides an overview of higher-order tensor decompositions, their applications, and available software. A tensor is a multidimensional or N-way array. Decompositions of higher-order tensors (i.e., N-way arrays with N ≥ 3) have applications in psycho-metrics, chemometrics, signal processing, numerical linear algebra, computer vision, numerical analysis, data mining, neuroscience, graph analysis, and elsewhere. Two particular tensor decompositions can be considered to be higher-order extensions of the matrix singular value decomposition: CANDECOMP/PARAFAC (CP) decomposes a tensor as a sum of rank-one tensors, and the Tucker decomposition is a higher-order form of principal component analysis. There are many other tensor decompositions, including INDSCAL, PARAFAC2, CANDELINC, DEDICOM, and PARATUCK2 as well as nonnegative variants of all of the above. The N-way Toolbox, Tensor Toolbox, and Multilinear Engine are examples of software packages for working with tensors. 1. Introduction. A tensor is a multidimensional array. More formally, an N-way or N th-order tensor is an element of the tensor product of N vector spaces, each of which has its own coordinate system. This notion of tensors is not to be confused with tensors in physics and engineering (such as stress tensors) , which are generally referred to as tensor fields in mathematics . A third-order tensor has three indices, as shown in Figure 1.1. A first-order tensor is a vector, a second-order tensor is a matrix, and tensors of order three or higher are called higher-order tensors. The goal of this survey is to provide an overview of higher-order tensors and their decompositions. Though there has been active research on tensor decompositions and models (i.e., decompositions applied to data arrays for extracting and explaining their properties) for the past four decades, very little of this work has been published
translated by 谷歌翻译 The widespread use of multi-sensor technology and the emergence of big datasets has highlighted the limitations of standard flat-view matrix models and the necessity to move towards more versatile data analysis tools. We show that higher-order tensors (i.e., multiway arrays) enable such a fundamental paradigm shift towards models that are essentially polynomial and whose uniqueness, unlike the matrix methods, is guaranteed under very mild and natural conditions. Benefiting from the power of multilinear algebra as their mathematical backbone, data analysis techniques using tensor decompo-sitions are shown to have great flexibility in the choice of constraints that match data properties, and to find more general latent components in the data than matrix-based methods. A comprehensive introduction to tensor decompositions is provided from a signal processing perspective, starting from the algebraic foundations, via basic Canonical Polyadic and Tucker models, through to advanced cause-effect and multi-view data analysis schemes. We show that tensor decompositions enable natural generalizations of some commonly used signal processing paradigms, such as canonical correlation and subspace techniques, signal separation, linear regression, feature extraction and classification. We also cover computational aspects, and point out how ideas from compressed sensing and scientific computing may be used for addressing the otherwise unmanageable storage and manipulation problems associated with big datasets. The concepts are supported by illustrative real world case studies illuminating the benefits of the tensor framework, as efficient and promising tools for modern signal processing, data analysis and machine learning applications; these benefits also extend to vector/matrix data through tensorization.
translated by 谷歌翻译 translated by 谷歌翻译

2009-09-22 translated by 谷歌翻译 We propose a general algorithmic framework for constrained matrix and tensor factorization, which is widely used in signal processing and machine learning. The new framework is a hybrid between alternating optimization (AO) and the alternating direction method of multipliers (ADMM): each matrix factor is updated in turn, using ADMM, hence the name AO-ADMM. This combination can naturally accommodate a great variety of constraints on the factor matrices, and almost all possible loss measures for the fitting. Computation caching and warm start strategies are used to ensure that each update is evaluated efficiently, while the outer AO framework exploits recent developments in block coordinate descent (BCD)-type methods which help ensure that every limit point is a stationary point, as well as faster and more robust convergence in practice. Three special cases are studied in detail: non-negative matrix/tensor factorization, constrained matrix/tensor completion, and dictionary learning. Extensive simulations and experiments with real data are used to showcase the effectiveness and broad applicability of the proposed framework.
translated by 谷歌翻译 Fourier PCA is Principal Component Analysis of a matrix obtained from higher order derivatives of the logarithm of the Fourier transform of a distribution. We make this method algorithmic by developing a tensor decomposition method for a pair of tensors sharing the same vectors in rank-1 decompositions. Our main application is the first provably polynomial-time algorithm for underdetermined ICA, i.e., learning an n × m matrix A from observations y = Ax where x is drawn from an unknown product distribution with arbitrary non-Gaussian components. The number of component distributions m can be arbitrarily higher than the dimension n and the columns of A only need to satisfy a natural and efficiently verifiable nondegeneracy condition. As a second application, we give an alternative algorithm for learning mixtures of spherical Gaussians with linearly independent means. These results also hold in the presence of Gaussian noise. *
translated by 谷歌翻译

2018-03-03 translated by 谷歌翻译

2018-09-25 translated by 谷歌翻译 translated by 谷歌翻译 This paper is a survey of the theory and methods of photogrammetric bundle adjustment, aimed at potential implementors in the computer vision community. Bundle adjustment is the problem of refining a visual reconstruction to produce jointly optimal structure and viewing parameter estimates. Topics covered include: the choice of cost function and robustness; numerical optimization including sparse Newton methods, linearly convergent approximations, updating and recursive methods; gauge (datum) invariance; and quality control. The theory is developed for general robust cost functions rather than restricting attention to traditional nonlinear least squares.
translated by 谷歌翻译

2017-12-01 translated by 谷歌翻译 Iterative constant modulus algorithms such as Godard and CMA have been used to blindly separate a superposition of co-channel constant modulus (CM) signals impinging on an antenna array. These algorithms have certain deficiencies in the context of convergence to local minima and the retrieval of all individual CM signals that are present in the channel. In this paper, we show that the underlying constant modulus fac-torization problem is, in fact, a generalized eigenvalue problem, and may be solved via a simultaneous diagonalization of a set of matrices. With this new, analytical approach, it is possible to detect the number of CM signals present in the channel, and to retrieve all of them exactly, rejecting other, non-CM signals. Only a modest amount of samples are required. The algorithm is robust in the presence of noise, and is tested on measured data, collected from an experimental setup .
translated by 谷歌翻译

2012-10-29 translated by 谷歌翻译 In this paper we proposed quasi-Newton and limited memory quasi-Newton methods for objective functions defined on Grassmannians or a product of Grassmannians. Specifically we defined bfgs and l-bfgs updates in local and global coordinates on Grassmannians or a product of these. We proved that, when local coordinates are used, our bfgs updates on Grassmannians share the same optimality property as the usual bfgs updates on Euclidean spaces. When applied to the best multilinear rank approximation problem for general and symmetric tensors, our approach yields fast, robust, and accurate algorithms that exploit the special Grassmannian structure of the respective problems, and which work on tensors of large dimensions and arbitrarily high order. Extensive numerical experiments are included to substantiate our claims.
translated by 谷歌翻译 AMS classification: 15A69 18F15 57N35 41A46 65K10 15A18 65L05 In this paper, the differential geometry of the novel hierarchical Tucker format for tensors is derived. The set H T,k of tensors with fixed tree T and hierarchical rank k is shown to be a smooth quotient manifold, namely the set of orbits of a Lie group action corresponding to the non-unique basis representation of these hierarchical tensors. Explicit characterizations of the quotient manifold, its tangent space and the tangent space of H T,k are derived, suitable for high-dimensional problems. The usefulness of a complete geometric description is demonstrated by two typical applications. First, new convergence results for the nonlinear Gauss-Seidel method on H T,k are given. Notably and in contrast to earlier works on this subject , the task of minimizing the Rayleigh quotient is also addressed. Second, evolution equations for dynamic tensor approximation are formulated in terms of an explicit projection operator onto the tangent space of H T,k. In addition, a numerical comparison is made between this dynamical approach and the standard one based on truncated singular value decompositions.
translated by 谷歌翻译 Hierarchical tensors can be regarded as a generalisation, preserving many crucial features, of the singular value decomposition to higher-order tensors. For a given tensor product space, a recursive decomposition of the set of coordinates into a dimension tree gives a hierarchy of nested subspaces and corresponding nested bases. The dimensions of these subspaces yield a notion of multilinear rank. This rank tuple, as well as quasi-optimal low-rank approximations by rank truncation, can be obtained by a hierarchical singular value decomposition. For fixed multilinear ranks, the storage and operation complexity of these hierarchical representations scale only linearly in the order of the tensor. As in the matrix case, the set of hierarchical tensors of a given multilinear rank is not a convex set, but forms an open smooth manifold. A number of techniques for the computation of hierarchical low-rank approximations have been developed, including local optimisation techniques on Riemannian manifolds as well as truncated iteration methods, which can be applied for solving high-dimensional partial differential equations. This article gives a survey of these developments. We also discuss applications to problems in uncertainty quantification, to the solution of the electronic Schrödinger equation in the strongly correlated regime, and to the computation of metastable states in molecular dynamics.
translated by 谷歌翻译 Matrix factorization is a popular approach for large-scale matrix completion. The optimization formulation based on matrix factorization can be solved very efficiently by standard algorithms in practice. However, due to the non-convexity caused by the factorization model, there is a limited theoretical understanding of this formulation. In this paper, we establish a theoretical guarantee for the factorization formulation to correctly recover the underlying low-rank matrix. In particular, we show that under similar conditions to those in previous works, many standard optimization algorithms converge to the global optima of a factorization formulation, and recover the true low-rank matrix. We study the local geometry of a properly regularized factorization formulation and prove that any stationary point in a certain local region is globally optimal. A major difference of our work from the existing results is that we do not need resampling in either the algorithm or its analysis. Compared to other works on nonconvex optimization, one extra difficulty lies in analyzing nonconvex constrained optimization when the constraint (or the corresponding regularizer) is not "consistent" with the gradient direction. One technical contribution is the perturbation analysis for non-symmetric matrix factorization.
translated by 谷歌翻译 This paper provides a review and commentary on the past, present, and future of numerical optimization algorithms in the context of machine learning applications. Through case studies on text classification and the training of deep neural networks, we discuss how optimization problems arise in machine learning and what makes them challenging. A major theme of our study is that large-scale machine learning represents a distinctive setting in which the stochastic gradient (SG) method has traditionally played a central role while conventional gradient-based nonlinear optimization techniques typically falter. Based on this viewpoint, we present a comprehensive theory of a straightforward, yet versatile SG algorithm, discuss its practical behavior, and highlight opportunities for designing algorithms with improved performance. This leads to a discussion about the next generation of optimization methods for large-scale machine learning, including an investigation of two main streams of research on techniques that diminish noise in the stochastic directions and methods that make use of second-order derivative approximations.
translated by 谷歌翻译 This work was originally motivated by a classification of tensors proposed by Richard Harshman. In particular, we focus on simple and multiple "bot-tlenecks", and on "swamps". Existing theoretical results are surveyed, some numerical algorithms are described in details, and their numerical complexity is calculated. In particular, the interest in using the ELS enhancement in these algorithms is discussed. Computer simulations feed this discussion.
translated by 谷歌翻译 The low-tubal-rank tensor model has been recently proposed for real-world multidimensional data. In this paper, we study the low-tubal-rank tensor completion problem, i.e., to recover a third-order tensor by observing a subset of its elements selected uniformly at random. We propose a fast iterative algorithm, called Tubal-Alt-Min, that is inspired by a similar approach for low-rank matrix completion. The unknown low-tubal-rank tensor is represented as the product of two much smaller tensors with the low-tubal-rank property being automatically incorporated, and Tubal-Alt-Min alternates between estimating those two tensors using tensor least squares minimization. First, we note that tensor least squares minimization is different from its matrix counterpart and nontrivial as the circular convolution operator of the low-tubal-rank tensor model is intertwined with the sub-sampling operator. Second, the theoretical performance guarantee is challenging since Tubal-Alt-Min is iterative and nonconvex in nature. We prove that 1) Tubal-Alt-Min guarantees exponential convergence to the global optima, and 2) for an n × n × k tensor with tubal-rank r n, the required sampling complexity is O(nr 2 k log 3 n) and the computational complexity is O(n 2 rk 2 log 2 n). Third, on both synthetic data and real-world video data, evaluation results show that compared with tensor-nuclear norm minimization (TNN-ADMM), Tubal-Alt-Min improves the recovery error dramatically (by orders of magnitude). It is estimated that Tubal-Alt-Min converges at an exponential rate 10 −0.4423Iter where Iter denotes the number of iterations, which is much faster than TNN-ADMM's 10 −0.0332Iter , and the running time can be accelerated by more than 5 times for a 200 × 200 × 20 tensor.
translated by 谷歌翻译
\${authors}

\${pubdate} \${abstract_cn}
translated by 谷歌翻译