This is a survey of nonlinear approximation, especially that part of the subject which is important in numerical computation. Nonlinear approximation means that the approximants do not come from linear spaces but rather from nonlinear manifolds. The central question to be studied is what, if any, are the advantages of nonlinear approximation over the simpler, more established, linear methods. This question is answered by studying the rate of approximation which is the decrease in error versus the number of parameters in the approx-imant. The number of parameters usually correlates well with computational effort. It is shown that in many settings the rate of nonlinear approximation can be characterized by certain smoothness conditions which are significantly weaker than required in the linear theory. Emphasis in the survey will be placed on approximation by piecewise polynomials and wavelets as well as their numerical implementation. Results on highly nonlinear methods such as optimal basis selection and greedy algorithms (adaptive pursuit) are also given. Applications to image processing, statistical estimation, regularity for PDEs, and adaptive algorithms are discussed.
translated by 谷歌翻译
This paper describes two digital implementations of a new mathematical transform, namely, the second generation curvelet transform [12, 10] in two and three dimensions. The first digital transformation is based on unequally-spaced fast Fourier transforms (USFFT) while the second is based on the wrapping of specially selected Fourier samples. The two implementations essentially differ by the choice of spatial grid used to translate curvelets at each scale and angle. Both digital transformations return a table of digital curvelet coefficients indexed by a scale parameter, an orientation parameter, and a spatial location parameter. And both implementations are fast in the sense that they run in O(n 2 log n) flops for n by n Cartesian arrays; in addition, they are also invertible, with rapid inversion algorithms of about the same complexity. Our digital transformations improve upon earlier implementations-based upon the first generation of curvelets-in the sense that they are conceptually simpler, faster and far less redundant. The software CurveLab, which implements both transforms presented in this paper, is available at http://www.curvelet.org.
translated by 谷歌翻译
The coming century is surely the century of data. A combination of blind faith and serious purpose makes our society invest massively in the collection and processing of data of all kinds, on scales unimaginable until recently. Hyperspectral Imagery, Internet Portals, Financial tick-by-tick data, and DNA Microarrays are just a few of the better-known sources, feeding data in torrential streams into scientific and business databases worldwide. In traditional statistical data analysis, we think of observations of instances of particular phenomena (e.g. instance ↔ human being), these observations being a vector of values we measured on several variables (e.g. blood pressure, weight, height, ...). In traditional statistical methodology, we assumed many observations and a few, well-chosen variables. The trend today is towards more observations but even more so, to radically larger numbers of variables-voracious, automatic, systematic collection of hyper-informative detail about each observed instance. We are seeing examples where the observations gathered on individual instances are curves, or spectra, or images, or even movies, so that a single observation has dimensions in the thousands or billions, while there are only tens or hundreds of instances available for study. Classical methods are simply not designed to cope with this kind of explosive growth of dimensionality of the observation vector. We can say with complete confidence that in the coming century , high-dimensional data analysis will be a very significant activity, and completely new methods of high-dimensional data analysis will be developed; we just don't know what they are yet. Mathematicians are ideally prepared for appreciating the abstract issues involved in finding patterns in such high-dimensional data. Two of the most influential principles in the coming century will be principles originally discovered and cultivated by mathematicians: the blessings of dimensionality and the curse of dimensionality. The curse of dimensionality is a phrase used by several subfields in the mathematical sciences; I use it here to refer to the apparent intractability of systematically searching through a high-dimensional space, the apparent intractability of accurately approximating a general high-dimensional function, the apparent intractability of integrating a high-dimensional function. The blessings of dimensionality are less widely noted, but they include the concentration of measure phenomenon (so-called in the geometry of Banach spaces), which means that certain random fluctuations are very well controlled in high dimensions and the success of asymptotic methods, used widely in mathematical statistics and statistical physics, which suggest that statements about very high-dimensional settings may be made where moderate dimensions would be too complicated. There is a large body of interesting work going on in the mathematical sciences, both to attack the curse of dimensionality in specific way
translated by 谷歌翻译
This paper presents an account of the current state of sampling, 50 years after Shannon's formulation of the sampling theorem. The emphasis is on regular sampling, where the grid is uniform. This topic has benefited from a strong research revival during the past few years, thanks in part to the mathematical connections that were made with wavelet theory. To introduce the reader to the modern, Hilbert-space formulation, we reinterpret Shannon's sampling procedure as an orthogonal projection onto the subspace of band-limited functions. We then extend the standard sampling paradigm for a representation of functions in the more general class of "shift-in-variant" functions spaces, including splines and wavelets. Practically , this allows for simpler-and possibly more realistic-interpolation models, which can be used in conjunction with a much wider class of (anti-aliasing) prefilters that are not necessarily ideal low-pass. We summarize and discuss the results available for the determination of the approximation error and of the sampling rate when the input of the system is essentially arbitrary; e.g., nonban-dlimited. We also review variations of sampling that can be understood from the same unifying perspective. These include wavelets, multiwavelets, Papoulis generalized sampling, finite elements, and frames. Irregular sampling and radial basis functions are briefly mentioned.
translated by 谷歌翻译
We study what we call ''dyadic CART''a method of nonparametric regression which constructs a recursive partition by optimizing a complexity penalized sum of squares, where the optimization is over all recursive partitions arising from midpoint splits. We show that the method is adaptive to unknown degrees of anisotropic smoothness. Specifically, consider the anisotropic smoothness classes of Nikol'skii, consisting of bivari-Ž. ate functions f x , x whose finite difference of distance h in direction i 1 2 is bounded in L p norm by Ch i , i s 1, 2. We show that dyadic CART, with 2 Ž. an appropriate complexity penalty parameter ; Const log n , is within logarithmic terms of minimax over every anisotropic smoothness class 0-C-, 0-, F 1. 1 2 The proof shows that dyadic CART is identical to a certain adaptive best-ortho-basis algorithm based on the library of all anisotropic Haar bases. Then it applies empirical basis selection ideas of Donoho and Johnstone. The basis empirically selected by dyadic CART is shown to be nearly as good as a basis ideally adapted to the underlying f. The risk of estimation in an ideally adapted anisotropic Haar basis is shown to be comparable to the minimax risk over anisotropic smoothness classes. Underlying the success of this argument is harmonic analysis of anisotropic smoothness classes. We show that, for each anisotropic smoothness class, there is an anisotropic Haar basis which is a best orthogonal basis for representing that smoothness class; the basis is optimal not just within the library of anisotropic Haar bases, but among 2 w x 2 all orthogonal bases of L 0, 1 .
translated by 谷歌翻译
Compressed sensing (CS) is an emerging field that has attracted considerable research interest over the past few years. Previous review articles in CS limit their scope to standard discrete-to-discrete measurement architectures using matrices of randomized nature and signal models based on standard sparsity. In recent years, CS has worked its way into several new application areas. This, in turn, necessitates a fresh look on many of the basics of CS. The random matrix measurement operator must be replaced by more structured sensing architectures that correspond to the characteristics of feasible acquisition hardware. The standard sparsity prior has to be extended to include a much richer class of signals and to encode broader data models, including continuous-time signals. In our overview, the theme is exploiting signal and measurement structure in compressive sensing. The prime focus is bridging theory and practice; that is, to pinpoint the potential of structured CS strategies to emerge from the math to the hardware. Our summary highlights new directions as well as relations to more traditional CS, with the hope of serving both as a review to practitioners wanting to join this emerging field, and as a reference for researchers that attempts to put some of the existing ideas in perspective of practical applications.
translated by 谷歌翻译
This paper introduces new tight frames of curvelets to address the problem of finding optimally sparse representations of objects with discontinuities along C 2 edges. Conceptually, the curvelet transform is a multiscale pyramid with many directions and positions at each length scale, and needle-shaped elements at fine scales. These elements have many useful geometric multiscale features that set them apart from classical mul-tiscale representations such as wavelets. For instance, curvelets obey a parabolic scaling relation which says that at scale 2 −j , each element has an envelope which is aligned along a 'ridge' of length 2 −j/2 and width 2 −j. We prove that curvelets provide an essentially optimal representation of typical objects f which are C 2 except for discontinuities along C 2 curves. Such representations are nearly as sparse as if f were not singular and turn out to be far more sparse than the wavelet decomposition of the object. For instance, the n-term partial reconstruction f C n obtained by selecting the n largest terms in the curvelet series obeys f − f C n 2 L2 ≤ C · n −2 · (log n) 3 , n → ∞. This rate of convergence holds uniformly over a class of functions which are C 2 except for discontinuities along C 2 curves and is essentially optimal. In comparison, the squared error of n-term wavelet approximations only converges as n −1 as n → ∞, which is considerably worst than the optimal behavior.
translated by 谷歌翻译
Tree approximation is a new form of nonlinear approximation which appears naturally in some applications such as image processing and adaptive numerical methods. It is somewhat more restrictive than the usual n-term approximation. We show that the restrictions of tree approximation cost little in terms of rates of approximation. We then use that result to design encoders for compression. These encoders are universal (they apply to general functions) and progressive (increasing accuracy is obtained by sending bit stream increments). We show optimality of the encoders in the sense that they provide upper estimates for the Kolmogorov entropy of Besov balls. 
translated by 谷歌翻译
Wideband analog signals push contemporary analog-to-digital conversion systems to their performance limits. In many applications, however, sampling at the Nyquist rate is inefficient because the signals of interest contain only a small number of significant frequencies relative to the ban-dlimit, although the locations of the frequencies may not be known a priori. For this type of sparse signal, other sampling strategies are possible. This paper describes a new type of data acquisition system, called a random demodulator, that is constructed from robust, readily available components. Let K denote the total number of frequencies in the signal, and let W denote its bandlimit in Hz. Simulations suggest that the random demodulator requires just O(K log(W/K)) samples per second to stably reconstruct the signal. This sampling rate is exponentially lower than the Nyquist rate of W Hz. In contrast with Nyquist sampling, one must use nonlinear methods, such as convex programming, to recover the signal from the samples taken by the random demodulator. This paper provides a detailed theoretical analysis of the system's performance that supports the empirical observations.
translated by 谷歌翻译
Diffusion wavelets
分类:
Our goal in this paper is to show that many of the tools of signal processing, adapted Fourier and wavelet analysis can be naturally lifted to the setting of digital data clouds, graphs, and manifolds. We use diffusion as a smoothing and scaling tool to enable coarse graining and multiscale analysis. Given a diffusion operator T on a manifold or a graph, with large powers of low rank, we present a general multiresolution construction for efficiently computing, representing and compressing T t. This allows a direct multiscale computation, to high precision, of functions of the operator, notably the associated Green's function, in compressed form, and their fast application. Classes of operators for which these computations are fast include certain diffusion-like operators, in any dimension, on manifolds, graphs, and in non-homogeneous media. We use ideas related to the Fast Multipole Methods and to the wavelet analysis of Calderón-Zygmund and pseudo-differential operators, to numerically enforce the emergence of a natural hierarchical coarse graining of a manifold, graph or data set. For example for a body of text documents the construction leads to a directory structure at different levels of generalization. The dyadic powers of an operator can be used to induce a multiresolution analysis, as in classical Littlewood-Paley and wavelet theory: we construct, with efficient and stable algorithms, bases of orthonormal scaling functions and wavelets associated to this multiresolution analysis, together with the corresponding downsampling operators, and use them to compress the corresponding powers of the operator. While most of our discussion deals with symmetric operators and relates to localization to spectral bands, the symmetry of the operators and their spectral theory need not be considered, as the main assumption is reduction of the numerical ranks as we take powers of the operator.
translated by 谷歌翻译
We study a simple ''horizon model'' for the problem of recovering an image from noisy data; in this model the image has an edge with-Holder¨regularityHolder¨ Holder¨regularity. Adopting the viewpoint of computational harmonic analysis, we develop an overcomplete collection of atoms called wedgelets, dyadi-cally organized indicator functions with a variety of locations, scales and orientations. The wedgelet representation provides nearly optimal representations of objects in the horizon model, as measured by minimax description length. We show how to rapidly compute a wedgelet approximation to noisy data by finding a special edgelet-decorated recursive partition which minimizes a complexity-penalized sum of squares. This estimate, using sufficient subpixel resolution, achieves nearly the minimax mean-squared error in the horizon model. In fact, the method is adaptive in the sense that it achieves nearly the minimax risk for any value of the unknown degree of regularity of the horizon, 1 F F 2. Wedgelet analysis and denoising may be used successfully outside the horizon model. We study images modelled as indicators of star-shaped sets with smooth boundaries and show that complexity-penalized wedgelet partitioning achieves nearly the minimax risk in that setting also.
translated by 谷歌翻译
低秩矩阵近似,例如截断奇异值分解和秩揭示QR分解,在印度分析和科学计算中起着重要作用。这项工作调查并扩展了最近的研究,这表明随机化提供了一个强大的工具来执行低秩矩阵近似。这些技术比传统方法更充分地利用现代计算体系结构,并打开处理真正大规模数据集的可能性。本文提出了一种用于构建计算部分矩阵分解的随机算法的模块化框架。这些方法使用randomsampling来识别捕获矩阵的大部分动作的子空间。然后将输入矩阵明确地或隐式地压缩到该子空间,并且确定性地操纵简化矩阵以获得期望的低秩。因式分解。在许多情况下,这种方法在准确性,速度和稳健性方面优于其经典竞争对手。这些声明得到了广泛的数值实验和详细的误差分析的支持。
translated by 谷歌翻译
Compressed sensing
分类:
Suppose x is an unknown vector in R m (depending on context, a digital image or signal); we plan to acquire data and then reconstruct. Nominally this 'should' require m samples. But suppose we know a priori that x is compressible by transform coding with a known transform, and we are allowed to acquire data about x by measuring n general linear functionals-rather than the usual pixels. If the collection of linear functionals is well-chosen, and we allow for a degree of reconstruction error, the size of n can be dramatically smaller than the size m usually considered necessary. Thus, certain natural classes of images with m pixels need only n = O(m 1/4 log 5/2 (m)) nonadaptive nonpixel samples for faithful recovery, as opposed to the usual m pixel samples. Our approach is abstract and general. We suppose that the object has a sparse representation in some orthonormal basis (eg. wavelet, Fourier) or tight frame (eg curvelet, Gabor), meaning that the coefficients belong to an p ball for 0 < p ≤ 1. This implies that the N most important coefficients in the expansion allow a reconstruction with 2 error O(N 1/2−1/p). It is possible to design n = O(N log(m)) nonadaptive measurements which contain the information necessary to reconstruct any such object with accuracy comparable to that which would be possible if the N most important coefficients of that object were directly observable. Moreover, a good approximation to those N important coefficients may be extracted from the n measurements by solving a convenient linear program, called by the name Basis Pursuit in the signal processing literature. The nonadaptive measurements have the character of 'random' linear combinations of basis/frame elements. These results are developed in a theoretical framework based on the theory of optimal recovery , the theory of n-widths, and information-based complexity. Our basic results concern properties of p balls in high-dimensional Euclidean space in the case 0 < p ≤ 1. We estimate the Gel'fand n-widths of such balls, give a criterion for near-optimal subspaces for Gel'fand n-widths, show that 'most' subspaces are near-optimal, and show that convex optimization can be used for processing information derived from these near-optimal subspaces. The techniques for deriving near-optimal subspaces include the use of almost-spherical sections in Banach space theory.
translated by 谷歌翻译
We consider a model problem of recovering a function f (x 1 , x 2) from noisy Radon data. The function f to be recovered is assumed smooth apart from a discontinuity along a C 2 curve, that is, an edge. We use the continuum white-noise model, with noise level ε. Traditional linear methods for solving such inverse problems behave poorly in the presence of edges. Qualitatively, the reconstructions are blurred near the edges; quantitatively, they give in our model mean squared errors (MSEs) that tend to zero with noise level ε only as O(ε 1/2) as ε → 0. A recent innovation-nonlinear shrinkage in the wavelet domain-visually improves edge sharpness and improves MSE convergence to O(ε 2/3). However, as we show here, this rate is not optimal. In fact, essentially optimal performance is obtained by deploying the recently-introduced tight frames of curvelets in this setting. Curvelets are smooth, highly anisotropic elements ideally suited for detecting and synthesizing curved edges. To deploy them in the Radon setting, we construct a curvelet-based biorthogonal decomposition of the Radon operator and build "curvelet shrinkage" estimators based on thresholding of the noisy curvelet coefficients. In effect, the estimator detects edges at certain locations and orientations in the Radon domain and automatically synthesizes edges at corresponding locations and directions in the original domain. We prove that the curvelet shrinkage can be tuned so that the estimator will attain, within logarithmic factors, the MSE O(ε 4/5) as noise level ε → 0. This rate of convergence holds uniformly over a class of functions which are C 2 except for discontinuities along C 2 curves, and (except for log terms) is the minimax rate for that class. Our approach is an instance of a general strategy which should apply in other inverse problems; we sketch a deconvolution example.
translated by 谷歌翻译
A unified view of the area of sparse signal processing is presented in tutorial form by bringing together various fields in which the property of sparsity has been successfully exploited. For each of these fields, various algorithms and techniques, which have been developed to leverage sparsity, are described succinctly. The common potential benefits of significant reduction in sampling rate and processing manipulations through sparse signal processing are revealed. The key application domains of sparse signal processing are sampling, coding, spectral estimation, array processing, component analysis, and multipath channel estimation. In terms of the sampling process and reconstruction algorithms, linkages are made with random sampling, compressed sensing, and rate of innovation. The redundancy introduced by channel coding in finite and real Galois fields is then related to over-sampling with similar reconstruction algorithms. The error locator polynomial (ELP) and iterative methods are shown to work quite effectively for both sampling and coding applications. The methods of Prony, Pisarenko, and MUltiple SIgnal Classification (MUSIC) are next shown to be targeted at analyzing signals with sparse frequency domain representations. Specifically, the relations of the approach of Prony to an annihilating filter in rate of innovation and ELP in coding are emphasized; the Pisarenko and MUSIC methods are further improvements of the Prony method under noisy environments. The iterative methods developed for sampling and coding applications are shown to be powerful tools in spectral estimation. Such narrowband spectral estimation is then related to multi-source location and direction of arrival estimation in array processing. Sparsity in unobservable source signals is also shown to facilitate source separation in sparse component analysis; the algorithms developed in this area such as linear programming and matching pursuit are also widely used in compressed sensing. Finally, the multipath channel estimation problem is shown to have a sparse formulation; algorithms similar to sampling and coding are used to estimate typical multicarrier communication channels.
translated by 谷歌翻译
We attempt to recover an unknown function from noisy, sampled data. Using orthonormal bases of compactly supported wavelets, we develop a nonlinear method which works in the wavelet domain by simple nonlinear shrinkage of the empirical wavelet coefficients. The shrinkage can be tuned to be nearly minimax over any member of a wide range of Triebel-and Besov-type smoothness constraints and asymptotically mini-max over Besov bodies with p F q. Linear estimates cannot achieve even the minimax rates over Triebel and Besov classes with p-2, so the Ž method can significantly outperform every linear method e.g., kernel,. smoothing spline, sieve in a minimax sense. Variants of our method based on simple threshold nonlinear estimators are nearly minimax. Our method possesses the interpretation of spatial adaptivity; it reconstructs using a kernel which may vary in shape and bandwidth from point to point, depending on the data. Least favorable distributions for certain of the Triebel and Besov scales generate objects with sparse wavelet transforms. Many real objects have similarly sparse transforms, which suggests that these minimax results are relevant for practical problems. Sequels to this paper, which was first drafted in November 1990, discuss practical implementation , spatial adaptation properties, universal near minimaxity and applications to inverse problems.
translated by 谷歌翻译
Performance bounds for criteria for model selection are developed using recent theory for sieves. The model selection criteria are based on an empirical loss or contrast function with an added penalty term motivated by empirical process theory and roughly proportional to the number of parameters needed to describe the model divided by the number of observations. Most of our examples involve density or regression estimation settings and we focus on the problem of estimating the unknown density or regression function. We show that the quadratic risk of the minimum penalized empirical contrast estimator is bounded by an index of the accuracy of the sieve. This accuracy index quantifies the trade-off among the candidate models between the approximation error and parameter dimension relative to sample size. If we choose a list of models which exhibit good approximation properties with respect to different classes of smoothness, the estimator can be simultaneously minimax rate optimal in each of those classes. This is what is usually called adaptation. The type of classes of smoothness in which one gets adaptation depends heavily on the list of models. If too many models are involved in order to get accurate approximation of many wide classes of functions simultaneously, it may happen that the estimator is only approx-Work supported in part by the NSF grant ECS-9410760, and URA CNRS 1321 "Statistique et modèles aléatoires", and URA CNRS 743 "Modélisation stochastique et Statistique". A. Barron et al. imately adaptive (typically up to a slowly varying function of the sample size). We shall provide various illustrations of our method such as penalized maximum likelihood, projection or least squares estimation. The models will involve commonly used finite dimensional expansions such as piece-wise polynomials with fixed or variable knots, trigonometric polynomials, wavelets, neural nets and related nonlinear expansions defined by superpo-sition of ridge functions.
translated by 谷歌翻译
Suppose we are given a vector f in a class F ⊂ R N , e.g. a class of digital signals or digital images. How many linear measurements do we need to make about f to be able to recover f to within precision ǫ in the Euclidean (ℓ 2) metric? This paper shows that if the objects of interest are sparse in a fixed basis or com-pressible, then it is possible to reconstruct f to within very high accuracy from a small number of random measurements by solving a simple linear program. More precisely, suppose that the nth largest entry of the vector |f | (or of its coefficients in a fixed basis) obeys |f | (n) ≤ R · n −1/p , where R > 0 and p > 0. Suppose that we take measurements y k = f, X k , k = 1,. .. , K, where the X k are N-dimensional Gaussian vectors with independent standard normal entries. Then for each f obeying the decay estimate above for some 0 < p < 1 and with overwhelming probability, our reconstruction f ♯ , defined as the solution to the constraints y k = f ♯ , X k with minimal ℓ 1 norm, obeys f − f ♯ ℓ2 ≤ C p · R · (K/ log N) −r , r = 1/p − 1/2. There is a sense in which this result is optimal; it is generally impossible to obtain a higher accuracy from any set of K measurements whatsoever. The methodology extends to various other random measurement ensembles; for example, we show that similar results hold if one observes few randomly sampled Fourier coefficients of f. In fact, the results are quite general and require only two hypotheses on the measurement ensemble which are detailed.
translated by 谷歌翻译
This paper provides a review and commentary on the past, present, and future of numerical optimization algorithms in the context of machine learning applications. Through case studies on text classification and the training of deep neural networks, we discuss how optimization problems arise in machine learning and what makes them challenging. A major theme of our study is that large-scale machine learning represents a distinctive setting in which the stochastic gradient (SG) method has traditionally played a central role while conventional gradient-based nonlinear optimization techniques typically falter. Based on this viewpoint, we present a comprehensive theory of a straightforward, yet versatile SG algorithm, discuss its practical behavior, and highlight opportunities for designing algorithms with improved performance. This leads to a discussion about the next generation of optimization methods for large-scale machine learning, including an investigation of two main streams of research on techniques that diminish noise in the stochastic directions and methods that make use of second-order derivative approximations.
translated by 谷歌翻译