Statistical Inverse Problems in Hilbert Scales
Abstract.
In this paper, we study the Tikhonov regularization scheme in Hilbert scales for the nonlinear statistical inverse problem with a general noise. The regularizing norm in this scheme is stronger than the norm in Hilbert space. We focus on developing a theoretical analysis for this scheme based on the conditional stability estimates. We utilize the concept of the distance function to establish the high probability estimates of the direct and reconstruction error in Reproducing kernel Hilbert space setting. Further, the explicit rates of convergence in terms of sample size are established for the oversmoothing case and the regular case over the regularity class defined through appropriate source condition. Our results improve and generalize previous results obtained in related settings.
Key words and phrases:
Statistical inverse problem; Tikhonov regularization; Hilbert Scales; Reproducing kernel Hilbert space; Minimax convergence rates.2010 Mathematics Subject Classification:
Primary: 62G20; Secondary: 62G08, 65J15, 65J20, 65J22.1. Introduction
In this paper, we study the nonlinear operator equation
(1) |
with the infinite dimensional separable real Hilbert spaces and with the inner products and , respectively. Here, is the space of functions between a Polish space and a real separable Hilbert space . We observe the noisy values of the function at the inputs :
(2) |
for . Here, is the number of observations which is called the sample size. In contrary to the direct learning scheme where we estimate the function , here we aim to estimate the function directly from the observations.
A common approach to stably approximate the solution of equation (1) is the Tikhonov regularization scheme. Sometimes, we have the information about the true solution, e.g., the true solution may be differentiable. To incorporate this information, we employ the Tikhonov regularization in Hilbert scales. This scheme consists of a functional which is a linear combination of a fidelity term measuring the fitness of the data and a penalty term in a stronger norm forcing the smoothness in the approximated solution. To define this scheme, we introduce a densely defined, unbounded, closed, linear, self-adjoint, strictly positive operator such that for some ,
(3) |
Here, we observe that is a bounded operator from the strict positivity of the operator .
The Tikhonov functional for the considered nonlinear inverse problem with sample is given by
where is an initial guess. The regularization parameter has to balance both terms appropriately. Then, the Tikhonov regularization scheme in Hilbert scales can be defined as
(4) |
For the continuous and weakly sequentially closed operator , there exists a global solution of the regularization scheme in (4). But it is not necessarily unique, since is nonlinear (see [14, Section 4.1.1]).
We consider the Hilbert scales generated by the operator . Here the spaces are Hilbert spaces equipped with the inner product . For the Hilbert scales, we have the well-known interpolation inequality
(5) |
which holds for any .
The regularization schemes in Hilbert scales have been well-studied and analysed under the different assumptions in the classical inverse problems [2, 5, 7, 14]. In learning theory, the general regularization in Hilbert scales is introduced for the linear inverse problems and established the rates of convergence [13]. Nicole et al. [10] studied the Stochastic Gradient Descent in Hilbert scales and the author provided different examples of Hilbert scales in learning. Further, the authors discussed the error estimates for the Stochastic Gradient Descent scheme for the direct learning problem. In the paper [11], the rates of convergence are established for nonlinear statistical inverse learning problems in the RKHS setting. The authors considered some assumptions on the nonlinearity of the operator such as Fréchet differentiability of the operator, Lipschitz continuity of the Fréchet derivative, and the link condition to transfer the smoothness in terms of the operator to the covariance operator.
Here, we consider the nonlinear inverse learning problem in Hilbert scales satisfying conditional stability estimates characterized by general concave index functions. We use the Tikhonov regularization schemes to obtain the stable approximate solution in the RKHS framework. Werner and Hofmann [16] illustrated the validity of the conditional stability estimates in different models and real-world situations. The authors showed that the derivative of is not always necessary for this condition.
For the regularization schemes in the RKHSs, generally, we consider the smoothness by the source condition in terms of the covariance operator which implies the rates of convergence. The covariance operator depends on the considered kernel and unknown probability measure. Therefore, the source condition cannot be verified practically. Moreover, the misspecified kernel affects the source condition and consequently, the rates of convergence for the regularization schemes. Here, we consider the smoothness of the true solution in terms of the known operator which can be checked in practice. We divide the smoothness into two cases: the regular case (i.e., ) and the oversmoothing case (i.e., ). The oversmoothing case is very delicate. We consider that the regularized solution in Hilbert scales (4) belongs to . But the true solution does not belong to in the oversmoothing case. The analysis is also tricky for nonlinear inverse problems since the Tikhonov regularization in Hilbert Scales does not have an explicit solution. The analysis starts with the step . But is not well-defined in the oversmoothing case (since ). We will utilize the concept of distance functions to overcome this problem.
The main results of our paper can be summarized as follows:
-
We discuss the rates of convergence for the Tikhonov regularization in Hilbert Scales under a conditional stability assumption for the inverse problem.
-
We obtain the error estimates in the absence of the widely-considered source condition. We will use the concept of the distance functions for this.
-
We establish the error bounds in both the regular case and the oversmoothing case for the appropriate benchmark smoothness.
The manuscript is organized as follows: In Section 2, we present the basic definitions, notation, and assumptions required in our analysis. In Section 3, we state and prove our main results. Here, we discuss the rates of convergence for Tikhonov regularization in Hilbert scales in the probabilistic sense. In Section 4, we present the explicit rates in terms of sample size by bounding the distance functions. In Appendix, we state the probabilistic estimate of perturbation inequalities.
2. Notation and Assumptions
Let the input space be a Polish space and the output space be a real separable Hilbert space. We consider the joint probability measure on the sample space . We denote the marginal distribution on by and the conditional distribution of given by . Therefore, the measure can be split as .
For the probability measure on , we assume that
(6) |
For the considered model with centred noise we find provided that the conditional expectation w.r.t. of given exists (a.s.). This holds true under Assumption (6). This fact together with the operator equation (1) motivates us to consider the following assumption.
Assumption 1 (The true solution).
The conditional expectation w.r.t. of given exists (a.s.), and there exists unique such that
Here, is the true solution of equation (1) which we aim at estimating. Here, we want to mention that the function is also the minimizer of the expected risk considered in [11].
We consider a Bernstein-type assumption for the noise :
Assumption 2 (Noise condition).
There exist some constants such that for almost all ,
We want to utilize the properties of the Reproducing kernel Hilbert spaces (RKHSs) in our analysis. Therefore, we assume that is contained in a vector-valued Reproducing kernel Hilbert space (RKHSvv). The RKHSvv arises from the operator-valued positive semi-definite kernel [9]. Here, is the Banach space of bounded linear operators.
Assumption 3 (Vector valued reproducing kernel Hilbert space ).
Suppose is an RKHSvv of functions corresponding to the kernel such that
-
For all , is a Hilbert-Schmidt operator, and
-
The real-valued function , defined by , is measurable .
This assumption implies that . We denote the canonical injection map to by and the corresponding covariance operator is . From the above assumption, we see that the covariance operator is positive and trace class. The covariance operator is very important in our convergence analysis. We will need some regularity assumptions in terms of the covariance operator on the marginal probability measure to achieve the uniform convergence rates for the regularized solution (4).
The error estimates studied in our analysis are based on the smoothness of the true solution and the behaviour of the effective dimension. The error estimates and the optimal parameter choice depend on the effective dimension for the regularization methods in reproducing kernel Hilbert spaces [4, 3, 12]. To achieve the fast convergence rates, we introduce the concept of the effective dimension [17]:
The effective dimension is a continuous, decreasing function of . The effective dimension is finite, since the operator is a trace class, and we get
The different behaviours of the eigenvalues of the covariance operator lead to different decay rates of the effective dimension [8]. Under the different scenarios of the effective dimension, we will get the explicit convergence rates in the next section.
In order to establish the error estimate, we introduce the discrete operators for the samples. For the ordered set , we define the Sampling Operator
We define the inner product space with the inner product for and for . Then, we get the expression of its adjoint as
It can be easily checked that under Assumption 3, .
We need to make some assumptions about the nonlinear structure of operator . Following the work of Werner and Hofmann [16], we consider the following assumption on , , and . To introduce this assumption, we define the closed balls in with center and radius and their intersections with the domain of , . For the simplicity, we will denote and and by and .
Assumption 4.
-
The domain of is a convex and closed subset of .
-
The operator is weak-to-weak sequentially continuous111i.e., with , , and implies ..
-
The operator is Lipschitz continuous with Lipschitz constant in a sufficiently large ball ,
-
There exist constants , , , , and such that
holds for all , where the constant may depend on , , and .
3. Convergence analysis
The assertions about the convergence of Tikhonov-regularized solution to the true solution are formulated in this section. First of all, we introduce some standard quantities required to establish the error estimates. We denote
(7) | ||||
(8) |
The probabilistic estimates of the above quantities are given in Appendix A. We will use the following standard assumption on the sample size and the regularization parameter for our probabilistic estimates:
(9) |
Now, we introduce the concept of the distance function (also known as ‘approximate source conditions’) which can be used in the absence of the source condition for [1, 15]. It measures the violation of a benchmark smoothness of the true solution. It becomes very important in the ‘oversmoothing case’ for regularization in Hilbert Scales.
Definition 3.1 (Approximate source condition).
For given , we define the distance function by
(10) |
Here defines the benchmark smoothness. Let be the minimizing element of the above problem. Here, we also denote the quantities and .
Here, we note that when the true solution is of the form with , then the distance function and the minimizer .
The error analysis starts using the fact that is the minimizer of the Tikhonov functional (4). We get the deterministic expressions (17), (18) for the quantities and after some rearrangement, and using Cauchy-Schwarz inequality, Young’s inequality. After the simplification and using the probabilistic estimates from Proposition A.1 we get the error estimates in terms of the sample size , the regularization parameter , and distance function by . The distance function can be measured using the source condition for (see Section 4). Consequently, we get the explicit dependency . The estimates depend on the effective dimension which will be explicitly expressed in terms of by using the different decay conditions on the effective dimension. Then, the bounds can be expressed explicitly in terms of and for the given smoothness of the solution . In Section 4, the a-priori choice of regularization parameter will be obtained by balancing the terms in the error bounds.
Theorem 3.2.
Proof.
By the definition of as the solution to the minimization problem in (4), we have
We re-express the above inequality as follows,
which implies
Then we have,
(11) | ||||
Using the interpolation inequality (5), the definition of distance function (10) and for under Assumption 4, we obtain
(12) | ||||
We have
which implies
where .
Now we apply Young’s inequality ( for ) with , , and in (12), and this implies
where . Now, using (4) we get,
(13) | ||||
where .
To estimate the last two terms in (11) we consider the inequality
By taking , and and using (3), Assumption 4 (iii) we get,
(14) | ||||
where , and , , , are defined in Assumption 4 (iii), (3), Assumption 4 (iii), (7), (8), respectively.
Now, using the inequality we get,
which implies
Now by rearranging the terms we obtain,
where .
Hence we get,
(15) | ||||
and
(16) | ||||
In case , for some fixed , we get explicit bounds from (15) and (16) in terms of and using (22) and (23).
For and , we optimize the bounds by balancing the terms in and . Let solves the equation . The function is a non-vanishing decreasing function, and hence the inverse exists, and it is decreasing. With this, the error bounds can be expressed as
(17) | ||||
and
(18) | ||||
where .
Now, using (22) and (23) in (15), (16) we obtain with probability ,
(19) |
and
(20) |
where depends on , , , , , , , , .
Taking the mean using the inequality (5) we get,
Hence the proof completes. ∎
4. Explicit rates under source condition
Here, we consider the smoothness of by the source condition in terms of the operator to get the explicit rates in terms of and . The smoothness parameter influences the rates of convergence, the larger (Smoother ) will lead to the faster convergence rates.
Assumption 5 (General source condition).
The true solution satisfy the condition:
The rates in Theorem 3.2 can be further simplified in two cases based on the behaviour of the distance function .
In case , we get the explicit error bounds in terms of and from Theorem 3.2. We get the function when and for some , i.e., . Consequently, this also implies . So, the rates of convergence in the reconstruction norm and prediction norm can be given as:
By balancing the error terms, we choose the regularization parameter in terms of the sample size . Consequently, we get the explicit rates of convergence in terms of the sample size.
Corollary 4.1.
In case , we have to estimate the function explicitly. We utilize the result of [6, Theorem 5.9] to estimate the distance function using the source condition. For the benchmark smoothness and the given smoothness , we assume that and . Then, under Assumption 5, we get the bound
Following the analysis in [6, Theorem 5.9] we also obtain the bounds for the distance function:
(21) |
To bound the distance function , we assume the following assumption in addition to Assumption 4 (iv) with the same parameters:
Assumption 6.
There exists a constant such that
holds for all .
Now, according to Theorem 3.2 we have to solve the following equation in order to estimate in terms of .
Here, we get the estimate of from Assumption 6 and the bound (21). By ignoring the multiplicative constant in Assumption 6 we get the following identity from the above equation:
This yields
We get the explicit error bound from Theorem 3.2 in terms of the sample size and using the above dependency .
Now, we get the following error estimates using the identity and the estimates of distance functions in it.
By balancing the error terms, we choose the regularization parameter in terms of the sample size . Consequently, we get the explicit rates of convergence in terms of the sample size.
Corollary 4.2.
The effective dimension exhibits different behaviour under the different choices kernel and unknown probability measures [8]. We consider the following decay conditions on it.
Assumption 7 (Polynomial decay condition).
Assume that for some there exists some positive constant such that
Assumption 8 (Logarithmic decay condition).
Assume that there exists some positive constant such that
Corollary 4.3.
Corollary 4.4.
Now, we summarize the above results with conditions. We presented the rates of convergence under the different decay conditions on the effective dimension in Corollaries 4.3, 4.4. In both the corollaries, first, we discuss the case when the actual smoothness is higher than the benchmark smoothness of the true solution. In this case, we get the rates of convergence corresponding to the benchmark smoothness for , . Although, the actual smoothness is higher. Second, we discuss the case when the actual smoothness is lesser than the benchmark smoothness. Here, we get the error estimates corresponding to the actual smoothness for , . So, the rates are the same as what we would get by directly using the smoothness information of the true solution. At the intersection point, when , then both rates coincide. So, this analysis suggests that if we consider the benchmark smoothness in the appropriate range, then we would get the best rates of convergence. We emphasize that our analysis covers the oversmoothing case, i.e., .
Acknowledgements
This research has been partially funded by Deutsche Forschungsgemeinschaft (DFG) under The Berlin Mathematics Research Center MATH+ (EXC-2046/1 - 390685689).
The author is grateful for fruitful discussions with Peter Mathé about regularization in Hilbert Scales.
Appendix A Probabilistic bounds
Here, we present the standard perturbation bounds measured under the random sampling which can be obtained in [12].
References
- [1] Johann Baumeister. Stable solution of inverse problems. Advanced Lectures in Mathematics, Friedrich Vieweg & Sohn, Braunschweig, 1987.
- [2] Nicolai Bissantz, Thorsten Hohage, and Axel Munk. Consistency and rates of convergence of nonlinear Tikhonov regularization with random noise. Inverse Problems, 20(6):1773–1789, 2004.
- [3] Gilles Blanchard and Nicole Mücke. Optimal rates for regularization of statistical inverse learning problems. Foundations of Computational Mathematics, 18(4):971–1013, 2018.
- [4] Andrea Caponnetto and Ernesto De Vito. Optimal rates for the regularized least-squares algorithm. Foundations of Computational Mathematics, 7(3):331–368, 2007.
- [5] Heinz W. Engl, Martin Hanke, and Andreas Neubauer. Regularization of inverse problems, volume 375. Math. Appl., Kluwer Academic Publishers Group, Dordrecht, The Netherlands, 1996.
- [6] Bernd Hofmann and Peter Mathé. Analysis of profile functions for general linear regularization methods. SIAM Journal on Numerical Analysis, 45(3):1122–1141, 2007.
- [7] Thorsten Hohage and Mihaela Pricop. Nonlinear Tikhonov regularization in Hilbert scales for inverse boundary value problems with random noise. Inverse Problems and Imaging, 2:271–290, 2008.
- [8] Shuai Lu, Peter Mathé, and Sergei V. Pereverzev. Balancing principle in supervised learning for a general regularization scheme. Applied and Computational Harmonic Analysis, 48(1):123–148, 2020.
- [9] Charles A. Micchelli and Massimiliano Pontil. On learning vector-valued functions. Neural Computation, 17(1):177–204, 2005.
- [10] Nicole Mücke and Enrico Reiss. Stochastic Gradient Descent in Hilbert Scales: Smoothness, Preconditioning and Earlier Stopping. arXiv preprint arXiv:2006.10840, 2020.
- [11] Abhishake Rastogi. Tikhonov regularization with oversmoothing penalty for nonlinear statistical inverse problems. Communications on Pure and Applied Analysis, 19(8):4111, 2020.
- [12] Abhishake Rastogi, Gilles Blanchard, and Peter Mathé. Convergence analysis of Tikhonov regularization for non-linear statistical inverse problems. Electronic Journal of Statistics, 14(2):2798–2841, 2020.
- [13] Abhishake Rastogi and Peter Mathé. Inverse learning in Hilbert scales. arXiv preprint arXiv:2002.10208, 2020.
- [14] Thomas Schuster, Barbara Kaltenbacher, Bernd Hofmann, and Kamil S. Kazimierski. Regularization methods in Banach spaces, volume 10 of Radon Series on Computational and Applied Mathematics. Walter de Gruyter GmbH & Co. KG, Berlin, 2012.
- [15] Steve Smale and Ding-Xuan Zhou. Estimating the approximation error in learning theory. Analysis and Applications, 01(01):17–41, 2003.
- [16] Frank Werner and Bernd Hofmann. Convergence analysis of (statistical) inverse problems under conditional stability estimates. Inverse Problems, 36(1):015004, 2019.
- [17] Tong Zhang. Effective dimension and generalization of kernel learning. In Proceedings of the 15th International Conference on Neural Information Processing Systems, pages 454–461, MIT Press, Cambridge, MA, 2002.