Statistical Inverse Problems in Hilbert Scales

Abhishake Institute of Mathematics, Technical University of Berlin, Straße des 17. Juni 136, 10623 Berlin, Germany abhishake@tu-berlin.de

Abstract.

In this paper, we study the Tikhonov regularization scheme in Hilbert scales for the nonlinear statistical inverse problem with a general noise. The regularizing norm in this scheme is stronger than the norm in Hilbert space. We focus on developing a theoretical analysis for this scheme based on the conditional stability estimates. We utilize the concept of the distance function to establish the high probability estimates of the direct and reconstruction error in Reproducing kernel Hilbert space setting. Further, the explicit rates of convergence in terms of sample size are established for the oversmoothing case and the regular case over the regularity class defined through appropriate source condition. Our results improve and generalize previous results obtained in related settings.

Key words and phrases:

Statistical inverse problem; Tikhonov regularization; Hilbert Scales; Reproducing kernel Hilbert space; Minimax convergence rates.

2010 Mathematics Subject Classification:

Primary: 62G20; Secondary: 62G08, 65J15, 65J20, 65J22.

1. Introduction

In this paper, we study the nonlinear operator equation

(1)

A (^f) =^g

with the infinite dimensional separable real Hilbert spaces $H$ and $H^{'}$ with the inner products $⟨ \cdot, \cdot ⟩_{H}$ and $⟨ \cdot, \cdot ⟩_{H^{'}}$ , respectively. Here, $H^{'}$ is the space of functions between a Polish space $X$ and a real separable Hilbert space $Y$ . We observe the noisy values of the function $^g$ at the inputs $x_{i}$ :

(2)

y_{i} =^g (x_{i}) + ε_{i}

for $1 \leq i \leq m$ . Here, $m$ is the number of observations which is called the sample size. In contrary to the direct learning scheme where we estimate the function $^g$ , here we aim to estimate the function $^f$ directly from the observations.

A common approach to stably approximate the solution of equation (1) is the Tikhonov regularization scheme. Sometimes, we have the information about the true solution, e.g., the true solution may be differentiable. To incorporate this information, we employ the Tikhonov regularization in Hilbert scales. This scheme consists of a functional which is a linear combination of a fidelity term measuring the fitness of the data and a penalty term in a stronger norm forcing the smoothness in the approximated solution. To define this scheme, we introduce a densely defined, unbounded, closed, linear, self-adjoint, strictly positive operator $L : D (L) \subset H \to H$ such that for some $ℓ_{L} > 0$ ,

(3)

ℓ_{L} {∥ f ∥}_{H} \leq {∥ L f ∥}_{H} \forall f \in D (L) .

Here, we observe that $L^{- 1} : H \to H$ is a bounded operator from the strict positivity of the operator $L$ .

The Tikhonov functional for the considered nonlinear inverse problem with sample $z = {(x_{i}, y_{i})}_{i}^{m}$ is given by

E_{z, λ} (f) = {\frac{1}{m} m \sum i = 1 {∥ A (f) (x_{i}) - y_{i} ∥}_{i}^{2} + λ {∥ ∥ L (f - ¯ f) ∥ ∥}_{H}^{2}},

where $¯ f \in D (A) \cap D (L)$ is an initial guess. The regularization parameter $λ > 0$ has to balance both terms appropriately. Then, the Tikhonov regularization scheme in Hilbert scales can be defined as

(4)

For the continuous and weakly sequentially closed operator $A$ , there exists a global solution of the regularization scheme in (4). But it is not necessarily unique, since $A$ is nonlinear (see [14, Section 4.1.1]).

We consider the Hilbert scales $H_{a}$ generated by the operator $L$ . Here the spaces $H_{a} := D (L^{a})$ are Hilbert spaces equipped with the inner product ${⟨ f, g ⟩}_{H_{a}} = {⟨ L^{a} f, L^{a} g ⟩}_{H}^{a}, f, g \in H_{a}$ . For the Hilbert scales, we have the well-known interpolation inequality

(5)

{∥ f ∥}_{H_{b}} \leq {∥ f ∥}_{H_{a}}^{\frac{c - b}{c - a}} {∥ f ∥}_{H_{c}}^{\frac{b - a}{c - a}}, f \in H_{c}

which holds for any $a < b < c$ .

The regularization schemes in Hilbert scales have been well-studied and analysed under the different assumptions in the classical inverse problems [2, 5, 7, 14]. In learning theory, the general regularization in Hilbert scales is introduced for the linear inverse problems and established the rates of convergence [13]. Nicole et al. [10] studied the Stochastic Gradient Descent in Hilbert scales and the author provided different examples of Hilbert scales in learning. Further, the authors discussed the error estimates for the Stochastic Gradient Descent scheme for the direct learning problem. In the paper [11], the rates of convergence are established for nonlinear statistical inverse learning problems in the RKHS setting. The authors considered some assumptions on the nonlinearity of the operator $A$ such as Fréchet differentiability of the operator, Lipschitz continuity of the Fréchet derivative, and the link condition to transfer the smoothness in terms of the operator $L$ to the covariance operator.

Here, we consider the nonlinear inverse learning problem in Hilbert scales satisfying conditional stability estimates characterized by general concave index functions. We use the Tikhonov regularization schemes to obtain the stable approximate solution in the RKHS framework. Werner and Hofmann [16] illustrated the validity of the conditional stability estimates in different models and real-world situations. The authors showed that the derivative of $A$ is not always necessary for this condition.

For the regularization schemes in the RKHSs, generally, we consider the smoothness by the source condition in terms of the covariance operator which implies the rates of convergence. The covariance operator depends on the considered kernel and unknown probability measure. Therefore, the source condition cannot be verified practically. Moreover, the misspecified kernel affects the source condition and consequently, the rates of convergence for the regularization schemes. Here, we consider the smoothness of the true solution in terms of the known operator $L$ which can be checked in practice. We divide the smoothness into two cases: the regular case (i.e., $^f \in D (L)$ ) and the oversmoothing case (i.e., $^f \notin D (L)$ ). The oversmoothing case is very delicate. We consider that the regularized solution in Hilbert scales (4) belongs to $D (L)$ . But the true solution does not belong to $D (L)$ in the oversmoothing case. The analysis is also tricky for nonlinear inverse problems since the Tikhonov regularization in Hilbert Scales does not have an explicit solution. The analysis starts with the step $E_{z, λ} (f_{z, λ}) \leq E_{z, λ} (^f)$ . But $E_{z, λ} (^f)$ is not well-defined in the oversmoothing case (since $^f \notin D (L)$ ). We will utilize the concept of distance functions to overcome this problem.

The main results of our paper can be summarized as follows:

We discuss the rates of convergence for the Tikhonov regularization in Hilbert Scales under a conditional stability assumption for the inverse problem.
We obtain the error estimates in the absence of the widely-considered source condition. We will use the concept of the distance functions for this.
We establish the error bounds in both the regular case and the oversmoothing case for the appropriate benchmark smoothness.

The manuscript is organized as follows: In Section 2, we present the basic definitions, notation, and assumptions required in our analysis. In Section 3, we state and prove our main results. Here, we discuss the rates of convergence for Tikhonov regularization in Hilbert scales in the probabilistic sense. In Section 4, we present the explicit rates in terms of sample size by bounding the distance functions. In Appendix, we state the probabilistic estimate of perturbation inequalities.

2. Notation and Assumptions

Let the input space $X$ be a Polish space and the output space $(Y, {⟨ \cdot, \cdot ⟩}_{Y})$ be a real separable Hilbert space. We consider the joint probability measure $ρ$ on the sample space $Z = X \times Y$ . We denote the marginal distribution on $X$ by $ν$ and the conditional distribution of $y$ given $x$ by $ρ (y | x)$ . Therefore, the measure $ρ$ can be split as $ρ (x, y) = ρ (y | x) ν (x)$ .

For the probability measure $ρ$ on $X \times Y$ , we assume that

(6)

\int_{Z} {∥ y ∥}_{Y}^{2} d ρ (x, y) < \infty .

For the considered model $y =^g (x) + ε$ with centred noise $ε$ we find $\int_{Y} y d ρ (y | x) =^g (x)$ provided that the conditional expectation w.r.t. $ρ$ of $y$ given $x$ exists (a.s.). This holds true under Assumption (6). This fact together with the operator equation (1) motivates us to consider the following assumption.

Assumption 1 (The true solution).

The conditional expectation w.r.t. $ρ$ of $y$ given $x$ exists (a.s.), and there exists unique $^f \in i n t (D (A)) \subset H$ such that

\int_{Y} y d ρ (y | x) = A (^f) (x), for all x \in X .

Here, $^f$ is the true solution of equation (1) which we aim at estimating. Here, we want to mention that the function $^f$ is also the minimizer of the expected risk considered in [11].

We consider a Bernstein-type assumption for the noise $ε = y - A (^f) (x)$ :

Assumption 2 (Noise condition).

There exist some constants $M, Σ$ such that for almost all $x \in X$ ,

\int_{Y} (e^{{∥ ε ∥}_{Y} / M} - \frac{{∥ ε ∥}_{Y}}{M} - 1) d ρ (y | x) \leq \frac{Σ^{2}}{2 M^{2}} .

We want to utilize the properties of the Reproducing kernel Hilbert spaces (RKHSs) in our analysis. Therefore, we assume that $Ran (A)$ is contained in a vector-valued Reproducing kernel Hilbert space (RKHSvv). The RKHSvv $H_{K}$ arises from the operator-valued positive semi-definite kernel $K : X \times X \to L (Y)$ [9]. Here, $L (Y)$ is the Banach space of bounded linear operators.

Assumption 3 (Vector valued reproducing kernel Hilbert space $H^{'}$ ).

Suppose $H^{'}$ is an RKHSvv of functions $g : X \to Y$ corresponding to the kernel $K : X \times X \to L (Y)$ such that

For all $x \in X$ , $K_{x} : Y \to H^{'}$ is a Hilbert-Schmidt operator, and

$κ^{2} := sup x \in X {∥ K_{x} ∥}_{x}^{2} = sup x \in X tr (K_{x}^{*} K_{x}) < \infty .$
The real-valued function $ς : X \times X \to R$ , defined by $ς (x, t) = {⟨ K_{t} v, K_{x} w ⟩}_{H^{'}}$ , is measurable $\forall v, w \in Y$ .

This assumption implies that $H^{'} \subset L^{2} (X, ν; Y)$ . We denote the canonical injection map $H^{'}$ to $L^{2} (X, ν; Y)$ by $I_{ν}$ and the corresponding covariance operator is $T_{ν} := I_{ν}^{*} I_{ν}$ . From the above assumption, we see that the covariance operator is positive and trace class. The covariance operator is very important in our convergence analysis. We will need some regularity assumptions in terms of the covariance operator on the marginal probability measure $ν$ to achieve the uniform convergence rates for the regularized solution (4).

The error estimates studied in our analysis are based on the smoothness of the true solution and the behaviour of the effective dimension. The error estimates and the optimal parameter choice depend on the effective dimension for the regularization methods in reproducing kernel Hilbert spaces [4, 3, 12]. To achieve the fast convergence rates, we introduce the concept of the effective dimension $N (λ)$ [17]:

N (λ) := T r ((T_{ν} + λ I)^{- 1} T_{ν}), % for λ > 0.

The effective dimension is a continuous, decreasing function of $λ$ . The effective dimension is finite, since the operator $T_{ν}$ is a trace class, and we get

N (λ) \leq {∥ ∥ (T_{ν} + λ I)^{- 1} ∥ ∥}_{L (H)}^{- 1} T r (T_{ν}) \leq \frac{κ^{2}}{λ} .

The different behaviours of the eigenvalues of the covariance operator lead to different decay rates of the effective dimension [8]. Under the different scenarios of the effective dimension, we will get the explicit convergence rates in the next section.

In order to establish the error estimate, we introduce the discrete operators for the samples. For the ordered set $(x)_{i} = x_{i}$ , we define the Sampling Operator

(S_{x} (g))_{i} = g (x_{i}) and 1 \leq i \leq m .

We define the inner product space $Y^{m}$ with the inner product for $(y)_{i} = y_{i}$ and $(y^{'})_{i} = y_{i}^{'}$ for $1 \leq i \leq m$ . Then, we get the expression of its adjoint $S_{x}^{*}$ as

S_{x}^{*} y = \frac{1}{m} m \sum i = 1 K_{x_{i}} y_{i}, \forall y \in Y^{m} .

It can be easily checked that under Assumption 3, ${∥ S_{x} ∥}_{H^{'} \to Y^{m}} \leq κ$ .

We need to make some assumptions about the nonlinear structure of operator $A$ . Following the work of Werner and Hofmann [16], we consider the following assumption on $A$ , $D (A)$ , and $^f$ . To introduce this assumption, we define the closed balls $Buμ(^f)={f∈Hu:∥∥f−^f∥∥u≤μ}$ in $H_{u} (u \in R)$ with center $^f \in H_{u}$ and radius $μ$ $(0 < μ \leq 1)$ and their intersections with the domain of $A$ , $D_{μ}^{u} (^f) := B_{μ}^{u} (^f) \cap D (A)$ . For the simplicity, we will denote $B_{μ}^{0} (^f)$ and $D_{μ}^{0} (^f)$ and by $B_{μ} (^f)$ and $D_{μ} (^f)$ .

Assumption 4.

The domain $D (A)$ of $A$ is a convex and closed subset of $H$ .
The operator $A : D (A) \to H^{'}$ is weak-to-weak sequentially continuous¹¹1i.e., $f_{n} ⇀^f \in H$ with $f_{n} \in D (A)$ , $n \in N$ , and $^f \in D (A)$ implies $A (f_{n}) ⇀ A (^f) \in H^{'}$ ..
The operator $A$ is Lipschitz continuous with Lipschitz constant $ℓ_{A} < \infty$ in a sufficiently large ball $B_{d} (^f)$ ,

${∥ ∥ A (f) - A (~ f) ∥ ∥}_{H^{'}} \leq ℓ_{A} {∥ ∥ f - ~ f ∥ ∥}_{H} \forall f, ~ f \in B_{d} (^f) \cap D (A) \subset H,$
There exist constants $p \geq 0$ , $s > 0$ , $α > 0$ , $d > 0$ , $θ \geq 0$ and $Q \subset D_{d}^{θ} (^f) \cap D (A)$ such that

${∥ ∥ f -^f ∥ ∥}_{H_{- p}} \leq α {∥ ∥ I_{ν} [A (f) - A (^f)] ∥ ∥}_{ν}^{s}$

holds for all $f \in Q$ , where the constant $α$ may depend on $p$ , $s$ , and $Q$ .

Assumption 4 (iv) is called the conditional stability estimate which helps us to characterize the degree of ill-posedness of inverse problems. Here, we note that operator $A$ may not be differentiable, (see the examples in [16]).

3. Convergence analysis

The assertions about the convergence of Tikhonov-regularized solution $f_{z, λ}$ to the true solution $^f$ are formulated in this section. First of all, we introduce some standard quantities required to establish the error estimates. We denote

(7)		$Θ_{z} :=$	${∥ ∥ (T_{ν} + λ I)^{- 1 / 2} S_{x}^{*} ε ∥ ∥}_{H^{'}}^{- 1 / 2} for ε = S_{x} [A (^f)] - y,$
(8)		$Ψ_{x} :=$	${∥ ∥ (T_{ν} + λ I)^{- 1 / 2} (T_{ν} - T_{x}) ∥ ∥}_{L_{2} (H^{'})}^{- 1 / 2} .$

The probabilistic estimates of the above quantities are given in Appendix A. We will use the following standard assumption on the sample size $m$ and the regularization parameter $λ$ for our probabilistic estimates:

(9)

N (λ) \leq m λ and 0 < λ \leq 1.

Now, we introduce the concept of the distance function (also known as ‘approximate source conditions’) which can be used in the absence of the source condition for $^f$ [1, 15]. It measures the violation of a benchmark smoothness of the true solution. It becomes very important in the ‘oversmoothing case’ $^f \notin D (L)$ for regularization in Hilbert Scales.

Definition 3.1 (Approximate source condition).

For given $q$ , we define the distance function $d : [0, \infty) \to [0, \infty)$ by

(10)

d(R)=inf{∥∥f−^f∥∥H:f−¯f=L−qv and ∥v∥H≤R},R>0.

Here $q$ defines the benchmark smoothness. Let ${^f}^{R}$ be the minimizing element of the above problem. Here, we also denote the quantities $d_{A} (R) = {∥ ∥ I_{ν} [A ({^f}^{R}) - A (^f)] ∥ ∥}_{L^{2} (X, ν; Y)}$ and $d^{p} (R) = {∥ ∥ {^f}^{R} -^f ∥ ∥}_{H_{- p}}^{R}$ .

Here, we note that when the true solution is of the form $^f = L^{- q} u$ with ${∥ u ∥}_{H} \leq ¯ R$ , then the distance function $D (¯ R) = 0$ and the minimizer ${^f}^{¯ R} =^f$ .

The error analysis starts using the fact that $f_{z, λ}$ is the minimizer of the Tikhonov functional (4). We get the deterministic expressions (17), (18) for the quantities ${∥ ∥ I_{ν} [A (f_{z, λ}) - A (^f)] ∥ ∥}_{L^{2} (X, ν; Y)}$ and ${∥ ∥ L (f_{z, λ} -^f) ∥ ∥}_{H}$ after some rearrangement, and using Cauchy-Schwarz inequality, Young’s inequality. After the simplification and using the probabilistic estimates from Proposition A.1 we get the error estimates in terms of the sample size $m$ , the regularization parameter $λ$ , and distance function by $R (λ)$ . The distance function can be measured using the source condition for $^f$ (see Section 4). Consequently, we get the explicit dependency $λ \to R (λ)$ . The estimates depend on the effective dimension which will be explicitly expressed in terms of $λ$ by using the different decay conditions on the effective dimension. Then, the bounds can be expressed explicitly in terms of $λ$ and $m$ for the given smoothness of the solution $^f$ . In Section 4, the a-priori choice of regularization parameter will be obtained by balancing the terms in the error bounds.

Theorem 3.2.

Let Assumptions 1–4, and condition (9) hold true. Let $1 \leq q \leq 2 + p$ , $q (s - 1) \leq p + s$ and $f_{z, λ}, {^f}^{R} \in Q$ (for sufficiently large sample size $m$ ) for $p$ , $s$ , $Q$ , $q$ defined in Assumption 4 (iv), (10). Then, for all $0 < η < 1$ , the following bounds hold with the confidence $1 - η$ :

	${∥ ∥ I_{ν} [A (f_{z, λ}) - A ({^f}^{R})] ∥ ∥}_{L^{2} (X, ν; Y)} \leq ˜ C λ^{\frac{1}{2}}$	$⎧ ⎨ ⎩ \sqrt{\frac{N (λ)}{m λ}} + R (λ)^{\frac{p + 1}{(p + q) - s (q - 1)}} λ^{\frac{s (q - 1)}{2 (p + q) - 2 s (q - 1)}} ⎫ ⎬ ⎭ log (\frac{4}{η}),$
	${∥ ∥ L (f_{z, λ} - {^f}^{R}) ∥ ∥}_{H}^{R} \leq ˜ C$	$⎧ ⎨ ⎩ \sqrt{\frac{N (λ)}{m λ}} + R (λ)^{\frac{p + 1}{(p + q) - s (q - 1)}} λ^{\frac{s (q - 1)}{2 (p + q) - 2 s (q - 1)}} ⎫ ⎬ ⎭ log (\frac{4}{η}),$
	${∥ ∥ f_{z, λ} - {^f}^{R} ∥ ∥}_{H}^{R} \leq ˜ C λ^{\frac{s}{2 (p + 1)}}$	${⎧ ⎨ ⎩ \sqrt{\frac{N (λ)}{m λ}} + R (λ)^{\frac{p + 1}{(p + q) - s (q - 1)}} λ^{\frac{s (q - 1)}{2 (p + q) - 2 s (q - 1)}} ⎫ ⎬ ⎭}^{\frac{(p + s)}{(p + 1)}} log (\frac{4}{η}) .$

Here, $R (λ)$ is the solution of the equation $d_{A} (R) R^{- \frac{p + 1}{(p + q) - s (q - 1)}} = λ^{\frac{p + q}{2 (p + q) - 2 s (q - 1)}}$ for $d_{A} (R) \neq 0$ and $R (λ)$ is a fixed constant for $d_{A} (R) = 0$ .

Proof.

By the definition of $f_{z, λ}$ as the solution to the minimization problem in (4), we have

\frac{1}{m} m \sum i = 1 {∥ [A (f_{z, λ})] (x_{i}) - y_{i} ∥}_{i}^{2} + λ {∥ ∥ L (f_{z, λ} - ¯ f) ∥ ∥}_{z, λ}^{2} \leq \frac{1}{m} m \sum i = 1 {∥ ∥ [A ({^f}^{R})] (x_{i}) - y_{i} ∥ ∥}_{i}^{2} + λ {∥ ∥ L ({^f}^{R} - ¯ f) ∥ ∥}_{H}^{R} .

We re-express the above inequality as follows,

{∥ S_{x} [A (f_{z, λ})] - y ∥}_{x}^{2} + λ {∥ ∥ L (f_{z, λ} - ¯ f) ∥ ∥}_{z, λ}^{2} \leq {∥ ∥ S_{x} [A ({^f}^{R})] - y ∥ ∥}_{x}^{2} + λ {∥ ∥ L ({^f}^{R} - ¯ f) ∥ ∥}_{H}^{R}

which implies

		${∥ ∥ S_{x} [A (f_{z, λ}) - A ({^f}^{R})] ∥ ∥}_{x}^{2} + 2 {⟨ S_{x} [A (f_{z, λ}) - A ({^f}^{R})], S_{x} [A ({^f}^{R})] - y ⟩}_{m} + λ {∥ ∥ L (f_{z, λ} - {^f}^{R}) ∥ ∥}_{z, λ}^{R}$
	$\leq$	$2 λ {⟨ L (f_{z, λ} - {^f}^{R}), L (¯ f - {^f}^{R}) ⟩}_{H}^{R} .$

Then we have,

(11)			${∥ ∥ I_{ν} [A (f_{z, λ}) - A ({^f}^{R})] ∥ ∥}_{ν}^{2} + λ {∥ ∥ L (f_{z, λ} - {^f}^{R}) ∥ ∥}_{z, λ}^{R}$
	$\leq$	$2 λ {⟨ L (f_{z, λ} - {^f}^{R}), L (¯ f - {^f}^{R}) ⟩}_{H}^{R} + 2 {⟨ I_{ν} [A (f_{z, λ}) - A ({^f}^{R})], I_{ν} [A ({^f}^{R}) - A (^f)] ⟩}_{H^{'}}$
		$+ 2 {⟨ A (f_{z, λ}) - A ({^f}^{R}), (T_{ν} - T_{x}) [A ({^f}^{R}) - A (^f)] + S_{x}^{*} ε ⟩}_{H^{'}}^{R}$

Using the interpolation inequality (5), the definition of distance function (10) and for $f_{z, λ} \in Q$ under Assumption 4, we obtain

(12)		${⟨ L (f_{z, λ} - {^f}^{R}), L (¯ f -^f^{R}) ⟩}_{H}^{R} \leq$	${∥ ∥ {^f}^{R} - ¯ f ∥ ∥}_{H_{q}}^{R} {∥ ∥ f_{z, λ} - {^f}^{R} ∥ ∥}_{H_{2 - q}}^{R}$
	$\leq$	$R {∥ ∥ L (f_{z, λ} - {^f}^{R}) ∥ ∥}_{z, λ}^{R} {∥ ∥ f_{z, λ} - {^f}^{R} ∥ ∥}_{z, λ}^{R} .$

We have

	${∥ ∥ f_{z, λ} - {^f}^{R} ∥ ∥}_{H_{- p}}^{R} \leq$	${∥ ∥ f_{z, λ} -^f ∥ ∥}_{H_{- p}} + {∥ ∥^f - {^f}^{R} ∥ ∥}_{H_{- p}}^{R}$
	$\leq$	$α {∥ ∥ I_{ν} (A (f_{z, λ}) - A (^f)) ∥ ∥}_{ν}^{s} + α {∥ ∥ I_{ν} (A (^f) - A ({^f}^{R})) ∥ ∥}_{ν}^{s}$
	$\leq$	$α {∥ ∥ I_{ν} (A (f_{z, λ}) - A (^f)) ∥ ∥}_{ν}^{s} + 2 α {∥ ∥ I_{ν} (A (^f) - A ({^f}^{R})) ∥ ∥}_{ν}^{s}$

which implies

{∥ ∥ f_{z, λ} - {^f}^{R} ∥ ∥}_{z, λ}^{R} \leq

2 α^{\frac{2 (q - 1)}{(p + q)}} {∥ ∥ I_{ν} (A (f_{z, λ}) - A (^f)) ∥ ∥}_{ν}^{\frac{2 s (q - 1)}{(p + q)}} + 4 α^{\frac{2 (q - 1)}{(p + q)}} d_{A} (R)^{\frac{2 s (q - 1)}{(p + q)}},

where $d_{A} (R) = {∥ ∥ I_{ν} [A ({^f}^{R}) - A (^f)] ∥ ∥}_{L^{2} (X, ν; Y)}$ .

Now we apply Young’s inequality ( $a b \leq \frac{a^{u}}{u} + \frac{b^{v}}{v}$ for $\frac{1}{u} + \frac{1}{v} = 1$ ) with $a = {(\frac{u}{4})}^{\frac{1}{u}} {∥ ∥ L (f_{z, λ} - {^f}^{R}) ∥ ∥}_{z, λ}^{R}$ , $b = {(\frac{4}{u})}^{\frac{1}{u}} R {∥ ∥ f_{z, λ} - {^f}^{R} ∥ ∥}_{z, λ}^{R}$ , $u = \frac{2 p + 2}{p - q + 2}$ and $v = \frac{2 p + 2}{p + q}$ in (12), and this implies

{⟨ L ({^f}^{R} - f_{z, λ}), L ({^f}^{R} - ¯ f) ⟩}_{H}^{R} \leq

\frac{1}{4} {∥ ∥ L (f_{z, λ} - {^f}^{R}) ∥ ∥}_{z, λ}^{R} + C R^{\frac{2 p + 2}{p + q}} {∥ ∥ f_{z, λ} - {^f}^{R} ∥ ∥}_{z, λ}^{R},

where $C = \frac{1}{v} {(\frac{4}{u})}^{\frac{v}{u}}$ . Now, using (4) we get,

(13)			${⟨ L ({^f}^{R} - f_{z, λ}), L ({^f}^{R} - ¯ f) ⟩}_{H}^{R}$
	$\leq$	$\frac{1}{4} {∥ ∥ L (f_{z, λ} - {^f}^{R}) ∥ ∥}_{z, λ}^{R} + C^{' 2} R^{\frac{2 p + 2}{p + q}} {∥ ∥ I_{ν} (A (f_{z, λ}) - A (^f)) ∥ ∥}_{ν}^{\frac{2 s (q - 1)}{(p + q)}} + 2 C^{' 2} R^{\frac{2 p + 2}{p + q}} d_{A} (R)^{\frac{2 s (q - 1)}{(p + q)}},$

where $C^{' 2} = 2 C α^{\frac{2 (q - 1)}{(p + q)}}$ .

To estimate the last two terms in (11) we consider the inequality

	${⟨ f, g ⟩}_{H^{'}} =$
	$\leq$	${\sqrt{λ} {∥ f ∥}_{H^{'}} + {∥ I_{ν} f ∥}_{L^{2} (X, ν; Y)}} {∥ ∥ (T_{ν} + λ I)^{- 1 / 2} g ∥ ∥}_{H^{'}}^{- 1 / 2} .$

By taking $f = A (f_{z, λ}) - A ({^f}^{R})$ , and $g = (T_{ν} - T_{x}) [A (f_{z, λ}) - 2 A (^f) + A ({^f}^{R})] + 2 S_{x}^{*} ε$ and using (3), Assumption 4 (iii) we get,

(14)			${⟨ A (f_{z, λ}) - A ({^f}^{R}), (T_{ν} - T_{x}) [A (f_{z, λ}) - 2 A (^f) + A ({^f}^{R})] + 2 S_{x}^{*} ε ⟩}_{H^{'}}^{R}$
	$\leq$	${ℓ1Ψx+2Θz}{√λ∥∥A(fz,λ)−A(^fR)∥∥H′+∥∥Iν[A(fz,λ)−A(^fR)]∥∥L2(X,ν;Y)}$
	$\leq$	${ℓ1Ψx+2Θz}{ℓA√λ∥∥fz,λ−^fR∥∥H+∥∥Iν[A(fz,λ)−A(^fR)]∥∥L2(X,ν;Y)}$
	$\leq$	${ℓ1Ψx+2Θz}{ℓ√λ∥∥L(fz,λ−^fR)∥∥H+∥∥Iν[A(fz,λ)−A(^fR)]∥∥L2(X,ν;Y)},$

where $ℓ_{1} = {∥ ∥ A (f_{z, λ}) - 2 A (^f) + A ({^f}^{R}) ∥ ∥}_{H^{'}}^{R}$ , $ℓ = \frac{ℓ_{A}}{ℓ_{L}}$ and $ℓ_{A}$ , $ℓ_{L}$ , $Θ_{z}$ , $Ψ_{x}$ are defined in Assumption 4 (iii), (3), Assumption 4 (iii), (7), (8), respectively.

Using the above estimates (13), (14) in (11) we obtain,

		${∥ ∥ I_{ν} [A (f_{z, λ}) - A ({^f}^{R})] ∥ ∥}_{ν}^{2} + \frac{λ}{2} {∥ ∥ L (f_{z, λ} - {^f}^{R}) ∥ ∥}_{z, λ}^{R}$
	$\leq$	$2 C^{' 2} λ R^{\frac{2 p + 2}{p + q}} {∥ ∥ I_{ν} [A (f_{z, λ}) - A ({^f}^{R})] ∥ ∥}_{ν}^{\frac{2 s (q - 1)}{p + q}} + 4 C^{' 2} λ R^{\frac{2 p + 2}{p + q}} d_{A} (R)^{\frac{2 s (q - 1)}{(p + q)}}$
		$+ 2 {∥ ∥ I_{ν} [A ({^f}^{R}) - A (^f)] ∥ ∥}_{L^{2} (X, ν; Y)} {∥ ∥ I_{ν} [A (f_{z, λ}) - A ({^f}^{R})] ∥ ∥}_{L^{2} (X, ν; Y)}$
		$+ (ℓ_{1} Ψ_{x} + 2 Θ_{z}) {∥ ∥ I_{ν} [A (f_{z, λ}) - A ({^f}^{R})] ∥ ∥}_{L^{2} (X, ν; Y)}$
		$+ ℓ \sqrt{λ} (ℓ_{1} Ψ_{x} + 2 Θ_{z}) {∥ ∥ L (f_{z, λ} - {^f}^{R}) ∥ ∥}_{H}^{R} .$

Now, using the inequality $a b \leq a^{2} + b^{2}$ we get,

		${∥ ∥ I_{ν} [A (f_{z, λ}) - A ({^f}^{R})] ∥ ∥}_{ν}^{2} + \frac{λ}{2} {∥ ∥ L (f_{z, λ} - {^f}^{R}) ∥ ∥}_{z, λ}^{R}$
	$\leq$	$2 C^{' 2} λ R^{\frac{2 p + 2}{p + q}} {∥ ∥ I_{ν} [A (f_{z, λ}) - A ({^f}^{R})] ∥ ∥}_{ν}^{\frac{2 s (q - 1)}{p + q}} + 4 C^{' 2} λ R^{\frac{2 p + 2}{p + q}} d_{A} (R)^{\frac{2 s (q - 1)}{(p + q)}}$
		$+ 4 {∥ ∥ I_{ν} [A ({^f}^{R}) - A (^f)] ∥ ∥}_{ν}^{2} + \frac{1}{4} {∥ ∥ I_{ν} [A (f_{z, λ}) - A ({^f}^{R})] ∥ ∥}_{ν}^{2}$
		$+ (ℓ_{1} Ψ_{x} + 2 Θ_{z})^{2} + \frac{1}{4} {∥ ∥ I_{ν} [A (f_{z, λ}) - A ({^f}^{R})] ∥ ∥}_{ν}^{2}$
		$+ ℓ^{2} (ℓ_{1} Ψ_{x} + 2 Θ_{z})^{2} + \frac{λ}{4} {∥ ∥ L (f_{z, λ} - {^f}^{R}) ∥ ∥}_{H}^{R} .$

which implies

		$\frac{1}{2} {∥ ∥ I_{ν} [A (f_{z, λ}) - A ({^f}^{R})] ∥ ∥}_{ν}^{2} + \frac{λ}{4} {∥ ∥ L (f_{z, λ} - {^f}^{R}) ∥ ∥}_{z, λ}^{R}$
	$\leq$	$2 C^{' 2} λ R^{\frac{2 p + 2}{p + q}} {∥ ∥ I_{ν} [A (f_{z, λ}) - A ({^f}^{R})] ∥ ∥}_{ν}^{\frac{2 s (q - 1)}{p + q}} + 4 C^{' 2} λ R^{\frac{2 p + 2}{p + q}} d_{A} (R)^{\frac{2 s (q - 1)}{(p + q)}}$
		$+ 4 d_{A} (R)^{2} + (ℓ^{2} + 1) (ℓ_{1} Ψ_{x} + 2 Θ_{z})^{2} .$

Now by rearranging the terms we obtain,

		${∥ ∥ I_{ν} [A (f_{z, λ}) - A ({^f}^{R})] ∥ ∥}_{ν}^{2} + λ {∥ ∥ L (f_{z, λ} - {^f}^{R}) ∥ ∥}_{z, λ}^{R}$
	$\leq$	$⎛ ⎝ 8 C^{' 2} λ R^{\frac{2 p + 2}{p + q}} {∥ ∥ I_{ν} [A (f_{z, λ}) - A ({^f}^{R})] ∥ ∥}_{ν}^{\frac{2 s (q - 1)}{p + q}} - {∥ ∥ I_{ν} [A (f_{z, λ}) - A ({^f}^{R})] ∥ ∥}_{ν}^{2} ⎞ ⎠$
		$+ 16 C^{' 2} λ R^{\frac{2 p + 2}{p + q}} d_{A} (R)^{\frac{2 s (q - 1)}{(p + q)}} + 16 d_{A} (R)^{2} + 4 (ℓ^{2} + 1) (ℓ_{1} Ψ_{x} + 2 Θ_{z})^{2}$
	$\leq$	$sup τ \geq 0 (8 C^{' 2} λ R^{\frac{2 p + 2}{p + q}} τ^{\frac{2 s (q - 1)}{p + q}} - τ^{2}) + 16 C^{' 2} λ R^{\frac{2 p + 2}{p + q}} d_{A} (R)^{\frac{2 s (q - 1)}{(p + q)}}$
		$+ 16 d_{A} (R)^{2} + 4 (ℓ^{2} + 1) (ℓ_{1} Ψ_{x} + 2 Θ_{z})^{2}$
	$=$	$C^{'' 2} R^{\frac{2 p + 2}{(p + q) - s (q - 1)}} λ^{\frac{p + q}{(p + q) - s (q - 1)}} + 16 C^{' 2} λ R^{\frac{2 p + 2}{p + q}} d_{A} (R)^{\frac{2 s (q - 1)}{(p + q)}}$
		$+ 16 d_{A} (R)^{2} + 4 (ℓ^{2} + 1) (ℓ_{1} Ψ_{x} + 2 Θ_{z})^{2},$

where $C^{''} = {(\frac{C^{' 2} 8 s (q - 1)}{p + q})}^{\frac{(p + q)}{2 (p + q) - 2 s (q - 1)}} {(\frac{(p + q) - s (q - 1)}{s (q - 1)})}^{1 / 2}$ .

Hence we get,

(15)			${∥ ∥ I_{ν} [A (f_{z, λ}) - A ({^f}^{R})] ∥ ∥}_{L^{2} (X, ν; Y)}$
	$\leq$	$2 (ℓ + 1) (ℓ_{1} Ψ_{x} + 2 Θ_{z}) + C^{''} R^{\frac{p + 1}{(p + q) - s (q - 1)}} λ^{\frac{p + q}{2 (p + q) - 2 s (q - 1)}} + 4 C^{'} \sqrt{λ} R^{\frac{p + 1}{p + q}} d_{A} (R)^{\frac{s (q - 1)}{(p + q)}} + 4 d_{A} (R)$

and

(16)			${∥ ∥ L (f_{z, λ} - {^f}^{R}) ∥ ∥}_{H}^{R}$
	$\leq$	$\frac{1}{\sqrt{λ}} {2 (ℓ + 1) (ℓ_{1} Ψ_{x} + 2 Θ_{z}) + C^{''} R^{\frac{p + 1}{(p + q) - s (q - 1)}} λ^{\frac{p + q}{2 (p + q) - 2 s (q - 1)}} + 4 C^{'} \sqrt{λ} R^{\frac{p + 1}{p + q}} d_{A} (R)^{\frac{s (q - 1)}{(p + q)}} + 4 d_{A} (R)} .$

In case $d_{A} (R) = 0$ , for some fixed $¯ R$ , we get explicit bounds from (15) and (16) in terms of $m$ and $λ$ using (22) and (23).

For $d_{A} (R) \neq 0$ and $λ > 0$ , we optimize the bounds by balancing the terms in $R$ and $λ$ . Let $R = R (λ)$ solves the equation $Γ (R) := d_{A} (R) R^{- \frac{p + 1}{(p + q) - s (q - 1)}} = λ^{\frac{p + q}{2 (p + q) - 2 s (q - 1)}}$ . The function $Γ (R)$ is a non-vanishing decreasing function, and hence the inverse $Γ^{- 1}$ exists, and it is decreasing. With this, the error bounds can be expressed as

(17)			${∥ ∥ I_{ν} [A (f_{z, λ}) - A ({^f}^{R})] ∥ ∥}_{L^{2} (X, ν; Y)}$
	$\leq$	$2 (ℓ + 1) (ℓ_{1} Ψ_{x} + 2 Θ_{z}) + C^{'''} R (λ)^{\frac{p + 1}{(p + q) - s (q - 1)}} λ^{\frac{p + q}{2 (p + q) - 2 s (q - 1)}}$

and

(18)			${∥ ∥ L (f_{z, λ} - {^f}^{R}) ∥ ∥}_{H}^{R}$
	$\leq$	$\frac{1}{\sqrt{λ}} {2 (ℓ + 1) (ℓ_{1} Ψ_{x} + 2 Θ_{z}) + C^{'''} R (λ)^{\frac{p + 1}{(p + q) - s (q - 1)}} λ^{\frac{p + q}{2 (p + q) - 2 s (q - 1)}}} .$

where $C^{'''} = C^{''} + 4 C^{'} + 4$ .

Now, using (22) and (23) in (15), (16) we obtain with probability $1 - η$ ,

(19)

{∥ ∥ I_{ν} [A (f_{z, λ}) - A ({^f}^{R})] ∥ ∥}_{L^{2} (X, ν; Y)} \leq ˜ C ⎧ ⎨ ⎩ \sqrt{\frac{N (λ)}{m}} + R (λ)^{\frac{p + 1}{(p + q) - s (q - 1)}} λ^{\frac{p + q}{2 (p + q) - 2 s (q - 1)}} ⎫ ⎬ ⎭ log (\frac{4}{η})

and

(20)

where $˜ C$ depends on $ℓ$ , $ℓ_{1}$ , $p$ , $q$ , $s$ , $κ$ , $M$ , $Σ$ , $α$ .

Taking the mean using the inequality (5) we get,

		${∥ ∥ f_{z, λ} - {^f}^{R} ∥ ∥}_{H}^{R}$
	$\leq$	${∥ ∥ L (f_{z, λ} - {^f}^{R}) ∥ ∥}_{z, λ}^{R} {∥ ∥ f_{z, λ} - {^f}^{R} ∥ ∥}_{z, λ}^{R}$
	$\leq$	$α^{\frac{1}{p + 1}} {∥ ∥ L (f_{z, λ} - {^f}^{R}) ∥ ∥}_{z, λ}^{R} {∥ ∥ I_{ν} [A (f_{z, λ}) - A ({^f}^{R})] ∥ ∥}_{ν}^{\frac{s}{p + 1}}$
	$\leq$	$˜ C λ^{- \frac{p}{2 (p + 1)}} {⎧ ⎨ ⎩ \sqrt{\frac{N (λ)}{m}} + R (λ)^{\frac{p + 1}{(p + q) - s (q - 1)}} λ^{\frac{p + q}{2 (p + q) - 2 s (q - 1)}} ⎫ ⎬ ⎭}^{\frac{(p + s)}{(p + 1)}} log (\frac{4}{η}) .$

Hence the proof completes. ∎

4. Explicit rates under source condition

Here, we consider the smoothness of $^f$ by the source condition in terms of the operator $L^{- 1}$ to get the explicit rates in terms of $m$ and $λ$ . The smoothness parameter $r$ influences the rates of convergence, the larger $r$ (Smoother $^f$ ) will lead to the faster convergence rates.

Assumption 5 (General source condition).

The true solution $^f$ satisfy the condition:

^f - ¯ f = L^{- r} v and {∥ v ∥}_{H} \leq R^{†} .

The rates in Theorem 3.2 can be further simplified in two cases based on the behaviour of the distance function $d_{A} (R)$ .

In case $d_{A} (R) = 0$ , we get the explicit error bounds in terms of $λ$ and $m$ from Theorem 3.2. We get the function $d_{A} (¯ R) = 0$ when $^f - ¯ f = L^{- q} v$ and $∥ v ∥ \leq ¯ R$ for some $¯ R$ , i.e., $r \geq q$ . Consequently, this also implies ${^f}^{¯ R} =^f$ . So, the rates of convergence in the reconstruction norm and prediction norm can be given as:

	$P{∥∥Iν[A(fz,λ)−A(^f)]∥∥L2≤˜Cλ12$	$⎛ ⎝ \sqrt{\frac{N (λ)}{m λ}} + {¯ R}^{\frac{}{p + 1} (p + q) - s (q - 1)} λ^{\frac{s (q - 1)}{2 (p + q) - 2 s (q - 1)}} ⎞ ⎠ log (\frac{4}{η})} \geq 1 - η,$
	$P{∥∥fz,λ−^f∥∥H≤˜Cλs2(p+1)$	${⎛ ⎝ \sqrt{\frac{N (λ)}{m λ}} + {¯ R}^{\frac{}{p + 1} (p + q) - s (q - 1)} λ^{\frac{s (q - 1)}{2 (p + q) - 2 s (q - 1)}} ⎞ ⎠}^{\frac{(p + s)}{(p + 1)}} log (\frac{4}{η})} \geq 1 - η .$

By balancing the error terms, we choose the regularization parameter $λ$ in terms of the sample size $m$ . Consequently, we get the explicit rates of convergence in terms of the sample size.

Corollary 4.1.

Under the same assumptions of Theorem 3.2 and Assumption 5 with $r \geq q$ and the a-priori choice of the regularization parameter $λ^{*} = Θ_{N, u}^{- 1} (\frac{1}{\sqrt{m}})$ for $Θ_{N, u} (t) = \frac{t^{u}}{\sqrt{N (t)}}$ and $u = \frac{p + q}{2 (p + q) - 2 s (q - 1)}$ , for all $0 < η < 1$ , the following error estimates holds with confidence $1 - η$ :

{∥ ∥ I_{ν} [A (f_{z, λ}) - A (^f)] ∥ ∥}_{L} \leq ¯ ¯¯ ¯ C {(λ^{*})}^{u} log (\frac{4}{η})

and

{∥ ∥ f_{z, λ} -^f ∥ ∥}_{H} \leq

¯ ¯¯ ¯ C {(λ^{*})}^{\frac{2 u (s + p) - p}{2 (p + 1)}} log (\frac{4}{η}) = ¯ ¯¯ ¯ C {(λ^{*})}^{\frac{s q}{2 (p + q) - 2 s (q - 1)}} log (\frac{4}{η}) .

where $¯ ¯¯ ¯ C$ depends on $ℓ$ , $ℓ_{1}$ , $p$ , $q$ , $s$ , $κ$ , $M$ , $Σ$ , $α$ , $¯ R$ .

In case $d_{A} (R) \neq 0$ , we have to estimate the function $d_{A} (R)$ explicitly. We utilize the result of [6, Theorem 5.9] to estimate the distance function using the source condition. For the benchmark smoothness $q$ and the given smoothness $r$ , we assume that $q \geq r$ and $2 q \geq p + r$ . Then, under Assumption 5, we get the bound

d (R) \leq \frac{{(R^{†})}^{\frac{q}{q - r}}}{R^{\frac{r}{q - r}}}, R > 0.

Following the analysis in [6, Theorem 5.9] we also obtain the bounds for the distance function:

(21)

To bound the distance function $d_{A} (R)$ , we assume the following assumption in addition to Assumption 4 (iv) with the same parameters:

Assumption 6.

There exists a constant $β > 0$ such that

α {∥ ∥ I_{ν} [A (f) - A (^f)] ∥ ∥}_{ν}^{s} \leq β {∥ ∥ f -^f ∥ ∥}_{H_{- p}}

holds for all $f \in Q$ .

Now, according to Theorem 3.2 we have to solve the following equation in order to estimate $R$ in terms of $λ$ .

d_{A} (R) R^{- \frac{p + 1}{(p + q) - s (q - 1)}} = λ^{\frac{p + q}{2 (p + q) - 2 s (q - 1)}} .

Here, we get the estimate of $d_{A} (R)$ from Assumption 6 and the bound (21). By ignoring the multiplicative constant in Assumption 6 we get the following identity from the above equation:

\frac{{(R^{†})}^{\frac{q + p}{s (q - r)}}}{R^{\frac{r + p}{s (q - r)}}} R^{- \frac{p + 1}{(p + q) - s (q - 1)}} = λ^{\frac{p + q}{2 (p + q) - 2 s (q - 1)}} .

This yields

R (λ) = {(R^{†})}^{\frac{(p + q) - s (q - 1)}{(p + r) - s (r - 1)}} λ^{\frac{s (r - q)}{2 (p + r) - 2 s (r - 1)}}, R > 0.

We get the explicit error bound from Theorem 3.2 in terms of the sample size $m$ and $λ$ using the above dependency $λ \to R (λ)$ .

	$P{∥∥Iν[A(fz,λ)−A(^fR)]∥∥L2≤˜Cλ12$	$⎛ ⎝ \sqrt{\frac{N (λ)}{m λ}} + {(R^{†})}^{\frac{p + 1}{(p + r) - s (r - 1)}} λ^{\frac{s (r - 1)}{2 (p + r) - 2 s (r - 1)}} ⎞ ⎠ log (\frac{4}{η})} \geq 1 - η,$
	$P{∥∥fz,λ−^fR∥∥H≤˜Cλs2(p+1)$	${⎛ ⎝ \sqrt{\frac{N (λ)}{m λ}} + {(R^{†})}^{\frac{p + 1}{(p + r) - s (r - 1)}} λ^{\frac{s (r - 1)}{2 (p + r) - 2 s (r - 1)}} ⎞ ⎠}^{\frac{(p + s)}{(p + 1)}} log (\frac{4}{η})} \geq 1 - η .$

Now, we get the following error estimates using the identity $f_{z, λ} -^f = (f_{z, λ} - {^f}^{R}) + ({^f}^{R} -^f)$ and the estimates of distance functions in it.

	$P{∥∥Iν[A(fz,λ)−A(^f)]∥∥L2≤˜Cλ12$	$⎛ ⎝ \sqrt{\frac{N (λ)}{m λ}} + {(R^{†})}^{\frac{p + 1}{(p + r) - s (r - 1)}} λ^{\frac{s (r - 1)}{2 (p + r) - 2 s (r - 1)}} ⎞ ⎠ log (\frac{4}{η})} \geq 1 - η,$
	$P{∥∥fz,λ−^f∥∥H≤˜Cλs2(p+1)$	${⎛ ⎝ \sqrt{\frac{N (λ)}{m λ}} + {(R^{†})}^{\frac{p + 1}{(p + r) - s (r - 1)}} λ^{\frac{s (r - 1)}{2 (p + r) - 2 s (r - 1)}} ⎞ ⎠}^{\frac{(p + s)}{(p + 1)}} log (\frac{4}{η})} \geq 1 - η .$

By balancing the error terms, we choose the regularization parameter $λ$ in terms of the sample size $m$ . Consequently, we get the explicit rates of convergence in terms of the sample size.

Corollary 4.2.

Under the same assumptions of Theorem 3.2 and Assumption 5 with $r \leq q$ , $r + p \leq 2 q$ and the a-priori choice of the regularization parameter $λ^{*} = Θ_{N, u}^{- 1} (\frac{1}{\sqrt{m}})$ for $Θ_{N, u} (t) = \frac{t^{u}}{\sqrt{N (t)}}$ and $u = \frac{p + r}{2 (p + r) - 2 s (r - 1)}$ , for all $0 < η < 1$ , the following error estimates holds with confidence $1 - η$ :

{∥ ∥ I_{ν} [A (f_{z, λ}) - A (^f)] ∥ ∥}_{L} \leq ¯ ¯¯ ¯ C {(λ^{*})}^{u} log (\frac{4}{η})

and

{∥ ∥ f_{z, λ} -^f ∥ ∥}_{H} \leq

¯ ¯¯ ¯ C {(λ^{*})}^{\frac{2 u (s + p) - p}{2 (p + 1)}} log (\frac{4}{η}) = ¯ ¯¯ ¯ C {(λ^{*})}^{\frac{s r}{2 (p + r) - 2 s (r - 1)}} log (\frac{4}{η}) .

where $¯ ¯¯ ¯ C$ depends on $ℓ$ , $ℓ_{1}$ , $p$ , $q$ , $s$ , $κ$ , $M$ , $Σ$ , $α$ , $R^{†}$ .

The effective dimension exhibits different behaviour under the different choices kernel and unknown probability measures [8]. We consider the following decay conditions on it.

Assumption 7 (Polynomial decay condition).

Assume that for some $0 < b < 1$ there exists some positive constant $C > 0$ such that

N (λ) := T r ((T_{ν} + λ I)^{- 1} T_{ν}) \leq C λ^{- b}, \forall λ > 0.

Assumption 8 (Logarithmic decay condition).

Assume that there exists some positive constant $C > 0$ such that

N (λ) \leq C log (\frac{1}{λ}), \forall λ > 0.

Corollary 4.3.

Under the same assumptions of Theorem 3.2 and Assumption 5, 6, 7 with the a-priori choice of the regularization parameter $λ^{*} = m^{- \frac{1}{2 u + b}}$ , for all $0 < η < 1$ , the following error estimates hold with confidence $1 - η$ :

	${∥ ∥ f_{z, λ} -^f ∥ ∥}_{H} \leq$	$˜ C {(λ^{*})}^{\frac{2 u (s + p) - p}{2 (p + 1)}} log (\frac{4}{η}), u = \frac{p + q}{2 (p + q) - 2 s (q - 1)} % for r \geq q .$
	${∥ ∥ f_{z, λ} -^f ∥ ∥}_{H} \leq$	$¯ ¯¯ ¯ C {(λ^{*})}^{\frac{2 u (s + p) - p}{2 (p + 1)}} log (\frac{4}{η}), u = \frac{p + r}{2 (p + r) - 2 s (r - 1)} % for r \leq q, r + p \leq 2 q .$

Corollary 4.4.

Under the same assumptions of Theorem 3.2 and Assumption 5, 6, 8 with the a-priori choice of the regularization parameter $λ^{*} = {(\frac{log m}{m})}^{\frac{1}{2 r + 1}}$ , for all $0 < η < 1$ , we have the following convergence rates with confidence $1 - η$ :

	${∥ ∥ f_{z, λ} -^f ∥ ∥}_{H} \leq$	$˜ C {(λ^{*})}^{\frac{2 u (s + p) - p}{2 (p + 1)}} log (\frac{4}{η}), u = \frac{p + q}{2 (p + q) - 2 s (q - 1)} % for r \geq q .$
	${∥ ∥ f_{z, λ} -^f ∥ ∥}_{H} \leq$	$¯ ¯¯ ¯ C {(λ^{*})}^{\frac{2 u (s + p) - p}{2 (p + 1)}} log (\frac{4}{η}), u = \frac{p + r}{2 (p + r) - 2 s (r - 1)} % for r \leq q, r + p \leq 2 q .$

Now, we summarize the above results with conditions. We presented the rates of convergence under the different decay conditions on the effective dimension in Corollaries 4.3, 4.4. In both the corollaries, first, we discuss the case when the actual smoothness is higher than the benchmark smoothness of the true solution. In this case, we get the rates of convergence corresponding to the benchmark smoothness $q$ for $1 \leq q \leq min (r, 2 + p)$ , $0 < s \leq 1$ . Although, the actual smoothness is higher. Second, we discuss the case when the actual smoothness is lesser than the benchmark smoothness. Here, we get the error estimates corresponding to the actual smoothness $r$ for $max (1, p, r) \leq q \leq 2 + p$ , $0 < s \leq 1$ . So, the rates are the same as what we would get by directly using the smoothness information of the true solution. At the intersection point, when $q = r$ , then both rates coincide. So, this analysis suggests that if we consider the benchmark smoothness in the appropriate range, then we would get the best rates of convergence. We emphasize that our analysis covers the oversmoothing case, i.e., $r \leq 1$ .

Acknowledgements

This research has been partially funded by Deutsche Forschungsgemeinschaft (DFG) under The Berlin Mathematics Research Center MATH+ (EXC-2046/1 - 390685689).
The author is grateful for fruitful discussions with Peter Mathé about regularization in Hilbert Scales.

Appendix A Probabilistic bounds

Here, we present the standard perturbation bounds measured under the random sampling which can be obtained in [12].

Proposition A.1.

Suppose Assumption 1–3 hold true, then for $m \in N$ and $0 < η < 1$ , each of the following estimates holds with the confidence $1 - η$ ,

Θ_{z} := {∥ ∥ (T_{ν} + λ I)^{- 1 / 2} S_{x}^{*} ε ∥ ∥}_{H^{'}}^{- 1 / 2} \leq 2 ⎛ ⎝ \frac{κ M}{m \sqrt{λ}} + \sqrt{\frac{Σ^{2} N (λ)}{m}} ⎞ ⎠ log (\frac{2}{η}),

and

Ψ_{x} := {∥ ∥ (T_{ν} + λ I)^{- 1 / 2} (T_{x} - T_{ν}) ∥ ∥}_{L_{2} (H^{'})}^{- 1 / 2} \leq 2 ⎛ ⎝ \frac{κ^{2}}{m \sqrt{λ}} + \sqrt{\frac{κ^{2} N (λ)}{m}} ⎞ ⎠ log (\frac{2}{η}) .

Since $N (λ)$ is decreasing function of $λ$ and $λ \leq 1$ . Therefore, from condition (9) we obtain,

N (1) \leq N (λ) \leq m λ

which implies that

Now using this bound in Proposition A.1 we get with probability $1 - η$ ,

(22)

Θ_{z} \leq 2 (\frac{κ M}{N (1)} + Σ) \sqrt{\frac{N (λ)}{m}} log (\frac{4}{η})

and

(23)

Ψ_{x} \leq 2 (\frac{κ^{2}}{N (1)} + κ) \sqrt{\frac{N (λ)}{m}} log (\frac{4}{η}) .

References

[1] Johann Baumeister. Stable solution of inverse problems. Advanced Lectures in Mathematics, Friedrich Vieweg & Sohn, Braunschweig, 1987.
[2] Nicolai Bissantz, Thorsten Hohage, and Axel Munk. Consistency and rates of convergence of nonlinear Tikhonov regularization with random noise. Inverse Problems, 20(6):1773–1789, 2004.
[3] Gilles Blanchard and Nicole Mücke. Optimal rates for regularization of statistical inverse learning problems. Foundations of Computational Mathematics, 18(4):971–1013, 2018.
[4] Andrea Caponnetto and Ernesto De Vito. Optimal rates for the regularized least-squares algorithm. Foundations of Computational Mathematics, 7(3):331–368, 2007.
[5] Heinz W. Engl, Martin Hanke, and Andreas Neubauer. Regularization of inverse problems, volume 375. Math. Appl., Kluwer Academic Publishers Group, Dordrecht, The Netherlands, 1996.
[6] Bernd Hofmann and Peter Mathé. Analysis of profile functions for general linear regularization methods. SIAM Journal on Numerical Analysis, 45(3):1122–1141, 2007.
[7] Thorsten Hohage and Mihaela Pricop. Nonlinear Tikhonov regularization in Hilbert scales for inverse boundary value problems with random noise. Inverse Problems and Imaging, 2:271–290, 2008.
[8] Shuai Lu, Peter Mathé, and Sergei V. Pereverzev. Balancing principle in supervised learning for a general regularization scheme. Applied and Computational Harmonic Analysis, 48(1):123–148, 2020.
[9] Charles A. Micchelli and Massimiliano Pontil. On learning vector-valued functions. Neural Computation, 17(1):177–204, 2005.
[10] Nicole Mücke and Enrico Reiss. Stochastic Gradient Descent in Hilbert Scales: Smoothness, Preconditioning and Earlier Stopping. arXiv preprint arXiv:2006.10840, 2020.
[11] Abhishake Rastogi. Tikhonov regularization with oversmoothing penalty for nonlinear statistical inverse problems. Communications on Pure and Applied Analysis, 19(8):4111, 2020.
[12] Abhishake Rastogi, Gilles Blanchard, and Peter Mathé. Convergence analysis of Tikhonov regularization for non-linear statistical inverse problems. Electronic Journal of Statistics, 14(2):2798–2841, 2020.
[13] Abhishake Rastogi and Peter Mathé. Inverse learning in Hilbert scales. arXiv preprint arXiv:2002.10208, 2020.
[14] Thomas Schuster, Barbara Kaltenbacher, Bernd Hofmann, and Kamil S. Kazimierski. Regularization methods in Banach spaces, volume 10 of Radon Series on Computational and Applied Mathematics. Walter de Gruyter GmbH & Co. KG, Berlin, 2012.
[15] Steve Smale and Ding-Xuan Zhou. Estimating the approximation error in learning theory. Analysis and Applications, 01(01):17–41, 2003.
[16] Frank Werner and Bernd Hofmann. Convergence analysis of (statistical) inverse problems under conditional stability estimates. Inverse Problems, 36(1):015004, 2019.
[17] Tong Zhang. Effective dimension and generalization of kernel learning. In Proceedings of the 15th International Conference on Neural Information Processing Systems, pages 454–461, MIT Press, Cambridge, MA, 2002.

Statistical Inverse Problems in Hilbert Scales

Abstract.

Key words and phrases:

2010 Mathematics Subject Classification:

1. Introduction

2. Notation and Assumptions

Assumption 1 (The true solution).

Assumption 2 (Noise condition).

Assumption 3 (Vector valued reproducing kernel Hilbert space H′).

Assumption 4.

3. Convergence analysis

Definition 3.1 (Approximate source condition).

Theorem 3.2.

Proof.

4. Explicit rates under source condition

Assumption 5 (General source condition).

Corollary 4.1.

Assumption 6.

Corollary 4.2.

Assumption 7 (Polynomial decay condition).

Assumption 8 (Logarithmic decay condition).

Corollary 4.3.

Corollary 4.4.

Acknowledgements

Appendix A Probabilistic bounds

Proposition A.1.

References

Assumption 3 (Vector valued reproducing kernel Hilbert space $H^{'}$ ).