在非参数回归设置中,我们构建了一个估计器,该估计器是一个连续的函数,以高概率插值数据点,同时在H \ h \'较大级别的平均平方风险下达到最小的最佳速率,以适应未知的平滑度。
translated by 谷歌翻译
We propose a new method for estimating the minimizer $\boldsymbol{x}^*$ and the minimum value $f^*$ of a smooth and strongly convex regression function $f$ from the observations contaminated by random noise. Our estimator $\boldsymbol{z}_n$ of the minimizer $\boldsymbol{x}^*$ is based on a version of the projected gradient descent with the gradient estimated by a regularized local polynomial algorithm. Next, we propose a two-stage procedure for estimation of the minimum value $f^*$ of regression function $f$. At the first stage, we construct an accurate enough estimator of $\boldsymbol{x}^*$, which can be, for example, $\boldsymbol{z}_n$. At the second stage, we estimate the function value at the point obtained in the first stage using a rate optimal nonparametric procedure. We derive non-asymptotic upper bounds for the quadratic risk and optimization error of $\boldsymbol{z}_n$, and for the risk of estimating $f^*$. We establish minimax lower bounds showing that, under certain choice of parameters, the proposed algorithms achieve the minimax optimal rates of convergence on the class of smooth and strongly convex functions.
translated by 谷歌翻译
We consider the problem of estimating the optimal transport map between a (fixed) source distribution $P$ and an unknown target distribution $Q$, based on samples from $Q$. The estimation of such optimal transport maps has become increasingly relevant in modern statistical applications, such as generative modeling. At present, estimation rates are only known in a few settings (e.g. when $P$ and $Q$ have densities bounded above and below and when the transport map lies in a H\"older class), which are often not reflected in practice. We present a unified methodology for obtaining rates of estimation of optimal transport maps in general function spaces. Our assumptions are significantly weaker than those appearing in the literature: we require only that the source measure $P$ satisfies a Poincar\'e inequality and that the optimal map be the gradient of a smooth convex function that lies in a space whose metric entropy can be controlled. As a special case, we recover known estimation rates for bounded densities and H\"older transport maps, but also obtain nearly sharp results in many settings not covered by prior work. For example, we provide the first statistical rates of estimation when $P$ is the normal distribution and the transport map is given by an infinite-width shallow neural network.
translated by 谷歌翻译
Testing the significance of a variable or group of variables $X$ for predicting a response $Y$, given additional covariates $Z$, is a ubiquitous task in statistics. A simple but common approach is to specify a linear model, and then test whether the regression coefficient for $X$ is non-zero. However, when the model is misspecified, the test may have poor power, for example when $X$ is involved in complex interactions, or lead to many false rejections. In this work we study the problem of testing the model-free null of conditional mean independence, i.e. that the conditional mean of $Y$ given $X$ and $Z$ does not depend on $X$. We propose a simple and general framework that can leverage flexible nonparametric or machine learning methods, such as additive models or random forests, to yield both robust error control and high power. The procedure involves using these methods to perform regressions, first to estimate a form of projection of $Y$ on $X$ and $Z$ using one half of the data, and then to estimate the expected conditional covariance between this projection and $Y$ on the remaining half of the data. While the approach is general, we show that a version of our procedure using spline regression achieves what we show is the minimax optimal rate in this nonparametric testing problem. Numerical experiments demonstrate the effectiveness of our approach both in terms of maintaining Type I error control, and power, compared to several existing approaches.
translated by 谷歌翻译
在因果推理和强盗文献中,基于观察数据的线性功能估算线性功能的问题是规范的。我们分析了首先估计治疗效果函数的广泛的两阶段程序,然后使用该数量来估计线性功能。我们证明了此类过程的均方误差上的非反应性上限:这些边界表明,为了获得非反应性最佳程序,应在特定加权$ l^2 $中最大程度地估算治疗效果的误差。 -规范。我们根据该加权规范的约束回归分析了两阶段的程序,并通过匹配非轴突局部局部最小值下限,在有限样品中建立了实例依赖性最优性。这些结果表明,除了取决于渐近效率方差之外,最佳的非质子风险除了取决于样本量支持的最富有函数类别的真实结果函数与其近似类别之间的加权规范距离。
translated by 谷歌翻译
对于高维和非参数统计模型,速率最优估计器平衡平方偏差和方差是一种常见的现象。虽然这种平衡被广泛观察到,但很少知道是否存在可以避免偏差和方差之间的权衡的方法。我们提出了一般的策略,以获得对任何估计方差的下限,偏差小于预先限定的界限。这表明偏差差异折衷的程度是不可避免的,并且允许量化不服从其的方法的性能损失。该方法基于许多抽象的下限,用于涉及关于不同概率措施的预期变化以及诸如Kullback-Leibler或Chi-Sque-diversence的信息措施的变化。其中一些不平等依赖于信息矩阵的新概念。在该物品的第二部分中,将抽象的下限应用于几种统计模型,包括高斯白噪声模型,边界估计问题,高斯序列模型和高维线性回归模型。对于这些特定的统计应用,发生不同类型的偏差差异发生,其实力变化很大。对于高斯白噪声模型中集成平方偏置和集成方差之间的权衡,我们将较低界限的一般策略与减少技术相结合。这允许我们将原始问题与估计的估计器中的偏差折衷联动,以更简单的统计模型中具有额外的对称性属性。在高斯序列模型中,发生偏差差异的不同相位转换。虽然偏差和方差之间存在非平凡的相互作用,但是平方偏差的速率和方差不必平衡以实现最小估计速率。
translated by 谷歌翻译
This paper investigates the stability of deep ReLU neural networks for nonparametric regression under the assumption that the noise has only a finite p-th moment. We unveil how the optimal rate of convergence depends on p, the degree of smoothness and the intrinsic dimension in a class of nonparametric regression functions with hierarchical composition structure when both the adaptive Huber loss and deep ReLU neural networks are used. This optimal rate of convergence cannot be obtained by the ordinary least squares but can be achieved by the Huber loss with a properly chosen parameter that adapts to the sample size, smoothness, and moment parameters. A concentration inequality for the adaptive Huber ReLU neural network estimators with allowable optimization errors is also derived. To establish a matching lower bound within the class of neural network estimators using the Huber loss, we employ a different strategy from the traditional route: constructing a deep ReLU network estimator that has a better empirical loss than the true function and the difference between these two functions furnishes a low bound. This step is related to the Huberization bias, yet more critically to the approximability of deep ReLU networks. As a result, we also contribute some new results on the approximation theory of deep ReLU neural networks.
translated by 谷歌翻译
We consider the problem of estimating a multivariate function $f_0$ of bounded variation (BV), from noisy observations $y_i = f_0(x_i) + z_i$ made at random design points $x_i \in \mathbb{R}^d$, $i=1,\ldots,n$. We study an estimator that forms the Voronoi diagram of the design points, and then solves an optimization problem that regularizes according to a certain discrete notion of total variation (TV): the sum of weighted absolute differences of parameters $\theta_i,\theta_j$ (which estimate the function values $f_0(x_i),f_0(x_j)$) at all neighboring cells $i,j$ in the Voronoi diagram. This is seen to be equivalent to a variational optimization problem that regularizes according to the usual continuum (measure-theoretic) notion of TV, once we restrict the domain to functions that are piecewise constant over the Voronoi diagram. The regression estimator under consideration hence performs (shrunken) local averaging over adaptively formed unions of Voronoi cells, and we refer to it as the Voronoigram, following the ideas in Koenker (2005), and drawing inspiration from Tukey's regressogram (Tukey, 1961). Our contributions in this paper span both the conceptual and theoretical frontiers: we discuss some of the unique properties of the Voronoigram in comparison to TV-regularized estimators that use other graph-based discretizations; we derive the asymptotic limit of the Voronoi TV functional; and we prove that the Voronoigram is minimax rate optimal (up to log factors) for estimating BV functions that are essentially bounded.
translated by 谷歌翻译
本文研究了基于Laplacian Eigenmaps(Le)的基于Laplacian EIGENMAPS(PCR-LE)的主要成分回归的统计性质,这是基于Laplacian Eigenmaps(Le)的非参数回归的方法。 PCR-LE通过投影观察到的响应的向量$ {\ bf y} =(y_1,\ ldots,y_n)$ to to changbood图表拉普拉斯的某些特征向量跨越的子空间。我们表明PCR-Le通过SoboLev空格实现了随机设计回归的最小收敛速率。在设计密度$ P $的足够平滑条件下,PCR-le达到估计的最佳速率(其中已知平方$ l ^ 2 $ norm的最佳速率为$ n ^ { - 2s /(2s + d) )} $)和健美的测试($ n ^ { - 4s /(4s + d)$)。我们还表明PCR-LE是\ EMPH {歧管Adaptive}:即,我们考虑在小型内在维度$ M $的歧管上支持设计的情况,并为PCR-LE提供更快的界限Minimax估计($ n ^ { - 2s /(2s + m)$)和测试($ n ^ { - 4s /(4s + m)$)收敛率。有趣的是,这些利率几乎总是比图形拉普拉斯特征向量的已知收敛率更快;换句话说,对于这个问题的回归估计的特征似乎更容易,统计上讲,而不是估计特征本身。我们通过经验证据支持这些理论结果。
translated by 谷歌翻译
在负面的感知问题中,我们给出了$ n $数据点$({\ boldsymbol x} _i,y_i)$,其中$ {\ boldsymbol x} _i $是$ d $ -densional vector和$ y_i \ in \ { + 1,-1 \} $是二进制标签。数据不是线性可分离的,因此我们满足自己的内容,以找到最大的线性分类器,具有最大的\ emph {否定}余量。换句话说,我们想找到一个单位常规矢量$ {\ boldsymbol \ theta} $,最大化$ \ min_ {i \ le n} y_i \ langle {\ boldsymbol \ theta},{\ boldsymbol x} _i \ rangle $ 。这是一个非凸优化问题(它相当于在Polytope中找到最大标准矢量),我们在两个随机模型下研究其典型属性。我们考虑比例渐近,其中$ n,d \ to \ idty $以$ n / d \ to \ delta $,并在最大边缘$ \ kappa _ {\ text {s}}(\ delta)上证明了上限和下限)$或 - 等效 - 在其逆函数$ \ delta _ {\ text {s}}(\ kappa)$。换句话说,$ \ delta _ {\ text {s}}(\ kappa)$是overparametization阈值:以$ n / d \ le \ delta _ {\ text {s}}(\ kappa) - \ varepsilon $一个分类器实现了消失的训练错误,具有高概率,而以$ n / d \ ge \ delta _ {\ text {s}}(\ kappa)+ \ varepsilon $。我们在$ \ delta _ {\ text {s}}(\ kappa)$匹配,以$ \ kappa \ to - \ idty $匹配。然后,我们分析了线性编程算法来查找解决方案,并表征相应的阈值$ \ delta _ {\ text {lin}}(\ kappa)$。我们观察插值阈值$ \ delta _ {\ text {s}}(\ kappa)$和线性编程阈值$ \ delta _ {\ text {lin {lin}}(\ kappa)$之间的差距,提出了行为的问题其他算法。
translated by 谷歌翻译
我们为基于Kaplan-Meier的最近的邻居和内核存活率估计值建立了第一个非矩形误差界限,其中特征向量位于度量空间中。我们的边界意味着这些非参数估计器的强度速率,并且最多可与对数因子匹配有条件的CDF估计的现有下限。我们的证明策略还为纳尔逊 - 阿伦累积危害估计量的最近的邻居和内核变体提供了非矩形保证。我们在四个数据集上实验比较这些方法。我们发现,对于内核存活率估计量,核心的一个不错的选择是使用随机生存森林学习的。
translated by 谷歌翻译
本文为信号去噪提供了一般交叉验证框架。然后将一般框架应用于非参数回归方法,例如趋势过滤和二元推车。然后显示所得到的交叉验证版本以获得最佳调谐的类似物所熟知的几乎相同的收敛速度。没有任何先前的趋势过滤或二元推车的理论分析。为了说明框架的一般性,我们还提出并研究了两个基本估算器的交叉验证版本;套索用于高维线性回归和矩阵估计的奇异值阈值阈值。我们的一般框架是由Chatterjee和Jafarov(2015)的想法的启发,并且可能适用于使用调整参数的广泛估算方法。
translated by 谷歌翻译
我们研究了随机近似程序,以便基于观察来自ergodic Markov链的长度$ n $的轨迹来求近求解$ d -dimension的线性固定点方程。我们首先表现出$ t _ {\ mathrm {mix}} \ tfrac {n}} \ tfrac {n}} \ tfrac {d}} \ tfrac {d} {n} $的非渐近性界限。$ t _ {\ mathrm {mix $是混合时间。然后,我们证明了一种在适当平均迭代序列上的非渐近实例依赖性,具有匹配局部渐近最小的限制的领先术语,包括对参数$的敏锐依赖(d,t _ {\ mathrm {mix}}) $以高阶术语。我们将这些上限与非渐近Minimax的下限补充,该下限是建立平均SA估计器的实例 - 最优性。我们通过Markov噪声的政策评估导出了这些结果的推导 - 覆盖了所有$ \ lambda \中的TD($ \ lambda $)算法,以便[0,1)$ - 和线性自回归模型。我们的实例依赖性表征为HyperParameter调整的细粒度模型选择程序的设计开放了门(例如,在运行TD($ \ Lambda $)算法时选择$ \ lambda $的值)。
translated by 谷歌翻译
量化概率分布之间的异化的统计分歧(SDS)是统计推理和机器学习的基本组成部分。用于估计这些分歧的现代方法依赖于通过神经网络(NN)进行参数化经验变化形式并优化参数空间。这种神经估算器在实践中大量使用,但相应的性能保证是部分的,并呼吁进一步探索。特别是,涉及的两个错误源之间存在基本的权衡:近似和经验估计。虽然前者需要NN课程富有富有表现力,但后者依赖于控制复杂性。我们通过非渐近误差界限基于浅NN的基于浅NN的估计的估算权,重点关注四个流行的$ \ mathsf {f} $ - 分离 - kullback-leibler,chi squared,squared hellinger,以及总变异。我们分析依赖于实证过程理论的非渐近功能近似定理和工具。界限揭示了NN尺寸和样品数量之间的张力,并使能够表征其缩放速率,以确保一致性。对于紧凑型支持的分布,我们进一步表明,上述上三次分歧的神经估算器以适当的NN生长速率接近Minimax率 - 最佳,实现了对数因子的参数速率。
translated by 谷歌翻译
尽管有许多有吸引力的财产,但内核方法受到维度的诅咒受到严重影响。例如,在$ \ mathbb {r} ^ d $的内部产品内核的情况下,再现内核希尔伯特空间(RKHS)规范对于依赖于小方向子集(RIDGE函数)的功能往往非常大。相应地,使用内核方法难以学习这样的功能。这种观察结果有动力研究内核方法的概括,由此rkhs规范 - 它等同于加权$ \ ell_2 $ norm - 被加权函数$ \ ell_p $ norm替换,我们将其称为$ \ mathcal {f} _p $ norm。不幸的是,这些方法的陶油是不清楚的。内核技巧不可用,最大限度地减少这些规范要求解决无限维凸面问题。我们将随机特征近似于这些规范,表明,对于$ p> 1 $,近似于原始学习问题所需的随机功能的数量是由样本大小的多项式的上限。因此,使用$ \ mathcal {f} _p $ norms在这些情况下是易行的。我们介绍了一种基于双重均匀浓度的证明技术,这可以对超分子化模型的研究更广泛。对于$ p = 1 $,我们对随机功能的保证近似分解。我们证明了使用$ \ mathcal {f} _1 $ norm的学习是在随机减少的$ \ mathsf {np} $ - 基于噪音的半个空间问题的问题。
translated by 谷歌翻译
当回归函数属于标准的平滑类时,由衍生物的单变量函数组成,衍生物到达$(\ gamma + 1)$ th由Action Anclople或Ae界定的常见常数,众所周知,最小的收敛速率均值平均错误(MSE)是$ \左(\ FRAC {\ SIGMA ^ {2}} {n} \右)^ {\ frac {2 \ gamma + 2} {2 \ gamma + 3}} $ \伽玛$是有限的,样本尺寸$ n \ lightarrow \ idty $。从一个不可思议的观点来看,考虑有限$ N $,本文显示:对于旧的H \“较旧的和SoboLev类,最低限度最佳速率是$ \ frac {\ sigma ^ {2} \ left(\ gamma \ vee1 \右)$ \ frac {n} {\ sigma ^ {2}} \ precsim \ left(\ gamma \ vee1 \右)^ {2 \ gamma + 3} $和$ \ left(\ frac {\ sigma ^ {2}} {n} \右)^ {\ frac {2 \ gamma + 2} $ \ r \ frac {n} {\ sigma ^ {2}}} \ succsim \ left(\ gamma \ vee1 \右)^ {2 \ gamma + 3} $。为了建立这些结果,我们在覆盖和覆盖号码上获得上下界限,以获得$ k的广义H \“较旧的班级$ th($ k = 0,...,\ gamma $)衍生物由上面的参数$ r_ {k} $和$ \ gamma $ th衍生物是$ r _ {\ gamma + 1} - $ lipschitz (以及广义椭圆形的平滑功能)。我们的界限锐化了标准类的古典度量熵结果,并赋予$ \ gamma $和$ r_ {k} $的一般依赖。通过在$ r_ {k} = 1 $以下派生MIMIMAX最佳MSE率,$ r_ {k} \ LEQ \ left(k-1 \右)!$和$ r_ {k} = k!$(与后两个在我们的介绍中有动机的情况)在我们的新熵界的帮助下,我们展示了一些有趣的结果,无法在文献中的现有熵界显示。对于H \“较旧的$ D-$变化函数,我们的结果表明,归一渐近率$ \左(\ frac {\ sigma ^ {2}} {n}右)^ {\ frac {2 \ Gamma + 2} {2 \ Gamma + 2 + D}} $可能是有限样本中的MSE低估。
translated by 谷歌翻译
We consider the problem of estimating s 2 when s belongs to some separable Hilbert space and one observes the Gaussian process Y t = s t + σL t , for all t ∈ , where L is some Gaussian isonormal process. This framework allows us in particular to consider the classical "Gaussian sequence model" for which = l 2 * and L t = λ≥1 t λ ε λ , where ε λ λ≥1 is a sequence of i.i.d. standard normal variables. Our approach consists in considering some at most countable families of finite-dimensional linear subspaces of (the models) and then using model selection via some conveniently penalized least squares criterion to build new estimators of s 2 . We prove a general nonasymptotic risk bound which allows us to show that such penalized estimators are adaptive on a variety of collections of sets for the parameter s, depending on the family of models from which they are built. In particular, in the context of the Gaussian sequence model, a convenient choice of the family of models allows defining estimators which are adaptive over collections of hyperrectangles, ellipsoids, l p -bodies or Besov bodies. We take special care to describe the conditions under which the penalized estimator is efficient when the level of noise σ tends to zero. Our construction is an alternative to the one by Efroïmovich and Low for hyperrectangles and provides new results otherwise.
translated by 谷歌翻译
现代神经网络通常以强烈的过度构造状态运行:它们包含许多参数,即使实际标签被纯粹随机的标签代替,它们也可以插入训练集。尽管如此,他们在看不见的数据上达到了良好的预测错误:插值训练集并不会导致巨大的概括错误。此外,过度散色化似乎是有益的,因为它简化了优化景观。在这里,我们在神经切线(NT)制度中的两层神经网络的背景下研究这些现象。我们考虑了一个简单的数据模型,以及各向同性协变量的矢量,$ d $尺寸和$ n $隐藏的神经元。我们假设样本量$ n $和尺寸$ d $都很大,并且它们在多项式上相关。我们的第一个主要结果是对过份术的经验NT内核的特征结构的特征。这种表征意味着必然的表明,经验NT内核的最低特征值在$ ND \ gg n $后立即从零界限,因此网络可以在同一制度中精确插值任意标签。我们的第二个主要结果是对NT Ridge回归的概括误差的表征,包括特殊情况,最小值-ULL_2 $ NORD插值。我们证明,一旦$ nd \ gg n $,测试误差就会被内核岭回归之一相对于无限宽度内核而近似。多项式脊回归的误差依次近似后者,从而通过与激活函数的高度组件相关的“自我诱导的”项增加了正则化参数。多项式程度取决于样本量和尺寸(尤其是$ \ log n/\ log d $)。
translated by 谷歌翻译
我们考虑了一个通用的非线性模型,其中信号是未知(可能增加的,可能增加的特征数量)的有限混合物,该特征是由由真实非线性参数参数化的连续字典发出的。在连续或离散设置中使用高斯(可能相关)噪声观察信号。我们提出了一种网格优化方法,即一种不使用参数空间上任何离散化方案的方法来估计特征的非线性参数和混合物的线性参数。我们使用有关离网方法的几何形状的最新结果,在真实的基础非线性参数上给出最小的分离,以便可以构建插值证书函数。还使用尾部界限,用于高斯过程的上流,我们将预测误差限制为高概率。假设可以构建证书函数,我们的预测误差绑定到日志 - 因线性回归模型中LASSO预测器所达到的速率类似。我们还建立了收敛速率,以高概率量化线性和非线性参数的估计质量。
translated by 谷歌翻译
我们研究了称为“乐观速率”(Panchenko 2002; Srebro等,2010)的统一收敛概念,用于与高斯数据的线性回归。我们的精致分析避免了现有结果中的隐藏常量和对数因子,这已知在高维设置中至关重要,特别是用于了解插值学习。作为一个特殊情况,我们的分析恢复了Koehler等人的保证。(2021年),在良性过度的过度条件下,严格地表征了低规范内插器的人口风险。但是,我们的乐观速度绑定还分析了具有任意训练错误的预测因子。这使我们能够在随机设计下恢复脊和套索回归的一些经典统计保障,并有助于我们在过度参数化制度中获得精确了解近端器的过度风险。
translated by 谷歌翻译