智能论文笔记

Bless and curse of smoothness and phase transitions in nonparametric regressions: a nonasymptotic perspective

Ying Zhu

分类：机器学习

2021-12-07

当回归函数属于标准的平滑类时，由衍生物的单变量函数组成，衍生物到达$（\ gamma + 1）$ th由Action Anclople或Ae界定的常见常数，众所周知，最小的收敛速率均值平均错误（MSE）是$ \左（\ FRAC {\ SIGMA ^ {2}} {n} \右）^ {\ frac {2 \ gamma + 2} {2 \ gamma + 3}} $ \伽玛$是有限的，样本尺寸$ n \ lightarrow \ idty $。从一个不可思议的观点来看，考虑有限$ N $，本文显示：对于旧的H \“较旧的和SoboLev类，最低限度最佳速率是$ \ frac {\ sigma ^ {2} \ left（\ gamma \ vee1 \右）$ \ frac {n} {\ sigma ^ {2}} \ precsim \ left（\ gamma \ vee1 \右）^ {2 \ gamma + 3} $和$ \ left（\ frac {\ sigma ^ {2}} {n} \右）^ {\ frac {2 \ gamma + 2} $ \ r \ frac {n} {\ sigma ^ {2}}} \ succsim \ left（\ gamma \ vee1 \右）^ {2 \ gamma + 3} $。为了建立这些结果，我们在覆盖和覆盖号码上获得上下界限，以获得$ k的广义H \“较旧的班级$ th（$ k = 0，...，\ gamma $）衍生物由上面的参数$ r_ {k} $和$ \ gamma $ th衍生物是$ r _ {\ gamma + 1} - $ lipschitz （以及广义椭圆形的平滑功能）。我们的界限锐化了标准类的古典度量熵结果，并赋予$ \ gamma $和$ r_ {k} $的一般依赖。通过在$ r_ {k} = 1 $以下派生MIMIMAX最佳MSE率，$ r_ {k} \ LEQ \ left（k-1 \右）！$和$ r_ {k} = k！$（与后两个在我们的介绍中有动机的情况）在我们的新熵界的帮助下，我们展示了一些有趣的结果，无法在文献中的现有熵界显示。对于H \“较旧的$ D-$变化函数，我们的结果表明，归一渐近率$ \左（\ frac {\ sigma ^ {2}} {n}右）^ {\ frac {2 \ Gamma + 2} {2 \ Gamma + 2 + D}} $可能是有限样本中的MSE低估。

translated by 谷歌翻译

Off-policy estimation of linear functionals: Non-asymptotic theory for semi-parametric efficiency

Wenlong Mou , Martin J. Wainwright , Peter L. Bartlett

分类： (统计)机器学习

2022-09-26

在因果推理和强盗文献中，基于观察数据的线性功能估算线性功能的问题是规范的。我们分析了首先估计治疗效果函数的广泛的两阶段程序，然后使用该数量来估计线性功能。我们证明了此类过程的均方误差上的非反应性上限：这些边界表明，为了获得非反应性最佳程序，应在特定加权$ l^2 $中最大程度地估算治疗效果的误差。 -规范。我们根据该加权规范的约束回归分析了两阶段的程序，并通过匹配非轴突局部局部最小值下限，在有限样品中建立了实例依赖性最优性。这些结果表明，除了取决于渐近效率方差之外，最佳的非质子风险除了取决于样本量支持的最富有函数类别的真实结果函数与其近似类别之间的加权规范距离。

translated by 谷歌翻译

Policy evaluation from a single path: Multi-step methods, mixing and mis-specification

Yaqi Duan , Martin J. Wainwright

分类： (统计)机器学习 | 机器学习

2022-11-07

We study non-parametric estimation of the value function of an infinite-horizon $\gamma$-discounted Markov reward process (MRP) using observations from a single trajectory. We provide non-asymptotic guarantees for a general family of kernel-based multi-step temporal difference (TD) estimates, including canonical $K$-step look-ahead TD for $K = 1, 2, \ldots$ and the TD$(\lambda)$ family for $\lambda \in [0,1)$ as special cases. Our bounds capture its dependence on Bellman fluctuations, mixing time of the Markov chain, any mis-specification in the model, as well as the choice of weight function defining the estimator itself, and reveal some delicate interactions between mixing time and model mis-specification. For a given TD method applied to a well-specified model, its statistical error under trajectory data is similar to that of i.i.d. sample transition pairs, whereas under mis-specification, temporal dependence in data inflates the statistical error. However, any such deterioration can be mitigated by increased look-ahead. We complement our upper bounds by proving minimax lower bounds that establish optimality of TD-based methods with appropriately chosen look-ahead and weighting, and reveal some fundamental differences between value function estimation and ordinary non-parametric regression.

translated by 谷歌翻译

On lower bounds for the bias-variance trade-off

Alexis Derumigny , Johannes Schmidt-Hieber

分类： (统计)机器学习

2020-05-30

对于高维和非参数统计模型，速率最优估计器平衡平方偏差和方差是一种常见的现象。虽然这种平衡被广泛观察到，但很少知道是否存在可以避免偏差和方差之间的权衡的方法。我们提出了一般的策略，以获得对任何估计方差的下限，偏差小于预先限定的界限。这表明偏差差异折衷的程度是不可避免的，并且允许量化不服从其的方法的性能损失。该方法基于许多抽象的下限，用于涉及关于不同概率措施的预期变化以及诸如Kullback-Leibler或Chi-Sque-diversence的信息措施的变化。其中一些不平等依赖于信息矩阵的新概念。在该物品的第二部分中，将抽象的下限应用于几种统计模型，包括高斯白噪声模型，边界估计问题，高斯序列模型和高维线性回归模型。对于这些特定的统计应用，发生不同类型的偏差差异发生，其实力变化很大。对于高斯白噪声模型中集成平方偏置和集成方差之间的权衡，我们将较低界限的一般策略与减少技术相结合。这允许我们将原始问题与估计的估计器中的偏差折衷联动，以更简单的统计模型中具有额外的对称性属性。在高斯序列模型中，发生偏差差异的不同相位转换。虽然偏差和方差之间存在非平凡的相互作用，但是平方偏差的速率和方差不必平衡以实现最小估计速率。

translated by 谷歌翻译

Minimax Optimal Regression over Sobolev Spaces via Laplacian Eigenmaps on Neighborhood Graphs

Alden Green , Sivaraman Balakrishnan , Ryan J. Tibshirani

分类： (统计)机器学习

2021-11-14

本文研究了基于Laplacian Eigenmaps（Le）的基于Laplacian EIGENMAPS（PCR-LE）的主要成分回归的统计性质，这是基于Laplacian Eigenmaps（Le）的非参数回归的方法。 PCR-LE通过投影观察到的响应的向量$ {\ bf y} =（y_1，\ ldots，y_n）$ to to changbood图表拉普拉斯的某些特征向量跨越的子空间。我们表明PCR-Le通过SoboLev空格实现了随机设计回归的最小收敛速率。在设计密度$ P $的足够平滑条件下，PCR-le达到估计的最佳速率（其中已知平方$ l ^ 2 $ norm的最佳速率为$ n ^ { - 2s /（2s + d））} $）和健美的测试（$ n ^ { - 4s /（4s + d）$）。我们还表明PCR-LE是\ EMPH {歧管Adaptive}：即，我们考虑在小型内在维度$ M $的歧管上支持设计的情况，并为PCR-LE提供更快的界限Minimax估计（$ n ^ { - 2s /（2s + m）$）和测试（$ n ^ { - 4s /（4s + m）$）收敛率。有趣的是，这些利率几乎总是比图形拉普拉斯特征向量的已知收敛率更快;换句话说，对于这个问题的回归估计的特征似乎更容易，统计上讲，而不是估计特征本身。我们通过经验证据支持这些理论结果。

translated by 谷歌翻译

How do noise tails impact on deep ReLU networks?

Jianqing Fan , Yihong Gu , Wen-Xin Zhou

分类： (统计)机器学习

2022-03-20

This paper investigates the stability of deep ReLU neural networks for nonparametric regression under the assumption that the noise has only a finite p-th moment. We unveil how the optimal rate of convergence depends on p, the degree of smoothness and the intrinsic dimension in a class of nonparametric regression functions with hierarchical composition structure when both the adaptive Huber loss and deep ReLU neural networks are used. This optimal rate of convergence cannot be obtained by the ordinary least squares but can be achieved by the Huber loss with a properly chosen parameter that adapts to the sample size, smoothness, and moment parameters. A concentration inequality for the adaptive Huber ReLU neural network estimators with allowable optimization errors is also derived. To establish a matching lower bound within the class of neural network estimators using the Huber loss, we employ a different strategy from the traditional route: constructing a deep ReLU network estimator that has a better empirical loss than the true function and the difference between these two functions furnishes a low bound. This step is related to the Huberization bias, yet more critically to the approximability of deep ReLU networks. As a result, we also contribute some new results on the approximation theory of deep ReLU neural networks.

translated by 谷歌翻译

Finite Sample Analysis of Minimax Offline Reinforcement Learning: Completeness, Fast Rates and First-Order Efficiency

Masatoshi Uehara , Masaaki Imaizumi , Nan Jiang , Nathan Kallus , Wen Sun , Tengyang Xie

分类：机器学习 | (统计)机器学习

2021-02-05

我们在使用函数近似的情况下，在使用最小的Minimax方法估算这些功能时，使用功能近似来实现函数近似和$ q $ functions的理论表征。在各种可靠性和完整性假设的组合下，我们表明Minimax方法使我们能够实现重量和质量功能的快速收敛速度，其特征在于关键的不平等\ citep {bartlett2005}。基于此结果，我们分析了OPE的收敛速率。特别是，我们引入了新型的替代完整性条件，在该条件下，OPE是可行的，我们在非尾部环境中以一阶效率提出了第一个有限样本结果，即在领先期限中具有最小的系数。

translated by 谷歌翻译

Neural Estimation of Statistical Divergences

Sreejith Sreekumar , Ziv Goldfeld

分类： (统计)机器学习

2021-10-07

量化概率分布之间的异化的统计分歧（SDS）是统计推理和机器学习的基本组成部分。用于估计这些分歧的现代方法依赖于通过神经网络（NN）进行参数化经验变化形式并优化参数空间。这种神经估算器在实践中大量使用，但相应的性能保证是部分的，并呼吁进一步探索。特别是，涉及的两个错误源之间存在基本的权衡：近似和经验估计。虽然前者需要NN课程富有富有表现力，但后者依赖于控制复杂性。我们通过非渐近误差界限基于浅NN的基于浅NN的估计的估算权，重点关注四个流行的$ \ mathsf {f} $ - 分离 - kullback-leibler，chi squared，squared hellinger，以及总变异。我们分析依赖于实证过程理论的非渐近功能近似定理和工具。界限揭示了NN尺寸和样品数量之间的张力，并使能够表征其缩放速率，以确保一致性。对于紧凑型支持的分布，我们进一步表明，上述上三次分歧的神经估算器以适当的NN生长速率接近Minimax率 - 最佳，实现了对数因子的参数速率。

translated by 谷歌翻译

Causal Inference Under Unmeasured Confounding With Negative Controls: A Minimax Learning Approach

Nathan Kallus , Xiaojie Mao , Masatoshi Uehara

分类： (统计)机器学习 | 机器学习

2021-03-25

当并非观察到所有混杂因子并获得负面对照时，我们研究因果参数的估计。最近的工作表明，这些方法如何通过两个所谓的桥梁函数来实现识别和有效估计。在本文中，我们使用阴性对照来应对因果推断的主要挑战：这些桥梁功能的识别和估计。先前的工作依赖于这些功能的完整性条件，以识别因果参数并在估计中需要进行独特性假设，并且还集中于桥梁函数的参数估计。相反，我们提供了一种新的识别策略，以避免完整性条件。而且，我们根据最小学习公式为这些功能提供新的估计量。这些估计值适合通用功能类别，例如重现Hilbert空间和神经网络。我们研究了有限样本收敛的结果，既可以估计桥梁功能本身，又要在各种假设组合下对因果参数进行最终估计。我们尽可能避免桥梁上的独特条件。

translated by 谷歌翻译

On the Estimation of Derivatives Using Plug-in KRR Estimators

Zejian Liu , Meng Li

分类： (统计)机器学习 | 机器学习

2020-06-02

我们研究了估计回归函数的导数的问题，该函数的衍生物具有广泛的应用，作为未知函数的关键非参数功能。标准分析可以定制为特定的衍生订单，参数调整仍然是一个艰巨的挑战，尤其是对于高阶导数。在本文中，我们提出了一个简单的插入式内核脊回归（KRR）估计器，其非参数回归中具有随机设计，该设计广泛适用于多维支持和任意混合派生衍生物。我们提供了非反应分析，以统一的方式研究提出的估计量的行为，该估计量涵盖回归函数及其衍生物，从而在强$ l_ \ infty $ norm中导致一般核类中的一般内核的两个误差范围。在专门针对多个多项式衰减特征值核的具体示例中，提出的估计器将最小值的最佳速率恢复到估计H \ h \ offormions ofergarithmic因子的最佳速率。因此，在任何衍生词的顺序中都选择了调整参数。因此，提出的估计器享受\ textIt {插件属性}的衍生物，因为它会自动适应要估计的衍生物顺序，从而可以轻松地在实践中调整。我们的仿真研究表明，相对于几种现有方法蓝色的几种现有方法的有限样本性能有限，并证实了其最小值最优性的理论发现。

translated by 谷歌翻译

Multivariate Trend Filtering for Lattice Data

Veeranjaneyulu Sadhanala , Yu-Xiang Wang , Addison J. Hu , Ryan J. Tibshirani

分类： (统计)机器学习 | 机器学习

2021-12-29

我们研究了趋势过滤的多元版本，称为Kronecker趋势过滤或KTF，因为设计点以$ D $维度形成格子。 KTF是单变量趋势过滤的自然延伸（Steidl等，2006; Kim等人，2009; Tibshirani，2014），并通过最大限度地减少惩罚最小二乘问题，其罚款术语总和绝对（高阶）沿每个坐标方向估计参数的差异。相应的惩罚运算符可以编写单次趋势过滤惩罚运营商的Kronecker产品，因此名称Kronecker趋势过滤。等效，可以在$ \ ell_1 $ -penalized基础回归问题上查看KTF，其中基本功能是下降阶段函数的张量产品，是一个分段多项式（离散样条）基础，基于单变量趋势过滤。本文是Sadhanala等人的统一和延伸结果。（2016,2017）。我们开发了一套完整的理论结果，描述了$ k \ grone 0 $和$ d \ geq 1 $的$ k ^ {\ mathrm {th}} $ over kronecker趋势过滤的行为。这揭示了许多有趣的现象，包括KTF在估计异构平滑的功能时KTF的优势，并且在$ d = 2（k + 1）$的相位过渡，一个边界过去（在高维对 - 光滑侧）线性泡沫不能完全保持一致。我们还利用Tibshirani（2020）的离散花键来利用最近的结果，特别是离散的花键插值结果，使我们能够将KTF估计扩展到恒定时间内的任何偏离晶格位置（与晶格数量的大小无关）。

translated by 谷歌翻译

Estimating divergence functionals and the likelihood ratio by convex risk minimization

XuanLong Nguyen , Martin J. Wainwright , Michael I. Jordan

分类：

2008-09-04

We develop and analyze M -estimation methods for divergence functionals and the likelihood ratios of two probability distributions. Our method is based on a non-asymptotic variational characterization of f -divergences, which allows the problem of estimating divergences to be tackled via convex empirical risk optimization. The resulting estimators are simple to implement, requiring only the solution of standard convex programs. We present an analysis of consistency and convergence for these estimators. Given conditions only on the ratios of densities, we show that our estimators can achieve optimal minimax rates for the likelihood ratio and the divergence functionals in certain regimes. We derive an efficient optimization algorithm for computing our estimates, and illustrate their convergence behavior and practical viability by simulations. 1

translated by 谷歌翻译

Optimal variance-reduced stochastic approximation in Banach spaces

Wenlong Mou , Koulik Khamaru , Martin J. Wainwright , Peter L. Bartlett , Michael I. Jordan

分类：机器学习 | (统计)机器学习

2022-01-21

We study the problem of estimating the fixed point of a contractive operator defined on a separable Banach space. Focusing on a stochastic query model that provides noisy evaluations of the operator, we analyze a variance-reduced stochastic approximation scheme, and establish non-asymptotic bounds for both the operator defect and the estimation error, measured in an arbitrary semi-norm. In contrast to worst-case guarantees, our bounds are instance-dependent, and achieve the local asymptotic minimax risk non-asymptotically. For linear operators, contractivity can be relaxed to multi-step contractivity, so that the theory can be applied to problems like average reward policy evaluation problem in reinforcement learning. We illustrate the theory via applications to stochastic shortest path problems, two-player zero-sum Markov games, as well as policy evaluation and $Q$-learning for tabular Markov decision processes.

translated by 谷歌翻译

A Cross Validation framework for Signal Denoising with Applications to Trend Filtering, Dyadic CART and Beyond

Anamitra Chaudhuri , Sabyasachi Chatterjee

分类： (统计)机器学习

2022-01-07

本文为信号去噪提供了一般交叉验证框架。然后将一般框架应用于非参数回归方法，例如趋势过滤和二元推车。然后显示所得到的交叉验证版本以获得最佳调谐的类似物所熟知的几乎相同的收敛速度。没有任何先前的趋势过滤或二元推车的理论分析。为了说明框架的一般性，我们还提出并研究了两个基本估算器的交叉验证版本;套索用于高维线性回归和矩阵估计的奇异值阈值阈值。我们的一般框架是由Chatterjee和Jafarov（2015）的想法的启发，并且可能适用于使用调整参数的广泛估算方法。

translated by 谷歌翻译

MARS via LASSO

Dohyeong Ki , Billy Fang , Adityanand Guntuboyina

分类： (统计)机器学习

2021-11-23

火星是1991年弗里德曼引入的非参数回归的流行方法。火星适合回归数据的简单非线性和非添加功能。我们提出并研究了火星方法的自然套索变体。我们的方法基于通过考虑MARS中的功能的无限维线性组合而获得的凸类功能的最小二乘估计，并施加基于变化的复杂性约束。我们表明我们的估计器可以通过有限维凸优化来计算，并且基于平滑度约束自然地连接到非参数函数估计技术。在一个简单的设计假设下，我们证明了我们的估算仪实现了一定程度上仅依赖于对数的收敛速度，从而在一定程度上避免了通常的维度诅咒。我们使用交叉验证方案实现了用于选择所涉及的调谐参数的方法，并显示与仿真和实际数据设置中的通常的MARS方法相比具有良好的性能。

translated by 谷歌翻译

Optimal and instance-dependent guarantees for Markovian linear stochastic approximation

Wenlong Mou , Ashwin Pananjady , Martin J. Wainwright , Peter L. Bartlett

分类：机器学习 | (统计)机器学习

2021-12-23

我们研究了随机近似程序，以便基于观察来自ergodic Markov链的长度$ n $的轨迹来求近求解$ d -dimension的线性固定点方程。我们首先表现出$ t _ {\ mathrm {mix}} \ tfrac {n}} \ tfrac {n}} \ tfrac {d}} \ tfrac {d} {n} $的非渐近性界限。$ t _ {\ mathrm {mix $是混合时间。然后，我们证明了一种在适当平均迭代序列上的非渐近实例依赖性，具有匹配局部渐近最小的限制的领先术语，包括对参数$的敏锐依赖（d，t _ {\ mathrm {mix}}） $以高阶术语。我们将这些上限与非渐近Minimax的下限补充，该下限是建立平均SA估计器的实例 - 最优性。我们通过Markov噪声的政策评估导出了这些结果的推导 - 覆盖了所有$ \ lambda \中的TD（$ \ lambda $）算法，以便[0,1）$ - 和线性自回归模型。我们的实例依赖性表征为HyperParameter调整的细粒度模型选择程序的设计开放了门（例如，在运行TD（$ \ Lambda $）算法时选择$ \ lambda $的值）。

translated by 谷歌翻译

Optimal transport map estimation in general function spaces

Vincent Divol , Jonathan Niles-Weed , Aram-Alexandre Pooladian

分类： (统计)机器学习

2022-12-07

We consider the problem of estimating the optimal transport map between a (fixed) source distribution $P$ and an unknown target distribution $Q$, based on samples from $Q$. The estimation of such optimal transport maps has become increasingly relevant in modern statistical applications, such as generative modeling. At present, estimation rates are only known in a few settings (e.g. when $P$ and $Q$ have densities bounded above and below and when the transport map lies in a H\"older class), which are often not reflected in practice. We present a unified methodology for obtaining rates of estimation of optimal transport maps in general function spaces. Our assumptions are significantly weaker than those appearing in the literature: we require only that the source measure $P$ satisfies a Poincar\'e inequality and that the optimal map be the gradient of a smooth convex function that lies in a space whose metric entropy can be controlled. As a special case, we recover known estimation rates for bounded densities and H\"older transport maps, but also obtain nearly sharp results in many settings not covered by prior work. For example, we provide the first statistical rates of estimation when $P$ is the normal distribution and the transport map is given by an infinite-width shallow neural network.

translated by 谷歌翻译

Adaptive, Rate-Optimal Hypothesis Testing in Nonparametric IV Models

Christoph Breunig , Xiaohong Chen

分类： (统计)机器学习

2020-06-17

我们提出了对非参数仪器变量（NPIV）模型中的结构函数的多面体锥体（例如，单调性，凸起）和平等（例如，参数，半游戏）限制的新的自适应假设试验。我们的测试统计是基于受限制和不受限制的筛估计之间的二次距离的改进的休假样本模拟。我们提供筛选调整参数的计算简单，数据驱动的选择和调整的CHI平方临界值。我们的测试在未知的内能性和仪器的未知强度存在下适应替代功能的未知平滑度。它达到了$ ^ 2 $以$ ^ 2 $的试验率。也就是说，通过未知规则的NPIV模型的任何其他假设测试，不能改善其在复合空缺上均匀地均匀地均匀的I型错误及其类型的II误差。通过反转自适应测试，可以获得数据驱动的置信度量为$ ^ 2 $。模拟确认我们的自适应测试控制规模及其有限样本功率极大地超过了NPIV模型中的单调性和参数限制的现有非自适应测试。介绍了对差异化产品需求和Engel曲线进行形状限制的经验应用。

translated by 谷歌翻译

Debiased Inference on Identified Linear Functionals of Underidentified Nuisances via Penalized Minimax Estimation

Nathan Kallus , Xiaojie Mao

分类： (统计)机器学习

2022-08-17

我们研究了对识别的非唯一麻烦的线性功能的通用推断，该功能定义为未识别条件矩限制的解决方案。这个问题出现在各种应用中，包括非参数仪器变量模型，未衡量的混杂性下的近端因果推断以及带有阴影变量的丢失 - 与随机数据。尽管感兴趣的线性功能（例如平均治疗效应）在适当的条件下是可以识别出的，但令人讨厌的非独家性对统计推断构成了严重的挑战，因为在这种情况下，常见的滋扰估计器可能是不稳定的，并且缺乏固定限制。在本文中，我们提出了对滋扰功能的受惩罚的最小估计器，并表明它们在这种挑战性的环境中有效推断。提出的滋扰估计器可以适应灵活的功能类别，重要的是，无论滋扰是否是唯一的，它们都可以融合到由惩罚确定的固定限制。我们使用受惩罚的滋扰估计器来形成有关感兴趣的线性功能的依据估计量，并在通用高级条件下证明其渐近正态性，这提供了渐近有效的置信区间。

translated by 谷歌翻译

Nonparametric regression using deep neural networks with ReLU activation function

Johannes Schmidt-Hieber

分类：

2017-08-22

Consider the multivariate nonparametric regression model. It is shown that estimators based on sparsely connected deep neural networks with ReLU activation function and properly chosen network architecture achieve the minimax rates of convergence (up to log nfactors) under a general composition assumption on the regression function. The framework includes many well-studied structural constraints such as (generalized) additive models. While there is a lot of flexibility in the network architecture, the tuning parameter is the sparsity of the network. Specifically, we consider large networks with number of potential network parameters exceeding the sample size. The analysis gives some insights into why multilayer feedforward neural networks perform well in practice. Interestingly, for ReLU activation function the depth (number of layers) of the neural network architectures plays an important role and our theory suggests that for nonparametric regression, scaling the network depth with the sample size is natural. It is also shown that under the composition assumption wavelet estimators can only achieve suboptimal rates.

translated by 谷歌翻译