智能论文笔记

One-Nearest-Neighbor Search is All You Need for Minimax Optimal Regression and Classification

J. Jon Ryu , Young-Han Kim

分类：机器学习

2022-02-05

最近，Qiao，Duan和Cheng〜（2019）提出了一种分布式的最近邻分类方法，其中大量数据集分为较小的组，每个组都使用$ k $ neartest-neegr-neighbor分类器和最终的类标签进行处理这些群体级标签中的多数投票预测。本文表明，在某些规律性条件下，对于回归和分类问题，在足够多的小组上具有$ k = 1 $的分布式算法达到最小的最佳误差率，直至乘数对数因子。粗略地说，分布式为1个最新的邻居规则，$ m $组的性能与标准$ \ theta（m）$ - 最近的邻居规则相当。在分析中，提出了具有精制聚合方法的替代规则，并证明可以达到确切的最小最佳速率。

translated by 谷歌翻译

Nearest Neighbor and Kernel Survival Analysis: Nonasymptotic Error Bounds and Strong Consistency Rates

George H. Chen

分类： (统计)机器学习 | 机器学习

2019-05-13

我们为基于Kaplan-Meier的最近的邻居和内核存活率估计值建立了第一个非矩形误差界限，其中特征向量位于度量空间中。我们的边界意味着这些非参数估计器的强度速率，并且最多可与对数因子匹配有条件的CDF估计的现有下限。我们的证明策略还为纳尔逊 - 阿伦累积危害估计量的最近的邻居和内核变体提供了非矩形保证。我们在四个数据集上实验比较这些方法。我们发现，对于内核存活率估计量，核心的一个不错的选择是使用随机生存森林学习的。

translated by 谷歌翻译

On Error and Compression Rates for Prototype Rules

Omer Kerem , Roi Weiss

分类：机器学习 | (统计)机器学习

2022-06-16

我们根据原型学习规则研究了非参数多类分类设置中的误差与压缩之间的紧密相互作用。我们特别关注最近提出的基于压缩的学习规则的紧密变体。除了其计算优点外，最近在任何承认普遍一致的规则的公制实例空间中，该规则在普遍的一致性上是普遍一致的，这是享受此属性的第一个学习算法。但是，其误差和压缩率已经打开。在这里，我们得出这样的速率，如果实例在欧几里得空间中存在于数据分布上的平滑度和尾部条件下。我们首先表明OptInet在享受最小的最小错误率的同时，达到了非平凡的压缩率。然后，我们继续研究一种新型的通用压缩方案，以进一步压缩原型规则，该规则在不牺牲准确性的情况下局部适应噪声水平。将其应用于OptInet，我们表明在几何边缘条件下，可以实现压缩率的进一步增益。提出了比较各种方法的性能的实验结果。

translated by 谷歌翻译

Consistent Non-Parametric Methods for Maximizing Robustness

Robi Bhattacharjee , Kamalika Chaudhuri

分类：机器学习

2021-02-18

对对抗性示例强大的学习分类器已经获得了最近的关注。标准强大学习框架的主要缺点是人为强大的RADIUS $ R $，适用于所有输入。这忽略了数据可能是高度异构的事实，在这种情况下，它是合理的，在某些数据区域中，鲁棒性区域应该更大，并且在其他区域中更小。在本文中，我们通过提出名为邻域最佳分类器的新限制分类器来解决此限制，该分类通过使用最接近的支持点的标签扩展其支持之外的贝叶斯最佳分类器。然后，我们认为该分类器可能会使其稳健性区域的大小最大化，但受到等于贝叶斯的准确性的约束。然后，我们存在足够的条件，该条件下可以表示为重量函数的一般非参数方法会聚在此限制，并且显示最近的邻居和内核分类器在某些条件下满足它们。

translated by 谷歌翻译

Optimal Nonparametric Inference with Two-Scale Distributional Nearest Neighbors

Emre Demirkaya , Yingying Fan , Lan Gao , Jinchi Lv , Patrick Vossler , Jingbo Wang

分类： (统计)机器学习 | 机器学习

2018-08-25

加权最近的邻居（WNN）估计量通常用作平均回归估计的灵活且易于实现的非参数工具。袋装技术是一种优雅的方式，可以自动生成最近邻居的重量的WNN估计器；我们将最终的估计量命名为分布最近的邻居（DNN），以便于参考。然而，这种估计器缺乏分布结果，从而将其应用于统计推断。此外，当平均回归函数具有高阶平滑度时，DNN无法达到最佳的非参数收敛率，这主要是由于偏差问题。在这项工作中，我们对DNN提供了深入的技术分析，我们建议通过线性将两个DNN估计量与不同的子采样量表进行线性相结合，从而提出了DNN估计量的偏差方法，从而导致新型的两尺度DNN（TDNN（TDNN））估计器。两尺度的DNN估计量具有等效的WNN表示，重量承认明确形式，有些则是负面的。我们证明，由于使用负权重，两尺度DNN估计器在四阶平滑度条件下估算回归函数时享有最佳的非参数收敛速率。我们进一步超出了估计，并确定DNN和两个规模的DNN均无渐进地正常，因为亚次采样量表和样本量差异到无穷大。对于实际实施，我们还使用二尺度DNN的Jacknife和Bootstrap技术提供方差估计器和分配估计器。可以利用这些估计器来构建有效的置信区间，以用于回归函数的非参数推断。建议的两尺度DNN方法的理论结果和吸引人的有限样本性能用几个数值示例说明了。

translated by 谷歌翻译

Off-policy estimation of linear functionals: Non-asymptotic theory for semi-parametric efficiency

Wenlong Mou , Martin J. Wainwright , Peter L. Bartlett

分类： (统计)机器学习

2022-09-26

在因果推理和强盗文献中，基于观察数据的线性功能估算线性功能的问题是规范的。我们分析了首先估计治疗效果函数的广泛的两阶段程序，然后使用该数量来估计线性功能。我们证明了此类过程的均方误差上的非反应性上限：这些边界表明，为了获得非反应性最佳程序，应在特定加权$ l^2 $中最大程度地估算治疗效果的误差。 -规范。我们根据该加权规范的约束回归分析了两阶段的程序，并通过匹配非轴突局部局部最小值下限，在有限样品中建立了实例依赖性最优性。这些结果表明，除了取决于渐近效率方差之外，最佳的非质子风险除了取决于样本量支持的最富有函数类别的真实结果函数与其近似类别之间的加权规范距离。

translated by 谷歌翻译

Estimation and Inference of Heterogeneous Treatment Effects using Random Forests

Stefan Wager , Susan Athey

分类：

2015-10-14

Many scientific and engineering challenges-ranging from personalized medicine to customized marketing recommendations-require an understanding of treatment effect heterogeneity. In this paper, we develop a non-parametric causal forest for estimating heterogeneous treatment effects that extends Breiman's widely used random forest algorithm. In the potential outcomes framework with unconfoundedness, we show that causal forests are pointwise consistent for the true treatment effect, and have an asymptotically Gaussian and centered sampling distribution. We also discuss a practical method for constructing asymptotic confidence intervals for the true treatment effect that are centered at the causal forest estimates. Our theoretical results rely on a generic Gaussian theory for a large family of random forest algorithms. To our knowledge, this is the first set of results that allows any type of random forest, including classification and regression forests, to be used for provably valid statistical inference. In experiments, we find causal forests to be substantially more powerful than classical methods based on nearest-neighbor matching, especially in the presence of irrelevant covariates.

translated by 谷歌翻译

Label Ranking through Nonparametric Regression

Dimitris Fotakis , Alkis Kalavasis , Eleni Psaroudaki

分类：机器学习

2021-11-04

标签排名（LR）对应于学习一个假设的问题，以通过有限一组标签将功能映射到排名。我们采用了对LR的非参数回归方法，并获得了这一基本实际问题的理论绩效保障。我们在无噪声和嘈杂的非参数回归设置中介绍了一个用于标签排名的生成模型，并为两种情况下提供学习算法的示例复杂性界限。在无噪声环境中，我们研究了全排序的LR问题，并在高维制度中使用决策树和随机林提供计算有效的算法。在嘈杂的环境中，我们考虑使用统计观点的不完整和部分排名的LR更通用的情况，并使用多种多组分类的一种方法获得样本复杂性范围。最后，我们与实验补充了我们的理论贡献，旨在了解输入回归噪声如何影响观察到的输出。

translated by 谷歌翻译

The Voronoigram: Minimax Estimation of Bounded Variation Functions From Scattered Data

Addison J. Hu , Alden Green , Ryan J. Tibshirani

分类： (统计)机器学习 | 机器学习

2022-12-30

We consider the problem of estimating a multivariate function $f_0$ of bounded variation (BV), from noisy observations $y_i = f_0(x_i) + z_i$ made at random design points $x_i \in \mathbb{R}^d$, $i=1,\ldots,n$. We study an estimator that forms the Voronoi diagram of the design points, and then solves an optimization problem that regularizes according to a certain discrete notion of total variation (TV): the sum of weighted absolute differences of parameters $\theta_i,\theta_j$ (which estimate the function values $f_0(x_i),f_0(x_j)$) at all neighboring cells $i,j$ in the Voronoi diagram. This is seen to be equivalent to a variational optimization problem that regularizes according to the usual continuum (measure-theoretic) notion of TV, once we restrict the domain to functions that are piecewise constant over the Voronoi diagram. The regression estimator under consideration hence performs (shrunken) local averaging over adaptively formed unions of Voronoi cells, and we refer to it as the Voronoigram, following the ideas in Koenker (2005), and drawing inspiration from Tukey's regressogram (Tukey, 1961). Our contributions in this paper span both the conceptual and theoretical frontiers: we discuss some of the unique properties of the Voronoigram in comparison to TV-regularized estimators that use other graph-based discretizations; we derive the asymptotic limit of the Voronoi TV functional; and we prove that the Voronoigram is minimax rate optimal (up to log factors) for estimating BV functions that are essentially bounded.

translated by 谷歌翻译

Dimension-agnostic inference using cross U-statistics

Ilmun Kim , Aaditya Ramdas

分类： (统计)机器学习

2020-11-10

Classical asymptotic theory for statistical inference usually involves calibrating a statistic by fixing the dimension $d$ while letting the sample size $n$ increase to infinity. Recently, much effort has been dedicated towards understanding how these methods behave in high-dimensional settings, where $d$ and $n$ both increase to infinity together. This often leads to different inference procedures, depending on the assumptions about the dimensionality, leaving the practitioner in a bind: given a dataset with 100 samples in 20 dimensions, should they calibrate by assuming $n \gg d$, or $d/n \approx 0.2$? This paper considers the goal of dimension-agnostic inference; developing methods whose validity does not depend on any assumption on $d$ versus $n$. We introduce an approach that uses variational representations of existing test statistics along with sample splitting and self-normalization to produce a new test statistic with a Gaussian limiting distribution, regardless of how $d$ scales with $n$. The resulting statistic can be viewed as a careful modification of degenerate U-statistics, dropping diagonal blocks and retaining off-diagonal blocks. We exemplify our technique for some classical problems including one-sample mean and covariance testing, and show that our tests have minimax rate-optimal power against appropriate local alternatives. In most settings, our cross U-statistic matches the high-dimensional power of the corresponding (degenerate) U-statistic up to a $\sqrt{2}$ factor.

translated by 谷歌翻译

Neural Estimation of Statistical Divergences

Sreejith Sreekumar , Ziv Goldfeld

分类： (统计)机器学习

2021-10-07

量化概率分布之间的异化的统计分歧（SDS）是统计推理和机器学习的基本组成部分。用于估计这些分歧的现代方法依赖于通过神经网络（NN）进行参数化经验变化形式并优化参数空间。这种神经估算器在实践中大量使用，但相应的性能保证是部分的，并呼吁进一步探索。特别是，涉及的两个错误源之间存在基本的权衡：近似和经验估计。虽然前者需要NN课程富有富有表现力，但后者依赖于控制复杂性。我们通过非渐近误差界限基于浅NN的基于浅NN的估计的估算权，重点关注四个流行的$ \ mathsf {f} $ - 分离 - kullback-leibler，chi squared，squared hellinger，以及总变异。我们分析依赖于实证过程理论的非渐近功能近似定理和工具。界限揭示了NN尺寸和样品数量之间的张力，并使能够表征其缩放速率，以确保一致性。对于紧凑型支持的分布，我们进一步表明，上述上三次分歧的神经估算器以适当的NN生长速率接近Minimax率 - 最佳，实现了对数因子的参数速率。

translated by 谷歌翻译

Adaptive Clustering Using Kernel Density Estimators

Ingo Steinwart , Bharath K. Sriperumbudur , Philipp Thomann

分类： (统计)机器学习

2017-08-17

我们派生并分析了一种用于估计有限簇树中的所有分裂的通用，递归算法以及相应的群集。我们进一步研究了从内核密度估计器接收级别设置估计时该通用聚类算法的统计特性。特别是，我们推出了有限的样本保证，一致性，收敛率以及用于选择内核带宽的自适应数据驱动策略。对于这些结果，我们不需要与H \“{o}连续性等密度的连续性假设，而是仅需要非参数性质的直观几何假设。

translated by 谷歌翻译

A Cross Validation framework for Signal Denoising with Applications to Trend Filtering, Dyadic CART and Beyond

Anamitra Chaudhuri , Sabyasachi Chatterjee

分类： (统计)机器学习

2022-01-07

本文为信号去噪提供了一般交叉验证框架。然后将一般框架应用于非参数回归方法，例如趋势过滤和二元推车。然后显示所得到的交叉验证版本以获得最佳调谐的类似物所熟知的几乎相同的收敛速度。没有任何先前的趋势过滤或二元推车的理论分析。为了说明框架的一般性，我们还提出并研究了两个基本估算器的交叉验证版本;套索用于高维线性回归和矩阵估计的奇异值阈值阈值。我们的一般框架是由Chatterjee和Jafarov（2015）的想法的启发，并且可能适用于使用调整参数的广泛估算方法。

translated by 谷歌翻译

Variance-Aware Off-Policy Evaluation with Linear Function Approximation

Yifei Min , Tianhao Wang , Dongruo Zhou , Quanquan Gu

分类：机器学习 | (统计)机器学习

2021-06-22

我们研究了用线性函数近似的加固学习中的违规评估（OPE）问题，旨在根据行为策略收集的脱机数据来估计目标策略的价值函数。我们建议纳入价值函数的方差信息以提高ope的样本效率。更具体地说，对于时间不均匀的epiSodic线性马尔可夫决策过程（MDP），我们提出了一种算法VA-OPE，它使用价值函数的估计方差重新重量拟合Q迭代中的Bellman残差。我们表明我们的算法达到了比最着名的结果绑定的更紧密的误差。我们还提供了行为政策与目标政策之间的分布转移的细粒度。广泛的数值实验证实了我们的理论。

translated by 谷歌翻译

Early-stopped neural networks are consistent

Ziwei Ji , Justin D. Li , Matus Telgarsky

分类：机器学习 | (统计)机器学习

2021-06-10

这项工作研究了浅relu网络通过梯度下降训练的浅relu网络，在底层数据分布一般的二进制分类数据上，（最佳）贝叶斯风险不一定为零。在此设置中，表明，在早期停止的梯度下降达到人口风险在不仅仅是逻辑和错误分类损失方面，也可以在校准方面任意接近最佳，这意味着其输出的符合矩阵映射近似于真正的条件分布任意精细。此外，这种分析的必要迭代，样本和架构复杂性，并且在真实条件模型的某种复杂度测量方面都是自然的。最后，虽然没有表明需要早期停止是必要的，但是显示满足局部内插特性的任何单变量分类器是不一致的。

translated by 谷歌翻译

How do noise tails impact on deep ReLU networks?

Jianqing Fan , Yihong Gu , Wen-Xin Zhou

分类： (统计)机器学习

2022-03-20

This paper investigates the stability of deep ReLU neural networks for nonparametric regression under the assumption that the noise has only a finite p-th moment. We unveil how the optimal rate of convergence depends on p, the degree of smoothness and the intrinsic dimension in a class of nonparametric regression functions with hierarchical composition structure when both the adaptive Huber loss and deep ReLU neural networks are used. This optimal rate of convergence cannot be obtained by the ordinary least squares but can be achieved by the Huber loss with a properly chosen parameter that adapts to the sample size, smoothness, and moment parameters. A concentration inequality for the adaptive Huber ReLU neural network estimators with allowable optimization errors is also derived. To establish a matching lower bound within the class of neural network estimators using the Huber loss, we employ a different strategy from the traditional route: constructing a deep ReLU network estimator that has a better empirical loss than the true function and the difference between these two functions furnishes a low bound. This step is related to the Huberization bias, yet more critically to the approximability of deep ReLU networks. As a result, we also contribute some new results on the approximation theory of deep ReLU neural networks.

translated by 谷歌翻译

Optimal and instance-dependent guarantees for Markovian linear stochastic approximation

Wenlong Mou , Ashwin Pananjady , Martin J. Wainwright , Peter L. Bartlett

分类：机器学习 | (统计)机器学习

2021-12-23

我们研究了随机近似程序，以便基于观察来自ergodic Markov链的长度$ n $的轨迹来求近求解$ d -dimension的线性固定点方程。我们首先表现出$ t _ {\ mathrm {mix}} \ tfrac {n}} \ tfrac {n}} \ tfrac {d}} \ tfrac {d} {n} $的非渐近性界限。$ t _ {\ mathrm {mix $是混合时间。然后，我们证明了一种在适当平均迭代序列上的非渐近实例依赖性，具有匹配局部渐近最小的限制的领先术语，包括对参数$的敏锐依赖（d，t _ {\ mathrm {mix}}） $以高阶术语。我们将这些上限与非渐近Minimax的下限补充，该下限是建立平均SA估计器的实例 - 最优性。我们通过Markov噪声的政策评估导出了这些结果的推导 - 覆盖了所有$ \ lambda \中的TD（$ \ lambda $）算法，以便[0,1）$ - 和线性自回归模型。我们的实例依赖性表征为HyperParameter调整的细粒度模型选择程序的设计开放了门（例如，在运行TD（$ \ Lambda $）算法时选择$ \ lambda $的值）。

translated by 谷歌翻译

On the Statistical Complexity of Sample Amplification

Brian Axelrod , Shivam Garg , Yanjun Han , Vatsal Sharan , Gregory Valiant

分类：机器学习

2022-01-12

鉴于$ n $ i.i.d.从未知的分发$ P $绘制的样本，何时可以生成更大的$ n + m $ samples，这些标题不能与$ n + m $ i.i.d区别区别。从$ p $绘制的样品？（AXELROD等人2019）将该问题正式化为样本放大问题，并为离散分布和高斯位置模型提供了最佳放大程序。然而，这些程序和相关的下限定制到特定分布类，对样本扩增的一般统计理解仍然很大程度上。在这项工作中，我们通过推出通常适用的放大程序，下限技术和与现有统计概念的联系来放置对公司统计基础的样本放大问题。我们的技术适用于一大类分布，包括指数家庭，并在样本放大和分配学习之间建立严格的联系。

translated by 谷歌翻译

Stability and generalization

分类：

We define notions of stability for learning algorithms and show how to use these notions to derive generalization error bounds based on the empirical error and the leave-one-out error. The methods we use can be applied in the regression framework as well as in the classification one when the classifier is obtained by thresholding a real-valued function. We study the stability properties of large classes of learning algorithms such as regularization based algorithms. In particular we focus on Hilbert space regularization and Kullback-Leibler regularization. We demonstrate how to apply the results to SVM for regression and classification.1. For a qualitative discussion about sensitivity analysis with links to other resources see e.g. http://sensitivity-analysis.jrc.cec.eu.int/

translated by 谷歌翻译

Debiasing In-Sample Policy Performance for Small-Data, Large-Scale Optimization

Vishal Gupta , Michael Huang , Paat Rusmevichientong

分类：机器学习 | (统计)机器学习

2021-07-26

由于在数据稀缺的设置中，交叉验证的性能不佳，我们提出了一个新颖的估计器，以估计数据驱动的优化策略的样本外部性能。我们的方法利用优化问题的灵敏度分析来估计梯度关于数据中噪声量的最佳客观值，并利用估计的梯度将策略的样本中的表现为依据。与交叉验证技术不同，我们的方法避免了为测试集牺牲数据，在训练和因此非常适合数据稀缺的设置时使用所有数据。我们证明了我们估计量的偏见和方差范围，这些问题与不确定的线性目标优化问题，但已知的，可能是非凸的，可行的区域。对于更专业的优化问题，从某种意义上说，可行区域“弱耦合”，我们证明结果更强。具体而言，我们在估算器的错误上提供明确的高概率界限，该估计器在策略类别上均匀地保持，并取决于问题的维度和策略类的复杂性。我们的边界表明，在轻度条件下，随着优化问题的尺寸的增长，我们的估计器的误差也会消失，即使可用数据的量仍然很小且恒定。说不同的是，我们证明我们的估计量在小型数据中的大规模政权中表现良好。最后，我们通过数值将我们提出的方法与最先进的方法进行比较，通过使用真实数据调度紧急医疗响应服务的案例研究。我们的方法提供了更准确的样本外部性能估计，并学习了表现更好的政策。

translated by 谷歌翻译