智能论文笔记

Nearest Neighbor and Kernel Survival Analysis: Nonasymptotic Error Bounds and Strong Consistency Rates

George H. Chen

分类： (统计)机器学习 | 机器学习

2019-05-13

我们为基于Kaplan-Meier的最近的邻居和内核存活率估计值建立了第一个非矩形误差界限，其中特征向量位于度量空间中。我们的边界意味着这些非参数估计器的强度速率，并且最多可与对数因子匹配有条件的CDF估计的现有下限。我们的证明策略还为纳尔逊 - 阿伦累积危害估计量的最近的邻居和内核变体提供了非矩形保证。我们在四个数据集上实验比较这些方法。我们发现，对于内核存活率估计量，核心的一个不错的选择是使用随机生存森林学习的。

translated by 谷歌翻译

Survival Kernets: Scalable and Interpretable Deep Kernel Survival Analysis with an Accuracy Guarantee

George H. Chen

分类：机器学习 | (统计)机器学习

2022-06-21

内核生存分析模型借助内核函数估计了个体生存分布，该分布衡量了任意两个数据点之间的相似性。可以使用深内核存活模型来学习这种内核函数。在本文中，我们提出了一种名为“生存内核”的新的深内核生存模型，该模型以模型解释和理论分析的方式将大型数据集扩展到大型数据集。具体而言，根据最近开发的训练集压缩方案，用于分类和回归，将培训数据分为簇，称为内核网，我们将其扩展到生存分析设置。在测试时间，每个数据点表示为这些簇的加权组合，每个数据点可以可视化。对于生存核的特殊情况，我们在预测的生存分布上建立了有限样本误差，该误差是最佳的，该误差是最佳的。尽管使用上述内核网络压缩策略可以实现测试时间的可伸缩性，但训练过程中的可伸缩性是通过基于XGBoost（例如Xgboost）的暖启动程序和加速神经建筑搜索的启发式方法来实现的。在三个不同大小的标准生存分析数据集（大约300万个数据点）上，我们表明生存核具有很高的竞争力，并且在一致性指数方面经过测试的最佳基线。我们的代码可在以下网址找到：https：//github.com/georgehc/survival-kernets

translated by 谷歌翻译

The Voronoigram: Minimax Estimation of Bounded Variation Functions From Scattered Data

Addison J. Hu , Alden Green , Ryan J. Tibshirani

分类： (统计)机器学习 | 机器学习

2022-12-30

We consider the problem of estimating a multivariate function $f_0$ of bounded variation (BV), from noisy observations $y_i = f_0(x_i) + z_i$ made at random design points $x_i \in \mathbb{R}^d$, $i=1,\ldots,n$. We study an estimator that forms the Voronoi diagram of the design points, and then solves an optimization problem that regularizes according to a certain discrete notion of total variation (TV): the sum of weighted absolute differences of parameters $\theta_i,\theta_j$ (which estimate the function values $f_0(x_i),f_0(x_j)$) at all neighboring cells $i,j$ in the Voronoi diagram. This is seen to be equivalent to a variational optimization problem that regularizes according to the usual continuum (measure-theoretic) notion of TV, once we restrict the domain to functions that are piecewise constant over the Voronoi diagram. The regression estimator under consideration hence performs (shrunken) local averaging over adaptively formed unions of Voronoi cells, and we refer to it as the Voronoigram, following the ideas in Koenker (2005), and drawing inspiration from Tukey's regressogram (Tukey, 1961). Our contributions in this paper span both the conceptual and theoretical frontiers: we discuss some of the unique properties of the Voronoigram in comparison to TV-regularized estimators that use other graph-based discretizations; we derive the asymptotic limit of the Voronoi TV functional; and we prove that the Voronoigram is minimax rate optimal (up to log factors) for estimating BV functions that are essentially bounded.

translated by 谷歌翻译

Off-policy estimation of linear functionals: Non-asymptotic theory for semi-parametric efficiency

Wenlong Mou , Martin J. Wainwright , Peter L. Bartlett

分类： (统计)机器学习

2022-09-26

在因果推理和强盗文献中，基于观察数据的线性功能估算线性功能的问题是规范的。我们分析了首先估计治疗效果函数的广泛的两阶段程序，然后使用该数量来估计线性功能。我们证明了此类过程的均方误差上的非反应性上限：这些边界表明，为了获得非反应性最佳程序，应在特定加权$ l^2 $中最大程度地估算治疗效果的误差。 -规范。我们根据该加权规范的约束回归分析了两阶段的程序，并通过匹配非轴突局部局部最小值下限，在有限样品中建立了实例依赖性最优性。这些结果表明，除了取决于渐近效率方差之外，最佳的非质子风险除了取决于样本量支持的最富有函数类别的真实结果函数与其近似类别之间的加权规范距离。

translated by 谷歌翻译

One-Nearest-Neighbor Search is All You Need for Minimax Optimal Regression and Classification

J. Jon Ryu , Young-Han Kim

分类：机器学习

2022-02-05

最近，Qiao，Duan和Cheng〜（2019）提出了一种分布式的最近邻分类方法，其中大量数据集分为较小的组，每个组都使用$ k $ neartest-neegr-neighbor分类器和最终的类标签进行处理这些群体级标签中的多数投票预测。本文表明，在某些规律性条件下，对于回归和分类问题，在足够多的小组上具有$ k = 1 $的分布式算法达到最小的最佳误差率，直至乘数对数因子。粗略地说，分布式为1个最新的邻居规则，$ m $组的性能与标准$ \ theta（m）$ - 最近的邻居规则相当。在分析中，提出了具有精制聚合方法的替代规则，并证明可以达到确切的最小最佳速率。

translated by 谷歌翻译

Debiasing In-Sample Policy Performance for Small-Data, Large-Scale Optimization

Vishal Gupta , Michael Huang , Paat Rusmevichientong

分类：机器学习 | (统计)机器学习

2021-07-26

由于在数据稀缺的设置中，交叉验证的性能不佳，我们提出了一个新颖的估计器，以估计数据驱动的优化策略的样本外部性能。我们的方法利用优化问题的灵敏度分析来估计梯度关于数据中噪声量的最佳客观值，并利用估计的梯度将策略的样本中的表现为依据。与交叉验证技术不同，我们的方法避免了为测试集牺牲数据，在训练和因此非常适合数据稀缺的设置时使用所有数据。我们证明了我们估计量的偏见和方差范围，这些问题与不确定的线性目标优化问题，但已知的，可能是非凸的，可行的区域。对于更专业的优化问题，从某种意义上说，可行区域“弱耦合”，我们证明结果更强。具体而言，我们在估算器的错误上提供明确的高概率界限，该估计器在策略类别上均匀地保持，并取决于问题的维度和策略类的复杂性。我们的边界表明，在轻度条件下，随着优化问题的尺寸的增长，我们的估计器的误差也会消失，即使可用数据的量仍然很小且恒定。说不同的是，我们证明我们的估计量在小型数据中的大规模政权中表现良好。最后，我们通过数值将我们提出的方法与最先进的方法进行比较，通过使用真实数据调度紧急医疗响应服务的案例研究。我们的方法提供了更准确的样本外部性能估计，并学习了表现更好的政策。

translated by 谷歌翻译

A Cross Validation framework for Signal Denoising with Applications to Trend Filtering, Dyadic CART and Beyond

Anamitra Chaudhuri , Sabyasachi Chatterjee

分类： (统计)机器学习

2022-01-07

本文为信号去噪提供了一般交叉验证框架。然后将一般框架应用于非参数回归方法，例如趋势过滤和二元推车。然后显示所得到的交叉验证版本以获得最佳调谐的类似物所熟知的几乎相同的收敛速度。没有任何先前的趋势过滤或二元推车的理论分析。为了说明框架的一般性，我们还提出并研究了两个基本估算器的交叉验证版本;套索用于高维线性回归和矩阵估计的奇异值阈值阈值。我们的一般框架是由Chatterjee和Jafarov（2015）的想法的启发，并且可能适用于使用调整参数的广泛估算方法。

translated by 谷歌翻译

Optimal and instance-dependent guarantees for Markovian linear stochastic approximation

Wenlong Mou , Ashwin Pananjady , Martin J. Wainwright , Peter L. Bartlett

分类：机器学习 | (统计)机器学习

2021-12-23

我们研究了随机近似程序，以便基于观察来自ergodic Markov链的长度$ n $的轨迹来求近求解$ d -dimension的线性固定点方程。我们首先表现出$ t _ {\ mathrm {mix}} \ tfrac {n}} \ tfrac {n}} \ tfrac {d}} \ tfrac {d} {n} $的非渐近性界限。$ t _ {\ mathrm {mix $是混合时间。然后，我们证明了一种在适当平均迭代序列上的非渐近实例依赖性，具有匹配局部渐近最小的限制的领先术语，包括对参数$的敏锐依赖（d，t _ {\ mathrm {mix}}） $以高阶术语。我们将这些上限与非渐近Minimax的下限补充，该下限是建立平均SA估计器的实例 - 最优性。我们通过Markov噪声的政策评估导出了这些结果的推导 - 覆盖了所有$ \ lambda \中的TD（$ \ lambda $）算法，以便[0,1）$ - 和线性自回归模型。我们的实例依赖性表征为HyperParameter调整的细粒度模型选择程序的设计开放了门（例如，在运行TD（$ \ Lambda $）算法时选择$ \ lambda $的值）。

translated by 谷歌翻译

Benign overfitting and adaptive nonparametric regression

Julien Chhor , Suzanne Sigalla , Alexandre B. Tsybakov

分类：机器学习 | (统计)机器学习

2022-06-27

在非参数回归设置中，我们构建了一个估计器，该估计器是一个连续的函数，以高概率插值数据点，同时在H \ h \'较大级别的平均平方风险下达到最小的最佳速率，以适应未知的平滑度。

translated by 谷歌翻译

Optimal Nonparametric Inference with Two-Scale Distributional Nearest Neighbors

Emre Demirkaya , Yingying Fan , Lan Gao , Jinchi Lv , Patrick Vossler , Jingbo Wang

分类： (统计)机器学习 | 机器学习

2018-08-25

加权最近的邻居（WNN）估计量通常用作平均回归估计的灵活且易于实现的非参数工具。袋装技术是一种优雅的方式，可以自动生成最近邻居的重量的WNN估计器；我们将最终的估计量命名为分布最近的邻居（DNN），以便于参考。然而，这种估计器缺乏分布结果，从而将其应用于统计推断。此外，当平均回归函数具有高阶平滑度时，DNN无法达到最佳的非参数收敛率，这主要是由于偏差问题。在这项工作中，我们对DNN提供了深入的技术分析，我们建议通过线性将两个DNN估计量与不同的子采样量表进行线性相结合，从而提出了DNN估计量的偏差方法，从而导致新型的两尺度DNN（TDNN（TDNN））估计器。两尺度的DNN估计量具有等效的WNN表示，重量承认明确形式，有些则是负面的。我们证明，由于使用负权重，两尺度DNN估计器在四阶平滑度条件下估算回归函数时享有最佳的非参数收敛速率。我们进一步超出了估计，并确定DNN和两个规模的DNN均无渐进地正常，因为亚次采样量表和样本量差异到无穷大。对于实际实施，我们还使用二尺度DNN的Jacknife和Bootstrap技术提供方差估计器和分配估计器。可以利用这些估计器来构建有效的置信区间，以用于回归函数的非参数推断。建议的两尺度DNN方法的理论结果和吸引人的有限样本性能用几个数值示例说明了。

translated by 谷歌翻译

The Projected Covariance Measure for assumption-lean variable significance testing

Anton Rask Lundborg , Ilmun Kim , Rajen D. Shah , Richard J. Samworth

分类： (统计)机器学习

2022-11-03

Testing the significance of a variable or group of variables $X$ for predicting a response $Y$, given additional covariates $Z$, is a ubiquitous task in statistics. A simple but common approach is to specify a linear model, and then test whether the regression coefficient for $X$ is non-zero. However, when the model is misspecified, the test may have poor power, for example when $X$ is involved in complex interactions, or lead to many false rejections. In this work we study the problem of testing the model-free null of conditional mean independence, i.e. that the conditional mean of $Y$ given $X$ and $Z$ does not depend on $X$. We propose a simple and general framework that can leverage flexible nonparametric or machine learning methods, such as additive models or random forests, to yield both robust error control and high power. The procedure involves using these methods to perform regressions, first to estimate a form of projection of $Y$ on $X$ and $Z$ using one half of the data, and then to estimate the expected conditional covariance between this projection and $Y$ on the remaining half of the data. While the approach is general, we show that a version of our procedure using spline regression achieves what we show is the minimax optimal rate in this nonparametric testing problem. Numerical experiments demonstrate the effectiveness of our approach both in terms of maintaining Type I error control, and power, compared to several existing approaches.

translated by 谷歌翻译

Adaptive Clustering Using Kernel Density Estimators

Ingo Steinwart , Bharath K. Sriperumbudur , Philipp Thomann

分类： (统计)机器学习

2017-08-17

我们派生并分析了一种用于估计有限簇树中的所有分裂的通用，递归算法以及相应的群集。我们进一步研究了从内核密度估计器接收级别设置估计时该通用聚类算法的统计特性。特别是，我们推出了有限的样本保证，一致性，收敛率以及用于选择内核带宽的自适应数据驱动策略。对于这些结果，我们不需要与H \“{o}连续性等密度的连续性假设，而是仅需要非参数性质的直观几何假设。

translated by 谷歌翻译

Uniform Consistency in Nonparametric Mixture Models

Bryon Aragam , Ruiyi Yang

分类： (统计)机器学习

2021-08-31

我们研究了非参数混合模型中的一致性以及回归的密切相关的混合物（也称为混合回归）模型，其中允许回归函数是非参数的，并且假定误差分布是高斯密度的卷积。我们在一般条件下构建统一的一致估计器，同时突出显示了将现有的点一致性结果扩展到均匀结果的几个疼痛点。最终的分析事实并非如此，并且在此过程中开发了几种新颖的技术工具。在混合回归的情况下，我们证明了回归函数的$ l^1 $收敛性，同时允许组件回归函数任意地相交，这带来了其他技术挑战。我们还考虑对一般（即非跨方向）非参数混合物的概括。

translated by 谷歌翻译

How do noise tails impact on deep ReLU networks?

Jianqing Fan , Yihong Gu , Wen-Xin Zhou

分类： (统计)机器学习

2022-03-20

This paper investigates the stability of deep ReLU neural networks for nonparametric regression under the assumption that the noise has only a finite p-th moment. We unveil how the optimal rate of convergence depends on p, the degree of smoothness and the intrinsic dimension in a class of nonparametric regression functions with hierarchical composition structure when both the adaptive Huber loss and deep ReLU neural networks are used. This optimal rate of convergence cannot be obtained by the ordinary least squares but can be achieved by the Huber loss with a properly chosen parameter that adapts to the sample size, smoothness, and moment parameters. A concentration inequality for the adaptive Huber ReLU neural network estimators with allowable optimization errors is also derived. To establish a matching lower bound within the class of neural network estimators using the Huber loss, we employ a different strategy from the traditional route: constructing a deep ReLU network estimator that has a better empirical loss than the true function and the difference between these two functions furnishes a low bound. This step is related to the Huberization bias, yet more critically to the approximability of deep ReLU networks. As a result, we also contribute some new results on the approximation theory of deep ReLU neural networks.

translated by 谷歌翻译

Asymptotic Distributions and Rates of Convergence for Random Forests via Generalized U-statistics

Wei Peng , Tim Coleman , Lucas Mentch

分类： (统计)机器学习 | 机器学习

2019-05-25

随机森林仍然是最受欢迎的现成监督学习算法之一。尽管他们记录了良好的经验成功，但直到最近，很少有很少的理论结果来描述他们的表现和行为。在这项工作中，我们通过建立随机森林和其他受监督学习集合的融合率来推动最近的一致性和渐近正常的工作。我们培养了广义U形统计的概念，并显示在此框架内，随机森林预测可能对比以前建立的较大的子样本尺寸可能保持渐近正常。我们还提供Berry-esseen的界限，以量化这种收敛的速度，使得分列大小的角色和确定随机森林预测分布的树木的角色。

translated by 谷歌翻译

Minimax Optimal Regression over Sobolev Spaces via Laplacian Eigenmaps on Neighborhood Graphs

Alden Green , Sivaraman Balakrishnan , Ryan J. Tibshirani

分类： (统计)机器学习

2021-11-14

本文研究了基于Laplacian Eigenmaps（Le）的基于Laplacian EIGENMAPS（PCR-LE）的主要成分回归的统计性质，这是基于Laplacian Eigenmaps（Le）的非参数回归的方法。 PCR-LE通过投影观察到的响应的向量$ {\ bf y} =（y_1，\ ldots，y_n）$ to to changbood图表拉普拉斯的某些特征向量跨越的子空间。我们表明PCR-Le通过SoboLev空格实现了随机设计回归的最小收敛速率。在设计密度$ P $的足够平滑条件下，PCR-le达到估计的最佳速率（其中已知平方$ l ^ 2 $ norm的最佳速率为$ n ^ { - 2s /（2s + d））} $）和健美的测试（$ n ^ { - 4s /（4s + d）$）。我们还表明PCR-LE是\ EMPH {歧管Adaptive}：即，我们考虑在小型内在维度$ M $的歧管上支持设计的情况，并为PCR-LE提供更快的界限Minimax估计（$ n ^ { - 2s /（2s + m）$）和测试（$ n ^ { - 4s /（4s + m）$）收敛率。有趣的是，这些利率几乎总是比图形拉普拉斯特征向量的已知收敛率更快;换句话说，对于这个问题的回归估计的特征似乎更容易，统计上讲，而不是估计特征本身。我们通过经验证据支持这些理论结果。

translated by 谷歌翻译

Sequential Estimation of Convex Functionals and Divergences

Tudor Manole , Aaditya Ramdas

分类： (统计)机器学习

2021-03-16

我们提出了一种统一的技术，用于顺序估计分布之间的凸面分歧，包括内核最大差异等积分概率度量，$ \ varphi $ - 像Kullback-Leibler发散，以及最佳运输成本，例如Wassersein距离的权力。这是通过观察到经验凸起分歧（部分有序）反向半角分离的实现来实现的，而可交换过滤耦合，其具有这些方法的最大不等式。这些技术似乎是对置信度序列和凸分流的现有文献的互补和强大的补充。我们构建一个离线到顺序设备，将各种现有的离线浓度不等式转换为可以连续监测的时间均匀置信序列，在任意停止时间提供有效的测试或置信区间。得到的顺序边界仅在相应的固定时间范围内支付迭代对数价格，保留对问题参数的相同依赖性（如适用的尺寸或字母大小）。这些结果也适用于更一般的凸起功能，如负差分熵，实证过程的高度和V型统计。

translated by 谷歌翻译

Policy evaluation from a single path: Multi-step methods, mixing and mis-specification

Yaqi Duan , Martin J. Wainwright

分类： (统计)机器学习 | 机器学习

2022-11-07

We study non-parametric estimation of the value function of an infinite-horizon $\gamma$-discounted Markov reward process (MRP) using observations from a single trajectory. We provide non-asymptotic guarantees for a general family of kernel-based multi-step temporal difference (TD) estimates, including canonical $K$-step look-ahead TD for $K = 1, 2, \ldots$ and the TD$(\lambda)$ family for $\lambda \in [0,1)$ as special cases. Our bounds capture its dependence on Bellman fluctuations, mixing time of the Markov chain, any mis-specification in the model, as well as the choice of weight function defining the estimator itself, and reveal some delicate interactions between mixing time and model mis-specification. For a given TD method applied to a well-specified model, its statistical error under trajectory data is similar to that of i.i.d. sample transition pairs, whereas under mis-specification, temporal dependence in data inflates the statistical error. However, any such deterioration can be mitigated by increased look-ahead. We complement our upper bounds by proving minimax lower bounds that establish optimality of TD-based methods with appropriately chosen look-ahead and weighting, and reveal some fundamental differences between value function estimation and ordinary non-parametric regression.

translated by 谷歌翻译

Nonparametric plug-in classifier for multiclass classification of S.D.E. paths

Christophe Denis , Charlotte Dion-Blanc , Eddy Ella Mintsa , Viet-Chi Tran

分类： (统计)机器学习

2022-12-20

We study the multiclass classification problem where the features come from the mixture of time-homogeneous diffusions. Specifically, the classes are discriminated by their drift functions while the diffusion coefficient is common to all classes and unknown. In this framework, we build a plug-in classifier which relies on nonparametric estimators of the drift and diffusion functions. We first establish the consistency of our classification procedure under mild assumptions and then provide rates of cnvergence under different set of assumptions. Finally, a numerical study supports our theoretical findings.

translated by 谷歌翻译

Tractability from overparametrization: The example of the negative perceptron

Andrea Montanari , Yiqiao Zhong , Kangjie Zhou

分类：机器学习

2021-10-28

在负面的感知问题中，我们给出了$ n $数据点$（{\ boldsymbol x} _i，y_i）$，其中$ {\ boldsymbol x} _i $是$ d $ -densional vector和$ y_i \ in \ { + 1，-1 \} $是二进制标签。数据不是线性可分离的，因此我们满足自己的内容，以找到最大的线性分类器，具有最大的\ emph {否定}余量。换句话说，我们想找到一个单位常规矢量$ {\ boldsymbol \ theta} $，最大化$ \ min_ {i \ le n} y_i \ langle {\ boldsymbol \ theta}，{\ boldsymbol x} _i \ rangle $ 。这是一个非凸优化问题（它相当于在Polytope中找到最大标准矢量），我们在两个随机模型下研究其典型属性。我们考虑比例渐近，其中$ n，d \ to \ idty $以$ n / d \ to \ delta $，并在最大边缘$ \ kappa _ {\ text {s}}（\ delta）上证明了上限和下限）$或 - 等效 - 在其逆函数$ \ delta _ {\ text {s}}（\ kappa）$。换句话说，$ \ delta _ {\ text {s}}（\ kappa）$是overparametization阈值：以$ n / d \ le \ delta _ {\ text {s}}（\ kappa） - \ varepsilon $一个分类器实现了消失的训练错误，具有高概率，而以$ n / d \ ge \ delta _ {\ text {s}}（\ kappa）+ \ varepsilon $。我们在$ \ delta _ {\ text {s}}（\ kappa）$匹配，以$ \ kappa \ to - \ idty $匹配。然后，我们分析了线性编程算法来查找解决方案，并表征相应的阈值$ \ delta _ {\ text {lin}}（\ kappa）$。我们观察插值阈值$ \ delta _ {\ text {s}}（\ kappa）$和线性编程阈值$ \ delta _ {\ text {lin {lin}}（\ kappa）$之间的差距，提出了行为的问题其他算法。

translated by 谷歌翻译