智能论文笔记

Off-the-grid prediction and testing for mixtures of translated features

Cristina Butucea , Jean-François Delmas , Anne Dutfoy , Clément Hardy

分类： (统计)机器学习

2022-12-02

We consider a model where a signal (discrete or continuous) is observed with an additive Gaussian noise process. The signal is issued from a linear combination of a finite but increasing number of translated features. The features are continuously parameterized by their location and depend on some scale parameter. First, we extend previous prediction results for off-the-grid estimators by taking into account here that the scale parameter may vary. The prediction bounds are analogous, but we improve the minimal distance between two consecutive features locations in order to achieve these bounds. Next, we propose a goodness-of-fit test for the model and give non-asymptotic upper bounds of the testing risk and of the minimax separation rate between two distinguishable signals. In particular, our test encompasses the signal detection framework. We deduce upper bounds on the minimal energy, expressed as the 2-norm of the linear coefficients, to successfully detect a signal in presence of noise. The general model considered in this paper is a non-linear extension of the classical high-dimensional regression model. It turns out that, in this framework, our upper bound on the minimax separation rate matches (up to a logarithmic factor) the lower bound on the minimax separation rate for signal detection in the high dimensional linear model associated to a fixed dictionary of features. We also propose a procedure to test whether the features of the observed signal belong to a given finite collection under the assumption that the linear coefficients may vary, but do not change to opposite signs under the null hypothesis. A non-asymptotic upper bound on the testing risk is given. We illustrate our results on the spikes deconvolution model with Gaussian features on the real line and with the Dirichlet kernel, frequently used in the compressed sensing literature, on the torus.

translated by 谷歌翻译

Off-the-grid learning of sparse mixtures from a continuous dictionary

Cristina Butucea , Jean-François Delmas , Anne Dutfoy , Clément Hardy

分类： (统计)机器学习 | 机器学习

2022-06-29

我们考虑了一个通用的非线性模型，其中信号是未知（可能增加的，可能增加的特征数量）的有限混合物，该特征是由由真实非线性参数参数化的连续字典发出的。在连续或离散设置中使用高斯（可能相关）噪声观察信号。我们提出了一种网格优化方法，即一种不使用参数空间上任何离散化方案的方法来估计特征的非线性参数和混合物的线性参数。我们使用有关离网方法的几何形状的最新结果，在真实的基础非线性参数上给出最小的分离，以便可以构建插值证书函数。还使用尾部界限，用于高斯过程的上流，我们将预测误差限制为高概率。假设可以构建证书函数，我们的预测误差绑定到日志 - 因线性回归模型中LASSO预测器所达到的速率类似。我们还建立了收敛速率，以高概率量化线性和非线性参数的估计质量。

translated by 谷歌翻译

On lower bounds for the bias-variance trade-off

Alexis Derumigny , Johannes Schmidt-Hieber

分类： (统计)机器学习

2020-05-30

对于高维和非参数统计模型，速率最优估计器平衡平方偏差和方差是一种常见的现象。虽然这种平衡被广泛观察到，但很少知道是否存在可以避免偏差和方差之间的权衡的方法。我们提出了一般的策略，以获得对任何估计方差的下限，偏差小于预先限定的界限。这表明偏差差异折衷的程度是不可避免的，并且允许量化不服从其的方法的性能损失。该方法基于许多抽象的下限，用于涉及关于不同概率措施的预期变化以及诸如Kullback-Leibler或Chi-Sque-diversence的信息措施的变化。其中一些不平等依赖于信息矩阵的新概念。在该物品的第二部分中，将抽象的下限应用于几种统计模型，包括高斯白噪声模型，边界估计问题，高斯序列模型和高维线性回归模型。对于这些特定的统计应用，发生不同类型的偏差差异发生，其实力变化很大。对于高斯白噪声模型中集成平方偏置和集成方差之间的权衡，我们将较低界限的一般策略与减少技术相结合。这允许我们将原始问题与估计的估计器中的偏差折衷联动，以更简单的统计模型中具有额外的对称性属性。在高斯序列模型中，发生偏差差异的不同相位转换。虽然偏差和方差之间存在非平凡的相互作用，但是平方偏差的速率和方差不必平衡以实现最小估计速率。

translated by 谷歌翻译

Dimension-agnostic inference using cross U-statistics

Ilmun Kim , Aaditya Ramdas

分类： (统计)机器学习

2020-11-10

Classical asymptotic theory for statistical inference usually involves calibrating a statistic by fixing the dimension $d$ while letting the sample size $n$ increase to infinity. Recently, much effort has been dedicated towards understanding how these methods behave in high-dimensional settings, where $d$ and $n$ both increase to infinity together. This often leads to different inference procedures, depending on the assumptions about the dimensionality, leaving the practitioner in a bind: given a dataset with 100 samples in 20 dimensions, should they calibrate by assuming $n \gg d$, or $d/n \approx 0.2$? This paper considers the goal of dimension-agnostic inference; developing methods whose validity does not depend on any assumption on $d$ versus $n$. We introduce an approach that uses variational representations of existing test statistics along with sample splitting and self-normalization to produce a new test statistic with a Gaussian limiting distribution, regardless of how $d$ scales with $n$. The resulting statistic can be viewed as a careful modification of degenerate U-statistics, dropping diagonal blocks and retaining off-diagonal blocks. We exemplify our technique for some classical problems including one-sample mean and covariance testing, and show that our tests have minimax rate-optimal power against appropriate local alternatives. In most settings, our cross U-statistic matches the high-dimensional power of the corresponding (degenerate) U-statistic up to a $\sqrt{2}$ factor.

translated by 谷歌翻译

Three rates of convergence or separation via U-statistics in a dependent framework

Quentin Duchemin , Yohann De Castro , Claire Lacour

分类： (统计)机器学习

2021-06-24

尽管U统计量在现代概率和统计学中存在着无处不在的，但其在依赖框架中的非反应分析可能被忽略了。在最近的一项工作中，已经证明了对统一的马尔可夫链的U级统计数据的新浓度不平等。在本文中，我们通过在三个不同的研究领域中进一步推动了当前知识状态，将这一理论突破付诸实践。首先，我们为使用MCMC方法估算痕量类积分运算符光谱的新指数不平等。新颖的是，这种结果适用于具有正征和负征值的内核，据我们所知，这是新的。此外，我们研究了使用成对损失函数和马尔可夫链样品的在线算法的概括性能。我们通过展示如何从任何在线学习者产生的假设序列中提取低风险假设来提供在线到批量转换结果。我们最终对马尔可夫链的不变度度量的密度进行了拟合优度测试的非反应分析。我们确定了一些类别的替代方案，基于$ L_2 $距离的测试具有规定的功率。

translated by 谷歌翻译

The Projected Covariance Measure for assumption-lean variable significance testing

Anton Rask Lundborg , Ilmun Kim , Rajen D. Shah , Richard J. Samworth

分类： (统计)机器学习

2022-11-03

Testing the significance of a variable or group of variables $X$ for predicting a response $Y$, given additional covariates $Z$, is a ubiquitous task in statistics. A simple but common approach is to specify a linear model, and then test whether the regression coefficient for $X$ is non-zero. However, when the model is misspecified, the test may have poor power, for example when $X$ is involved in complex interactions, or lead to many false rejections. In this work we study the problem of testing the model-free null of conditional mean independence, i.e. that the conditional mean of $Y$ given $X$ and $Z$ does not depend on $X$. We propose a simple and general framework that can leverage flexible nonparametric or machine learning methods, such as additive models or random forests, to yield both robust error control and high power. The procedure involves using these methods to perform regressions, first to estimate a form of projection of $Y$ on $X$ and $Z$ using one half of the data, and then to estimate the expected conditional covariance between this projection and $Y$ on the remaining half of the data. While the approach is general, we show that a version of our procedure using spline regression achieves what we show is the minimax optimal rate in this nonparametric testing problem. Numerical experiments demonstrate the effectiveness of our approach both in terms of maintaining Type I error control, and power, compared to several existing approaches.

translated by 谷歌翻译

Minimax Optimal Regression over Sobolev Spaces via Laplacian Eigenmaps on Neighborhood Graphs

Alden Green , Sivaraman Balakrishnan , Ryan J. Tibshirani

分类： (统计)机器学习

2021-11-14

本文研究了基于Laplacian Eigenmaps（Le）的基于Laplacian EIGENMAPS（PCR-LE）的主要成分回归的统计性质，这是基于Laplacian Eigenmaps（Le）的非参数回归的方法。 PCR-LE通过投影观察到的响应的向量$ {\ bf y} =（y_1，\ ldots，y_n）$ to to changbood图表拉普拉斯的某些特征向量跨越的子空间。我们表明PCR-Le通过SoboLev空格实现了随机设计回归的最小收敛速率。在设计密度$ P $的足够平滑条件下，PCR-le达到估计的最佳速率（其中已知平方$ l ^ 2 $ norm的最佳速率为$ n ^ { - 2s /（2s + d））} $）和健美的测试（$ n ^ { - 4s /（4s + d）$）。我们还表明PCR-LE是\ EMPH {歧管Adaptive}：即，我们考虑在小型内在维度$ M $的歧管上支持设计的情况，并为PCR-LE提供更快的界限Minimax估计（$ n ^ { - 2s /（2s + m）$）和测试（$ n ^ { - 4s /（4s + m）$）收敛率。有趣的是，这些利率几乎总是比图形拉普拉斯特征向量的已知收敛率更快;换句话说，对于这个问题的回归估计的特征似乎更容易，统计上讲，而不是估计特征本身。我们通过经验证据支持这些理论结果。

translated by 谷歌翻译

The Lasso with general Gaussian designs with applications to hypothesis testing

Michael Celentano , Andrea Montanari , Yuting Wei

分类：机器学习 | (统计)机器学习

2020-07-27

套索是一种高维回归的方法，当时，当协变量$ p $的订单数量或大于观测值$ n $时，通常使用它。由于两个基本原因，经典的渐近态性理论不适用于该模型：$（1）$正规风险是非平滑的； $（2）$估算器$ \ wideHat {\ boldsymbol {\ theta}} $与true参数vector $ \ boldsymbol {\ theta}^*$无法忽略。结果，标准的扰动论点是渐近正态性的传统基础。另一方面，套索估计器可以精确地以$ n $和$ p $大，$ n/p $的订单为一。这种表征首先是在使用I.I.D的高斯设计的情况下获得的。协变量：在这里，我们将其推广到具有非偏差协方差结构的高斯相关设计。这是根据更简单的``固定设计''模型表示的。我们在两个模型中各种数量的分布之间的距离上建立了非反应界限，它们在合适的稀疏类别中均匀地固定在信号上$ \ boldsymbol {\ theta}^*$。作为应用程序，我们研究了借助拉索的分布，并表明需要校正程度对于计算有效的置信区间是必要的。

translated by 谷歌翻译

Adaptive estimation of a quadratic functional by model selection

分类：

We consider the problem of estimating s 2 when s belongs to some separable Hilbert space and one observes the Gaussian process Y t = s t + σL t , for all t ∈ , where L is some Gaussian isonormal process. This framework allows us in particular to consider the classical "Gaussian sequence model" for which = l 2 * and L t = λ≥1 t λ ε λ , where ε λ λ≥1 is a sequence of i.i.d. standard normal variables. Our approach consists in considering some at most countable families of finite-dimensional linear subspaces of (the models) and then using model selection via some conveniently penalized least squares criterion to build new estimators of s 2 . We prove a general nonasymptotic risk bound which allows us to show that such penalized estimators are adaptive on a variety of collections of sets for the parameter s, depending on the family of models from which they are built. In particular, in the context of the Gaussian sequence model, a convenient choice of the family of models allows defining estimators which are adaptive over collections of hyperrectangles, ellipsoids, l p -bodies or Besov bodies. We take special care to describe the conditions under which the penalized estimator is efficient when the level of noise σ tends to zero. Our construction is an alternative to the one by Efroïmovich and Low for hyperrectangles and provides new results otherwise.

translated by 谷歌翻译

MARS via LASSO

Dohyeong Ki , Billy Fang , Adityanand Guntuboyina

分类： (统计)机器学习

2021-11-23

火星是1991年弗里德曼引入的非参数回归的流行方法。火星适合回归数据的简单非线性和非添加功能。我们提出并研究了火星方法的自然套索变体。我们的方法基于通过考虑MARS中的功能的无限维线性组合而获得的凸类功能的最小二乘估计，并施加基于变化的复杂性约束。我们表明我们的估计器可以通过有限维凸优化来计算，并且基于平滑度约束自然地连接到非参数函数估计技术。在一个简单的设计假设下，我们证明了我们的估算仪实现了一定程度上仅依赖于对数的收敛速度，从而在一定程度上避免了通常的维度诅咒。我们使用交叉验证方案实现了用于选择所涉及的调谐参数的方法，并显示与仿真和实际数据设置中的通常的MARS方法相比具有良好的性能。

translated by 谷歌翻译

The Sketched Wasserstein Distance for mixture distributions

Xin Bing , Florentina Bunea , Jonathan Niles-Weed

分类： (统计)机器学习

2022-06-26

素描的Wasserstein距离（$ W^S $）是专门针对有限混合物分布的新概率距离。给定概率分布的集合$ \ MATHCAL {a} $定义的任何度量$ d $，$ w^s $定义为该指标的最判别凸扩展为space $ \ mathcal {s} = \ textrm {cons}（\ Mathcal {a}）$ \ Mathcal {a} $的元素混合物的$。我们的表示定理表明，以这种方式构建的空间$（\ MATHCAL {S}，w^s）$对$ \ MATHCAL {x} =（\ Mathcal {a}，d）$的wasserstein空间是同构的。该结果为Wasserstein距离建立了普遍性，表明它们的特征是它们具有有限混合物的判别能力。我们利用此表示定理提出了基于Kantorovich--Rubenstein二元性的估计方法，并证明了一般定理，该定理表明其估计误差可以由任何估计混合物重量和混合物组件的误差的总和来限制。这些数量的估计器。在$ p $二维离散$ k $ -mixtures的情况下，我们得出了估计$ w^s $的尖锐统计属性，我们显示的可以估计的速率与$ \ sqrt {k/n} $，达到对数因素。我们对这些边界进行了互补，以估计$ k $ - 点度量空间上的分布之间的瓦斯汀距离的风险，这与我们的上限与对数因素相匹配。该结果是用于估计离散分布之间的Wasserstein距离的第一个接近最小的下限。此外，我们构造了混合物权重的$ \ sqrt {n} $渐变正常的估计器，并得出了我们$ w^s $的估计器的$ \ sqrt {n} $分布限制。仿真研究和数据分析为新素描的瓦斯汀距离的适用性提供了强有力的支持。

translated by 谷歌翻译

Estimating the minimizer and the minimum value of a regression function under passive design

Arya Akhavan , Davit Gogolashvili , Alexandre B. Tsybakov

分类： (统计)机器学习

2022-11-29

We propose a new method for estimating the minimizer $\boldsymbol{x}^*$ and the minimum value $f^*$ of a smooth and strongly convex regression function $f$ from the observations contaminated by random noise. Our estimator $\boldsymbol{z}_n$ of the minimizer $\boldsymbol{x}^*$ is based on a version of the projected gradient descent with the gradient estimated by a regularized local polynomial algorithm. Next, we propose a two-stage procedure for estimation of the minimum value $f^*$ of regression function $f$. At the first stage, we construct an accurate enough estimator of $\boldsymbol{x}^*$, which can be, for example, $\boldsymbol{z}_n$. At the second stage, we estimate the function value at the point obtained in the first stage using a rate optimal nonparametric procedure. We derive non-asymptotic upper bounds for the quadratic risk and optimization error of $\boldsymbol{z}_n$, and for the risk of estimating $f^*$. We establish minimax lower bounds showing that, under certain choice of parameters, the proposed algorithms achieve the minimax optimal rates of convergence on the class of smooth and strongly convex functions.

translated by 谷歌翻译

Off-policy estimation of linear functionals: Non-asymptotic theory for semi-parametric efficiency

Wenlong Mou , Martin J. Wainwright , Peter L. Bartlett

分类： (统计)机器学习

2022-09-26

在因果推理和强盗文献中，基于观察数据的线性功能估算线性功能的问题是规范的。我们分析了首先估计治疗效果函数的广泛的两阶段程序，然后使用该数量来估计线性功能。我们证明了此类过程的均方误差上的非反应性上限：这些边界表明，为了获得非反应性最佳程序，应在特定加权$ l^2 $中最大程度地估算治疗效果的误差。 -规范。我们根据该加权规范的约束回归分析了两阶段的程序，并通过匹配非轴突局部局部最小值下限，在有限样品中建立了实例依赖性最优性。这些结果表明，除了取决于渐近效率方差之外，最佳的非质子风险除了取决于样本量支持的最富有函数类别的真实结果函数与其近似类别之间的加权规范距离。

translated by 谷歌翻译

Adaptive, Rate-Optimal Hypothesis Testing in Nonparametric IV Models

Christoph Breunig , Xiaohong Chen

分类： (统计)机器学习

2020-06-17

我们提出了对非参数仪器变量（NPIV）模型中的结构函数的多面体锥体（例如，单调性，凸起）和平等（例如，参数，半游戏）限制的新的自适应假设试验。我们的测试统计是基于受限制和不受限制的筛估计之间的二次距离的改进的休假样本模拟。我们提供筛选调整参数的计算简单，数据驱动的选择和调整的CHI平方临界值。我们的测试在未知的内能性和仪器的未知强度存在下适应替代功能的未知平滑度。它达到了$ ^ 2 $以$ ^ 2 $的试验率。也就是说，通过未知规则的NPIV模型的任何其他假设测试，不能改善其在复合空缺上均匀地均匀地均匀的I型错误及其类型的II误差。通过反转自适应测试，可以获得数据驱动的置信度量为$ ^ 2 $。模拟确认我们的自适应测试控制规模及其有限样本功率极大地超过了NPIV模型中的单调性和参数限制的现有非自适应测试。介绍了对差异化产品需求和Engel曲线进行形状限制的经验应用。

translated by 谷歌翻译

Optimistic Rates: A Unifying Theory for Interpolation Learning and Regularization in Linear Regression

Lijia Zhou , Frederic Koehler , Danica J. Sutherland , Nathan Srebro

分类： (统计)机器学习 | 机器学习

2021-12-08

我们研究了称为“乐观速率”（Panchenko 2002; Srebro等，2010）的统一收敛概念，用于与高斯数据的线性回归。我们的精致分析避免了现有结果中的隐藏常量和对数因子，这已知在高维设置中至关重要，特别是用于了解插值学习。作为一个特殊情况，我们的分析恢复了Koehler等人的保证。（2021年），在良性过度的过度条件下，严格地表征了低规范内插器的人口风险。但是，我们的乐观速度绑定还分析了具有任意训练错误的预测因子。这使我们能够在随机设计下恢复脊和套索回归的一些经典统计保障，并有助于我们在过度参数化制度中获得精确了解近端器的过度风险。

translated by 谷歌翻译

A general framework for the analysis of kernel-based tests

Tamara Fernández , Nicolás Rivera

分类： (统计)机器学习

2022-08-31

基于内核的测试提供了一个简单而有效的框架，该框架使用繁殖内核希尔伯特空间的理论设计非参数测试程序。在本文中，我们提出了新的理论工具，可用于在几种数据方案以及许多不同的测试问题中研究基于内核测试的渐近行为。与当前的方法不同，我们的方法避免使用冗长的$ u $和$ v $统计信息扩展并限制定理，该定理通常出现在文献中，并直接与希尔伯特空格上的随机功能合作。因此，我们的框架会导致对内核测试的简单明了的分析，只需要轻度的规律条件。此外，我们表明，通常可以通过证明我们方法所需的规律条件既足够又需要进行必要的规律条件来改进我们的分析。为了说明我们的方法的有效性，我们为有条件的独立性测试问题提供了一项新的内核测试，以及针对已知的基于内核测试的新分析。

translated by 谷歌翻译

The Performance of Wasserstein Distributionally Robust M-Estimators in High Dimensions

Liviu Aolaritei , Soroosh Shafieezadeh-Abadeh , Florian Dörfler

分类： (统计)机器学习 | 机器学习

2022-06-27

Wasserstein的分布在强大的优化方面已成为强大估计的有力框架，享受良好的样本外部性能保证，良好的正则化效果以及计算上可易处理的双重重新纠正。在这样的框架中，通过将最接近经验分布的所有概率分布中最接近的所有概率分布中最小化的最差预期损失来最大程度地减少估计量。在本文中，我们提出了一个在噪声线性测量中估算未知参数的Wasserstein分布稳定的M估计框架，我们专注于分析此类估计器的平方误差性能的重要且具有挑战性的任务。我们的研究是在现代的高维比例状态下进行的，在该状态下，环境维度和样品数量都以相对的速度进行编码，该速率以编码问题的下/过度参数化的比例。在各向同性高斯特征假设下，我们表明可以恢复平方误差作为凸 - 串联优化问题的解，令人惊讶的是，它在最多四个标量变量中都涉及。据我们所知，这是在Wasserstein分布强劲的M估计背景下研究此问题的第一项工作。

translated by 谷歌翻译

Neural Estimation of Statistical Divergences

Sreejith Sreekumar , Ziv Goldfeld

分类： (统计)机器学习

2021-10-07

量化概率分布之间的异化的统计分歧（SDS）是统计推理和机器学习的基本组成部分。用于估计这些分歧的现代方法依赖于通过神经网络（NN）进行参数化经验变化形式并优化参数空间。这种神经估算器在实践中大量使用，但相应的性能保证是部分的，并呼吁进一步探索。特别是，涉及的两个错误源之间存在基本的权衡：近似和经验估计。虽然前者需要NN课程富有富有表现力，但后者依赖于控制复杂性。我们通过非渐近误差界限基于浅NN的基于浅NN的估计的估算权，重点关注四个流行的$ \ mathsf {f} $ - 分离 - kullback-leibler，chi squared，squared hellinger，以及总变异。我们分析依赖于实证过程理论的非渐近功能近似定理和工具。界限揭示了NN尺寸和样品数量之间的张力，并使能够表征其缩放速率，以确保一致性。对于紧凑型支持的分布，我们进一步表明，上述上三次分歧的神经估算器以适当的NN生长速率接近Minimax率 - 最佳，实现了对数因子的参数速率。

translated by 谷歌翻译

Sequential Estimation of Convex Functionals and Divergences

Tudor Manole , Aaditya Ramdas

分类： (统计)机器学习

2021-03-16

我们提出了一种统一的技术，用于顺序估计分布之间的凸面分歧，包括内核最大差异等积分概率度量，$ \ varphi $ - 像Kullback-Leibler发散，以及最佳运输成本，例如Wassersein距离的权力。这是通过观察到经验凸起分歧（部分有序）反向半角分离的实现来实现的，而可交换过滤耦合，其具有这些方法的最大不等式。这些技术似乎是对置信度序列和凸分流的现有文献的互补和强大的补充。我们构建一个离线到顺序设备，将各种现有的离线浓度不等式转换为可以连续监测的时间均匀置信序列，在任意停止时间提供有效的测试或置信区间。得到的顺序边界仅在相应的固定时间范围内支付迭代对数价格，保留对问题参数的相同依赖性（如适用的尺寸或字母大小）。这些结果也适用于更一般的凸起功能，如负差分熵，实证过程的高度和V型统计。

translated by 谷歌翻译

Benign overfitting and adaptive nonparametric regression

Julien Chhor , Suzanne Sigalla , Alexandre B. Tsybakov

分类：机器学习 | (统计)机器学习

2022-06-27

在非参数回归设置中，我们构建了一个估计器，该估计器是一个连续的函数，以高概率插值数据点，同时在H \ h \'较大级别的平均平方风险下达到最小的最佳速率，以适应未知的平滑度。

translated by 谷歌翻译