智能论文笔记

Optimal high-dimensional and nonparametric distributed testing under communication constraints

Botond Szabó , Lasse Vuursteen , Harry van Zanten

分类： (统计)机器学习

2022-02-02

我们在分布式框架中得出最小值测试错误，其中数据被分成多个机器，并且它们与中央机器的通信仅限于$ b $位。我们研究了高斯白噪声下的$ d $ - 和无限维信号检测问题。我们还得出达到理论下限的分布式测试算法。我们的结果表明，分布式测试受到从根本上不同的现象，这些现象在分布式估计中未观察到。在我们的发现中，我们表明，可以访问共享随机性的测试协议在某些制度中的性能比不进行的测试协议可以更好地表现。我们还观察到，即使仅使用单个本地计算机上可用的信息，一致的非参数分布式测试始终是可能的，即使只有$ 1 $的通信和相应的测试优于最佳本地测试。此外，我们还得出了自适应非参数分布测试策略和相应的理论下限。

translated by 谷歌翻译

Rate-Distortion Theoretic Generalization Bounds for Stochastic Learning Algorithms

Milad Sefidgaran , Amin Gohari , Gaël Richard , Umut Şimşekli

分类： (统计)机器学习 | 机器学习

2022-03-04

了解现代机器学习设置中的概括一直是统计学习理论的主要挑战之一。在这种情况下，近年来见证了各种泛化范围的发展，表明了不同的复杂性概念，例如数据样本和算法输出之间的相互信息，假设空间的可压缩性以及假设空间的分形维度。尽管这些界限从不同角度照亮了手头的问题，但它们建议的复杂性概念似乎似乎无关，从而限制了它们的高级影响。在这项研究中，我们通过速率理论的镜头证明了新的概括界定，并明确地将相互信息，可压缩性和分形维度的概念联系起来。我们的方法包括（i）通过使用源编码概念来定义可压缩性的广义概念，（ii）表明“压缩错误率”可以与预期和高概率相关。我们表明，在“无损压缩”设置中，我们恢复并改善了现有的基于信息的界限，而“有损压缩”方案使我们能够将概括与速率延伸维度联系起来，这是分形维度的特定概念。我们的结果为概括带来了更统一的观点，并打开了几个未来的研究方向。

translated by 谷歌翻译

Functional Linear Regression of Cumulative Distribution Functions

Qian Zhang , Anuran Makur , Kamyar Azizzadenesheli

分类：机器学习

2022-05-28

The estimation of cumulative distribution functions (CDFs) is an important learning task with a great variety of downstream applications, such as risk assessments in predictions and decision making. In this paper, we study functional regression of contextual CDFs where each data point is sampled from a linear combination of context dependent CDF basis functions. We propose functional ridge-regression-based estimation methods that estimate CDFs accurately everywhere. In particular, given $n$ samples with $d$ basis functions, we show estimation error upper bounds of $\widetilde{O}(\sqrt{d/n})$ for fixed design, random design, and adversarial context cases. We also derive matching information theoretic lower bounds, establishing minimax optimality for CDF functional regression. Furthermore, we remove the burn-in time in the random design setting using an alternative penalized estimator. Then, we consider agnostic settings where there is a mismatch in the data generation process. We characterize the error of the proposed estimators in terms of the mismatched error, and show that the estimators are well-behaved under model mismatch. Finally, to complete our study, we formalize infinite dimensional models where the parameter space is an infinite dimensional Hilbert space, and establish self-normalized estimation error upper bounds for this setting.

translated by 谷歌翻译

On lower bounds for the bias-variance trade-off

Alexis Derumigny , Johannes Schmidt-Hieber

分类： (统计)机器学习

2020-05-30

对于高维和非参数统计模型，速率最优估计器平衡平方偏差和方差是一种常见的现象。虽然这种平衡被广泛观察到，但很少知道是否存在可以避免偏差和方差之间的权衡的方法。我们提出了一般的策略，以获得对任何估计方差的下限，偏差小于预先限定的界限。这表明偏差差异折衷的程度是不可避免的，并且允许量化不服从其的方法的性能损失。该方法基于许多抽象的下限，用于涉及关于不同概率措施的预期变化以及诸如Kullback-Leibler或Chi-Sque-diversence的信息措施的变化。其中一些不平等依赖于信息矩阵的新概念。在该物品的第二部分中，将抽象的下限应用于几种统计模型，包括高斯白噪声模型，边界估计问题，高斯序列模型和高维线性回归模型。对于这些特定的统计应用，发生不同类型的偏差差异发生，其实力变化很大。对于高斯白噪声模型中集成平方偏置和集成方差之间的权衡，我们将较低界限的一般策略与减少技术相结合。这允许我们将原始问题与估计的估计器中的偏差折衷联动，以更简单的统计模型中具有额外的对称性属性。在高斯序列模型中，发生偏差差异的不同相位转换。虽然偏差和方差之间存在非平凡的相互作用，但是平方偏差的速率和方差不必平衡以实现最小估计速率。

translated by 谷歌翻译

Stochastic Subgradient Descent Escapes Active Strict Saddles

Pascal Bianchi , Walid Hachem , Sholom Schechtman

分类： (统计)机器学习

2021-08-04

In non-smooth stochastic optimization, we establish the non-convergence of the stochastic subgradient descent (SGD) to the critical points recently called active strict saddles by Davis and Drusvyatskiy. Such points lie on a manifold $M$ where the function $f$ has a direction of second-order negative curvature. Off this manifold, the norm of the Clarke subdifferential of $f$ is lower-bounded. We require two conditions on $f$. The first assumption is a Verdier stratification condition, which is a refinement of the popular Whitney stratification. It allows us to establish a reinforced version of the projection formula of Bolte \emph{et.al.} for Whitney stratifiable functions, and which is of independent interest. The second assumption, termed the angle condition, allows to control the distance of the iterates to $M$. When $f$ is weakly convex, our assumptions are generic. Consequently, generically in the class of definable weakly convex functions, the SGD converges to a local minimizer.

translated by 谷歌翻译

On the Statistical Complexity of Sample Amplification

Brian Axelrod , Shivam Garg , Yanjun Han , Vatsal Sharan , Gregory Valiant

分类：机器学习

2022-01-12

鉴于$ n $ i.i.d.从未知的分发$ P $绘制的样本，何时可以生成更大的$ n + m $ samples，这些标题不能与$ n + m $ i.i.d区别区别。从$ p $绘制的样品？（AXELROD等人2019）将该问题正式化为样本放大问题，并为离散分布和高斯位置模型提供了最佳放大程序。然而，这些程序和相关的下限定制到特定分布类，对样本扩增的一般统计理解仍然很大程度上。在这项工作中，我们通过推出通常适用的放大程序，下限技术和与现有统计概念的联系来放置对公司统计基础的样本放大问题。我们的技术适用于一大类分布，包括指数家庭，并在样本放大和分配学习之间建立严格的联系。

translated by 谷歌翻译

Precise Asymptotics for Spectral Methods in Mixed Generalized Linear Models

Yihan Zhang , Marco Mondelli , Ramji Venkataramanan

分类：机器学习 | (统计)机器学习

2022-11-21

In a mixed generalized linear model, the objective is to learn multiple signals from unlabeled observations: each sample comes from exactly one signal, but it is not known which one. We consider the prototypical problem of estimating two statistically independent signals in a mixed generalized linear model with Gaussian covariates. Spectral methods are a popular class of estimators which output the top two eigenvectors of a suitable data-dependent matrix. However, despite the wide applicability, their design is still obtained via heuristic considerations, and the number of samples $n$ needed to guarantee recovery is super-linear in the signal dimension $d$. In this paper, we develop exact asymptotics on spectral methods in the challenging proportional regime in which $n, d$ grow large and their ratio converges to a finite constant. By doing so, we are able to optimize the design of the spectral method, and combine it with a simple linear estimator, in order to minimize the estimation error. Our characterization exploits a mix of tools from random matrices, free probability and the theory of approximate message passing algorithms. Numerical simulations for mixed linear regression and phase retrieval display the advantage enabled by our analysis over existing designs of spectral methods.

translated by 谷歌翻译

Double Robust Bayesian Inference on Average Treatment Effects

Christoph Breunig , Ruixuan Liu , Zhengfei Yu

分类： (统计)机器学习

2022-11-29

We study a double robust Bayesian inference procedure on the average treatment effect (ATE) under unconfoundedness. Our Bayesian approach involves a correction term for prior distributions adjusted by the propensity score. We prove asymptotic equivalence of our Bayesian estimator and efficient frequentist estimators by establishing a new semiparametric Bernstein-von Mises theorem under double robustness; i.e., the lack of smoothness of conditional mean functions can be compensated by high regularity of the propensity score and vice versa. Consequently, the resulting Bayesian point estimator internalizes the bias correction as the frequentist-type doubly robust estimator, and the Bayesian credible sets form confidence intervals with asymptotically exact coverage probability. In simulations, we find that this corrected Bayesian procedure leads to significant bias reduction of point estimation and accurate coverage of confidence intervals, especially when the dimensionality of covariates is large relative to the sample size and the underlying functions become complex. We illustrate our method in an application to the National Supported Work Demonstration.

translated by 谷歌翻译

Private Stochastic Optimization in the Presence of Outliers: Optimal Rates for (Non-Smooth) Convex Losses and Extension to Non-Convex Losses

Andrew Lowy , Meisam Razaviyayn

分类：机器学习 | (统计)机器学习

2022-09-15

我们研究了私人（DP）随机优化（SO），其中包含非Lipschitz连续的离群值和损失函数的数据。迄今为止，DP上的绝大多数工作，因此假设损失是Lipschitz（即随机梯度均匀边界），并且它们的误差界限与损失的Lipschitz参数。尽管此假设很方便，但通常是不现实的：在需要隐私的许多实际问题中，数据可能包含异常值或无限制，导致某些随机梯度具有较大的规范。在这种情况下，Lipschitz参数可能过于较大，从而导致空虚的多余风险范围。因此，在最近的工作[WXDX20，KLZ22]上，我们做出了较弱的假设，即随机梯度已经限制了$ k $ - them-th Moments for Boy $ k \ geq 2 $。与DP Lipschitz上的作品相比，我们的多余风险量表与$ k $ 3的时刻限制，而不是损失的Lipschitz参数，从而在存在异常值的情况下允许速度明显更快。对于凸面和强烈凸出损失函数，我们提供了第一个渐近最佳的过量风险范围（最多可对数因素）。此外，与先前的作品[WXDX20，KLZ22]相反，我们的边界不需要损失函数是可区分的/平滑的。我们还设计了一种加速算法，该算法在线性时间内运行并提高了（与先前的工作相比），并且几乎最佳的过量风险因平滑损失而产生。此外，我们的工作是第一个解决非convex non-lipschitz损失功能的工作，以满足近端不平等现象。这涵盖了一些类别的神经网，以及其他实用模型。我们的近端PL算法几乎具有最佳的多余风险，几乎与强凸的下限相匹配。最后，我们提供了算法的洗牌DP变化，这些变化不需要受信任的策展人（例如，用于分布式学习）。

translated by 谷歌翻译

Dimension-agnostic inference using cross U-statistics

Ilmun Kim , Aaditya Ramdas

分类： (统计)机器学习

2020-11-10

Classical asymptotic theory for statistical inference usually involves calibrating a statistic by fixing the dimension $d$ while letting the sample size $n$ increase to infinity. Recently, much effort has been dedicated towards understanding how these methods behave in high-dimensional settings, where $d$ and $n$ both increase to infinity together. This often leads to different inference procedures, depending on the assumptions about the dimensionality, leaving the practitioner in a bind: given a dataset with 100 samples in 20 dimensions, should they calibrate by assuming $n \gg d$, or $d/n \approx 0.2$? This paper considers the goal of dimension-agnostic inference; developing methods whose validity does not depend on any assumption on $d$ versus $n$. We introduce an approach that uses variational representations of existing test statistics along with sample splitting and self-normalization to produce a new test statistic with a Gaussian limiting distribution, regardless of how $d$ scales with $n$. The resulting statistic can be viewed as a careful modification of degenerate U-statistics, dropping diagonal blocks and retaining off-diagonal blocks. We exemplify our technique for some classical problems including one-sample mean and covariance testing, and show that our tests have minimax rate-optimal power against appropriate local alternatives. In most settings, our cross U-statistic matches the high-dimensional power of the corresponding (degenerate) U-statistic up to a $\sqrt{2}$ factor.

translated by 谷歌翻译

On High dimensional Poisson models with measurement error: hypothesis testing for nonlinear nonconvex optimization

Fei Jiang , Yeqing Zhou , Jianxuan Liu , Yanyuan Ma

分类：机器学习

2022-12-31

We study estimation and testing in the Poisson regression model with noisy high dimensional covariates, which has wide applications in analyzing noisy big data. Correcting for the estimation bias due to the covariate noise leads to a non-convex target function to minimize. Treating the high dimensional issue further leads us to augment an amenable penalty term to the target function. We propose to estimate the regression parameter through minimizing the penalized target function. We derive the L1 and L2 convergence rates of the estimator and prove the variable selection consistency. We further establish the asymptotic normality of any subset of the parameters, where the subset can have infinitely many components as long as its cardinality grows sufficiently slow. We develop Wald and score tests based on the asymptotic normality of the estimator, which permits testing of linear functions of the members if the subset. We examine the finite sample performance of the proposed tests by extensive simulation. Finally, the proposed method is successfully applied to the Alzheimer's Disease Neuroimaging Initiative study, which motivated this work initially.

translated by 谷歌翻译

The Projected Covariance Measure for assumption-lean variable significance testing

Anton Rask Lundborg , Ilmun Kim , Rajen D. Shah , Richard J. Samworth

分类： (统计)机器学习

2022-11-03

Testing the significance of a variable or group of variables $X$ for predicting a response $Y$, given additional covariates $Z$, is a ubiquitous task in statistics. A simple but common approach is to specify a linear model, and then test whether the regression coefficient for $X$ is non-zero. However, when the model is misspecified, the test may have poor power, for example when $X$ is involved in complex interactions, or lead to many false rejections. In this work we study the problem of testing the model-free null of conditional mean independence, i.e. that the conditional mean of $Y$ given $X$ and $Z$ does not depend on $X$. We propose a simple and general framework that can leverage flexible nonparametric or machine learning methods, such as additive models or random forests, to yield both robust error control and high power. The procedure involves using these methods to perform regressions, first to estimate a form of projection of $Y$ on $X$ and $Z$ using one half of the data, and then to estimate the expected conditional covariance between this projection and $Y$ on the remaining half of the data. While the approach is general, we show that a version of our procedure using spline regression achieves what we show is the minimax optimal rate in this nonparametric testing problem. Numerical experiments demonstrate the effectiveness of our approach both in terms of maintaining Type I error control, and power, compared to several existing approaches.

translated by 谷歌翻译

Optimal transport map estimation in general function spaces

Vincent Divol , Jonathan Niles-Weed , Aram-Alexandre Pooladian

分类： (统计)机器学习

2022-12-07

We consider the problem of estimating the optimal transport map between a (fixed) source distribution $P$ and an unknown target distribution $Q$, based on samples from $Q$. The estimation of such optimal transport maps has become increasingly relevant in modern statistical applications, such as generative modeling. At present, estimation rates are only known in a few settings (e.g. when $P$ and $Q$ have densities bounded above and below and when the transport map lies in a H\"older class), which are often not reflected in practice. We present a unified methodology for obtaining rates of estimation of optimal transport maps in general function spaces. Our assumptions are significantly weaker than those appearing in the literature: we require only that the source measure $P$ satisfies a Poincar\'e inequality and that the optimal map be the gradient of a smooth convex function that lies in a space whose metric entropy can be controlled. As a special case, we recover known estimation rates for bounded densities and H\"older transport maps, but also obtain nearly sharp results in many settings not covered by prior work. For example, we provide the first statistical rates of estimation when $P$ is the normal distribution and the transport map is given by an infinite-width shallow neural network.

translated by 谷歌翻译

Time-uniform central limit theory, asymptotic confidence sequences, and anytime-valid causal inference

Ian Waudby-Smith , David Arbour , Ritwik Sinha , Edward H. Kennedy , Aaditya Ramdas

分类： (统计)机器学习

2021-03-11

基于中央限制定理（CLT）的置信区间是经典统计的基石。尽管仅渐近地有效，但它们是无处不在的，因为它们允许在非常弱的假设下进行统计推断，即使不可能进行非反应性推断，通常也可以应用于问题。本文引入了这种渐近置信区间的时间均匀类似物。为了详细说明，我们的方法采用置信序列（CS）的形式 - 随着时间的推移均匀有效的置信区间序列。 CSS在任意停止时间时提供有效的推断，与需要预先确定样本量的经典置信区间不同，因此没有受到“窥视”数据的惩罚。文献中现有的CSS是非肿瘤的，因此不享受上述渐近置信区间的广泛适用性。我们的工作通过给出“渐近CSS”的定义来弥合差距，并得出仅需要类似CLT的假设的通用渐近CS。虽然CLT在固定样本量下近似于高斯的样本平均值的分布，但我们使用强大的不变性原理（来自Komlos，Major和Tusnady的1970年代的开创性工作），按照整个样品平均过程均匀地近似于整个样品平均过程。隐性的高斯过程。我们通过在观察性研究中基于双重稳健的估计量来得出非参数渐近级别的CSS来证明它们的实用性，即使在固定的时间方案中，也可能不存在非催化方法（由于混淆偏见）。这些使双重强大的因果推断可以连续监测并自适应地停止。

translated by 谷歌翻译

Minimax Optimal Regression over Sobolev Spaces via Laplacian Eigenmaps on Neighborhood Graphs

Alden Green , Sivaraman Balakrishnan , Ryan J. Tibshirani

分类： (统计)机器学习

2021-11-14

本文研究了基于Laplacian Eigenmaps（Le）的基于Laplacian EIGENMAPS（PCR-LE）的主要成分回归的统计性质，这是基于Laplacian Eigenmaps（Le）的非参数回归的方法。 PCR-LE通过投影观察到的响应的向量$ {\ bf y} =（y_1，\ ldots，y_n）$ to to changbood图表拉普拉斯的某些特征向量跨越的子空间。我们表明PCR-Le通过SoboLev空格实现了随机设计回归的最小收敛速率。在设计密度$ P $的足够平滑条件下，PCR-le达到估计的最佳速率（其中已知平方$ l ^ 2 $ norm的最佳速率为$ n ^ { - 2s /（2s + d））} $）和健美的测试（$ n ^ { - 4s /（4s + d）$）。我们还表明PCR-LE是\ EMPH {歧管Adaptive}：即，我们考虑在小型内在维度$ M $的歧管上支持设计的情况，并为PCR-LE提供更快的界限Minimax估计（$ n ^ { - 2s /（2s + m）$）和测试（$ n ^ { - 4s /（4s + m）$）收敛率。有趣的是，这些利率几乎总是比图形拉普拉斯特征向量的已知收敛率更快;换句话说，对于这个问题的回归估计的特征似乎更容易，统计上讲，而不是估计特征本身。我们通过经验证据支持这些理论结果。

translated by 谷歌翻译

An Interpretable and Efficient Infinite-Order Vector Autoregressive Model for High-Dimensional Time Series

Yao Zheng , Shibo Li

分类： (统计)机器学习

2022-09-02

作为一种特殊的无限级矢量自回旋（VAR）模型，矢量自回归移动平均值（VARMA）模型比广泛使用的有限级var模型可以捕获更丰富的时间模式。然而，长期以来，其实用性一直受到其不可识别性，计算疾病性和解释相对难度的阻碍。本文介绍了一种新颖的无限级VAR模型，该模型不仅避免了VARMA模型的缺点，而且继承了其有利的时间模式。作为另一个有吸引力的特征，可以单独解释该模型的时间和横截面依赖性结构，因为它们的特征是不同的参数集。对于高维时间序列，这种分离激发了我们对确定横截面依赖性的参数施加稀疏性。结果，可以在不牺牲任何时间信息的情况下实现更高的统计效率和可解释性。我们为提出的模型引入了一个$ \ ell_1 $调查估计量，并得出相应的非反应误差边界。开发了有效的块坐标下降算法和一致的模型顺序选择方法。拟议方法的优点得到了模拟研究和现实世界的宏观经济数据分析的支持。

translated by 谷歌翻译

Debiased Machine Learning of Set-Identified Linear Models

Vira Semenova

分类： (统计)机器学习 | 机器学习

2017-12-28

This paper provides estimation and inference methods for an identified set's boundary (i.e., support function) where the selection among a very large number of covariates is based on modern regularized tools. I characterize the boundary using a semiparametric moment equation. Combining Neyman-orthogonality and sample splitting ideas, I construct a root-N consistent, uniformly asymptotically Gaussian estimator of the boundary and propose a multiplier bootstrap procedure to conduct inference. I apply this result to the partially linear model, the partially linear IV model and the average partial derivative with an interval-valued outcome.

translated by 谷歌翻译

Off-policy estimation of linear functionals: Non-asymptotic theory for semi-parametric efficiency

Wenlong Mou , Martin J. Wainwright , Peter L. Bartlett

分类： (统计)机器学习

2022-09-26

在因果推理和强盗文献中，基于观察数据的线性功能估算线性功能的问题是规范的。我们分析了首先估计治疗效果函数的广泛的两阶段程序，然后使用该数量来估计线性功能。我们证明了此类过程的均方误差上的非反应性上限：这些边界表明，为了获得非反应性最佳程序，应在特定加权$ l^2 $中最大程度地估算治疗效果的误差。 -规范。我们根据该加权规范的约束回归分析了两阶段的程序，并通过匹配非轴突局部局部最小值下限，在有限样品中建立了实例依赖性最优性。这些结果表明，除了取决于渐近效率方差之外，最佳的非质子风险除了取决于样本量支持的最富有函数类别的真实结果函数与其近似类别之间的加权规范距离。

translated by 谷歌翻译

Tensor-on-Tensor Regression: Riemannian Optimization, Over-parameterization, Statistical-computational Gap, and Their Interplay

Yuetian Luo , Anru R. Zhang

分类：机器学习 | (统计)机器学习

2022-06-17

我们研究了张量张量的回归，其中的目标是将张量的响应与张量协变量与塔克等级参数张量/矩阵连接起来，而没有其内在等级的先验知识。我们提出了Riemannian梯度下降（RGD）和Riemannian Gauss-Newton（RGN）方法，并通过研究等级过度参数化的影响来应对未知等级的挑战。我们通过表明RGD和RGN分别线性地和四边形地收敛到两个等级的统计最佳估计值，从而为一般的张量调节回归提供了第一个收敛保证。我们的理论揭示了一种有趣的现象：Riemannian优化方法自然地适应了过度参数化，而无需修改其实施。我们还为低度多项式框架下的标量调整回归中的统计计算差距提供了第一个严格的证据。我们的理论证明了``统计计算差距的祝福''现象：在张张量的张量回归中，对于三个或更高的张紧器，在张张量的张量回归中，计算所需的样本量与中等级别相匹配的计算量相匹配。在考虑计算可行的估计器时，虽然矩阵设置没有此类好处。这表明中等等级的过度参数化本质上是``在张量调整的样本量三分或更高的样本大小上，三分或更高的样本量。最后，我们进行仿真研究以显示我们提出的方法的优势并证实我们的理论发现。

translated by 谷歌翻译

A Permutation-Free Kernel Independence Test

Shubhanshu Shekhar , Ilmun Kim , Aaditya Ramdas

分类：机器学习 | (统计)机器学习

2022-12-18

In nonparametric independence testing, we observe i.i.d.\ data $\{(X_i,Y_i)\}_{i=1}^n$, where $X \in \mathcal{X}, Y \in \mathcal{Y}$ lie in any general spaces, and we wish to test the null that $X$ is independent of $Y$. Modern test statistics such as the kernel Hilbert-Schmidt Independence Criterion (HSIC) and Distance Covariance (dCov) have intractable null distributions due to the degeneracy of the underlying U-statistics. Thus, in practice, one often resorts to using permutation testing, which provides a nonasymptotic guarantee at the expense of recalculating the quadratic-time statistics (say) a few hundred times. This paper provides a simple but nontrivial modification of HSIC and dCov (called xHSIC and xdCov, pronounced ``cross'' HSIC/dCov) so that they have a limiting Gaussian distribution under the null, and thus do not require permutations. This requires building on the newly developed theory of cross U-statistics by Kim and Ramdas (2020), and in particular developing several nontrivial extensions of the theory in Shekhar et al. (2022), which developed an analogous permutation-free kernel two-sample test. We show that our new tests, like the originals, are consistent against fixed alternatives, and minimax rate optimal against smooth local alternatives. Numerical simulations demonstrate that compared to the full dCov or HSIC, our variants have the same power up to a $\sqrt 2$ factor, giving practitioners a new option for large problems or data-analysis pipelines where computation, not sample size, could be the bottleneck.

translated by 谷歌翻译