许多用于反问题和数据同化的现代算法都依赖于集成Kalman更新,以将先前的预测与观察到的数据融为一体。合奏Kalman方法通常具有小的合奏尺寸,这在生成每个粒子成本高昂的应用中至关重要。本文对合奏Kalman更新进行了非反应分析,该分析严格地解释了为什么由于快速衰减或近似稀疏性而导致先前的协方差具有适度的有效尺寸,那么小合奏的大小就足够了。我们在统一的框架中介绍了我们的理论,比较了使用扰动观测值,平方根滤波和本地化的集合卡尔曼更新的几个实现。作为我们分析的一部分,我们为可能具有独立感兴趣的大约稀疏矩阵开发了新的无维度协方差估计界限。
translated by 谷歌翻译
套索是一种高维回归的方法,当时,当协变量$ p $的订单数量或大于观测值$ n $时,通常使用它。由于两个基本原因,经典的渐近态性理论不适用于该模型:$(1)$正规风险是非平滑的; $(2)$估算器$ \ wideHat {\ boldsymbol {\ theta}} $与true参数vector $ \ boldsymbol {\ theta}^*$无法忽略。结果,标准的扰动论点是渐近正态性的传统基础。另一方面,套索估计器可以精确地以$ n $和$ p $大,$ n/p $的订单为一。这种表征首先是在使用I.I.D的高斯设计的情况下获得的。协变量:在这里,我们将其推广到具有非偏差协方差结构的高斯相关设计。这是根据更简单的``固定设计''模型表示的。我们在两个模型中各种数量的分布之间的距离上建立了非反应界限,它们在合适的稀疏类别中均匀地固定在信号上$ \ boldsymbol {\ theta}^*$。作为应用程序,我们研究了借助拉索的分布,并表明需要校正程度对于计算有效的置信区间是必要的。
translated by 谷歌翻译
我们考虑与高斯数据的高维线性回归中的插值学习,并在类高斯宽度方面证明了任意假设类别中的内插器的泛化误差。将通用绑定到欧几里德常规球恢复了Bartlett等人的一致性结果。(2020)对于最小规范内插器,并确认周等人的预测。(2020)在高斯数据的特殊情况下,对于近乎最小常态的内插器。我们通过将其应用于单位来证明所界限的一般性,从而获得最小L1-NORM Interpoolator(基础追踪)的新型一致性结果。我们的结果表明,基于规范的泛化界限如何解释并用于分析良性过度装备,至少在某些设置中。
translated by 谷歌翻译
Classical asymptotic theory for statistical inference usually involves calibrating a statistic by fixing the dimension $d$ while letting the sample size $n$ increase to infinity. Recently, much effort has been dedicated towards understanding how these methods behave in high-dimensional settings, where $d$ and $n$ both increase to infinity together. This often leads to different inference procedures, depending on the assumptions about the dimensionality, leaving the practitioner in a bind: given a dataset with 100 samples in 20 dimensions, should they calibrate by assuming $n \gg d$, or $d/n \approx 0.2$? This paper considers the goal of dimension-agnostic inference; developing methods whose validity does not depend on any assumption on $d$ versus $n$. We introduce an approach that uses variational representations of existing test statistics along with sample splitting and self-normalization to produce a new test statistic with a Gaussian limiting distribution, regardless of how $d$ scales with $n$. The resulting statistic can be viewed as a careful modification of degenerate U-statistics, dropping diagonal blocks and retaining off-diagonal blocks. We exemplify our technique for some classical problems including one-sample mean and covariance testing, and show that our tests have minimax rate-optimal power against appropriate local alternatives. In most settings, our cross U-statistic matches the high-dimensional power of the corresponding (degenerate) U-statistic up to a $\sqrt{2}$ factor.
translated by 谷歌翻译
我们研究了随机近似程序,以便基于观察来自ergodic Markov链的长度$ n $的轨迹来求近求解$ d -dimension的线性固定点方程。我们首先表现出$ t _ {\ mathrm {mix}} \ tfrac {n}} \ tfrac {n}} \ tfrac {d}} \ tfrac {d} {n} $的非渐近性界限。$ t _ {\ mathrm {mix $是混合时间。然后,我们证明了一种在适当平均迭代序列上的非渐近实例依赖性,具有匹配局部渐近最小的限制的领先术语,包括对参数$的敏锐依赖(d,t _ {\ mathrm {mix}}) $以高阶术语。我们将这些上限与非渐近Minimax的下限补充,该下限是建立平均SA估计器的实例 - 最优性。我们通过Markov噪声的政策评估导出了这些结果的推导 - 覆盖了所有$ \ lambda \中的TD($ \ lambda $)算法,以便[0,1)$ - 和线性自回归模型。我们的实例依赖性表征为HyperParameter调整的细粒度模型选择程序的设计开放了门(例如,在运行TD($ \ Lambda $)算法时选择$ \ lambda $的值)。
translated by 谷歌翻译
矩阵正常模型,高斯矩阵变化分布的系列,其协方差矩阵是两个较低尺寸因子的Kronecker乘积,经常用于模拟矩阵变化数据。张量正常模型将该家庭推广到三个或更多因素的Kronecker产品。我们研究了矩阵和张量模型中协方差矩阵的Kronecker因子的估计。我们向几个自然度量中的最大似然估计器(MLE)实现的误差显示了非因素界限。与现有范围相比,我们的结果不依赖于条件良好或稀疏的因素。对于矩阵正常模型,我们所有的所有界限都是最佳的对数因子最佳,对于张量正常模型,我们对最大因数和整体协方差矩阵的绑定是最佳的,所以提供足够的样品以获得足够的样品以获得足够的样品常量Frobenius错误。在与我们的样本复杂性范围相同的制度中,我们表明迭代程序计算称为触发器算法称为触发器算法的MLE的线性地收敛,具有高概率。我们的主要工具是Fisher信息度量诱导的正面矩阵的几何中的测地强凸性。这种强大的凸起由某些随机量子通道的扩展来决定。我们还提供了数值证据,使得将触发器算法与简单的收缩估计器组合可以提高缺乏采样制度的性能。
translated by 谷歌翻译
Testing the significance of a variable or group of variables $X$ for predicting a response $Y$, given additional covariates $Z$, is a ubiquitous task in statistics. A simple but common approach is to specify a linear model, and then test whether the regression coefficient for $X$ is non-zero. However, when the model is misspecified, the test may have poor power, for example when $X$ is involved in complex interactions, or lead to many false rejections. In this work we study the problem of testing the model-free null of conditional mean independence, i.e. that the conditional mean of $Y$ given $X$ and $Z$ does not depend on $X$. We propose a simple and general framework that can leverage flexible nonparametric or machine learning methods, such as additive models or random forests, to yield both robust error control and high power. The procedure involves using these methods to perform regressions, first to estimate a form of projection of $Y$ on $X$ and $Z$ using one half of the data, and then to estimate the expected conditional covariance between this projection and $Y$ on the remaining half of the data. While the approach is general, we show that a version of our procedure using spline regression achieves what we show is the minimax optimal rate in this nonparametric testing problem. Numerical experiments demonstrate the effectiveness of our approach both in terms of maintaining Type I error control, and power, compared to several existing approaches.
translated by 谷歌翻译
我们考虑估计与I.I.D的排名$ 1 $矩阵因素的问题。高斯,排名$ 1 $的测量值,这些测量值非线性转化和损坏。考虑到非线性的两种典型选择,我们研究了从随机初始化开始的此非convex优化问题的天然交流更新规则的收敛性能。我们通过得出确定性递归,即使在高维问题中也是准确的,我们显示出算法的样本分割版本的敏锐收敛保证。值得注意的是,虽然无限样本的种群更新是非信息性的,并提示单个步骤中的精确恢复,但算法 - 我们的确定性预测 - 从随机初始化中迅速地收敛。我们尖锐的非反应分析也暴露了此问题的其他几种细粒度,包括非线性和噪声水平如何影响收敛行为。从技术层面上讲,我们的结果可以通过证明我们的确定性递归可以通过我们的确定性顺序来预测我们的确定性序列,而当每次迭代都以$ n $观测来运行时,我们的确定性顺序可以通过$ n^{ - 1/2} $的波动。我们的技术利用了源自有关高维$ m $估计文献的遗留工具,并为通过随机数据的其他高维优化问题的随机初始化而彻底地分析了高阶迭代算法的途径。
translated by 谷歌翻译
本文为信号去噪提供了一般交叉验证框架。然后将一般框架应用于非参数回归方法,例如趋势过滤和二元推车。然后显示所得到的交叉验证版本以获得最佳调谐的类似物所熟知的几乎相同的收敛速度。没有任何先前的趋势过滤或二元推车的理论分析。为了说明框架的一般性,我们还提出并研究了两个基本估算器的交叉验证版本;套索用于高维线性回归和矩阵估计的奇异值阈值阈值。我们的一般框架是由Chatterjee和Jafarov(2015)的想法的启发,并且可能适用于使用调整参数的广泛估算方法。
translated by 谷歌翻译
Autoencoders are a popular model in many branches of machine learning and lossy data compression. However, their fundamental limits, the performance of gradient methods and the features learnt during optimization remain poorly understood, even in the two-layer setting. In fact, earlier work has considered either linear autoencoders or specific training regimes (leading to vanishing or diverging compression rates). Our paper addresses this gap by focusing on non-linear two-layer autoencoders trained in the challenging proportional regime in which the input dimension scales linearly with the size of the representation. Our results characterize the minimizers of the population risk, and show that such minimizers are achieved by gradient methods; their structure is also unveiled, thus leading to a concise description of the features obtained via training. For the special case of a sign activation function, our analysis establishes the fundamental limits for the lossy compression of Gaussian sources via (shallow) autoencoders. Finally, while the results are proved for Gaussian data, numerical simulations on standard datasets display the universality of the theoretical predictions.
translated by 谷歌翻译
切成薄片的相互信息(SMI)定义为在随机变量的一维随机投影之间的平均值(MI)项。它是对经典MI依赖的替代度量,该量子保留了许多特性,但更可扩展到高维度。但是,对SMI本身和其估计率的定量表征取决于环境维度,这对于理解可伸缩性至关重要,仍然晦涩难懂。这项工作将原始的SMI定义扩展到$ K $ -SMI,该定义将预测视为$ k $维二维子空间,并提供了有关其依赖性尺寸的多方面帐户。在2-Wasserstein指标中使用差分熵连续性的新结果,我们对Monte Carlo(MC)基于$ K $ -SMI的估计的错误得出了尖锐的界限,并明确依赖于$ K $和环境维度,揭示了他们与样品数量的相互作用。然后,我们将MC Integrator与神经估计框架相结合,以提供端到端$ K $ -SMI估算器,为此建立了最佳的收敛率。随着尺寸的增长,我们还探索了人口$ k $ -smi的渐近学,从而为高斯近似结果提供了在适当的力矩范围下衰减的残差。我们的理论通过数值实验验证,并适用于切片Infogan,该切片完全提供了$ k $ -smi的可伸缩性问题的全面定量说明,包括SMI作为特殊情况,当$ k = 1 $。
translated by 谷歌翻译
近似消息传递(AMP)是解决高维统计问题的有效迭代范式。但是,当迭代次数超过$ o \ big(\ frac {\ log n} {\ log log \ log \ log n} \时big)$(带有$ n $问题维度)。为了解决这一不足,本文开发了一个非吸附框架,用于理解峰值矩阵估计中的AMP。基于AMP更新的新分解和可控的残差项,我们布置了一个分析配方,以表征在存在独立初始化的情况下AMP的有限样本行为,该过程被进一步概括以进行光谱初始化。作为提出的分析配方的两个具体后果:(i)求解$ \ mathbb {z} _2 $同步时,我们预测了频谱初始化AMP的行为,最高为$ o \ big(\ frac {n} {\ mathrm {\ mathrm { poly} \ log n} \ big)$迭代,表明该算法成功而无需随后的细化阶段(如最近由\ citet {celentano2021local}推测); (ii)我们表征了稀疏PCA中AMP的非反应性行为(在尖刺的Wigner模型中),以广泛的信噪比。
translated by 谷歌翻译
现代神经网络通常以强烈的过度构造状态运行:它们包含许多参数,即使实际标签被纯粹随机的标签代替,它们也可以插入训练集。尽管如此,他们在看不见的数据上达到了良好的预测错误:插值训练集并不会导致巨大的概括错误。此外,过度散色化似乎是有益的,因为它简化了优化景观。在这里,我们在神经切线(NT)制度中的两层神经网络的背景下研究这些现象。我们考虑了一个简单的数据模型,以及各向同性协变量的矢量,$ d $尺寸和$ n $隐藏的神经元。我们假设样本量$ n $和尺寸$ d $都很大,并且它们在多项式上相关。我们的第一个主要结果是对过份术的经验NT内核的特征结构的特征。这种表征意味着必然的表明,经验NT内核的最低特征值在$ ND \ gg n $后立即从零界限,因此网络可以在同一制度中精确插值任意标签。我们的第二个主要结果是对NT Ridge回归的概括误差的表征,包括特殊情况,最小值-ULL_2 $ NORD插值。我们证明,一旦$ nd \ gg n $,测试误差就会被内核岭回归之一相对于无限宽度内核而近似。多项式脊回归的误差依次近似后者,从而通过与激活函数的高度组件相关的“自我诱导的”项增加了正则化参数。多项式程度取决于样本量和尺寸(尤其是$ \ log n/\ log d $)。
translated by 谷歌翻译
随机奇异值分解(RSVD)是用于计算大型数据矩阵截断的SVD的一类计算算法。给定A $ n \ times n $对称矩阵$ \ mathbf {m} $,原型RSVD算法输出通过计算$ \ mathbf {m mathbf {m} $的$ k $引导singular vectors的近似m}^{g} \ mathbf {g} $;这里$ g \ geq 1 $是一个整数,$ \ mathbf {g} \ in \ mathbb {r}^{n \ times k} $是一个随机的高斯素描矩阵。在本文中,我们研究了一般的“信号加上噪声”框架下的RSVD的统计特性,即,观察到的矩阵$ \ hat {\ mathbf {m}} $被认为是某种真实但未知的加法扰动信号矩阵$ \ mathbf {m} $。我们首先得出$ \ ell_2 $(频谱规范)和$ \ ell_ {2 \ to \ infty} $(最大行行列$ \ ell_2 $ norm)$ \ hat {\ hat {\ Mathbf {M}} $和信号矩阵$ \ Mathbf {M} $的真实单数向量。这些上限取决于信噪比(SNR)和功率迭代$ g $的数量。观察到一个相变现象,其中较小的SNR需要较大的$ g $值以保证$ \ ell_2 $和$ \ ell_ {2 \ to \ fo \ infty} $ distances的收敛。我们还表明,每当噪声矩阵满足一定的痕量生长条件时,这些相变发生的$ g $的阈值都会很清晰。最后,我们得出了近似奇异向量的行波和近似矩阵的进入波动的正常近似。我们通过将RSVD的几乎最佳性能保证在应用于三个统计推断问题的情况下,即社区检测,矩阵完成和主要的组件分析,并使用缺失的数据来说明我们的理论结果。
translated by 谷歌翻译
We consider the problem of estimating a multivariate function $f_0$ of bounded variation (BV), from noisy observations $y_i = f_0(x_i) + z_i$ made at random design points $x_i \in \mathbb{R}^d$, $i=1,\ldots,n$. We study an estimator that forms the Voronoi diagram of the design points, and then solves an optimization problem that regularizes according to a certain discrete notion of total variation (TV): the sum of weighted absolute differences of parameters $\theta_i,\theta_j$ (which estimate the function values $f_0(x_i),f_0(x_j)$) at all neighboring cells $i,j$ in the Voronoi diagram. This is seen to be equivalent to a variational optimization problem that regularizes according to the usual continuum (measure-theoretic) notion of TV, once we restrict the domain to functions that are piecewise constant over the Voronoi diagram. The regression estimator under consideration hence performs (shrunken) local averaging over adaptively formed unions of Voronoi cells, and we refer to it as the Voronoigram, following the ideas in Koenker (2005), and drawing inspiration from Tukey's regressogram (Tukey, 1961). Our contributions in this paper span both the conceptual and theoretical frontiers: we discuss some of the unique properties of the Voronoigram in comparison to TV-regularized estimators that use other graph-based discretizations; we derive the asymptotic limit of the Voronoi TV functional; and we prove that the Voronoigram is minimax rate optimal (up to log factors) for estimating BV functions that are essentially bounded.
translated by 谷歌翻译
We consider the problem of estimating the optimal transport map between a (fixed) source distribution $P$ and an unknown target distribution $Q$, based on samples from $Q$. The estimation of such optimal transport maps has become increasingly relevant in modern statistical applications, such as generative modeling. At present, estimation rates are only known in a few settings (e.g. when $P$ and $Q$ have densities bounded above and below and when the transport map lies in a H\"older class), which are often not reflected in practice. We present a unified methodology for obtaining rates of estimation of optimal transport maps in general function spaces. Our assumptions are significantly weaker than those appearing in the literature: we require only that the source measure $P$ satisfies a Poincar\'e inequality and that the optimal map be the gradient of a smooth convex function that lies in a space whose metric entropy can be controlled. As a special case, we recover known estimation rates for bounded densities and H\"older transport maps, but also obtain nearly sharp results in many settings not covered by prior work. For example, we provide the first statistical rates of estimation when $P$ is the normal distribution and the transport map is given by an infinite-width shallow neural network.
translated by 谷歌翻译
在本文中,我们提出了一种均匀抖动的一位量化方案,以进行高维统计估计。该方案包含截断,抖动和量化,作为典型步骤。作为规范示例,量化方案应用于三个估计问题:稀疏协方差矩阵估计,稀疏线性回归和矩阵完成。我们研究了高斯和重尾政权,假定重尾数据的基本分布具有有限的第二或第四刻。对于每个模型,我们根据一位量化的数据提出新的估计器。在高斯次级政权中,我们的估计器达到了对数因素的最佳最小速率,这表明我们的量化方案几乎没有额外的成本。在重尾状态下,虽然我们的估计量基本上变慢,但这些结果是在这种单位量化和重型尾部设置中的第一个结果,或者比现有可比结果表现出显着改善。此外,我们为一位压缩传感和一位矩阵完成的问题做出了巨大贡献。具体而言,我们通过凸面编程将一位压缩感传感扩展到次高斯甚至是重尾传感向量。对于一位矩阵完成,我们的方法与标准似然方法基本不同,并且可以处理具有未知分布的预量化随机噪声。提出了有关合成数据的实验结果,以支持我们的理论分析。
translated by 谷歌翻译
在因果推理和强盗文献中,基于观察数据的线性功能估算线性功能的问题是规范的。我们分析了首先估计治疗效果函数的广泛的两阶段程序,然后使用该数量来估计线性功能。我们证明了此类过程的均方误差上的非反应性上限:这些边界表明,为了获得非反应性最佳程序,应在特定加权$ l^2 $中最大程度地估算治疗效果的误差。 -规范。我们根据该加权规范的约束回归分析了两阶段的程序,并通过匹配非轴突局部局部最小值下限,在有限样品中建立了实例依赖性最优性。这些结果表明,除了取决于渐近效率方差之外,最佳的非质子风险除了取决于样本量支持的最富有函数类别的真实结果函数与其近似类别之间的加权规范距离。
translated by 谷歌翻译
We study the problem of estimating the fixed point of a contractive operator defined on a separable Banach space. Focusing on a stochastic query model that provides noisy evaluations of the operator, we analyze a variance-reduced stochastic approximation scheme, and establish non-asymptotic bounds for both the operator defect and the estimation error, measured in an arbitrary semi-norm. In contrast to worst-case guarantees, our bounds are instance-dependent, and achieve the local asymptotic minimax risk non-asymptotically. For linear operators, contractivity can be relaxed to multi-step contractivity, so that the theory can be applied to problems like average reward policy evaluation problem in reinforcement learning. We illustrate the theory via applications to stochastic shortest path problems, two-player zero-sum Markov games, as well as policy evaluation and $Q$-learning for tabular Markov decision processes.
translated by 谷歌翻译
我们研究了称为“乐观速率”(Panchenko 2002; Srebro等,2010)的统一收敛概念,用于与高斯数据的线性回归。我们的精致分析避免了现有结果中的隐藏常量和对数因子,这已知在高维设置中至关重要,特别是用于了解插值学习。作为一个特殊情况,我们的分析恢复了Koehler等人的保证。(2021年),在良性过度的过度条件下,严格地表征了低规范内插器的人口风险。但是,我们的乐观速度绑定还分析了具有任意训练错误的预测因子。这使我们能够在随机设计下恢复脊和套索回归的一些经典统计保障,并有助于我们在过度参数化制度中获得精确了解近端器的过度风险。
translated by 谷歌翻译