智能论文笔记

Cross-validation for change-point regression: pitfalls and solutions

Florian Pein , Rajen D. Shah

分类： (统计)机器学习

2021-12-06

交叉验证是在许多非参数回归问题中调整参数选择的标准方法。然而，它在变化点回归中的使用不太常见，也许由于其预测误差的标准可能似乎允许小的虚假变化，因此不太适合估计变化点的数量和位置。我们表明，实际上，具有平方误差损失的交叉验证问题更严重，可以导致系统的减少或过度估计变化点的数量，以及在更改的简单设置中的平均功能的高度次优估计很容易检测到。我们提出了两种简单的方法来解决这些问题，第一个涉及使用绝对误差而不是平方误差损失，以及第二个涉及修改所用的熔断集。对于后者，我们提供了允许一致估计一般变更点估计程序的变化点数的条件。我们显示这些条件对于使用新结果的最佳分区满足其在提供错误数量的更改点时的性能。数值实验表明，特别是当错误分布良好的调整参数选择时，特别是使用经典调谐参数选择的绝对误差方法竞争，但可以在错过的模型中显着优于这些。 CRAN上的R包CrossValidationCP中提供了我们的方法。

translated by 谷歌翻译

The Projected Covariance Measure for assumption-lean variable significance testing

Anton Rask Lundborg , Ilmun Kim , Rajen D. Shah , Richard J. Samworth

分类： (统计)机器学习

2022-11-03

Testing the significance of a variable or group of variables $X$ for predicting a response $Y$, given additional covariates $Z$, is a ubiquitous task in statistics. A simple but common approach is to specify a linear model, and then test whether the regression coefficient for $X$ is non-zero. However, when the model is misspecified, the test may have poor power, for example when $X$ is involved in complex interactions, or lead to many false rejections. In this work we study the problem of testing the model-free null of conditional mean independence, i.e. that the conditional mean of $Y$ given $X$ and $Z$ does not depend on $X$. We propose a simple and general framework that can leverage flexible nonparametric or machine learning methods, such as additive models or random forests, to yield both robust error control and high power. The procedure involves using these methods to perform regressions, first to estimate a form of projection of $Y$ on $X$ and $Z$ using one half of the data, and then to estimate the expected conditional covariance between this projection and $Y$ on the remaining half of the data. While the approach is general, we show that a version of our procedure using spline regression achieves what we show is the minimax optimal rate in this nonparametric testing problem. Numerical experiments demonstrate the effectiveness of our approach both in terms of maintaining Type I error control, and power, compared to several existing approaches.

translated by 谷歌翻译

A Cross Validation framework for Signal Denoising with Applications to Trend Filtering, Dyadic CART and Beyond

Anamitra Chaudhuri , Sabyasachi Chatterjee

分类： (统计)机器学习

2022-01-07

本文为信号去噪提供了一般交叉验证框架。然后将一般框架应用于非参数回归方法，例如趋势过滤和二元推车。然后显示所得到的交叉验证版本以获得最佳调谐的类似物所熟知的几乎相同的收敛速度。没有任何先前的趋势过滤或二元推车的理论分析。为了说明框架的一般性，我们还提出并研究了两个基本估算器的交叉验证版本;套索用于高维线性回归和矩阵估计的奇异值阈值阈值。我们的一般框架是由Chatterjee和Jafarov（2015）的想法的启发，并且可能适用于使用调整参数的广泛估算方法。

translated by 谷歌翻译

Debiasing In-Sample Policy Performance for Small-Data, Large-Scale Optimization

Vishal Gupta , Michael Huang , Paat Rusmevichientong

分类：机器学习 | (统计)机器学习

2021-07-26

由于在数据稀缺的设置中，交叉验证的性能不佳，我们提出了一个新颖的估计器，以估计数据驱动的优化策略的样本外部性能。我们的方法利用优化问题的灵敏度分析来估计梯度关于数据中噪声量的最佳客观值，并利用估计的梯度将策略的样本中的表现为依据。与交叉验证技术不同，我们的方法避免了为测试集牺牲数据，在训练和因此非常适合数据稀缺的设置时使用所有数据。我们证明了我们估计量的偏见和方差范围，这些问题与不确定的线性目标优化问题，但已知的，可能是非凸的，可行的区域。对于更专业的优化问题，从某种意义上说，可行区域“弱耦合”，我们证明结果更强。具体而言，我们在估算器的错误上提供明确的高概率界限，该估计器在策略类别上均匀地保持，并取决于问题的维度和策略类的复杂性。我们的边界表明，在轻度条件下，随着优化问题的尺寸的增长，我们的估计器的误差也会消失，即使可用数据的量仍然很小且恒定。说不同的是，我们证明我们的估计量在小型数据中的大规模政权中表现良好。最后，我们通过数值将我们提出的方法与最先进的方法进行比较，通过使用真实数据调度紧急医疗响应服务的案例研究。我们的方法提供了更准确的样本外部性能估计，并学习了表现更好的政策。

translated by 谷歌翻译

Modelling High-Dimensional Categorical Data Using Nonconvex Fusion Penalties

Benjamin G. Stokell , Rajen D. Shah , Ryan J. Tibshirani

分类： (统计)机器学习

2020-02-28

我们提出了一种估计具有标称分类数据的高维线性模型的方法。我们的估算器，称为范围，通过使其相应的系数完全相等来融合水平。这是通过对分类变量的系数的阶数统计之间的差异之间的差异来实现这一点，从而聚类系数。我们提供了一种算法，用于精确和有效地计算在具有潜在许多级别的单个变量的情况下的总体上的最小值的全局最小值，并且在多变量情况下在块坐标血管下降过程中使用它。我们表明，利用未知级别融合的Oracle最小二乘解决方案是具有高概率的坐标血缘的极限点，只要真正的级别具有一定的最小分离;已知这些条件在单变量案例中最小。我们展示了在一系列实际和模拟数据集中的范围的有利性能。 R包的R包Catreg实现线性模型的范围，也可以在CRAN上提供逻辑回归的版本。

translated by 谷歌翻译

Jump Interval-Learning for Individualized Decision Making

Hengrui Cai , Chengchun Shi , Rui Song , Wenbin Lu

分类：机器学习 | (统计)机器学习

2021-11-17

个性化决定规则（IDR）是一个决定函数，可根据他/她观察到的特征分配给定的治疗。文献中的大多数现有工作考虑使用二进制或有限的许多治疗方案的设置。在本文中，我们专注于连续治疗设定，并提出跳跃间隔 - 学习，开发一个最大化预期结果的个性化间隔值决定规则（I2DR）。与推荐单一治疗的IDRS不同，所提出的I2DR为每个人产生了一系列治疗方案，使其在实践中实施更加灵活。为了获得最佳I2DR，我们的跳跃间隔学习方法估计通过跳转惩罚回归给予治疗和协变量的结果的条件平均值，并基于估计的结果回归函数来衍生相应的最佳I2DR。允许回归线是用于清晰的解释或深神经网络的线性，以模拟复杂的处理 - 协调会相互作用。为了实现跳跃间隔学习，我们开发了一种基于动态编程的搜索算法，其有效计算结果回归函数。当结果回归函数是处理空间的分段或连续功能时，建立所得I2DR的统计特性。我们进一步制定了一个程序，以推断（估计）最佳政策下的平均结果。进行广泛的模拟和对华法林研究的真实数据应用，以证明所提出的I2DR的经验有效性。

translated by 谷歌翻译

On LASSO for High Dimensional Predictive Regression

Ziwei Mei , Zhentao Shi

分类： (统计)机器学习

2022-12-14

In a high dimensional linear predictive regression where the number of potential predictors can be larger than the sample size, we consider using LASSO, a popular L1-penalized regression method, to estimate the sparse coefficients when many unit root regressors are present. Consistency of LASSO relies on two building blocks: the deviation bound of the cross product of the regressors and the error term, and the restricted eigenvalue of the Gram matrix of the regressors. In our setting where unit root regressors are driven by temporal dependent non-Gaussian innovations, we establish original probabilistic bounds for these two building blocks. The bounds imply that the rates of convergence of LASSO are different from those in the familiar cross sectional case. In practical applications given a mixture of stationary and nonstationary predictors, asymptotic guarantee of LASSO is preserved if all predictors are scale-standardized. In an empirical example of forecasting the unemployment rate with many macroeconomic time series, strong performance is delivered by LASSO when the initial specification is guided by macroeconomic domain expertise.

translated by 谷歌翻译

Perturbation Analysis of Randomized SVD and its Applications to High-dimensional Statistics

Yichi Zhang , Minh Tang

分类： (统计)机器学习

2022-03-19

随机奇异值分解（RSVD）是用于计算大型数据矩阵截断的SVD的一类计算算法。给定A $ n \ times n $对称矩阵$ \ mathbf {m} $，原型RSVD算法输出通过计算$ \ mathbf {m mathbf {m} $的$ k $引导singular vectors的近似m}^{g} \ mathbf {g} $;这里$ g \ geq 1 $是一个整数，$ \ mathbf {g} \ in \ mathbb {r}^{n \ times k} $是一个随机的高斯素描矩阵。在本文中，我们研究了一般的“信号加上噪声”框架下的RSVD的统计特性，即，观察到的矩阵$ \ hat {\ mathbf {m}} $被认为是某种真实但未知的加法扰动信号矩阵$ \ mathbf {m} $。我们首先得出$ \ ell_2 $（频谱规范）和$ \ ell_ {2 \ to \ infty} $（最大行行列$ \ ell_2 $ norm）$ \ hat {\ hat {\ Mathbf {M}} $和信号矩阵$ \ Mathbf {M} $的真实单数向量。这些上限取决于信噪比（SNR）和功率迭代$ g $的数量。观察到一个相变现象，其中较小的SNR需要较大的$ g $值以保证$ \ ell_2 $和$ \ ell_ {2 \ to \ fo \ infty} $ distances的收敛。我们还表明，每当噪声矩阵满足一定的痕量生长条件时，这些相变发生的$ g $的阈值都会很清晰。最后，我们得出了近似奇异向量的行波和近似矩阵的进入波动的正常近似。我们通过将RSVD的几乎最佳性能保证在应用于三个统计推断问题的情况下，即社区检测，矩阵完成和主要的组件分析，并使用缺失的数据来说明我们的理论结果。

translated by 谷歌翻译

Tractability from overparametrization: The example of the negative perceptron

Andrea Montanari , Yiqiao Zhong , Kangjie Zhou

分类：机器学习

2021-10-28

在负面的感知问题中，我们给出了$ n $数据点$（{\ boldsymbol x} _i，y_i）$，其中$ {\ boldsymbol x} _i $是$ d $ -densional vector和$ y_i \ in \ { + 1，-1 \} $是二进制标签。数据不是线性可分离的，因此我们满足自己的内容，以找到最大的线性分类器，具有最大的\ emph {否定}余量。换句话说，我们想找到一个单位常规矢量$ {\ boldsymbol \ theta} $，最大化$ \ min_ {i \ le n} y_i \ langle {\ boldsymbol \ theta}，{\ boldsymbol x} _i \ rangle $ 。这是一个非凸优化问题（它相当于在Polytope中找到最大标准矢量），我们在两个随机模型下研究其典型属性。我们考虑比例渐近，其中$ n，d \ to \ idty $以$ n / d \ to \ delta $，并在最大边缘$ \ kappa _ {\ text {s}}（\ delta）上证明了上限和下限）$或 - 等效 - 在其逆函数$ \ delta _ {\ text {s}}（\ kappa）$。换句话说，$ \ delta _ {\ text {s}}（\ kappa）$是overparametization阈值：以$ n / d \ le \ delta _ {\ text {s}}（\ kappa） - \ varepsilon $一个分类器实现了消失的训练错误，具有高概率，而以$ n / d \ ge \ delta _ {\ text {s}}（\ kappa）+ \ varepsilon $。我们在$ \ delta _ {\ text {s}}（\ kappa）$匹配，以$ \ kappa \ to - \ idty $匹配。然后，我们分析了线性编程算法来查找解决方案，并表征相应的阈值$ \ delta _ {\ text {lin}}（\ kappa）$。我们观察插值阈值$ \ delta _ {\ text {s}}（\ kappa）$和线性编程阈值$ \ delta _ {\ text {lin {lin}}（\ kappa）$之间的差距，提出了行为的问题其他算法。

translated by 谷歌翻译

On Low-rank Trace Regression under General Sampling Distribution

Nima Hamidi , Mohsen Bayati

分类：机器学习 | (统计)机器学习

2019-04-18

In this paper, we study the trace regression when a matrix of parameters B* is estimated via the convex relaxation of a rank-regularized regression or via regularized non-convex optimization. It is known that these estimators satisfy near-optimal error bounds under assumptions on the rank, coherence, and spikiness of B*. We start by introducing a general notion of spikiness for B* that provides a generic recipe to prove the restricted strong convexity of the sampling operator of the trace regression and obtain near-optimal and non-asymptotic error bounds for the estimation error. Similar to the existing literature, these results require the regularization parameter to be above a certain theory-inspired threshold that depends on observation noise that may be unknown in practice. Next, we extend the error bounds to cases where the regularization parameter is chosen via cross-validation. This result is significant in that existing theoretical results on cross-validated estimators (Kale et al., 2011; Kumar et al., 2013; Abou-Moustafa and Szepesvari, 2017) do not apply to our setting since the estimators we study are not known to satisfy their required notion of stability. Finally, using simulations on synthetic and real data, we show that the cross-validated estimator selects a near-optimal penalty parameter and outperforms the theory-inspired approach of selecting the parameter.

translated by 谷歌翻译

Robust Batch Policy Learning in Markov Decision Processes

Zhengling Qi , Peng Liao

分类：机器学习 | (统计)机器学习

2020-11-09

我们研究马尔可夫决策过程（MDP）框架中的离线数据驱动的顺序决策问题。为了提高学习政策的概括性和适应性，我们建议通过一套关于在政策诱导的固定分配所在的分发的一套平均奖励来评估每项政策。给定由某些行为策略生成的多个轨迹的预收集数据集，我们的目标是在预先指定的策略类中学习一个强大的策略，可以最大化此集的最小值。利用半参数统计的理论，我们开发了一种统计上有效的策略学习方法，用于估算DE NED强大的最佳政策。在数据集中的总决策点方面建立了达到对数因子的速率最佳遗憾。

translated by 谷歌翻译

On lower bounds for the bias-variance trade-off

Alexis Derumigny , Johannes Schmidt-Hieber

分类： (统计)机器学习

2020-05-30

对于高维和非参数统计模型，速率最优估计器平衡平方偏差和方差是一种常见的现象。虽然这种平衡被广泛观察到，但很少知道是否存在可以避免偏差和方差之间的权衡的方法。我们提出了一般的策略，以获得对任何估计方差的下限，偏差小于预先限定的界限。这表明偏差差异折衷的程度是不可避免的，并且允许量化不服从其的方法的性能损失。该方法基于许多抽象的下限，用于涉及关于不同概率措施的预期变化以及诸如Kullback-Leibler或Chi-Sque-diversence的信息措施的变化。其中一些不平等依赖于信息矩阵的新概念。在该物品的第二部分中，将抽象的下限应用于几种统计模型，包括高斯白噪声模型，边界估计问题，高斯序列模型和高维线性回归模型。对于这些特定的统计应用，发生不同类型的偏差差异发生，其实力变化很大。对于高斯白噪声模型中集成平方偏置和集成方差之间的权衡，我们将较低界限的一般策略与减少技术相结合。这允许我们将原始问题与估计的估计器中的偏差折衷联动，以更简单的统计模型中具有额外的对称性属性。在高斯序列模型中，发生偏差差异的不同相位转换。虽然偏差和方差之间存在非平凡的相互作用，但是平方偏差的速率和方差不必平衡以实现最小估计速率。

translated by 谷歌翻译

Off-policy estimation of linear functionals: Non-asymptotic theory for semi-parametric efficiency

Wenlong Mou , Martin J. Wainwright , Peter L. Bartlett

分类： (统计)机器学习

2022-09-26

在因果推理和强盗文献中，基于观察数据的线性功能估算线性功能的问题是规范的。我们分析了首先估计治疗效果函数的广泛的两阶段程序，然后使用该数量来估计线性功能。我们证明了此类过程的均方误差上的非反应性上限：这些边界表明，为了获得非反应性最佳程序，应在特定加权$ l^2 $中最大程度地估算治疗效果的误差。 -规范。我们根据该加权规范的约束回归分析了两阶段的程序，并通过匹配非轴突局部局部最小值下限，在有限样品中建立了实例依赖性最优性。这些结果表明，除了取决于渐近效率方差之外，最佳的非质子风险除了取决于样本量支持的最富有函数类别的真实结果函数与其近似类别之间的加权规范距离。

translated by 谷歌翻译

How do noise tails impact on deep ReLU networks?

Jianqing Fan , Yihong Gu , Wen-Xin Zhou

分类： (统计)机器学习

2022-03-20

This paper investigates the stability of deep ReLU neural networks for nonparametric regression under the assumption that the noise has only a finite p-th moment. We unveil how the optimal rate of convergence depends on p, the degree of smoothness and the intrinsic dimension in a class of nonparametric regression functions with hierarchical composition structure when both the adaptive Huber loss and deep ReLU neural networks are used. This optimal rate of convergence cannot be obtained by the ordinary least squares but can be achieved by the Huber loss with a properly chosen parameter that adapts to the sample size, smoothness, and moment parameters. A concentration inequality for the adaptive Huber ReLU neural network estimators with allowable optimization errors is also derived. To establish a matching lower bound within the class of neural network estimators using the Huber loss, we employ a different strategy from the traditional route: constructing a deep ReLU network estimator that has a better empirical loss than the true function and the difference between these two functions furnishes a low bound. This step is related to the Huberization bias, yet more critically to the approximability of deep ReLU networks. As a result, we also contribute some new results on the approximation theory of deep ReLU neural networks.

translated by 谷歌翻译

A Non-Asymptotic Framework for Approximate Message Passing in Spiked Models

Gen Li , Yuting Wei

分类：机器学习 | (统计)机器学习

2022-08-05

近似消息传递（AMP）是解决高维统计问题的有效迭代范式。但是，当迭代次数超过$ o \ big（\ frac {\ log n} {\ log log \ log \ log n} \时big）$（带有$ n $问题维度）。为了解决这一不足，本文开发了一个非吸附框架，用于理解峰值矩阵估计中的AMP。基于AMP更新的新分解和可控的残差项，我们布置了一个分析配方，以表征在存在独立初始化的情况下AMP的有限样本行为，该过程被进一步概括以进行光谱初始化。作为提出的分析配方的两个具体后果：（i）求解$ \ mathbb {z} _2 $同步时，我们预测了频谱初始化AMP的行为，最高为$ o \ big（\ frac {n} {\ mathrm {\ mathrm { poly} \ log n} \ big）$迭代，表明该算法成功而无需随后的细化阶段（如最近由\ citet {celentano2021local}推测）; （ii）我们表征了稀疏PCA中AMP的非反应性行为（在尖刺的Wigner模型中），以广泛的信噪比。

translated by 谷歌翻译

Dimension-agnostic inference using cross U-statistics

Ilmun Kim , Aaditya Ramdas

分类： (统计)机器学习

2020-11-10

Classical asymptotic theory for statistical inference usually involves calibrating a statistic by fixing the dimension $d$ while letting the sample size $n$ increase to infinity. Recently, much effort has been dedicated towards understanding how these methods behave in high-dimensional settings, where $d$ and $n$ both increase to infinity together. This often leads to different inference procedures, depending on the assumptions about the dimensionality, leaving the practitioner in a bind: given a dataset with 100 samples in 20 dimensions, should they calibrate by assuming $n \gg d$, or $d/n \approx 0.2$? This paper considers the goal of dimension-agnostic inference; developing methods whose validity does not depend on any assumption on $d$ versus $n$. We introduce an approach that uses variational representations of existing test statistics along with sample splitting and self-normalization to produce a new test statistic with a Gaussian limiting distribution, regardless of how $d$ scales with $n$. The resulting statistic can be viewed as a careful modification of degenerate U-statistics, dropping diagonal blocks and retaining off-diagonal blocks. We exemplify our technique for some classical problems including one-sample mean and covariance testing, and show that our tests have minimax rate-optimal power against appropriate local alternatives. In most settings, our cross U-statistic matches the high-dimensional power of the corresponding (degenerate) U-statistic up to a $\sqrt{2}$ factor.

translated by 谷歌翻译

High-dimensional Inference for Dynamic Treatment Effects

Jelena Bradic , Weijie Ji , Yuqian Zhang

分类：机器学习 | (统计)机器学习

2021-10-10

本文提出了在多阶段实验的背景下的异质治疗效应的置信区间结构，以$ N $样品和高维，$ D $，混淆。我们的重点是$ d \ gg n $的情况，但获得的结果也适用于低维病例。我们展示了正则化估计的偏差，在高维变焦空间中不可避免，具有简单的双重稳固分数。通过这种方式，不需要额外的偏差，并且我们获得root $ N $推理结果，同时允许治疗和协变量的多级相互依赖性。记忆财产也没有假设;治疗可能取决于所有先前的治疗作业以及以前的所有多阶段混淆。我们的结果依赖于潜在依赖的某些稀疏假设。我们发现具有动态处理的强大推理所需的新产品率条件。

translated by 谷歌翻译

Nearly Optimal Latent State Decoding in Block MDPs

Yassir Jedra , Junghyun Lee , Alexandre Proutière , Se-Young Yun

分类：机器学习 | (统计)机器学习

2022-08-17

我们研究了情节块MDP中模型估计和无奖励学习的问题。在这些MDP中，决策者可以访问少数潜在状态产生的丰富观察或上下文。我们首先对基于固定行为策略生成的数据估算潜在状态解码功能（从观测到潜在状态的映射）感兴趣。我们在估计此功能的错误率上得出了信息理论的下限，并提出了接近此基本限制的算法。反过来，我们的算法还提供了MDP的所有组件的估计值。然后，我们研究在无奖励框架中学习近乎最佳政策的问题。根据我们有效的模型估计算法，我们表明我们可以以最佳的速度推断出策略（随着收集样品的数量增长大）的最佳策略。有趣的是，我们的分析提供了必要和充分的条件，在这些条件下，利用块结构可以改善样本复杂性，以识别近乎最佳的策略。当满足这些条件时，Minimax无奖励设置中的样本复杂性将通过乘法因子$ n $提高，其中$ n $是可能的上下文数量。

translated by 谷歌翻译

The Lasso with general Gaussian designs with applications to hypothesis testing

Michael Celentano , Andrea Montanari , Yuting Wei

分类：机器学习 | (统计)机器学习

2020-07-27

套索是一种高维回归的方法，当时，当协变量$ p $的订单数量或大于观测值$ n $时，通常使用它。由于两个基本原因，经典的渐近态性理论不适用于该模型：$（1）$正规风险是非平滑的； $（2）$估算器$ \ wideHat {\ boldsymbol {\ theta}} $与true参数vector $ \ boldsymbol {\ theta}^*$无法忽略。结果，标准的扰动论点是渐近正态性的传统基础。另一方面，套索估计器可以精确地以$ n $和$ p $大，$ n/p $的订单为一。这种表征首先是在使用I.I.D的高斯设计的情况下获得的。协变量：在这里，我们将其推广到具有非偏差协方差结构的高斯相关设计。这是根据更简单的``固定设计''模型表示的。我们在两个模型中各种数量的分布之间的距离上建立了非反应界限，它们在合适的稀疏类别中均匀地固定在信号上$ \ boldsymbol {\ theta}^*$。作为应用程序，我们研究了借助拉索的分布，并表明需要校正程度对于计算有效的置信区间是必要的。

translated by 谷歌翻译

Error analysis for denoising smooth modulo signals on a graph

Hemant Tyagi

分类： (统计)机器学习

2020-09-10

在许多应用中，我们获得了流畅的函数的嘈杂模态样本的访问，其目标是鲁棒地解开样本，即估计该功能的原始样本。在最近的工作中，Cucuringu和Tyagi通过首先将它们代表在单元复杂圆上，然后解决平滑度规则化最小二乘问题 - Laplacian的平滑度适用的Proximity Graph的平滑度$ G $ - ON单位圆的产品歧管。这个问题是二次受约束的二次程序（QCQP），其是非凸显的，因此提出解决其球形放松导致信任区域子问题（TRS）。就理论担保而言，派生$ \ ell_2 $错误界限（trs）。然而，这些界限通常弱，并且没有真正证明由（TRS）进行的去噪。在这项工作中，我们分析（TRS）以及（QCQP）的不受约束的放松。对于这些估算器，我们在高斯噪声的设置中提供了一种精致的分析，并导出了噪音制度，其中他们可否证明模数观察W.R.T $ \ ell_2 $常规。分析在$ G $是任何连接的图形中的常规设置中进行。

translated by 谷歌翻译