智能论文笔记

Feature Selection using e-values

Subhabrata Majumdar , Snigdhansu Chatterjee

分类： (统计)机器学习 | 机器学习

2022-06-11

在监督参数模型的背景下，我们介绍了电子价值的概念。电子价值是标量数量，代表了以在所有功能（即完整模型）训练的模型的子集中训练的模型中训练的模型中参数估计值的接近性。在一般条件下，电子价值的等级排序将包含所有基本特征的模型与不具有的模型分开。电子价值适用于广泛的参数模型。我们使用数据深度和基于快速重采样的算法来使用电子价值实现特征选择过程，从而提供一致性结果。对于$ p $维的功能空间，与传统的拟合和评估$ 2^p $型号相反，此过程仅适用完整型号并评估$ P+1 $型号。通过在几个模型设置以及合成和真实数据集的实验中，我们确定电子价值方法是现有特定于特定模型特征选择方法的有希望的一般替代方法。

translated by 谷歌翻译

The Lasso with general Gaussian designs with applications to hypothesis testing

Michael Celentano , Andrea Montanari , Yuting Wei

分类：机器学习 | (统计)机器学习

2020-07-27

套索是一种高维回归的方法，当时，当协变量$ p $的订单数量或大于观测值$ n $时，通常使用它。由于两个基本原因，经典的渐近态性理论不适用于该模型：$（1）$正规风险是非平滑的； $（2）$估算器$ \ wideHat {\ boldsymbol {\ theta}} $与true参数vector $ \ boldsymbol {\ theta}^*$无法忽略。结果，标准的扰动论点是渐近正态性的传统基础。另一方面，套索估计器可以精确地以$ n $和$ p $大，$ n/p $的订单为一。这种表征首先是在使用I.I.D的高斯设计的情况下获得的。协变量：在这里，我们将其推广到具有非偏差协方差结构的高斯相关设计。这是根据更简单的``固定设计''模型表示的。我们在两个模型中各种数量的分布之间的距离上建立了非反应界限，它们在合适的稀疏类别中均匀地固定在信号上$ \ boldsymbol {\ theta}^*$。作为应用程序，我们研究了借助拉索的分布，并表明需要校正程度对于计算有效的置信区间是必要的。

translated by 谷歌翻译

Supervised Multivariate Learning with Simultaneous Feature Auto-grouping and Dimension Reduction

Yiyuan She , Jiahui Shen , Chao Zhang

分类： (统计)机器学习 | 机器学习

2021-12-17

现代高维方法经常采用“休稀稀物”的原则，而在监督多元学习统计学中可能面临着大量非零系数的“密集”问题。本文提出了一种新的聚类减少秩（CRL）框架，其施加了两个联合矩阵规范化，以自动分组构建预测因素的特征。 CRL比低级别建模更具可解释，并放松变量选择中的严格稀疏假设。在本文中，提出了新的信息 - 理论限制，揭示了寻求集群的内在成本，以及多元学习中的维度的祝福。此外，开发了一种有效的优化算法，其执行子空间学习和具有保证融合的聚类。所获得的定点估计器虽然不一定是全局最佳的，但在某些规则条件下享有超出标准似然设置的所需的统计准确性。此外，提出了一种新的信息标准，以及其无垢形式，用于集群和秩选择，并且具有严格的理论支持，而不假设无限的样本大小。广泛的模拟和实数据实验证明了所提出的方法的统计准确性和可解释性。

translated by 谷歌翻译

Modelling High-Dimensional Categorical Data Using Nonconvex Fusion Penalties

Benjamin G. Stokell , Rajen D. Shah , Ryan J. Tibshirani

分类： (统计)机器学习

2020-02-28

我们提出了一种估计具有标称分类数据的高维线性模型的方法。我们的估算器，称为范围，通过使其相应的系数完全相等来融合水平。这是通过对分类变量的系数的阶数统计之间的差异之间的差异来实现这一点，从而聚类系数。我们提供了一种算法，用于精确和有效地计算在具有潜在许多级别的单个变量的情况下的总体上的最小值的全局最小值，并且在多变量情况下在块坐标血管下降过程中使用它。我们表明，利用未知级别融合的Oracle最小二乘解决方案是具有高概率的坐标血缘的极限点，只要真正的级别具有一定的最小分离;已知这些条件在单变量案例中最小。我们展示了在一系列实际和模拟数据集中的范围的有利性能。 R包的R包Catreg实现线性模型的范围，也可以在CRAN上提供逻辑回归的版本。

translated by 谷歌翻译

Data-Driven Sample Average Approximation with Covariate Information

Rohit Kannan , Güzin Bayraksan , James R. Luedtke

分类： (统计)机器学习

2022-07-27

当我们对优化模型中的不确定参数进行观察以及对协变量的同时观察时，我们研究了数据驱动决策的优化。鉴于新的协变量观察，目标是选择一个决定以此观察为条件的预期成本的决定。我们研究了三个数据驱动的框架，这些框架将机器学习预测模型集成在随机编程样本平均值近似（SAA）中，以近似解决该问题的解决方案。 SAA框架中的两个是新的，并使用了场景生成的剩余预测模型的样本外残差。我们研究的框架是灵活的，并且可以容纳参数，非参数和半参数回归技术。我们在数据生成过程，预测模型和随机程序中得出条件，在这些程序下，这些数据驱动的SaaS的解决方案是一致且渐近最佳的，并且还得出了收敛速率和有限的样本保证。计算实验验证了我们的理论结果，证明了我们数据驱动的公式比现有方法的潜在优势（即使预测模型被误解了），并说明了我们在有限的数据制度中新的数据驱动配方的好处。

translated by 谷歌翻译

The Projected Covariance Measure for assumption-lean variable significance testing

Anton Rask Lundborg , Ilmun Kim , Rajen D. Shah , Richard J. Samworth

分类： (统计)机器学习

2022-11-03

Testing the significance of a variable or group of variables $X$ for predicting a response $Y$, given additional covariates $Z$, is a ubiquitous task in statistics. A simple but common approach is to specify a linear model, and then test whether the regression coefficient for $X$ is non-zero. However, when the model is misspecified, the test may have poor power, for example when $X$ is involved in complex interactions, or lead to many false rejections. In this work we study the problem of testing the model-free null of conditional mean independence, i.e. that the conditional mean of $Y$ given $X$ and $Z$ does not depend on $X$. We propose a simple and general framework that can leverage flexible nonparametric or machine learning methods, such as additive models or random forests, to yield both robust error control and high power. The procedure involves using these methods to perform regressions, first to estimate a form of projection of $Y$ on $X$ and $Z$ using one half of the data, and then to estimate the expected conditional covariance between this projection and $Y$ on the remaining half of the data. While the approach is general, we show that a version of our procedure using spline regression achieves what we show is the minimax optimal rate in this nonparametric testing problem. Numerical experiments demonstrate the effectiveness of our approach both in terms of maintaining Type I error control, and power, compared to several existing approaches.

translated by 谷歌翻译

Best Subset Selection in Reduced Rank Regression

Canhong Wen , Ruipeng Dong , Xueqin Wang , Weiyu Li , Heping Zhang

分类：机器学习

2022-11-29

Sparse reduced rank regression is an essential statistical learning method. In the contemporary literature, estimation is typically formulated as a nonconvex optimization that often yields to a local optimum in numerical computation. Yet, their theoretical analysis is always centered on the global optimum, resulting in a discrepancy between the statistical guarantee and the numerical computation. In this research, we offer a new algorithm to address the problem and establish an almost optimal rate for the algorithmic solution. We also demonstrate that the algorithm achieves the estimation with a polynomial number of iterations. In addition, we present a generalized information criterion to simultaneously ensure the consistency of support set recovery and rank estimation. Under the proposed criterion, we show that our algorithm can achieve the oracle reduced rank estimation with a significant probability. The numerical studies and an application in the ovarian cancer genetic data demonstrate the effectiveness and scalability of our approach.

translated by 谷歌翻译

Sparse Generalized Yule-Walker Estimation for Large Spatio-temporal Autoregressions with an Application to NO2 Satellite Data

Hanno Reuvers , Etienne Wijler

分类： (统计)机器学习

2021-08-05

我们考虑一个高维模型，其中观察到时间和空间的变量。该模型由包含时间滞后的时空回归和因变量的空间滞后组成。与古典空间自回归模型不同，我们不依赖于预定的空间交互矩阵，但从数据中推断所有空间交互。假设稀疏性，我们通过惩罚一组Yule-Walker方程来估计完全数据驱动的空间和时间依赖。这种正则化可以留下非结构化，但我们还提出了当观察结果源自空间网格（例如卫星图像）时定制的收缩程序。推导有限的样本误差界限，并且在渐近框架中建立估计一致性，其中样本大小和空间单元的数量共同偏离。外源性变量也可以包括在内。与竞争程序相比，仿真练习表现出强大的有限样本性能。作为一个实证应用，我们模型卫星测量了伦敦的No2浓度。我们的方法通过竞争力的基准提供预测，我们发现了强烈的空间互动的证据。

translated by 谷歌翻译

On LASSO for High Dimensional Predictive Regression

Ziwei Mei , Zhentao Shi

分类： (统计)机器学习

2022-12-14

In a high dimensional linear predictive regression where the number of potential predictors can be larger than the sample size, we consider using LASSO, a popular L1-penalized regression method, to estimate the sparse coefficients when many unit root regressors are present. Consistency of LASSO relies on two building blocks: the deviation bound of the cross product of the regressors and the error term, and the restricted eigenvalue of the Gram matrix of the regressors. In our setting where unit root regressors are driven by temporal dependent non-Gaussian innovations, we establish original probabilistic bounds for these two building blocks. The bounds imply that the rates of convergence of LASSO are different from those in the familiar cross sectional case. In practical applications given a mixture of stationary and nonstationary predictors, asymptotic guarantee of LASSO is preserved if all predictors are scale-standardized. In an empirical example of forecasting the unemployment rate with many macroeconomic time series, strong performance is delivered by LASSO when the initial specification is guided by macroeconomic domain expertise.

translated by 谷歌翻译

Supervised Homogeneity Fusion: a Combinatorial Approach

Wen Wang , Shihao Wu , Ziwei Zhu , Ling Zhou , Peter X. -K. Song

分类： (统计)机器学习 | 机器学习

2022-01-04

将回归系数融合到均匀组中可以揭示在每个组内共享共同值的系数。这种扩展均匀性降低了参数空间的内在尺寸，并释放统计学精度。我们提出并调查了一个名为$ l_0 $ -fusion的新的组合分组方法，这些方法可用于混合整数优化（MIO）。在统计方面，我们识别称为分组灵敏度的基本量，该基本量为恢复真实组的难度。我们展示$ l_0 $ -fusion在分组灵敏度的最弱需求下实现了分组一致性：如果违反了这一要求，则小组拼写的最低风险将无法收敛到零。此外，我们展示了在高维制度中，可以使用无需任何必要的统计效率损失的确保筛选特征，同时降低计算成本的校正特征耦合耦合的$ L_0 $ -Fusion。在算法方面，我们为$ l_0 $ -fusion提供了一个mio配方，以及温暖的开始策略。仿真和实际数据分析表明，在分组准确性方面，$ L_0 $ -FUSUS展示其竞争对手的优势。

translated by 谷歌翻译

Retire: Robust Expectile Regression in High Dimensions

Rebeka Man , Kean Ming Tan , Zian Wang , Wen-Xin Zhou

分类： (统计)机器学习

2022-12-11

High-dimensional data can often display heterogeneity due to heteroscedastic variance or inhomogeneous covariate effects. Penalized quantile and expectile regression methods offer useful tools to detect heteroscedasticity in high-dimensional data. The former is computationally challenging due to the non-smooth nature of the check loss, and the latter is sensitive to heavy-tailed error distributions. In this paper, we propose and study (penalized) robust expectile regression (retire), with a focus on iteratively reweighted $\ell_1$-penalization which reduces the estimation bias from $\ell_1$-penalization and leads to oracle properties. Theoretically, we establish the statistical properties of the retire estimator under two regimes: (i) low-dimensional regime in which $d \ll n$; (ii) high-dimensional regime in which $s\ll n\ll d$ with $s$ denoting the number of significant predictors. In the high-dimensional setting, we carefully characterize the solution path of the iteratively reweighted $\ell_1$-penalized retire estimation, adapted from the local linear approximation algorithm for folded-concave regularization. Under a mild minimum signal strength condition, we show that after as many as $\log(\log d)$ iterations the final iterate enjoys the oracle convergence rate. At each iteration, the weighted $\ell_1$-penalized convex program can be efficiently solved by a semismooth Newton coordinate descent algorithm. Numerical studies demonstrate the competitive performance of the proposed procedure compared with either non-robust or quantile regression based alternatives.

translated by 谷歌翻译

Dimension-agnostic inference using cross U-statistics

Ilmun Kim , Aaditya Ramdas

分类： (统计)机器学习

2020-11-10

Classical asymptotic theory for statistical inference usually involves calibrating a statistic by fixing the dimension $d$ while letting the sample size $n$ increase to infinity. Recently, much effort has been dedicated towards understanding how these methods behave in high-dimensional settings, where $d$ and $n$ both increase to infinity together. This often leads to different inference procedures, depending on the assumptions about the dimensionality, leaving the practitioner in a bind: given a dataset with 100 samples in 20 dimensions, should they calibrate by assuming $n \gg d$, or $d/n \approx 0.2$? This paper considers the goal of dimension-agnostic inference; developing methods whose validity does not depend on any assumption on $d$ versus $n$. We introduce an approach that uses variational representations of existing test statistics along with sample splitting and self-normalization to produce a new test statistic with a Gaussian limiting distribution, regardless of how $d$ scales with $n$. The resulting statistic can be viewed as a careful modification of degenerate U-statistics, dropping diagonal blocks and retaining off-diagonal blocks. We exemplify our technique for some classical problems including one-sample mean and covariance testing, and show that our tests have minimax rate-optimal power against appropriate local alternatives. In most settings, our cross U-statistic matches the high-dimensional power of the corresponding (degenerate) U-statistic up to a $\sqrt{2}$ factor.

translated by 谷歌翻译

Understanding Implicit Regularization in Over-Parameterized Single Index Model

Jianqing Fan , Zhuoran Yang , Mengxin Yu

分类： (统计)机器学习 | 机器学习

2020-07-16

在本文中，我们利用过度参数化来设计高维单索索引模型的无规矩算法，并为诱导的隐式正则化现象提供理论保证。具体而言，我们研究了链路功能是非线性且未知的矢量和矩阵单索引模型，信号参数是稀疏向量或低秩对称矩阵，并且响应变量可以是重尾的。为了更好地理解隐含正规化的角色而没有过度的技术性，我们假设协变量的分布是先验的。对于载体和矩阵设置，我们通过采用分数函数变换和专为重尾数据的强大截断步骤来构造过度参数化最小二乘损耗功能。我们建议通过将无规则化的梯度下降应用于损耗函数来估计真实参数。当初始化接近原点并且步骤中足够小时，我们证明了所获得的解决方案在载体和矩阵案件中实现了最小的收敛统计速率。此外，我们的实验结果支持我们的理论调查结果，并表明我们的方法在$ \ ell_2 $ -staticatisticated率和变量选择一致性方面具有明确的正则化的经验卓越。

translated by 谷歌翻译

Approximate Post-Selective Inference for Regression with the Group LASSO

Snigdha Panigrahi , Peter W. MacDonald , Daniel Kessler

分类： (统计)机器学习

2020-12-31

在选择组套索（或普遍的变体，例如重叠，稀疏或标准化的组套索）之后，在没有选择偏见的调整的情况下，对所选参数的推断是不可靠的。在受惩罚的高斯回归设置中，现有方法为选择事件提供了调整，这些事件可以表示为数据变量中的线性不平等。然而，这种表示未能与组套索一起选择，并实质上阻碍了随后的选择后推断的范围。推论兴趣的关键问题 - 例如，推断选定变量对结果的影响 - 仍未得到解答。在本文中，我们开发了一种一致的，选择性的贝叶斯方法，通过得出似然调整因子和近似值来解决现有差距，从而消除了组中的偏见。对模拟数据和人类Connectome项目数据的实验表明，我们的方法恢复了所选组中参数的影响，同时仅支付较小的偏差调整价格。

translated by 谷歌翻译

Estimation and Inference on Heterogeneous Treatment Effects in High-Dimensional Dynamic Panels under Weak Dependence

Vira Semenova , Matt Goldman , Victor Chernozhukov , Matt Taddy

分类： (统计)机器学习

2017-12-28

This paper provides estimation and inference methods for a conditional average treatment effects (CATE) characterized by a high-dimensional parameter in both homogeneous cross-sectional and unit-heterogeneous dynamic panel data settings. In our leading example, we model CATE by interacting the base treatment variable with explanatory variables. The first step of our procedure is orthogonalization, where we partial out the controls and unit effects from the outcome and the base treatment and take the cross-fitted residuals. This step uses a novel generic cross-fitting method we design for weakly dependent time series and panel data. This method "leaves out the neighbors" when fitting nuisance components, and we theoretically power it by using Strassen's coupling. As a result, we can rely on any modern machine learning method in the first step, provided it learns the residuals well enough. Second, we construct an orthogonal (or residual) learner of CATE -- the Lasso CATE -- that regresses the outcome residual on the vector of interactions of the residualized treatment with explanatory variables. If the complexity of CATE function is simpler than that of the first-stage regression, the orthogonal learner converges faster than the single-stage regression-based learner. Third, we perform simultaneous inference on parameters of the CATE function using debiasing. We also can use ordinary least squares in the last two steps when CATE is low-dimensional. In heterogeneous panel data settings, we model the unobserved unit heterogeneity as a weakly sparse deviation from Mundlak (1978)'s model of correlated unit effects as a linear function of time-invariant covariates and make use of L1-penalization to estimate these models. We demonstrate our methods by estimating price elasticities of groceries based on scanner data. We note that our results are new even for the cross-sectional (i.i.d) case.

translated by 谷歌翻译

Group structure estimation for panel data -- a general approach

Lu Yu , Jiaying Gu , Stanislav Volgushev

分类： (统计)机器学习

2022-01-05

考虑一个面板数据设置，其中可获得对个人的重复观察。通常可以合理地假设存在共享观察特征的类似效果的个体组，但是分组通常提前未知。我们提出了一种新颖的方法来估计普通面板数据模型的这种未观察到的分组。我们的方法明确地估计各个参数估计中的不确定性，并且在每个人上具有大量的个体和/或重复测量的计算可行。即使在单个数据不可用的情况下，也可以应用开发的想法，并且仅向研究人员提供参数估计与某种量化的不确定性。

translated by 谷歌翻译

Optimal Nonparametric Inference with Two-Scale Distributional Nearest Neighbors

Emre Demirkaya , Yingying Fan , Lan Gao , Jinchi Lv , Patrick Vossler , Jingbo Wang

分类： (统计)机器学习 | 机器学习

2018-08-25

加权最近的邻居（WNN）估计量通常用作平均回归估计的灵活且易于实现的非参数工具。袋装技术是一种优雅的方式，可以自动生成最近邻居的重量的WNN估计器；我们将最终的估计量命名为分布最近的邻居（DNN），以便于参考。然而，这种估计器缺乏分布结果，从而将其应用于统计推断。此外，当平均回归函数具有高阶平滑度时，DNN无法达到最佳的非参数收敛率，这主要是由于偏差问题。在这项工作中，我们对DNN提供了深入的技术分析，我们建议通过线性将两个DNN估计量与不同的子采样量表进行线性相结合，从而提出了DNN估计量的偏差方法，从而导致新型的两尺度DNN（TDNN（TDNN））估计器。两尺度的DNN估计量具有等效的WNN表示，重量承认明确形式，有些则是负面的。我们证明，由于使用负权重，两尺度DNN估计器在四阶平滑度条件下估算回归函数时享有最佳的非参数收敛速率。我们进一步超出了估计，并确定DNN和两个规模的DNN均无渐进地正常，因为亚次采样量表和样本量差异到无穷大。对于实际实施，我们还使用二尺度DNN的Jacknife和Bootstrap技术提供方差估计器和分配估计器。可以利用这些估计器来构建有效的置信区间，以用于回归函数的非参数推断。建议的两尺度DNN方法的理论结果和吸引人的有限样本性能用几个数值示例说明了。

translated by 谷歌翻译

A Splicing Approach to Best Subset of Groups Selection

Yanhang Zhang , Junxian Zhu , Jin Zhu , Xueqin Wang

分类：机器学习 | (统计)机器学习

2021-04-23

组选择的最佳子集（BSG）是选择一小部分非重叠组以在响应变量上获得最佳解释性的过程。它吸引了越来越多的关注，并且在实践中具有深远的应用。但是，由于BSG在高维环境中的计算棘手性，开发用于解决BSGS的有效算法仍然是研究热点。在本文中，我们提出了一种划分的算法，该算法迭代地检测相关组并排除了无关的组。此外，再加上新的组信息标准，我们开发了一种自适应算法来确定最佳模型大小。在轻度条件下，我们的算法可以在多项式时间内以高概率确定组的最佳子集是可以证明的。最后，我们通过将它们与合成数据集和现实世界中的几种最新算法进行比较来证明我们的方法的效率和准确性。

translated by 谷歌翻译

Uniform Convergence of Interpolators: Gaussian Width, Norm Bounds, and Benign Overfitting

Frederic Koehler , Lijia Zhou , Danica J. Sutherland , Nathan Srebro

分类： (统计)机器学习 | 机器学习

2021-06-17

我们考虑与高斯数据的高维线性回归中的插值学习，并在类高斯宽度方面证明了任意假设类别中的内插器的泛化误差。将通用绑定到欧几里德常规球恢复了Bartlett等人的一致性结果。（2020）对于最小规范内插器，并确认周等人的预测。（2020）在高斯数据的特殊情况下，对于近乎最小常态的内插器。我们通过将其应用于单位来证明所界限的一般性，从而获得最小L1-NORM Interpoolator（基础追踪）的新型一致性结果。我们的结果表明，基于规范的泛化界限如何解释并用于分析良性过度装备，至少在某些设置中。

translated by 谷歌翻译

Optimistic Rates: A Unifying Theory for Interpolation Learning and Regularization in Linear Regression

Lijia Zhou , Frederic Koehler , Danica J. Sutherland , Nathan Srebro

分类： (统计)机器学习 | 机器学习

2021-12-08

我们研究了称为“乐观速率”（Panchenko 2002; Srebro等，2010）的统一收敛概念，用于与高斯数据的线性回归。我们的精致分析避免了现有结果中的隐藏常量和对数因子，这已知在高维设置中至关重要，特别是用于了解插值学习。作为一个特殊情况，我们的分析恢复了Koehler等人的保证。（2021年），在良性过度的过度条件下，严格地表征了低规范内插器的人口风险。但是，我们的乐观速度绑定还分析了具有任意训练错误的预测因子。这使我们能够在随机设计下恢复脊和套索回归的一些经典统计保障，并有助于我们在过度参数化制度中获得精确了解近端器的过度风险。

translated by 谷歌翻译