智能论文笔记

Variable selection via nonconcave penalized likelihood and its oracle properties

分类：

Your use of the JSTOR archive indicates your acceptance of JSTOR's Terms and Conditions of Use, available at http://www.jstor.org/page/info/about/policies/terms.jsp. JSTOR's Terms and Conditions of Use provides, in part, that unless you have obtained prior permission, you may not download an entire issue of a journal or multiple copies of articles, and you may use content in the JSTOR archive only for your personal, non-commercial use.Please contact the publisher regarding any further use of this work. Publisher contact information may be obtained at http://www.jstor.org/action/showPublisher?publisherCode=astata.Each copy of any part of a JSTOR transmission must contain the same copyright notice that appears on the screen or printed page of such transmission. JSTOR is a not-for-profit service that helps scholars, researchers, and students discover, use, and build upon a wide range of content in a trusted digital archive. We use information technology and tools to increase productivity and facilitate new forms of scholarship. For more information about JSTOR, please contact

translated by 谷歌翻译

On the instrumental variable estimation with many weak and invalid instruments

Yiqi Lin , Frank Windmeijer , Xinyuan Song , Qingliang Fan

分类： (统计)机器学习

2022-07-07

我们讨论了具有未知IV有效性的线性仪器变量（IV）模型中识别的基本问题。我们重新审视了流行的多数和多元化规则，并表明通常没有识别条件是“且仅在总体上”。假设“最稀少的规则”，该规则等同于多数规则，但在计算算法中变得运作，我们研究并证明了基于两步选择的其他IV估计器的非convex惩罚方法的优势，就两步选择而言选择一致性和单独弱IV的适应性。此外，我们提出了一种与识别条件保持一致的替代较低的惩罚，并同时提供甲骨文稀疏结构。与先前的文献相比，针对静脉强度较弱的估计仪得出了理想的理论特性。使用模拟证明了有限样本特性，并且选择和估计方法应用于有关贸易对经济增长的影响的经验研究。

translated by 谷歌翻译

Sparse Horseshoe Estimation via Expectation-Maximisation

Shu Yu Tew , Daniel F. Schmidt , Enes Makalic

分类： (统计)机器学习 | 机器学习

2022-11-07

The horseshoe prior is known to possess many desirable properties for Bayesian estimation of sparse parameter vectors, yet its density function lacks an analytic form. As such, it is challenging to find a closed-form solution for the posterior mode. Conventional horseshoe estimators use the posterior mean to estimate the parameters, but these estimates are not sparse. We propose a novel expectation-maximisation (EM) procedure for computing the MAP estimates of the parameters in the case of the standard linear model. A particular strength of our approach is that the M-step depends only on the form of the prior and it is independent of the form of the likelihood. We introduce several simple modifications of this EM procedure that allow for straightforward extension to generalised linear models. In experiments performed on simulated and real data, our approach performs comparable, or superior to, state-of-the-art sparse estimation methods in terms of statistical performance and computational cost.

translated by 谷歌翻译

Subset Selection with Shrinkage: Sparse Linear Modeling when the SNR is low

Rahul Mazumder , Peter Radchenko , Antoine Dedieu

分类： (统计)机器学习

2017-08-10

在稀疏线性建模 - 最佳子集选择中，研究了一个看似意外的，相对不太理解的基本工具的过度选择，这最小化了对非零系数的约束的限制的剩余平方和。虽然当信噪比（SNR）高时，最佳子集选择过程通常被视为稀疏学习中的“黄金标准”，但是当SNR低时，其预测性能会恶化。特别是，它通过连续收缩方法而言，例如脊回归和套索。我们研究了高噪声制度中最佳子集选择的行为，并提出了一种基于最小二乘标准的正则化版本的替代方法。我们提出的估算员（a）在很大程度上减轻了高噪声制度的最佳次集选择的可预测性能差。（b）相对于通过脊回归和套索的最佳预测模型，通常递送大幅稀疏模型的同时表现出有利的。我们对所提出的方法的预测性质进行广泛的理论分析，并在噪声水平高时提供相对于最佳子集选择的优越预测性能的理由。我们的估算器可以表达为混合整数二阶圆锥优化问题的解决方案，因此，来自数学优化的现代计算工具可供使用。

translated by 谷歌翻译

Mining the Factor Zoo: Estimation of Latent Factor Models with Sufficient Proxies

Runzhe Wan , Yingying Li , Wenbin Lu , Rui Song

分类：机器学习

2022-12-25

Latent factor model estimation typically relies on either using domain knowledge to manually pick several observed covariates as factor proxies, or purely conducting multivariate analysis such as principal component analysis. However, the former approach may suffer from the bias while the latter can not incorporate additional information. We propose to bridge these two approaches while allowing the number of factor proxies to diverge, and hence make the latent factor model estimation robust, flexible, and statistically more accurate. As a bonus, the number of factors is also allowed to grow. At the heart of our method is a penalized reduced rank regression to combine information. To further deal with heavy-tailed data, a computationally attractive penalized robust reduced rank regression method is proposed. We establish faster rates of convergence compared with the benchmark. Extensive simulations and real examples are used to illustrate the advantages.

translated by 谷歌翻译

Retire: Robust Expectile Regression in High Dimensions

Rebeka Man , Kean Ming Tan , Zian Wang , Wen-Xin Zhou

分类： (统计)机器学习

2022-12-11

High-dimensional data can often display heterogeneity due to heteroscedastic variance or inhomogeneous covariate effects. Penalized quantile and expectile regression methods offer useful tools to detect heteroscedasticity in high-dimensional data. The former is computationally challenging due to the non-smooth nature of the check loss, and the latter is sensitive to heavy-tailed error distributions. In this paper, we propose and study (penalized) robust expectile regression (retire), with a focus on iteratively reweighted $\ell_1$-penalization which reduces the estimation bias from $\ell_1$-penalization and leads to oracle properties. Theoretically, we establish the statistical properties of the retire estimator under two regimes: (i) low-dimensional regime in which $d \ll n$; (ii) high-dimensional regime in which $s\ll n\ll d$ with $s$ denoting the number of significant predictors. In the high-dimensional setting, we carefully characterize the solution path of the iteratively reweighted $\ell_1$-penalized retire estimation, adapted from the local linear approximation algorithm for folded-concave regularization. Under a mild minimum signal strength condition, we show that after as many as $\log(\log d)$ iterations the final iterate enjoys the oracle convergence rate. At each iteration, the weighted $\ell_1$-penalized convex program can be efficiently solved by a semismooth Newton coordinate descent algorithm. Numerical studies demonstrate the competitive performance of the proposed procedure compared with either non-robust or quantile regression based alternatives.

translated by 谷歌翻译

Distribution-Free Predictive Inference For Regression

Jing Lei , Max G'Sell , Alessandro Rinaldo , Ryan J. Tibshirani , Larry Wasserman

分类：

2016-04-14

We develop a general framework for distribution-free predictive inference in regression, using conformal inference. The proposed methodology allows for the construction of a prediction band for the response variable using any estimator of the regression function. The resulting prediction band preserves the consistency properties of the original estimator under standard assumptions, while guaranteeing finite-sample marginal coverage even when these assumptions do not hold. We analyze and compare, both empirically and theoretically, the two major variants of our conformal framework: full conformal inference and split conformal inference, along with a related jackknife method. These methods offer different tradeoffs between statistical accuracy (length of resulting prediction intervals) and computational efficiency. As extensions, we develop a method for constructing valid in-sample prediction intervals called rank-one-out conformal inference, which has essentially the same computational efficiency as split conformal inference. We also describe an extension of our procedures for producing prediction bands with locally varying length, in order to adapt to heteroskedascity in the data. Finally, we propose a model-free notion of variable importance, called leave-one-covariate-out or LOCO inference. Accompanying this paper is an R package conformalInference that implements all of the proposals we have introduced. In the spirit of reproducibility, all of our empirical results can also be easily (re)generated using this package.

translated by 谷歌翻译

Adaptive LASSO estimation for functional hidden dynamic geostatistical model

Paolo Maranzano , Philipp Otto , Alessandro Fassò

分类： (统计)机器学习

2022-08-10

我们根据功能性隐藏动态地理模型（F-HDGM）的惩罚最大似然估计器（PMLE）提出了一种新型的模型选择算法。这些模型采用经典的混合效应回归结构，该结构具有嵌入式时空动力学，以模拟在功能域中观察到的地理参考数据。因此，感兴趣的参数是该域之间的函数。该算法同时选择了相关的样条基函数和回归变量，这些函数和回归变量用于对响应变量与协变量之间的固定效应关系进行建模。这样，它会自动收缩到功能系数的零部分或无关回归器的全部效果。该算法基于迭代优化，并使用自适应的绝对收缩和选择器操作员（LASSO）惩罚函数，其中未含量的F-HDGM最大likikelihood估计器获得了其中的权重。最大化的计算负担大大减少了可能性的局部二次近似。通过蒙特卡洛模拟研究，我们分析了在不同情况下算法的性能，包括回归器之间的强相关性。我们表明，在我们考虑的所有情况下，受罚的估计器的表现都优于未确定的估计器。我们将该算法应用于一个真实案例研究，其中将意大利伦巴第地区的小时二氧化氮浓度记录记录为具有多种天气和土地覆盖协变量的功能过程。

translated by 谷歌翻译

A Data-Driven Line Search Rule for Support Recovery in High-dimensional Data Analysis

Peili Li , Yuling Jiao , Xiliang Lu , Lican Kang

分类： (统计)机器学习 | 机器学习

2021-11-21

在这项工作中，我们将该算法考虑到（非线性）回归问题与$ \ ell_0 $罚款。用于$ \ ell_0 $基于$的优化问题的现有算法通常用固定的步长进行，并且选择适当的步长度取决于限制的强凸性和损耗功能的平滑度，因此难以计算计算。在Sprite的支持检测和根查找\ Cite {HJK2020}的思想中，我们提出了一种新颖且有效的数据驱动线搜索规则，以自适应地确定适当的步长。我们证明了绑定到所提出的算法的$ \ ell_2 $ error，而没有限制成本函数。在线性和逻辑回归问题中具有最先进的算法的大量数值比较显示了所提出的算法的稳定性，有效性和优越性。

translated by 谷歌翻译

Prediction Errors for Penalized Regressions based on Generalized Approximate Message Passing

Ayaka Sakata

分类： (统计)机器学习 | 机器学习

2022-06-26

We discuss the prediction accuracy of assumed statistical models in terms of prediction errors for the generalized linear model and penalized maximum likelihood methods. We derive the forms of estimators for the prediction errors: C p criterion, information criteria, and leave-one-out cross validation (LOOCV) error, using the generalized approximate message passing (GAMP) algorithm and replica method. These estimators coincide with each other when the number of model parameters is sufficiently small; however, there is a discrepancy between them in particular in the overparametrized region where the number of model parameters is larger than the data dimension. In this paper, we review the prediction errors and corresponding estimators, and discuss their differences. In the framework of GAMP, we show that the information criteria can be expressed by using the variance of the estimates. Further, we demonstrate how to approach LOOCV error from the information criteria by utilizing the expression provided by GAMP.

translated by 谷歌翻译

Maximum Likelihood from Incomplete Data Via the EM Algorithm

分类：

JSTOR is a not-for-profit service that helps scholars, researchers, and students discover, use, and build upon a wide range of content in a trusted digital archive. We use information technology and tools to increase productivity and facilitate new forms of scholarship. For more information about JSTOR, please contact

translated by 谷歌翻译

A flexible empirical Bayes approach to multiple linear regression and connections with penalized regression

Youngseok Kim , Wei Wang , Peter Carbonetto , Matthew Stephens

分类： (统计)机器学习

2022-08-23

我们引入了一种新的经验贝叶斯方法，用于大规模多线性回归。我们的方法结合了两个关键思想：（i）使用灵活的“自适应收缩”先验，该先验近似于正常分布的有限混合物，近似于正常分布的非参数家族；（ii）使用变分近似来有效估计先前的超参数并计算近似后期。将这两个想法结合起来，将快速，灵活的方法与计算速度相当，可与快速惩罚的回归方法（例如Lasso）相当，并在各种场景中具有出色的预测准确性。此外，我们表明，我们方法中的后验平均值可以解释为解决惩罚性回归问题，并通过直接解决优化问题（而不是通过交叉验证来调整）从数据中学到的惩罚函数的精确形式。。我们的方法是在r https://github.com/stephenslab/mr.ash.ash.alpha的r软件包中实现的

translated by 谷歌翻译

Modelling High-Dimensional Categorical Data Using Nonconvex Fusion Penalties

Benjamin G. Stokell , Rajen D. Shah , Ryan J. Tibshirani

分类： (统计)机器学习

2020-02-28

我们提出了一种估计具有标称分类数据的高维线性模型的方法。我们的估算器，称为范围，通过使其相应的系数完全相等来融合水平。这是通过对分类变量的系数的阶数统计之间的差异之间的差异来实现这一点，从而聚类系数。我们提供了一种算法，用于精确和有效地计算在具有潜在许多级别的单个变量的情况下的总体上的最小值的全局最小值，并且在多变量情况下在块坐标血管下降过程中使用它。我们表明，利用未知级别融合的Oracle最小二乘解决方案是具有高概率的坐标血缘的极限点，只要真正的级别具有一定的最小分离;已知这些条件在单变量案例中最小。我们展示了在一系列实际和模拟数据集中的范围的有利性能。 R包的R包Catreg实现线性模型的范围，也可以在CRAN上提供逻辑回归的版本。

translated by 谷歌翻译

On the Sparse DAG Structure Learning Based on Adaptive Lasso

Danru Xu , Erdun Gao , Wei Huang , Mingming Gong

分类： (统计)机器学习 | 机器学习

2022-09-07

学习由有针对性的无环图（DAG）代表的基本休闲结构，这些事件来自完全观察到的事件是因果推理的关键部分，但由于组合和较大的搜索空间，这是一项挑战。最近的一系列发展通过利用代数平等表征，将该组合问题重新生要重现为一个连续的优化问题。但是，这些方法在优化之后遭受了固定阈值的措施，这不是一种灵活而系统的方法，可以排除诱导周期的边缘或错误的发现边缘，其边缘具有由数值精度引起的较小值。在本文中，我们开发了一种数据驱动的DAG结构学习方法，而没有预定义阈值，称为自适应宣传[30]，该方法通过在正则化项中对每个参数应用自适应惩罚水平来实现。我们表明，在某些特定条件下，自适应宣传符合Oracle属性。此外，模拟实验结果验证了我们方法的有效性，而没有设置边缘重量的任何间隙。

translated by 谷歌翻译

Two-Stage Robust and Sparse Distributed Statistical Inference for Large-Scale Data

Emadaldin Mozafari-Majd , Visa Koivunen

分类： (统计)机器学习 | 机器学习

2022-08-17

在本文中，我们解决了在涉及大规模数据的设置中进行统计推断的问题，这些数据可能是高度的，并且被异常值污染。数据的大量和维度需要分布式处理和存储解决方案。我们提出了一个两阶段分布和强大的统计推断程序，通过促进稀疏性来应对高维模型。在第一阶段（称为模型选择）中，相关预测因子是通过将强大的LASSO估计器应用于不同数据子集的局部选择。然后，从每个计算节点中的变量选择通过投票方案融合，以找到完整数据集的稀疏基础。它以强大的方式识别相关变量。在第二阶段，采用了开发的统计上健壮的和计算高效的引导方法。实际推断构建体间隔，找到参数估计并量化标准偏差。与第1阶段类似，将局部推理的结果传达给融合中心并在此组合。通过使用分析方法，我们建立了鲁棒和计算有效的引导方法的有利统计特性，包括固定数量的预测因子和鲁棒性的一致性。提出的两阶段的鲁棒和分布式推理程序在变量选择中表现出可靠的性能和鲁棒性，即使数据是高度且受异常值污染的，找到置信区间和标准偏差的自举近似。

translated by 谷歌翻译

Understanding Implicit Regularization in Over-Parameterized Single Index Model

Jianqing Fan , Zhuoran Yang , Mengxin Yu

分类： (统计)机器学习 | 机器学习

2020-07-16

在本文中，我们利用过度参数化来设计高维单索索引模型的无规矩算法，并为诱导的隐式正则化现象提供理论保证。具体而言，我们研究了链路功能是非线性且未知的矢量和矩阵单索引模型，信号参数是稀疏向量或低秩对称矩阵，并且响应变量可以是重尾的。为了更好地理解隐含正规化的角色而没有过度的技术性，我们假设协变量的分布是先验的。对于载体和矩阵设置，我们通过采用分数函数变换和专为重尾数据的强大截断步骤来构造过度参数化最小二乘损耗功能。我们建议通过将无规则化的梯度下降应用于损耗函数来估计真实参数。当初始化接近原点并且步骤中足够小时，我们证明了所获得的解决方案在载体和矩阵案件中实现了最小的收敛统计速率。此外，我们的实验结果支持我们的理论调查结果，并表明我们的方法在$ \ ell_2 $ -staticatisticated率和变量选择一致性方面具有明确的正则化的经验卓越。

translated by 谷歌翻译

Scalable Gaussian-process regression and variable selection using Vecchia approximations

Jian Cao , Joseph Guinness , Marc G. Genton , Matthias Katzfuss

分类： (统计)机器学习

2022-02-25

高斯过程（GP）回归是一种灵活的，非参数回归的方法，自然量化不确定性。在许多应用中，响应和协变量的数量均大，目标是选择与响应相关的协变量。在这种情况下，我们提出了一种新颖的可扩展算法，即创建的VGPR，该算法基于Vecchia GP近似，优化了受惩罚的GP log-logikelihiens，这是空间统计的有序条件近似，这意味着精确矩阵的稀疏cholesky因子。我们将正则路径从强度惩罚到弱惩罚，依次添加基于对数似然梯度的候选协变量，并通过新的二次约束坐标下降算法取消了无关的协变量。我们提出了基于Vecchia的迷你批次亚采样，该子采样提供了无偏的梯度估计器。最终的过程可扩展到数百万个响应和数千个协变量。理论分析和数值研究表明，相对于现有方法，可伸缩性和准确性的提高。

translated by 谷歌翻译

Gaining Outlier Resistance with Progressive Quantiles: Fast Algorithms and Theoretical Studies

Yiyuan She , Zhifeng Wang , Jiahui Shen

分类： (统计)机器学习

2021-12-15

异常值广泛发生在大数据应用中，可能严重影响统计估计和推理。在本文中，引入了抗强估计的框架，以强制任意给出的损耗函数。它与修剪方法密切连接，并且包括所有样本的显式外围参数，这反过来促进计算，理论和参数调整。为了解决非凸起和非体性的问题，我们开发可扩展的算法，以实现轻松和保证快速收敛。特别地，提出了一种新的技术来缓解对起始点的要求，使得在常规数据集上，可以大大减少数据重采样的数量。基于组合的统计和计算处理，我们能够超越M估计来执行非因思分析。所获得的抗性估算器虽然不一定全局甚至是局部最佳的，但在低维度和高维度中享有最小的速率最优性。回归，分类和神经网络的实验表明，在总异常值发生的情况下提出了拟议方法的优异性能。

translated by 谷歌翻译

Data blurring: sample splitting a single sample

James Leiner , Boyan Duan , Larry Wasserman , Aaditya Ramdas

分类： (统计)机器学习

2021-12-21

假设我们观察一个随机向量$ x $从一个具有未知参数的已知家庭中的一些分发$ p $。我们问以下问题：什么时候可以将$ x $分为两部分$ f（x）$和$ g（x）$，使得两部分都足以重建$ x $自行，但两者都可以恢复$ x $完全，$（f（x），g（x））$的联合分布是贸易的吗？作为一个例子，如果$ x =（x_1，\ dots，x_n）$和$ p $是一个产品分布，那么对于任何$ m <n $，我们可以将样本拆分以定义$ f（x）=（x_1 ，\ dots，x_m）$和$ g（x）=（x_ {m + 1}，\ dots，x_n）$。 Rasines和Young（2021）提供了通过使用$ x $的随机化实现此任务的替代路线，并通过加性高斯噪声来实现高斯分布数据的有限样本中的选择后推断和非高斯添加剂模型的渐近。在本文中，我们提供更一般的方法，可以通过借助贝叶斯推断的思路在有限样本中实现这种分裂，以产生（频繁的）解决方案，该解决方案可以被视为数据分裂的连续模拟。我们称我们的方法数据模糊，作为数据分割，数据雕刻和P值屏蔽的替代方案。我们举例说明了一些原型应用程序的方法，例如选择趋势过滤和其他回归问题的选择后推断。

translated by 谷歌翻译

A non-asymptotic approach for model selection via penalization in high-dimensional mixture of experts models

TrungTin Nguyen , Hien Duy Nguyen , Faicel Chamroukhi , Florence Forbes

分类：人工智能 | 机器学习 | (统计)机器学习

2021-04-06

专家（MOE）的混合是一种流行的统计和机器学习模型，由于其灵活性和效率，多年来一直引起关注。在这项工作中，我们将高斯门控的局部MOE（GLOME）和块对基因协方差局部MOE（Blome）回归模型在异质数据中呈现非线性关系，并在高维预测变量之间具有潜在的隐藏图形结构相互作用。这些模型从计算和理论角度提出了困难的统计估计和模型选择问题。本文致力于研究以混合成分数量，高斯平均专家的复杂性以及协方差矩阵的隐藏块 - 基因结构为特征的Glome或Blome模型集合中的模型选择问题。惩罚最大似然估计框架。特别是，我们建立了以弱甲骨文不平等的形式的非反应风险界限，但前提是罚款的下限。然后，在合成和真实数据集上证明了我们的模型的良好经验行为。

translated by 谷歌翻译