智能论文笔记

Machine Learning Panel Data Regressions with Heavy-tailed Dependent Data: Theory and Application

Andrii Babii , Ryan T. Ball , Eric Ghysels , Jonas Striaukas

分类： (统计)机器学习

2020-08-08

本文介绍了用于在不同频率下采样的重尾依赖面板数据的结构化机器学习回归。我们专注于稀疏组的套索正规化。这种类型的正则化可以利用混合频率序列面板数据结构并提高估计的质量。我们获得了汇集和固定效果的Oracle不等式稀疏组套索面板数据估算器认识到财务和经济数据可能具有脂肪尾。为此，我们利用了由盗版$ \ Tai $ -Mixing流程组成的面板数据的新Fuk-Nagaev集中不等式。

translated by 谷歌翻译

Estimation and Inference on Heterogeneous Treatment Effects in High-Dimensional Dynamic Panels under Weak Dependence

Vira Semenova , Matt Goldman , Victor Chernozhukov , Matt Taddy

分类： (统计)机器学习

2017-12-28

This paper provides estimation and inference methods for a conditional average treatment effects (CATE) characterized by a high-dimensional parameter in both homogeneous cross-sectional and unit-heterogeneous dynamic panel data settings. In our leading example, we model CATE by interacting the base treatment variable with explanatory variables. The first step of our procedure is orthogonalization, where we partial out the controls and unit effects from the outcome and the base treatment and take the cross-fitted residuals. This step uses a novel generic cross-fitting method we design for weakly dependent time series and panel data. This method "leaves out the neighbors" when fitting nuisance components, and we theoretically power it by using Strassen's coupling. As a result, we can rely on any modern machine learning method in the first step, provided it learns the residuals well enough. Second, we construct an orthogonal (or residual) learner of CATE -- the Lasso CATE -- that regresses the outcome residual on the vector of interactions of the residualized treatment with explanatory variables. If the complexity of CATE function is simpler than that of the first-stage regression, the orthogonal learner converges faster than the single-stage regression-based learner. Third, we perform simultaneous inference on parameters of the CATE function using debiasing. We also can use ordinary least squares in the last two steps when CATE is low-dimensional. In heterogeneous panel data settings, we model the unobserved unit heterogeneity as a weakly sparse deviation from Mundlak (1978)'s model of correlated unit effects as a linear function of time-invariant covariates and make use of L1-penalization to estimate these models. We demonstrate our methods by estimating price elasticities of groceries based on scanner data. We note that our results are new even for the cross-sectional (i.i.d) case.

translated by 谷歌翻译

On LASSO for High Dimensional Predictive Regression

Ziwei Mei , Zhentao Shi

分类： (统计)机器学习

2022-12-14

In a high dimensional linear predictive regression where the number of potential predictors can be larger than the sample size, we consider using LASSO, a popular L1-penalized regression method, to estimate the sparse coefficients when many unit root regressors are present. Consistency of LASSO relies on two building blocks: the deviation bound of the cross product of the regressors and the error term, and the restricted eigenvalue of the Gram matrix of the regressors. In our setting where unit root regressors are driven by temporal dependent non-Gaussian innovations, we establish original probabilistic bounds for these two building blocks. The bounds imply that the rates of convergence of LASSO are different from those in the familiar cross sectional case. In practical applications given a mixture of stationary and nonstationary predictors, asymptotic guarantee of LASSO is preserved if all predictors are scale-standardized. In an empirical example of forecasting the unemployment rate with many macroeconomic time series, strong performance is delivered by LASSO when the initial specification is guided by macroeconomic domain expertise.

translated by 谷歌翻译

Sparse Generalized Yule-Walker Estimation for Large Spatio-temporal Autoregressions with an Application to NO2 Satellite Data

Hanno Reuvers , Etienne Wijler

分类： (统计)机器学习

2021-08-05

我们考虑一个高维模型，其中观察到时间和空间的变量。该模型由包含时间滞后的时空回归和因变量的空间滞后组成。与古典空间自回归模型不同，我们不依赖于预定的空间交互矩阵，但从数据中推断所有空间交互。假设稀疏性，我们通过惩罚一组Yule-Walker方程来估计完全数据驱动的空间和时间依赖。这种正则化可以留下非结构化，但我们还提出了当观察结果源自空间网格（例如卫星图像）时定制的收缩程序。推导有限的样本误差界限，并且在渐近框架中建立估计一致性，其中样本大小和空间单元的数量共同偏离。外源性变量也可以包括在内。与竞争程序相比，仿真练习表现出强大的有限样本性能。作为一个实证应用，我们模型卫星测量了伦敦的No2浓度。我们的方法通过竞争力的基准提供预测，我们发现了强烈的空间互动的证据。

translated by 谷歌翻译

Forecast Evaluation in Large Cross-Sections of Realized Volatility

Christis Katsouris

分类： (统计)机器学习 | 机器学习

2021-12-09

在本文中，我们考虑了使用相同的预测精度测试程序在横截面依赖下实现了实现波动率测量的预测评估。在预测实现挥发性时，我们根据增强横截面评估模型的预测精度。在相等预测精度的零假设下，所采用的基准模型是标准的HAR模型，而在非相同的预测精度的替代方案下，预测模型是通过套索缩收估计的增强的HAR模型。我们通过结合测量误差校正以及横截面跳转分量测量来研究预报对模型规范的敏感性。使用数值实现评估模型的样本外预测评估。

translated by 谷歌翻译

The Projected Covariance Measure for assumption-lean variable significance testing

Anton Rask Lundborg , Ilmun Kim , Rajen D. Shah , Richard J. Samworth

分类： (统计)机器学习

2022-11-03

Testing the significance of a variable or group of variables $X$ for predicting a response $Y$, given additional covariates $Z$, is a ubiquitous task in statistics. A simple but common approach is to specify a linear model, and then test whether the regression coefficient for $X$ is non-zero. However, when the model is misspecified, the test may have poor power, for example when $X$ is involved in complex interactions, or lead to many false rejections. In this work we study the problem of testing the model-free null of conditional mean independence, i.e. that the conditional mean of $Y$ given $X$ and $Z$ does not depend on $X$. We propose a simple and general framework that can leverage flexible nonparametric or machine learning methods, such as additive models or random forests, to yield both robust error control and high power. The procedure involves using these methods to perform regressions, first to estimate a form of projection of $Y$ on $X$ and $Z$ using one half of the data, and then to estimate the expected conditional covariance between this projection and $Y$ on the remaining half of the data. While the approach is general, we show that a version of our procedure using spline regression achieves what we show is the minimax optimal rate in this nonparametric testing problem. Numerical experiments demonstrate the effectiveness of our approach both in terms of maintaining Type I error control, and power, compared to several existing approaches.

translated by 谷歌翻译

The Lasso with general Gaussian designs with applications to hypothesis testing

Michael Celentano , Andrea Montanari , Yuting Wei

分类：机器学习 | (统计)机器学习

2020-07-27

套索是一种高维回归的方法，当时，当协变量$ p $的订单数量或大于观测值$ n $时，通常使用它。由于两个基本原因，经典的渐近态性理论不适用于该模型：$（1）$正规风险是非平滑的； $（2）$估算器$ \ wideHat {\ boldsymbol {\ theta}} $与true参数vector $ \ boldsymbol {\ theta}^*$无法忽略。结果，标准的扰动论点是渐近正态性的传统基础。另一方面，套索估计器可以精确地以$ n $和$ p $大，$ n/p $的订单为一。这种表征首先是在使用I.I.D的高斯设计的情况下获得的。协变量：在这里，我们将其推广到具有非偏差协方差结构的高斯相关设计。这是根据更简单的``固定设计''模型表示的。我们在两个模型中各种数量的分布之间的距离上建立了非反应界限，它们在合适的稀疏类别中均匀地固定在信号上$ \ boldsymbol {\ theta}^*$。作为应用程序，我们研究了借助拉索的分布，并表明需要校正程度对于计算有效的置信区间是必要的。

translated by 谷歌翻译

Invariant Inference via Residual Randomization

Panos Toulis

分类： (统计)机器学习

2019-08-12

统计推断中的主要范式取决于I.I.D.的结构。来自假设的无限人群的数据。尽管它取得了成功，但在复杂的数据结构下，即使在清楚无限人口所代表的内容的情况下，该框架在复杂的数据结构下仍然不灵活。在本文中，我们探讨了一个替代框架，在该框架中，推断只是对模型误差的不变性假设，例如交换性或符号对称性。作为解决这个不变推理问题的一般方法，我们提出了一个基于随机的过程。我们证明了该过程的渐近有效性的一般条件，并在许多数据结构中说明了，包括单向和双向布局中的群集误差。我们发现，通过残差随机化的不变推断具有三个吸引人的属性：（1）在弱且可解释的条件下是有效的，可以解决重型数据，有限聚类甚至一些高维设置的问题。（2）它在有限样品中是可靠的，因为它不依赖经典渐近学所需的规律性条件。（3）它以适应数据结构的统一方式解决了推断问题。另一方面，诸如OLS或Bootstrap之类的经典程序以I.I.D.为前提。结构，只要实际问题结构不同，就需要修改。经典框架中的这种不匹配导致了多种可靠的误差技术和自举变体，这些变体经常混淆应用研究。我们通过广泛的经验评估证实了这些发现。残留随机化对许多替代方案的表现有利，包括可靠的误差方法，自举变体和分层模型。

translated by 谷歌翻译

Data-Driven Sample Average Approximation with Covariate Information

Rohit Kannan , Güzin Bayraksan , James R. Luedtke

分类： (统计)机器学习

2022-07-27

当我们对优化模型中的不确定参数进行观察以及对协变量的同时观察时，我们研究了数据驱动决策的优化。鉴于新的协变量观察，目标是选择一个决定以此观察为条件的预期成本的决定。我们研究了三个数据驱动的框架，这些框架将机器学习预测模型集成在随机编程样本平均值近似（SAA）中，以近似解决该问题的解决方案。 SAA框架中的两个是新的，并使用了场景生成的剩余预测模型的样本外残差。我们研究的框架是灵活的，并且可以容纳参数，非参数和半参数回归技术。我们在数据生成过程，预测模型和随机程序中得出条件，在这些程序下，这些数据驱动的SaaS的解决方案是一致且渐近最佳的，并且还得出了收敛速率和有限的样本保证。计算实验验证了我们的理论结果，证明了我们数据驱动的公式比现有方法的潜在优势（即使预测模型被误解了），并说明了我们在有限的数据制度中新的数据驱动配方的好处。

translated by 谷歌翻译

Statistical Inference for Maximin Effects: Identifying Stable Associations across Multiple Studies

Zijian Guo

分类： (统计)机器学习

2020-11-15

Integrative analysis of data from multiple sources is critical to making generalizable discoveries. Associations that are consistently observed across multiple source populations are more likely to be generalized to target populations with possible distributional shifts. In this paper, we model the heterogeneous multi-source data with multiple high-dimensional regressions and make inferences for the maximin effect (Meinshausen, B{\"u}hlmann, AoS, 43(4), 1801--1830). The maximin effect provides a measure of stable associations across multi-source data. A significant maximin effect indicates that a variable has commonly shared effects across multiple source populations, and these shared effects may be generalized to a broader set of target populations. There are challenges associated with inferring maximin effects because its point estimator can have a non-standard limiting distribution. We devise a novel sampling method to construct valid confidence intervals for maximin effects. The proposed confidence interval attains a parametric length. This sampling procedure and the related theoretical analysis are of independent interest for solving other non-standard inference problems. Using genetic data on yeast growth in multiple environments, we demonstrate that the genetic variants with significant maximin effects have generalizable effects under new environments.

translated by 谷歌翻译

On the instrumental variable estimation with many weak and invalid instruments

Yiqi Lin , Frank Windmeijer , Xinyuan Song , Qingliang Fan

分类： (统计)机器学习

2022-07-07

我们讨论了具有未知IV有效性的线性仪器变量（IV）模型中识别的基本问题。我们重新审视了流行的多数和多元化规则，并表明通常没有识别条件是“且仅在总体上”。假设“最稀少的规则”，该规则等同于多数规则，但在计算算法中变得运作，我们研究并证明了基于两步选择的其他IV估计器的非convex惩罚方法的优势，就两步选择而言选择一致性和单独弱IV的适应性。此外，我们提出了一种与识别条件保持一致的替代较低的惩罚，并同时提供甲骨文稀疏结构。与先前的文献相比，针对静脉强度较弱的估计仪得出了理想的理论特性。使用模拟证明了有限样本特性，并且选择和估计方法应用于有关贸易对经济增长的影响的经验研究。

translated by 谷歌翻译

Distribution-Free Predictive Inference For Regression

Jing Lei , Max G'Sell , Alessandro Rinaldo , Ryan J. Tibshirani , Larry Wasserman

分类：

2016-04-14

We develop a general framework for distribution-free predictive inference in regression, using conformal inference. The proposed methodology allows for the construction of a prediction band for the response variable using any estimator of the regression function. The resulting prediction band preserves the consistency properties of the original estimator under standard assumptions, while guaranteeing finite-sample marginal coverage even when these assumptions do not hold. We analyze and compare, both empirically and theoretically, the two major variants of our conformal framework: full conformal inference and split conformal inference, along with a related jackknife method. These methods offer different tradeoffs between statistical accuracy (length of resulting prediction intervals) and computational efficiency. As extensions, we develop a method for constructing valid in-sample prediction intervals called rank-one-out conformal inference, which has essentially the same computational efficiency as split conformal inference. We also describe an extension of our procedures for producing prediction bands with locally varying length, in order to adapt to heteroskedascity in the data. Finally, we propose a model-free notion of variable importance, called leave-one-covariate-out or LOCO inference. Accompanying this paper is an R package conformalInference that implements all of the proposals we have introduced. In the spirit of reproducibility, all of our empirical results can also be easily (re)generated using this package.

translated by 谷歌翻译

Binary Choice with Asymmetric Loss in a Data-Rich Environment: Theory and an Application to Racial Justice

Andrii Babii , Xi Chen , Eric Ghysels , Rohit Kumar

分类： (统计)机器学习

2020-10-16

我们在具有不对称损耗功能的数据丰富的环境中研究了二元选择问题。经济学学文献涵盖非参数二元选择问题，但在富含数据的环境中没有提供计算上有吸引力的解决方案。机器学习文献具有许多算法，但主要集中在独立于协变量的损耗功能上。我们表明，通过基于损失的损失的重量或最先进的机器学习技术，可以通过非常简单的损失的重量来实现关于与一般损失函数的二元成果的理论上有效决策。我们将我们的分析应用于审前拘留中的种族正义。

translated by 谷歌翻译

Algorithm is Experiment: Machine Learning, Market Design, and Policy Eligibility Rules

Yusuke Narita , Kohei Yata

分类：机器学习 | (统计)机器学习

2021-04-26

算法在政策和业务中产生越来越多的决策和建议。这种算法决策是自然实验（可条件准随机分配的仪器），因为该算法仅基于可观察输入变量的决定。我们使用该观察来为一类随机和确定性决策算法开发治疗效果估算器。我们的估算器被证明对于明确的因果效应，它们是一致的和渐近正常的。我们估算器的一个关键特例是多维回归不连续性设计。我们应用估算员以评估冠状病毒援助，救济和经济安全（关心）法案的效果，其中数十亿美元的资金通过算法规则分配给医院。我们的估计表明，救济资金对Covid-19相关的医院活动水平影响不大。天真的OLS和IV估计表现出实质性的选择偏差。

translated by 谷歌翻译

Concentration analysis of multivariate elliptic diffusion processes

Cathrine Aeckerle-Willems , Claudia Strauch , Lukas Trottner

分类： (统计)机器学习

2022-06-07

我们证明了连续和离散时间添加功能的浓度不平等和相关的PAC界限，用于可能是多元，不可逆扩散过程的无界函数。我们的分析依赖于通过泊松方程的方法，使我们能够考虑一系列非常广泛的指数性千古过程。这些结果增加了现有的浓度不平等，用于扩散过程的加性功能，这些功能仅适用于有界函数或从明显较小的类别中的过程的无限函数。我们通过两个截然不同的区域的例子来证明这些指数不平等的力量。考虑到在稀疏性约束下可能具有高维参数非线性漂移模型，我们应用连续的时间浓度结果来验证套索估计的受限特征值条件，这对于甲骨文不平等的推导至关重要。离散添加功能的结果用于研究未经调整的Langevin MCMC算法，用于采样中等重尾密度$ \ pi $。特别是，我们为多项式增长功能$ f $的样品蒙特卡洛估计量$ \ pi（f）提供PAC边界，以量化足够的样本和阶梯尺寸，以在规定的边距内近似具有很高的可能性。

translated by 谷歌翻译

High-dimensional Inference for Dynamic Treatment Effects

Jelena Bradic , Weijie Ji , Yuqian Zhang

分类：机器学习 | (统计)机器学习

2021-10-10

本文提出了在多阶段实验的背景下的异质治疗效应的置信区间结构，以$ N $样品和高维，$ D $，混淆。我们的重点是$ d \ gg n $的情况，但获得的结果也适用于低维病例。我们展示了正则化估计的偏差，在高维变焦空间中不可避免，具有简单的双重稳固分数。通过这种方式，不需要额外的偏差，并且我们获得root $ N $推理结果，同时允许治疗和协变量的多级相互依赖性。记忆财产也没有假设;治疗可能取决于所有先前的治疗作业以及以前的所有多阶段混淆。我们的结果依赖于潜在依赖的某些稀疏假设。我们发现具有动态处理的强大推理所需的新产品率条件。

translated by 谷歌翻译

A penalized two-pass regression to predict stock returns with time-varying risk premia

Gaetan Bakalli , Stéphane Guerrier , Olivier Scaillet

分类： (统计)机器学习

2022-08-01

我们通过随时间变化的因素负载开发了受惩罚的两次通用回归。第一遍中的惩罚对时间变化驱动因素强加了稀疏性，同时还通过正规化适当的系数组来维持与无契约限制的兼容性。第二次通过提供了风险溢价估计，以预测股权超额回报。我们的蒙特卡洛结果以及我们对大量横断面数据集的个人股票集的经验结果表明，如果不进行分组的惩罚可能会屈服于几乎所有估计的时变模型，违反了无标准限制。此外，我们的结果表明，与惩罚方法相比，所提出的方法在没有适当分组或时间不变的因子模型的情况下减少了预测错误。

translated by 谷歌翻译

Dimension-agnostic inference using cross U-statistics

Ilmun Kim , Aaditya Ramdas

分类： (统计)机器学习

2020-11-10

Classical asymptotic theory for statistical inference usually involves calibrating a statistic by fixing the dimension $d$ while letting the sample size $n$ increase to infinity. Recently, much effort has been dedicated towards understanding how these methods behave in high-dimensional settings, where $d$ and $n$ both increase to infinity together. This often leads to different inference procedures, depending on the assumptions about the dimensionality, leaving the practitioner in a bind: given a dataset with 100 samples in 20 dimensions, should they calibrate by assuming $n \gg d$, or $d/n \approx 0.2$? This paper considers the goal of dimension-agnostic inference; developing methods whose validity does not depend on any assumption on $d$ versus $n$. We introduce an approach that uses variational representations of existing test statistics along with sample splitting and self-normalization to produce a new test statistic with a Gaussian limiting distribution, regardless of how $d$ scales with $n$. The resulting statistic can be viewed as a careful modification of degenerate U-statistics, dropping diagonal blocks and retaining off-diagonal blocks. We exemplify our technique for some classical problems including one-sample mean and covariance testing, and show that our tests have minimax rate-optimal power against appropriate local alternatives. In most settings, our cross U-statistic matches the high-dimensional power of the corresponding (degenerate) U-statistic up to a $\sqrt{2}$ factor.

translated by 谷歌翻译

On High dimensional Poisson models with measurement error: hypothesis testing for nonlinear nonconvex optimization

Fei Jiang , Yeqing Zhou , Jianxuan Liu , Yanyuan Ma

分类：机器学习

2022-12-31

We study estimation and testing in the Poisson regression model with noisy high dimensional covariates, which has wide applications in analyzing noisy big data. Correcting for the estimation bias due to the covariate noise leads to a non-convex target function to minimize. Treating the high dimensional issue further leads us to augment an amenable penalty term to the target function. We propose to estimate the regression parameter through minimizing the penalized target function. We derive the L1 and L2 convergence rates of the estimator and prove the variable selection consistency. We further establish the asymptotic normality of any subset of the parameters, where the subset can have infinitely many components as long as its cardinality grows sufficiently slow. We develop Wald and score tests based on the asymptotic normality of the estimator, which permits testing of linear functions of the members if the subset. We examine the finite sample performance of the proposed tests by extensive simulation. Finally, the proposed method is successfully applied to the Alzheimer's Disease Neuroimaging Initiative study, which motivated this work initially.

translated by 谷歌翻译

Is completeness necessary? Estimation in nonidentified linear models

Andrii Babii , Jean-Pierre Florens

分类： (统计)机器学习

2017-09-11

我们显示基于光谱正则化的估计变换到一类非识别线性不良逆模型中的结构参数的最佳近似。重要的是，这种融合在均匀和希尔伯特空间规范中保持。当最佳近似与结构参数重合时，我们描述了几种情况，或者至少合理地近似，并且讨论我们的结果在部分识别设置中是如何有用的。最后，我们记录了识别失败对正规化估计器的线性功能的渐近分布具有重要意义，该估算器可以具有加权Chi平方组分。该理论被示出了各种高维和非参数IV回归。

translated by 谷歌翻译