智能论文笔记

Do We Exploit all Information for Counterfactual Analysis? Benefits of Factor Models and Idiosyncratic Correction

Jianqing Fan , Ricardo P. Masini , Marcelo C. Medeiros

分类：机器学习 | (统计)机器学习

2020-11-08

最佳定价，即确定最大限度地提高给定产品的利润或收入的价格水平，是零售业的重要任务。要选择这样的数量，请先估计产品需求的价格弹性。由于混淆效果和价格内限性，回归方法通常无法恢复这些弹性。因此，通常需要随机实验。然而，例如，弹性可以是高度异构的，这取决于商店的位置。随着随机化经常发生在市级，标准差异差异方法也可能失败。可能的解决方案是基于根据从人工对照构成的治疗方法测量处理对单个（或仅几个）处理单元的影响的方法。例如，对于治疗组中的每个城市，可以从未处理的位置构成反事实。在本文中，我们应用了一种新的高维统计方法，以衡量价格变化对巴西主要零售商的日常销售的影响。所提出的方法结合了主成分（因子）和稀疏回归，导致一种称为因子调整的正规化方法的方法（\ TextTt {FarmTraTeat}）。数据包括每日五种不同产品的日常销售和价格，超过400多名市。审议的产品属于\ emph {甜蜜和糖果}类别和实验已经在2016年和2017年进行。我们的结果证实了高度异质性的假设，从而产生了与独特的市政当局的不同定价策略。

translated by 谷歌翻译

Synthetic learner: model-free inference on treatments over time

Davide Viviano , Jelena Bradic

分类：机器学习 | (统计)机器学习

2019-04-02

了解特定待遇或政策与许多感兴趣领域有关的影响，从政治经济学，营销到医疗保健。在本文中，我们开发了一种非参数算法，用于在合成控制的背景下检测随着时间的流逝的治疗作用。该方法基于许多算法的反事实预测，而不必假设该算法正确捕获模型。我们介绍了一种推论程序来检测治疗效果，并表明测试程序对于固定，β混合过程渐近有效，而无需对所考虑的一组基础算法施加任何限制。我们讨论了平均治疗效果估计的一致性保证，并为提出的方法提供了遗憾的界限。算法类别可能包括随机森林，套索或任何其他机器学习估计器。数值研究和应用说明了该方法的优势。

translated by 谷歌翻译

Estimating Heterogeneous Bounds for Treatment Effects under Sample Selection and Non-response

Phillip Heiler

分类： (统计)机器学习

2022-09-09

在本文中，我们提出了一种非参数估计的方法，并推断了一般样本选择模型中因果效应参数的异质界限，初始治疗可能会影响干预后结果是否观察到。可观察到的协变量可能会混淆治疗选择，而观察结果和不可观察的结果可能会混淆。该方法提供条件效应界限作为策略相关的预处理变量的功能。它允许对身份不明的条件效应曲线进行有效的统计推断。我们使用灵活的半参数脱偏机学习方法，该方法可以适应柔性功能形式和治疗，选择和结果过程之间的高维混杂变量。还提供了易于验证的高级条件，以进行估计和错误指定的鲁棒推理保证。

translated by 谷歌翻译

Predictor Selection for Synthetic Controls

Jaume Vives-i-Bastida

分类： (统计)机器学习

2022-03-22

Synthetic control methods often rely on matching pre-treatment characteristics (called predictors) of the treated unit. The choice of predictors and how they are weighted plays a key role in the performance and interpretability of synthetic control estimators. This paper proposes the use of a sparse synthetic control procedure that penalizes the number of predictors used in generating the counterfactual to select the most important predictors. We derive, in a linear factor model framework, a new model selection consistency result and show that the penalized procedure has a faster mean squared error convergence rate. Through a simulation study, we then show that the sparse synthetic control achieves lower bias and has better post-treatment performance than the un-penalized synthetic control. Finally, we apply the method to revisit the study of the passage of Proposition 99 in California in an augmented setting with a large number of predictors available.

translated by 谷歌翻译

Estimation and Inference on Heterogeneous Treatment Effects in High-Dimensional Dynamic Panels under Weak Dependence

Vira Semenova , Matt Goldman , Victor Chernozhukov , Matt Taddy

分类： (统计)机器学习

2017-12-28

This paper provides estimation and inference methods for a conditional average treatment effects (CATE) characterized by a high-dimensional parameter in both homogeneous cross-sectional and unit-heterogeneous dynamic panel data settings. In our leading example, we model CATE by interacting the base treatment variable with explanatory variables. The first step of our procedure is orthogonalization, where we partial out the controls and unit effects from the outcome and the base treatment and take the cross-fitted residuals. This step uses a novel generic cross-fitting method we design for weakly dependent time series and panel data. This method "leaves out the neighbors" when fitting nuisance components, and we theoretically power it by using Strassen's coupling. As a result, we can rely on any modern machine learning method in the first step, provided it learns the residuals well enough. Second, we construct an orthogonal (or residual) learner of CATE -- the Lasso CATE -- that regresses the outcome residual on the vector of interactions of the residualized treatment with explanatory variables. If the complexity of CATE function is simpler than that of the first-stage regression, the orthogonal learner converges faster than the single-stage regression-based learner. Third, we perform simultaneous inference on parameters of the CATE function using debiasing. We also can use ordinary least squares in the last two steps when CATE is low-dimensional. In heterogeneous panel data settings, we model the unobserved unit heterogeneity as a weakly sparse deviation from Mundlak (1978)'s model of correlated unit effects as a linear function of time-invariant covariates and make use of L1-penalization to estimate these models. We demonstrate our methods by estimating price elasticities of groceries based on scanner data. We note that our results are new even for the cross-sectional (i.i.d) case.

translated by 谷歌翻译

Heterogeneous Synthetic Learner for Panel Data

Ye Shen , Runzhe Wan , Hengrui Cai , Rui Song

分类： (统计)机器学习 | 机器学习

2022-12-30

In the new era of personalization, learning the heterogeneous treatment effect (HTE) becomes an inevitable trend with numerous applications. Yet, most existing HTE estimation methods focus on independently and identically distributed observations and cannot handle the non-stationarity and temporal dependency in the common panel data setting. The treatment evaluators developed for panel data, on the other hand, typically ignore the individualized information. To fill the gap, in this paper, we initialize the study of HTE estimation in panel data. Under different assumptions for HTE identifiability, we propose the corresponding heterogeneous one-side and two-side synthetic learner, namely H1SL and H2SL, by leveraging the state-of-the-art HTE estimator for non-panel data and generalizing the synthetic control method that allows flexible data generating process. We establish the convergence rates of the proposed estimators. The superior performance of the proposed methods over existing ones is demonstrated by extensive numerical studies.

translated by 谷歌翻译

Synthetic Design: An Optimization Approach to Experimental Design with Synthetic Controls

Nick Doudchenko , Khashayar Khosravi , Jean Pouget-Abadie , Sebastien Lahaie , Miles Lubin , Vahab Mirrokni , Jann Spiess , Guido Imbens

分类： (统计)机器学习

2021-12-01

我们研究了具有预处理结果数据的实验研究的最佳设计。估计平均处理效果是治疗和控制单元的加权平均结果之间的差异。许多常用的方法符合该配方，包括差分估计器和各种合成控制技术。我们提出了几种方法，用于结合重量选择一组处理的单位。观察问题的NP硬度，我们介绍了混合整数编程配方，可选择处理和控制集和单位权重。我们证明，这些提出的方法导致定性不同的实验单元进行治疗。我们根据美国劳动统计局的公开数据使用模拟，这些数据在与随机试验等简单和常用的替代品相比时，表现出平均平方误差和统计功率的改进。

translated by 谷歌翻译

Policy Evaluation for Temporal and/or Spatial Dependent Experiments in Ride-sourcing Platforms

Shikai Luo , Ying Yang , Chengchun Shi , Fang Yao , Jieping Ye , Hongtu Zhu

分类：机器学习 | (统计)机器学习

2022-02-22

基于A/B测试的政策评估引起了人们对数字营销的极大兴趣，但是在乘车平台（例如Uber和Didi）中的这种评估主要是由于其时间和/或空间依赖性实验的复杂结构而被很好地研究。。本文的目的是在乘车平台中的政策评估中进行，目的是在平台的政策和换回设计下的感兴趣结果之间建立因果关系。我们提出了一个基于时间变化系数决策过程（VCDP）模型的新型潜在结果框架，以捕获时间依赖性实验中的动态治疗效果。我们通过将其分解为直接效应总和（DE）和间接效应（IE）来进一步表征平均治疗效应。我们为DE和IE制定了估计和推理程序。此外，我们提出了一个时空VCDP来处理时空依赖性实验。对于这两个VCDP模型，我们都建立了估计和推理程序的统计特性（例如弱收敛和渐近力）。我们进行广泛的模拟，以研究拟议估计和推理程序的有限样本性能。我们研究了VCDP模型如何帮助改善DIDI中各种派遣和处置政策的政策评估。

translated by 谷歌翻译

The Projected Covariance Measure for assumption-lean variable significance testing

Anton Rask Lundborg , Ilmun Kim , Rajen D. Shah , Richard J. Samworth

分类： (统计)机器学习

2022-11-03

Testing the significance of a variable or group of variables $X$ for predicting a response $Y$, given additional covariates $Z$, is a ubiquitous task in statistics. A simple but common approach is to specify a linear model, and then test whether the regression coefficient for $X$ is non-zero. However, when the model is misspecified, the test may have poor power, for example when $X$ is involved in complex interactions, or lead to many false rejections. In this work we study the problem of testing the model-free null of conditional mean independence, i.e. that the conditional mean of $Y$ given $X$ and $Z$ does not depend on $X$. We propose a simple and general framework that can leverage flexible nonparametric or machine learning methods, such as additive models or random forests, to yield both robust error control and high power. The procedure involves using these methods to perform regressions, first to estimate a form of projection of $Y$ on $X$ and $Z$ using one half of the data, and then to estimate the expected conditional covariance between this projection and $Y$ on the remaining half of the data. While the approach is general, we show that a version of our procedure using spline regression achieves what we show is the minimax optimal rate in this nonparametric testing problem. Numerical experiments demonstrate the effectiveness of our approach both in terms of maintaining Type I error control, and power, compared to several existing approaches.

translated by 谷歌翻译

Optimal Experimental Design for Staggered Rollouts

Ruoxuan Xiong , Susan Athey , Mohsen Bayati , Guido Imbens

分类： (统计)机器学习

2019-11-09

在本文中，我们研究了在一组单位上进行的设计实验的问题，例如在线市场中的用户或用户组，以多个时间段，例如数周或数月。这些实验特别有助于研究对当前和未来结果具有因果影响的治疗（瞬时和滞后的影响）。设计问题涉及在实验之前或期间选择每个单元的治疗时间，以便最精确地估计瞬间和滞后的效果，实验后。这种治疗决策的优化可以通过降低其样本尺寸要求，直接最小化实验的机会成本。优化是我们提供近最优解的NP-Hard整数程序，当时在开始时进行设计决策（固定样本大小设计）。接下来，我们研究允许在实验期间进行适应性决策的顺序实验，并且还可能早期停止实验，进一步降低其成本。然而，这些实验的顺序性质使设计阶段和估计阶段复杂化。我们提出了一种新的算法，PGAE，通过自适应地制造治疗决策，估算治疗效果和绘制有效的实验后推理来解决这些挑战。 PGAE将来自贝叶斯统计，动态编程和样品分裂的思想结合起来。使用来自多个域的真实数据集的合成实验，我们证明了与基准相比，我们的固定样本尺寸和顺序实验的提出解决方案将实验的机会成本降低了50％和70％。

translated by 谷歌翻译

Distribution-Free Predictive Inference For Regression

Jing Lei , Max G'Sell , Alessandro Rinaldo , Ryan J. Tibshirani , Larry Wasserman

分类：

2016-04-14

We develop a general framework for distribution-free predictive inference in regression, using conformal inference. The proposed methodology allows for the construction of a prediction band for the response variable using any estimator of the regression function. The resulting prediction band preserves the consistency properties of the original estimator under standard assumptions, while guaranteeing finite-sample marginal coverage even when these assumptions do not hold. We analyze and compare, both empirically and theoretically, the two major variants of our conformal framework: full conformal inference and split conformal inference, along with a related jackknife method. These methods offer different tradeoffs between statistical accuracy (length of resulting prediction intervals) and computational efficiency. As extensions, we develop a method for constructing valid in-sample prediction intervals called rank-one-out conformal inference, which has essentially the same computational efficiency as split conformal inference. We also describe an extension of our procedures for producing prediction bands with locally varying length, in order to adapt to heteroskedascity in the data. Finally, we propose a model-free notion of variable importance, called leave-one-covariate-out or LOCO inference. Accompanying this paper is an R package conformalInference that implements all of the proposals we have introduced. In the spirit of reproducibility, all of our empirical results can also be easily (re)generated using this package.

translated by 谷歌翻译

Modelling hetegeneous treatment effects by quantitle local polynomial decision tree and forest

Lai Xinglin

分类： (统计)机器学习

2021-11-30

为了进一步开发异构治疗效果的统计推理问题，本文在Breiman（2001）随机林树（RFT）和Wager等人的情况下建立了使用古典的优秀统计属性来参数化非参数问题的（2018）因果树。oLs和基于协变量分数的局部线性间隔的划分，同时保留随机林树木，具有可构造的置信区间和渐近常数特性的优势[athey和Imbens（2016），efron（2014），赌第等（2014年）\ citep {wagert2014Asymptotic}，我们根据固定规则提出了一个决策树，根据固定规则与本地样本的多项式估计相结合，我们称之为临时局部线性因果树（QLPRT）和林（QLPRF）。

translated by 谷歌翻译

The Lasso with general Gaussian designs with applications to hypothesis testing

Michael Celentano , Andrea Montanari , Yuting Wei

分类：机器学习 | (统计)机器学习

2020-07-27

套索是一种高维回归的方法，当时，当协变量$ p $的订单数量或大于观测值$ n $时，通常使用它。由于两个基本原因，经典的渐近态性理论不适用于该模型：$（1）$正规风险是非平滑的； $（2）$估算器$ \ wideHat {\ boldsymbol {\ theta}} $与true参数vector $ \ boldsymbol {\ theta}^*$无法忽略。结果，标准的扰动论点是渐近正态性的传统基础。另一方面，套索估计器可以精确地以$ n $和$ p $大，$ n/p $的订单为一。这种表征首先是在使用I.I.D的高斯设计的情况下获得的。协变量：在这里，我们将其推广到具有非偏差协方差结构的高斯相关设计。这是根据更简单的``固定设计''模型表示的。我们在两个模型中各种数量的分布之间的距离上建立了非反应界限，它们在合适的稀疏类别中均匀地固定在信号上$ \ boldsymbol {\ theta}^*$。作为应用程序，我们研究了借助拉索的分布，并表明需要校正程度对于计算有效的置信区间是必要的。

translated by 谷歌翻译

Evaluating Treatment Prioritization Rules via Rank-Weighted Average Treatment Effects

Steve Yadlowsky , Scott Fleming , Nigam Shah , Emma Brunskill , Stefan Wager

分类： (统计)机器学习

2021-11-15

有许多可用于选择优先考虑治疗的可用方法，包括基于治疗效果估计，风险评分和手工制作规则的遵循申请。我们将秩加权平均治疗效应（RATY）指标作为一种简单常见的指标系列，用于比较水平竞争范围的治疗优先级规则。对于如何获得优先级规则，率是不可知的，并且仅根据他们在识别受益于治疗中受益的单位的方式进行评估。我们定义了一系列速率估算器，并证明了一个中央限位定理，可以在各种随机和观测研究环境中实现渐近精确的推断。我们为使用自主置信区间的使用提供了理由，以及用于测试关于治疗效果中的异质性的假设的框架，与优先级规则相关。我们对速率的定义嵌套了许多现有度量，包括QINI系数，以及我们的分析直接产生了这些指标的推论方法。我们展示了我们从个性化医学和营销的示例中的方法。在医疗环境中，使用来自Sprint和Accor-BP随机对照试验的数据，我们发现没有明显的证据证明异质治疗效果。另一方面，在大量的营销审判中，我们在一些数字广告活动的治疗效果中发现了具有的强大证据，并证明了如何使用率如何比较优先考虑估计风险的目标规则与估计治疗效益优先考虑的目标规则。

translated by 谷歌翻译

Causal Effect Estimation with Global Probabilistic Forecasting: A Case Study of the Impact of Covid-19 Lockdowns on Energy Demand

Ankitha Nandipura Prasanna , Priscila Grecov , Angela Dieyu Weng , Christoph Bergmeir

分类：机器学习 | 人工智能

2022-09-19

电力行业正在大力实施智能网格技术，以提高可靠性，可用性，安全性和效率。该实施需要技术进步，标准和法规的发展以及测试和计划。智能电网载荷预测和管理对于降低需求波动和改善连接发电机，分销商和零售商的市场机制至关重要。在政策实施或外部干预措施中，有必要分析其对电力需求的影响的不确定性，以使系统对需求的波动更加准确。本文分析了外部干预的不确定性对电力需求的影响。它实现了一种结合概率和全局预测模型的框架，使用深度学习方法来估计干预措施的因果影响分布。通过预测受影响实例的反事实分布结果，然后将其与实际结果进行对比来评估因果效应。我们将COVID-19锁定对能源使用的影响视为评估这种干预对电力需求分布的不均匀影响的案例研究。我们可以证明，在澳大利亚和某些欧洲国家的最初封锁期间，槽通常比峰值更大的下降，而平均值几乎不受影响。

translated by 谷歌翻译

Forecast Evaluation in Large Cross-Sections of Realized Volatility

Christis Katsouris

分类： (统计)机器学习 | 机器学习

2021-12-09

在本文中，我们考虑了使用相同的预测精度测试程序在横截面依赖下实现了实现波动率测量的预测评估。在预测实现挥发性时，我们根据增强横截面评估模型的预测精度。在相等预测精度的零假设下，所采用的基准模型是标准的HAR模型，而在非相同的预测精度的替代方案下，预测模型是通过套索缩收估计的增强的HAR模型。我们通过结合测量误差校正以及横截面跳转分量测量来研究预报对模型规范的敏感性。使用数值实现评估模型的样本外预测评估。

translated by 谷歌翻译

Invariant Inference via Residual Randomization

Panos Toulis

分类： (统计)机器学习

2019-08-12

统计推断中的主要范式取决于I.I.D.的结构。来自假设的无限人群的数据。尽管它取得了成功，但在复杂的数据结构下，即使在清楚无限人口所代表的内容的情况下，该框架在复杂的数据结构下仍然不灵活。在本文中，我们探讨了一个替代框架，在该框架中，推断只是对模型误差的不变性假设，例如交换性或符号对称性。作为解决这个不变推理问题的一般方法，我们提出了一个基于随机的过程。我们证明了该过程的渐近有效性的一般条件，并在许多数据结构中说明了，包括单向和双向布局中的群集误差。我们发现，通过残差随机化的不变推断具有三个吸引人的属性：（1）在弱且可解释的条件下是有效的，可以解决重型数据，有限聚类甚至一些高维设置的问题。（2）它在有限样品中是可靠的，因为它不依赖经典渐近学所需的规律性条件。（3）它以适应数据结构的统一方式解决了推断问题。另一方面，诸如OLS或Bootstrap之类的经典程序以I.I.D.为前提。结构，只要实际问题结构不同，就需要修改。经典框架中的这种不匹配导致了多种可靠的误差技术和自举变体，这些变体经常混淆应用研究。我们通过广泛的经验评估证实了这些发现。残留随机化对许多替代方案的表现有利，包括可靠的误差方法，自举变体和分层模型。

translated by 谷歌翻译

On the instrumental variable estimation with many weak and invalid instruments

Yiqi Lin , Frank Windmeijer , Xinyuan Song , Qingliang Fan

分类： (统计)机器学习

2022-07-07

我们讨论了具有未知IV有效性的线性仪器变量（IV）模型中识别的基本问题。我们重新审视了流行的多数和多元化规则，并表明通常没有识别条件是“且仅在总体上”。假设“最稀少的规则”，该规则等同于多数规则，但在计算算法中变得运作，我们研究并证明了基于两步选择的其他IV估计器的非convex惩罚方法的优势，就两步选择而言选择一致性和单独弱IV的适应性。此外，我们提出了一种与识别条件保持一致的替代较低的惩罚，并同时提供甲骨文稀疏结构。与先前的文献相比，针对静脉强度较弱的估计仪得出了理想的理论特性。使用模拟证明了有限样本特性，并且选择和估计方法应用于有关贸易对经济增长的影响的经验研究。

translated by 谷歌翻译

On High dimensional Poisson models with measurement error: hypothesis testing for nonlinear nonconvex optimization

Fei Jiang , Yeqing Zhou , Jianxuan Liu , Yanyuan Ma

分类：机器学习

2022-12-31

We study estimation and testing in the Poisson regression model with noisy high dimensional covariates, which has wide applications in analyzing noisy big data. Correcting for the estimation bias due to the covariate noise leads to a non-convex target function to minimize. Treating the high dimensional issue further leads us to augment an amenable penalty term to the target function. We propose to estimate the regression parameter through minimizing the penalized target function. We derive the L1 and L2 convergence rates of the estimator and prove the variable selection consistency. We further establish the asymptotic normality of any subset of the parameters, where the subset can have infinitely many components as long as its cardinality grows sufficiently slow. We develop Wald and score tests based on the asymptotic normality of the estimator, which permits testing of linear functions of the members if the subset. We examine the finite sample performance of the proposed tests by extensive simulation. Finally, the proposed method is successfully applied to the Alzheimer's Disease Neuroimaging Initiative study, which motivated this work initially.

translated by 谷歌翻译

On LASSO for High Dimensional Predictive Regression

Ziwei Mei , Zhentao Shi

分类： (统计)机器学习

2022-12-14

In a high dimensional linear predictive regression where the number of potential predictors can be larger than the sample size, we consider using LASSO, a popular L1-penalized regression method, to estimate the sparse coefficients when many unit root regressors are present. Consistency of LASSO relies on two building blocks: the deviation bound of the cross product of the regressors and the error term, and the restricted eigenvalue of the Gram matrix of the regressors. In our setting where unit root regressors are driven by temporal dependent non-Gaussian innovations, we establish original probabilistic bounds for these two building blocks. The bounds imply that the rates of convergence of LASSO are different from those in the familiar cross sectional case. In practical applications given a mixture of stationary and nonstationary predictors, asymptotic guarantee of LASSO is preserved if all predictors are scale-standardized. In an empirical example of forecasting the unemployment rate with many macroeconomic time series, strong performance is delivered by LASSO when the initial specification is guided by macroeconomic domain expertise.

translated by 谷歌翻译