智能论文笔记

Finite-Sample Guarantees for High-Dimensional DML

Victor Quintas-Martinez

分类：机器学习 | (统计)机器学习

2022-06-15

DECIASED机器学习（DML）提供了一种有吸引力的方法来估计观察环境中的治疗效果，在这种情况下，因果参数的识别需要有条件的独立性或不符的假设，因为它可以灵活地控制大量的协变量。本文提供了新的有限样本保证，可保证对高维DML的关节推断，从而界定了估计量的有限样本分布与其渐近高斯近似相距多远。这些保证对应用研究人员很有用，因为它们可以提供距离标称级别的联合置信带覆盖范围的距离。在许多情况下，高维因果参数可能引起人们的关注，例如许多治疗概况的吃量，或者在许多结果上进行治疗的食品。我们还涵盖了无限维度参数，例如对潜在结果的整个边际分布的影响。本文中的有限样本保证补充了DML估计量的一致性和渐近正态性的现有结果，DML估计量是渐近的，或仅处理一维情况。

translated by 谷歌翻译

Localized Debiased Machine Learning: Efficient Inference on Quantile Treatment Effects and Beyond

Nathan Kallus , Xiaojie Mao , Masatoshi Uehara

分类： (统计)机器学习 | 机器学习

2019-12-30

我们考虑在估计涉及依赖参数的高维滋扰的估计方程中估计一个低维参数。一个中心示例是因果推理中（局部）分位数处理效应（（L）QTE）的有效估计方程，涉及在分位数以估计的分位数评估的协方差累积分布函数。借记机学习（DML）是一种使用灵活的机器学习方法估算高维滋扰的数据分解方法，但是将其应用于参数依赖性滋扰的问题是不切实际的。对于（L）QTE，DML要求我们学习整个协变量累积分布函数。相反，我们提出了局部偏见的机器学习（LDML），该学习避免了这一繁重的步骤，并且只需要对参数进行一次初始粗糙猜测而估算烦恼。对于（L）QTE，LDML仅涉及学习两个回归功能，这是机器学习方法的标准任务。我们证明，在松弛速率条件下，我们的估计量与使用未知的真实滋扰的不可行的估计器具有相同的有利渐近行为。因此，LDML值得注意的是，当我们必须控制许多协变量和/或灵活的关系时，如（l）QTES在（（l）QTES）中，实际上可以有效地估算重要数量，例如（l）QTES。

translated by 谷歌翻译

Orthogonal Series Estimation for the Ratio of Conditional Expectation Functions

Kazuhiko Shinoda , Takahiro Hoshino

分类： (统计)机器学习

2022-12-26

In various fields of data science, researchers are often interested in estimating the ratio of conditional expectation functions (CEFR). Specifically in causal inference problems, it is sometimes natural to consider ratio-based treatment effects, such as odds ratios and hazard ratios, and even difference-based treatment effects are identified as CEFR in some empirically relevant settings. This chapter develops the general framework for estimation and inference on CEFR, which allows the use of flexible machine learning for infinite-dimensional nuisance parameters. In the first stage of the framework, the orthogonal signals are constructed using debiased machine learning techniques to mitigate the negative impacts of the regularization bias in the nuisance estimates on the target estimates. The signals are then combined with a novel series estimator tailored for CEFR. We derive the pointwise and uniform asymptotic results for estimation and inference on CEFR, including the validity of the Gaussian bootstrap, and provide low-level sufficient conditions to apply the proposed framework to some specific examples. We demonstrate the finite-sample performance of the series estimator constructed under the proposed framework by numerical simulations. Finally, we apply the proposed method to estimate the causal effect of the 401(k) program on household assets.

translated by 谷歌翻译

The Projected Covariance Measure for assumption-lean variable significance testing

Anton Rask Lundborg , Ilmun Kim , Rajen D. Shah , Richard J. Samworth

分类： (统计)机器学习

2022-11-03

Testing the significance of a variable or group of variables $X$ for predicting a response $Y$, given additional covariates $Z$, is a ubiquitous task in statistics. A simple but common approach is to specify a linear model, and then test whether the regression coefficient for $X$ is non-zero. However, when the model is misspecified, the test may have poor power, for example when $X$ is involved in complex interactions, or lead to many false rejections. In this work we study the problem of testing the model-free null of conditional mean independence, i.e. that the conditional mean of $Y$ given $X$ and $Z$ does not depend on $X$. We propose a simple and general framework that can leverage flexible nonparametric or machine learning methods, such as additive models or random forests, to yield both robust error control and high power. The procedure involves using these methods to perform regressions, first to estimate a form of projection of $Y$ on $X$ and $Z$ using one half of the data, and then to estimate the expected conditional covariance between this projection and $Y$ on the remaining half of the data. While the approach is general, we show that a version of our procedure using spline regression achieves what we show is the minimax optimal rate in this nonparametric testing problem. Numerical experiments demonstrate the effectiveness of our approach both in terms of maintaining Type I error control, and power, compared to several existing approaches.

translated by 谷歌翻译

Estimation and Inference on Heterogeneous Treatment Effects in High-Dimensional Dynamic Panels under Weak Dependence

Vira Semenova , Matt Goldman , Victor Chernozhukov , Matt Taddy

分类： (统计)机器学习

2017-12-28

This paper provides estimation and inference methods for a conditional average treatment effects (CATE) characterized by a high-dimensional parameter in both homogeneous cross-sectional and unit-heterogeneous dynamic panel data settings. In our leading example, we model CATE by interacting the base treatment variable with explanatory variables. The first step of our procedure is orthogonalization, where we partial out the controls and unit effects from the outcome and the base treatment and take the cross-fitted residuals. This step uses a novel generic cross-fitting method we design for weakly dependent time series and panel data. This method "leaves out the neighbors" when fitting nuisance components, and we theoretically power it by using Strassen's coupling. As a result, we can rely on any modern machine learning method in the first step, provided it learns the residuals well enough. Second, we construct an orthogonal (or residual) learner of CATE -- the Lasso CATE -- that regresses the outcome residual on the vector of interactions of the residualized treatment with explanatory variables. If the complexity of CATE function is simpler than that of the first-stage regression, the orthogonal learner converges faster than the single-stage regression-based learner. Third, we perform simultaneous inference on parameters of the CATE function using debiasing. We also can use ordinary least squares in the last two steps when CATE is low-dimensional. In heterogeneous panel data settings, we model the unobserved unit heterogeneity as a weakly sparse deviation from Mundlak (1978)'s model of correlated unit effects as a linear function of time-invariant covariates and make use of L1-penalization to estimate these models. We demonstrate our methods by estimating price elasticities of groceries based on scanner data. We note that our results are new even for the cross-sectional (i.i.d) case.

translated by 谷歌翻译

Off-policy estimation of linear functionals: Non-asymptotic theory for semi-parametric efficiency

Wenlong Mou , Martin J. Wainwright , Peter L. Bartlett

分类： (统计)机器学习

2022-09-26

在因果推理和强盗文献中，基于观察数据的线性功能估算线性功能的问题是规范的。我们分析了首先估计治疗效果函数的广泛的两阶段程序，然后使用该数量来估计线性功能。我们证明了此类过程的均方误差上的非反应性上限：这些边界表明，为了获得非反应性最佳程序，应在特定加权$ l^2 $中最大程度地估算治疗效果的误差。 -规范。我们根据该加权规范的约束回归分析了两阶段的程序，并通过匹配非轴突局部局部最小值下限，在有限样品中建立了实例依赖性最优性。这些结果表明，除了取决于渐近效率方差之外，最佳的非质子风险除了取决于样本量支持的最富有函数类别的真实结果函数与其近似类别之间的加权规范距离。

translated by 谷歌翻译

Causal Inference Under Unmeasured Confounding With Negative Controls: A Minimax Learning Approach

Nathan Kallus , Xiaojie Mao , Masatoshi Uehara

分类： (统计)机器学习 | 机器学习

2021-03-25

当并非观察到所有混杂因子并获得负面对照时，我们研究因果参数的估计。最近的工作表明，这些方法如何通过两个所谓的桥梁函数来实现识别和有效估计。在本文中，我们使用阴性对照来应对因果推断的主要挑战：这些桥梁功能的识别和估计。先前的工作依赖于这些功能的完整性条件，以识别因果参数并在估计中需要进行独特性假设，并且还集中于桥梁函数的参数估计。相反，我们提供了一种新的识别策略，以避免完整性条件。而且，我们根据最小学习公式为这些功能提供新的估计量。这些估计值适合通用功能类别，例如重现Hilbert空间和神经网络。我们研究了有限样本收敛的结果，既可以估计桥梁功能本身，又要在各种假设组合下对因果参数进行最终估计。我们尽可能避免桥梁上的独特条件。

translated by 谷歌翻译

High-dimensional Inference for Dynamic Treatment Effects

Jelena Bradic , Weijie Ji , Yuqian Zhang

分类：机器学习 | (统计)机器学习

2021-10-10

本文提出了在多阶段实验的背景下的异质治疗效应的置信区间结构，以$ N $样品和高维，$ D $，混淆。我们的重点是$ d \ gg n $的情况，但获得的结果也适用于低维病例。我们展示了正则化估计的偏差，在高维变焦空间中不可避免，具有简单的双重稳固分数。通过这种方式，不需要额外的偏差，并且我们获得root $ N $推理结果，同时允许治疗和协变量的多级相互依赖性。记忆财产也没有假设;治疗可能取决于所有先前的治疗作业以及以前的所有多阶段混淆。我们的结果依赖于潜在依赖的某些稀疏假设。我们发现具有动态处理的强大推理所需的新产品率条件。

translated by 谷歌翻译

Risk bounds when learning infinitely many response functions by ordinary linear regression

Vincent Plassier , François Portier , Johan Segers

分类： (统计)机器学习 | 机器学习

2020-06-16

考虑基于相同的输入变量的同时学习大量响应函数的问题。训练数据包括从共同分布绘制的输入变量的单个独立随机样本以及相关的响应。将输入变量映射到称为特征空间的高维线性空间，并且响应函数被建模为映射特征的线性功能，通过普通最小二乘校准系数。我们通过在响应函数均匀地控制过度风险的收敛速度来提供最坏情况过度预测风险的收敛保证。允许特征图的尺寸倾向于与样本大小无穷大。响应功能的集合虽然可能是无限的，但应该具有有限的VAPNIK-Chervonenkis维度。在合理的计算时间内构建多个代理模型时，可以应用所派生的界限。

translated by 谷歌翻译

Adversarial Estimators

Jonas Metzger

分类：机器学习 | (统计)机器学习

2022-04-22

我们开发了对对抗估计量（“ A-估计器”）的渐近理论。它们将最大样品型估计量（“ M-估计器”）推广为平均目标，以通过某些参数最大化，而其他参数则最小化。该课程涵盖了瞬间的瞬间通用方法，生成的对抗网络以及机器学习和计量经济学方面的最新建议。在这些示例中，研究人员指出，原则上可以使用哪些方面进行估计，并且对手学习如何最佳地强调它们。我们在重点和部分识别下得出A估计剂的收敛速率，以及其参数功能的正态性。未知功能可以通过筛子（例如深神经网络）近似，我们为此提供简化的低级条件。作为推论，我们获得了神经网络估计剂的正态性，克服了文献先前确定的技术问题。我们的理论产生了有关各种A估计器的新成果，为它们在最近的应用中的成功提供了直觉和正式的理由。

translated by 谷歌翻译

Dimension-agnostic inference using cross U-statistics

Ilmun Kim , Aaditya Ramdas

分类： (统计)机器学习

2020-11-10

Classical asymptotic theory for statistical inference usually involves calibrating a statistic by fixing the dimension $d$ while letting the sample size $n$ increase to infinity. Recently, much effort has been dedicated towards understanding how these methods behave in high-dimensional settings, where $d$ and $n$ both increase to infinity together. This often leads to different inference procedures, depending on the assumptions about the dimensionality, leaving the practitioner in a bind: given a dataset with 100 samples in 20 dimensions, should they calibrate by assuming $n \gg d$, or $d/n \approx 0.2$? This paper considers the goal of dimension-agnostic inference; developing methods whose validity does not depend on any assumption on $d$ versus $n$. We introduce an approach that uses variational representations of existing test statistics along with sample splitting and self-normalization to produce a new test statistic with a Gaussian limiting distribution, regardless of how $d$ scales with $n$. The resulting statistic can be viewed as a careful modification of degenerate U-statistics, dropping diagonal blocks and retaining off-diagonal blocks. We exemplify our technique for some classical problems including one-sample mean and covariance testing, and show that our tests have minimax rate-optimal power against appropriate local alternatives. In most settings, our cross U-statistic matches the high-dimensional power of the corresponding (degenerate) U-statistic up to a $\sqrt{2}$ factor.

translated by 谷歌翻译

Debiased Inference on Identified Linear Functionals of Underidentified Nuisances via Penalized Minimax Estimation

Nathan Kallus , Xiaojie Mao

分类： (统计)机器学习

2022-08-17

我们研究了对识别的非唯一麻烦的线性功能的通用推断，该功能定义为未识别条件矩限制的解决方案。这个问题出现在各种应用中，包括非参数仪器变量模型，未衡量的混杂性下的近端因果推断以及带有阴影变量的丢失 - 与随机数据。尽管感兴趣的线性功能（例如平均治疗效应）在适当的条件下是可以识别出的，但令人讨厌的非独家性对统计推断构成了严重的挑战，因为在这种情况下，常见的滋扰估计器可能是不稳定的，并且缺乏固定限制。在本文中，我们提出了对滋扰功能的受惩罚的最小估计器，并表明它们在这种挑战性的环境中有效推断。提出的滋扰估计器可以适应灵活的功能类别，重要的是，无论滋扰是否是唯一的，它们都可以融合到由惩罚确定的固定限制。我们使用受惩罚的滋扰估计器来形成有关感兴趣的线性功能的依据估计量，并在通用高级条件下证明其渐近正态性，这提供了渐近有效的置信区间。

translated by 谷歌翻译

Debiased Machine Learning of Set-Identified Linear Models

Vira Semenova

分类： (统计)机器学习 | 机器学习

2017-12-28

This paper provides estimation and inference methods for an identified set's boundary (i.e., support function) where the selection among a very large number of covariates is based on modern regularized tools. I characterize the boundary using a semiparametric moment equation. Combining Neyman-orthogonality and sample splitting ideas, I construct a root-N consistent, uniformly asymptotically Gaussian estimator of the boundary and propose a multiplier bootstrap procedure to conduct inference. I apply this result to the partially linear model, the partially linear IV model and the average partial derivative with an interval-valued outcome.

translated by 谷歌翻译

Finite Sample Analysis of Minimax Offline Reinforcement Learning: Completeness, Fast Rates and First-Order Efficiency

Masatoshi Uehara , Masaaki Imaizumi , Nan Jiang , Nathan Kallus , Wen Sun , Tengyang Xie

分类：机器学习 | (统计)机器学习

2021-02-05

我们在使用函数近似的情况下，在使用最小的Minimax方法估算这些功能时，使用功能近似来实现函数近似和$ q $ functions的理论表征。在各种可靠性和完整性假设的组合下，我们表明Minimax方法使我们能够实现重量和质量功能的快速收敛速度，其特征在于关键的不平等\ citep {bartlett2005}。基于此结果，我们分析了OPE的收敛速率。特别是，我们引入了新型的替代完整性条件，在该条件下，OPE是可行的，我们在非尾部环境中以一阶效率提出了第一个有限样本结果，即在领先期限中具有最小的系数。

translated by 谷歌翻译

A General Framework for Treatment Effect Estimation in Semi-Supervised and High Dimensional Settings

Abhishek Chakrabortty , Guorong Dai , Eric Tchetgen Tchetgen

分类： (统计)机器学习

2022-01-03

在本文中，我们的目标是提供对半监督（SS）因果推理的一般性和完全理解治疗效果。具体而言，我们考虑两个这样的估计值：（a）平均治疗效果和（b）定量处理效果，作为原型案例，在SS设置中，其特征在于两个可用的数据集：（i）标记的数据集大小$ N $，为响应和一组高维协变量以及二元治疗指标提供观察。（ii）一个未标记的数据集，大小超过$ n $，但未观察到的响应。使用这两个数据集，我们开发了一个SS估计系列，该系列是：（1）更强大，并且（2）比其监督对应力更高的基于标记的数据集。除了通过监督方法可以实现的“标准”双重稳健结果（在一致性方面），我们还在正确指定模型中的倾向得分，我们进一步建立了我们SS估计的根本-N一致性和渐近常态。没有需要涉及的特定形式的滋扰职能。这种改善的鲁棒性来自使用大规模未标记的数据，因此通常不能在纯粹监督的环境中获得。此外，只要正确指定所有滋扰函数，我们的估计值都显示为半参数效率。此外，作为滋扰估计器的说明，我们考虑逆概率加权型核平滑估计，涉及未知的协变量转换机制，并在高维情景新颖的情况下建立其统一的收敛速率，这应该是独立的兴趣。两种模拟和实际数据的数值结果验证了我们对其监督对应物的优势，了解鲁棒性和效率。

translated by 谷歌翻译

Doubly-Valid/Doubly-Sharp Sensitivity Analysis for Causal Inference with Unmeasured Confounding

Jacob Dorn , Kevin Guo , Nathan Kallus

分类：机器学习 | (统计)机器学习

2021-12-21

在TAN（2006）边缘敏感模型下，在不观察到的混淆存在下构建平均处理效应的界限问题。结合涉及对冲倾向分数的现有表征具有对问题的新的分布稳健特征，我们提出了我们称之为“双重有效/双重尖锐”（DVD）估计的这些界限的新颖估算器。双重清晰度对应于DVD估计始终估计灵敏度模型所暗示的最有可能（即，夏普）的界限，即使当所有滋扰参数都适当一致时，即使在两个滋扰参数中的一个被击败并实现半污染参数之一。双倍有效性是部分识别的全新财产：DVD估计仍然提供有效，但即使在大多数滋扰参数都被遗漏时，仍然没有锐利。实际上，即使在DVDS点估计无法渐近正常的情况下，标准沃尔德置信区间也可能保持有效。在二进制结果的情况下，DVD估计是特别方便的并且在结果回归和倾向评分方面具有闭合形式的表达。我们展示了模拟研究中的DVD估计，以及对右心导管插入的案例研究。

translated by 谷歌翻译

Debiasing In-Sample Policy Performance for Small-Data, Large-Scale Optimization

Vishal Gupta , Michael Huang , Paat Rusmevichientong

分类：机器学习 | (统计)机器学习

2021-07-26

由于在数据稀缺的设置中，交叉验证的性能不佳，我们提出了一个新颖的估计器，以估计数据驱动的优化策略的样本外部性能。我们的方法利用优化问题的灵敏度分析来估计梯度关于数据中噪声量的最佳客观值，并利用估计的梯度将策略的样本中的表现为依据。与交叉验证技术不同，我们的方法避免了为测试集牺牲数据，在训练和因此非常适合数据稀缺的设置时使用所有数据。我们证明了我们估计量的偏见和方差范围，这些问题与不确定的线性目标优化问题，但已知的，可能是非凸的，可行的区域。对于更专业的优化问题，从某种意义上说，可行区域“弱耦合”，我们证明结果更强。具体而言，我们在估算器的错误上提供明确的高概率界限，该估计器在策略类别上均匀地保持，并取决于问题的维度和策略类的复杂性。我们的边界表明，在轻度条件下，随着优化问题的尺寸的增长，我们的估计器的误差也会消失，即使可用数据的量仍然很小且恒定。说不同的是，我们证明我们的估计量在小型数据中的大规模政权中表现良好。最后，我们通过数值将我们提出的方法与最先进的方法进行比较，通过使用真实数据调度紧急医疗响应服务的案例研究。我们的方法提供了更准确的样本外部性能估计，并学习了表现更好的政策。

translated by 谷歌翻译

Neighborhood Adaptive Estimators for Causal Inference under Network Interference

Alexandre Belloni , Fei Fang , Alexander Volfovsky

分类： (统计)机器学习 | 机器学习

2022-12-07

Estimating causal effects has become an integral part of most applied fields. Solving these modern causal questions requires tackling violations of many classical causal assumptions. In this work we consider the violation of the classical no-interference assumption, meaning that the treatment of one individuals might affect the outcomes of another. To make interference tractable, we consider a known network that describes how interference may travel. However, unlike previous work in this area, the radius (and intensity) of the interference experienced by a unit is unknown and can depend on different sub-networks of those treated and untreated that are connected to this unit. We study estimators for the average direct treatment effect on the treated in such a setting. The proposed estimator builds upon a Lepski-like procedure that searches over the possible relevant radii and treatment assignment patterns. In contrast to previous work, the proposed procedure aims to approximate the relevant network interference patterns. We establish oracle inequalities and corresponding adaptive rates for the estimation of the interference function. We leverage such estimates to propose and analyze two estimators for the average direct treatment effect on the treated. We address several challenges steaming from the data-driven creation of the patterns (i.e. feature engineering) and the network dependence. In addition to rates of convergence, under mild regularity conditions, we show that one of the proposed estimators is asymptotically normal and unbiased.

translated by 谷歌翻译

Dimension-Free Average Treatment Effect Inference with Deep Neural Networks

Xinze Du , Yingying Fan , Jinchi Lv , Tianshu Sun , Patrick Vossler

分类： (统计)机器学习 | 机器学习

2021-12-02

本文研究了在潜在的结果框架中使用深神经网络（DNN）的平均治疗效果（ATE）的估计和推理。在一些规则性条件下，观察到的响应可以作为与混杂变量和治疗指标作为自变量的平均回归问题的响应。使用这种配方，我们研究了通过使用特定网络架构的DNN回归基于估计平均回归函数的两种尝试估计和推断方法。我们表明ATE的两个DNN估计在底层真正的均值回归模型上的一些假设下与无维一致性率一致。我们的模型假设可容纳观察到的协变量的潜在复杂的依赖结构，包括治疗指标和混淆变量之间的潜在因子和非线性相互作用。我们还基于采样分裂的思想，确保精确推理和不确定量化，建立了我们估计的渐近常态。仿真研究和实际数据应用证明了我们的理论调查结果，支持我们的DNN估计和推理方法。

translated by 谷歌翻译

Estimating Heterogeneous Bounds for Treatment Effects under Sample Selection and Non-response

Phillip Heiler

分类： (统计)机器学习

2022-09-09

在本文中，我们提出了一种非参数估计的方法，并推断了一般样本选择模型中因果效应参数的异质界限，初始治疗可能会影响干预后结果是否观察到。可观察到的协变量可能会混淆治疗选择，而观察结果和不可观察的结果可能会混淆。该方法提供条件效应界限作为策略相关的预处理变量的功能。它允许对身份不明的条件效应曲线进行有效的统计推断。我们使用灵活的半参数脱偏机学习方法，该方法可以适应柔性功能形式和治疗，选择和结果过程之间的高维混杂变量。还提供了易于验证的高级条件，以进行估计和错误指定的鲁棒推理保证。

translated by 谷歌翻译