找到与治疗效果差异相关的特征对于揭示基本因果机制至关重要。现有方法通过测量特征属性如何影响{\ iT条件平均治疗效果}(CATE)的程度来寻求此类特征。但是,这些方法可能会忽略重要特征,因为CATE是平均治疗效果的度量,无法检测到平均值以外的其他分布参数(例如方差)的差异。为了解决现有方法的这种弱点,我们提出了一个特征选择框架,以发现{\ IT分布处理效果修饰符}。我们首先制定特征重要性度量,该指标量化特征属性如何影响潜在结果分布之间的差异。然后,我们得出其计算高效的估计器,并开发了一个功能选择算法,该算法可以将I型错误率控制为所需级别。实验结果表明,我们的框架成功地发现了重要特征,并优于现有的基于均值的方法。
translated by 谷歌翻译
We propose a framework for analyzing and comparing distributions, which we use to construct statistical tests to determine if two samples are drawn from different distributions. Our test statistic is the largest difference in expectations over functions in the unit ball of a reproducing kernel Hilbert space (RKHS), and is called the maximum mean discrepancy (MMD). We present two distributionfree tests based on large deviation bounds for the MMD, and a third test based on the asymptotic distribution of this statistic. The MMD can be computed in quadratic time, although efficient linear time approximations are available. Our statistic is an instance of an integral probability metric, and various classical metrics on distributions are obtained when alternative function classes are used in place of an RKHS. We apply our two-sample tests to a variety of problems, including attribute matching for databases using the Hungarian marriage method, where they perform strongly. Excellent performance is also obtained when comparing distributions over graphs, for which these are the first such tests.
translated by 谷歌翻译
在监视机器学习系统时,均匀性的两样本测试构成了现有的漂移检测构建方法的基础。它们用于测试证据表明,最近部署数据的分布与历史参考数据的基础数据不同。但是,通常,诸如时间诱导的相关性等各种因素意味着,预计最近的部署数据不会形成I.I.D.来自历史数据分布的样本。取而代之的是,我们可能希望测试允许更改的\ textit {Context}条件上的分布差异。为了促进这一点,我们从因果推理域借用机械,以开发出更通用的漂移检测框架,建立在有条件分布治疗效果的两样本测试基础上。我们建议根据最大条件平均差异对框架进行特定的实例化。然后,我们提供了一项实证研究,证明了其对实践感兴趣的各种漂移检测问题的有效性,例如以对其各自的流行率不敏感的方式检测数据基础分布的漂移。该研究还证明了对成像网尺度视力问题的适用性。
translated by 谷歌翻译
有许多可用于选择优先考虑治疗的可用方法,包括基于治疗效果估计,风险评分和手工制作规则的遵循申请。我们将秩加权平均治疗效应(RATY)指标作为一种简单常见的指标系列,用于比较水平竞争范围的治疗优先级规则。对于如何获得优先级规则,率是不可知的,并且仅根据他们在识别受益于治疗中受益的单位的方式进行评估。我们定义了一系列速率估算器,并证明了一个中央限位定理,可以在各种随机和观测研究环境中实现渐近精确的推断。我们为使用自主置信区间的使用提供了理由,以及用于测试关于治疗效果中的异质性的假设的框架,与优先级规则相关。我们对速率的定义嵌套了许多现有度量,包括QINI系数,以及我们的分析直接产生了这些指标的推论方法。我们展示了我们从个性化医学和营销的示例中的方法。在医疗环境中,使用来自Sprint和Accor-BP随机对照试验的数据,我们发现没有明显的证据证明异质治疗效果。另一方面,在大量的营销审判中,我们在一些数字广告活动的治疗效果中发现了具有的强大证据,并证明了如何使用率如何比较优先考虑估计风险的目标规则与估计治疗效益优先考虑的目标规则。
translated by 谷歌翻译
随着混凝剂的数量增加,因果推理越来越复杂。给定护理$ x $,混淆器$ z $和结果$ y $,我们开发一个非参数方法来测试\ texit {do-null}假设$ h_0:\; p(y | \ text {\它do}(x = x))= p(y)$违反替代方案。在Hilbert Schmidt独立性标准(HSIC)上进行边缘独立性测试,我们提出了后门 - HSIC(BD-HSIC)并证明它被校准,并且在大量混淆下具有二元和连续治疗的力量。此外,我们建立了BD-HSIC中使用的协方差运算符的估计的收敛性质。我们研究了BD-HSIC对参数测试的优点和缺点以及与边缘独立测试或有条件独立测试相比使用DO-NULL测试的重要性。可以在\超链接{https:/github.com/mrhuff/kgformula} {\ texttt {https://github.com/mrhuff/kgformula}}完整的实现。
translated by 谷歌翻译
Classical asymptotic theory for statistical inference usually involves calibrating a statistic by fixing the dimension $d$ while letting the sample size $n$ increase to infinity. Recently, much effort has been dedicated towards understanding how these methods behave in high-dimensional settings, where $d$ and $n$ both increase to infinity together. This often leads to different inference procedures, depending on the assumptions about the dimensionality, leaving the practitioner in a bind: given a dataset with 100 samples in 20 dimensions, should they calibrate by assuming $n \gg d$, or $d/n \approx 0.2$? This paper considers the goal of dimension-agnostic inference; developing methods whose validity does not depend on any assumption on $d$ versus $n$. We introduce an approach that uses variational representations of existing test statistics along with sample splitting and self-normalization to produce a new test statistic with a Gaussian limiting distribution, regardless of how $d$ scales with $n$. The resulting statistic can be viewed as a careful modification of degenerate U-statistics, dropping diagonal blocks and retaining off-diagonal blocks. We exemplify our technique for some classical problems including one-sample mean and covariance testing, and show that our tests have minimax rate-optimal power against appropriate local alternatives. In most settings, our cross U-statistic matches the high-dimensional power of the corresponding (degenerate) U-statistic up to a $\sqrt{2}$ factor.
translated by 谷歌翻译
There is intense interest in applying machine learning to problems of causal inference in fields such as healthcare, economics and education. In particular, individual-level causal inference has important applications such as precision medicine. We give a new theoretical analysis and family of algorithms for predicting individual treatment effect (ITE) from observational data, under the assumption known as strong ignorability. The algorithms learn a "balanced" representation such that the induced treated and control distributions look similar. We give a novel, simple and intuitive generalization-error bound showing that the expected ITE estimation error of a representation is bounded by a sum of the standard generalization-error of that representation and the distance between the treated and control distributions induced by the representation. We use Integral Probability Metrics to measure distances between distributions, deriving explicit bounds for the Wasserstein and Maximum Mean Discrepancy (MMD) distances. Experiments on real and simulated data show the new algorithms match or outperform the state-of-the-art.
translated by 谷歌翻译
在本文中,我们研究了高维条件独立测试,统计和机器学习中的关键构建块问题。我们提出了一种基于双生成对抗性网络(GANS)的推理程序。具体来说,我们首先介绍双GANS框架来学习两个发电机的条件分布。然后,我们将这两个生成器集成到构造测试统计,这采用多个转换函数的广义协方差措施的最大形式。我们还采用了数据分割和交叉拟合来最小化发电机上的条件,以实现所需的渐近属性,并采用乘法器引导来获得相应的$ P $ -Value。我们表明,构造的测试统计数据是双重稳健的,并且由此产生的测试既逆向I误差,并具有渐近的电源。同样的是,与现有测试相比,我们建立了较弱和实际上更可行的条件下的理论保障,我们的提案提供了如何利用某些最先进的深层学习工具(如GAN)的具体示例帮助解决古典但具有挑战性的统计问题。我们通过模拟和应用于抗癌药物数据集来证明我们的测试的疗效。在https://github.com/tianlinxu312/dgcit上提供了所提出的程序的Python实现。
translated by 谷歌翻译
我们提出了一种基于最大平均差异(MMD)的新型非参数两样本测试,该测试是通过具有不同核带宽的聚合测试来构建的。这种称为MMDAGG的聚合过程可确保对所使用的内核的收集最大化测试能力,而无需持有核心选择的数据(这会导致测试能力损失)或任意内核选择,例如中位数启发式。我们在非反应框架中工作,并证明我们的聚集测试对Sobolev球具有最小自适应性。我们的保证不仅限于特定的内核,而是符合绝对可集成的一维翻译不变特性内核的任何产品。此外,我们的结果适用于流行的数值程序来确定测试阈值,即排列和野生引导程序。通过对合成数据集和现实世界数据集的数值实验,我们证明了MMDAGG优于MMD内核适应的替代方法,用于两样本测试。
translated by 谷歌翻译
我们提出和分析了一种新颖的统计程序,即创建的Agrasst,以评估可能以明确形式可用的图形生成器的质量。特别是,Agrasst可用于确定学习的图生成过程是否能够生成类似给定输入图的图。受到随机图的Stein运算符的启发,Agrasst的关键思想是基于从图生成器获得的操作员的内核差异的构建。Agrasst可以为图形生成器培训程序提供可解释的批评,并帮助确定可靠的下游任务样品批次。使用Stein的方法,我们为广泛的随机图模型提供了理论保证。我们在两个合成输入图上提供了经验结果,并具有已知的图生成过程,以及对图形最新的(深)生成模型进行训练的现实输入图。
translated by 谷歌翻译
有条件的随机测试(CRTS)评估了一个变量$ x $是否可以预测另一个变量$ y $,因为观察到了协变量$ z $。 CRT需要拟合大量的预测模型,这通常在计算上是棘手的。降低CRT成本的现有解决方案通常将数据集分为火车和测试部分,或者依靠启发式方法进行互动,这两者都会导致权力损失。我们提出了脱钩的独立性测试(饮食),该算法通过利用边际独立性统计数据来测试条件独立关系来避免这两个问题。饮食测试两个随机变量的边际独立性:$ f(x \ hid z)$和$ f(y \ mid z)$,其中$ f(\ cdot \ mid z)$是有条件的累积分配功能(CDF)。这些变量称为“信息残差”。我们为饮食提供足够的条件,以实现有限的样本类型误差控制和大于1型错误率的功率。然后,我们证明,在使用信息残差之间的相互信息作为测试统计数据时,饮食会产生最强大的有条件测试。最后,我们显示出比几个合成和真实基准测试的其他可处理的CRT的饮食能力更高。
translated by 谷歌翻译
Although understanding and characterizing causal effects have become essential in observational studies, it is challenging when the confounders are high-dimensional. In this article, we develop a general framework $\textit{CausalEGM}$ for estimating causal effects by encoding generative modeling, which can be applied in both binary and continuous treatment settings. Under the potential outcome framework with unconfoundedness, we establish a bidirectional transformation between the high-dimensional confounders space and a low-dimensional latent space where the density is known (e.g., multivariate normal distribution). Through this, CausalEGM simultaneously decouples the dependencies of confounders on both treatment and outcome and maps the confounders to the low-dimensional latent space. By conditioning on the low-dimensional latent features, CausalEGM can estimate the causal effect for each individual or the average causal effect within a population. Our theoretical analysis shows that the excess risk for CausalEGM can be bounded through empirical process theory. Under an assumption on encoder-decoder networks, the consistency of the estimate can be guaranteed. In a series of experiments, CausalEGM demonstrates superior performance over existing methods for both binary and continuous treatments. Specifically, we find CausalEGM to be substantially more powerful than competing methods in the presence of large sample sizes and high dimensional confounders. The software of CausalEGM is freely available at https://github.com/SUwonglab/CausalEGM.
translated by 谷歌翻译
我们使用最大平均差异(MMD),Hilbert Schmidt独立标准(HSIC)和内核Stein差异(KSD),,提出了一系列针对两样本,独立性和合适性问题的计算效率,非参数测试,用于两样本,独立性和合适性问题。分别。我们的测试统计数据是不完整的$ u $统计信息,其计算成本与与经典$ u $ u $统计测试相关的样本数量和二次时间之间的线性时间之间的插值。这三个提出的测试在几个内核带宽上汇总,以检测各种尺度的零件:我们称之为结果测试mmdagginc,hsicagginc和ksdagginc。对于测试阈值,我们得出了一个针对野生引导不完整的$ U $ - 统计数据的分位数,该统计是独立的。我们得出了MMDagginc和Hsicagginc的均匀分离率,并准确量化了计算效率和可实现速率之间的权衡:据我们所知,该结果是基于不完整的$ U $统计学的测试新颖的。我们进一步表明,在二次时间案例中,野生引导程序不会对基于更广泛的基于置换的方法进行测试功率,因为​​两者都达到了相同的最小最佳速率(这反过来又与使用Oracle分位数的速率相匹配)。我们通过数值实验对计算效率和测试能力之间的权衡进行数字实验来支持我们的主张。在三个测试框架中,我们观察到我们提出的线性时间聚合测试获得的功率高于当前最新线性时间内核测试。
translated by 谷歌翻译
The kernel Maximum Mean Discrepancy~(MMD) is a popular multivariate distance metric between distributions that has found utility in two-sample testing. The usual kernel-MMD test statistic is a degenerate U-statistic under the null, and thus it has an intractable limiting distribution. Hence, to design a level-$\alpha$ test, one usually selects the rejection threshold as the $(1-\alpha)$-quantile of the permutation distribution. The resulting nonparametric test has finite-sample validity but suffers from large computational cost, since every permutation takes quadratic time. We propose the cross-MMD, a new quadratic-time MMD test statistic based on sample-splitting and studentization. We prove that under mild assumptions, the cross-MMD has a limiting standard Gaussian distribution under the null. Importantly, we also show that the resulting test is consistent against any fixed alternative, and when using the Gaussian kernel, it has minimax rate-optimal power against local alternatives. For large sample sizes, our new cross-MMD provides a significant speedup over the MMD, for only a slight loss in power.
translated by 谷歌翻译
Causal inference is the process of using assumptions, study designs, and estimation strategies to draw conclusions about the causal relationships between variables based on data. This allows researchers to better understand the underlying mechanisms at work in complex systems and make more informed decisions. In many settings, we may not fully observe all the confounders that affect both the treatment and outcome variables, complicating the estimation of causal effects. To address this problem, a growing literature in both causal inference and machine learning proposes to use Instrumental Variables (IV). This paper serves as the first effort to systematically and comprehensively introduce and discuss the IV methods and their applications in both causal inference and machine learning. First, we provide the formal definition of IVs and discuss the identification problem of IV regression methods under different assumptions. Second, we categorize the existing work on IV methods into three streams according to the focus on the proposed methods, including two-stage least squares with IVs, control function with IVs, and evaluation of IVs. For each stream, we present both the classical causal inference methods, and recent developments in the machine learning literature. Then, we introduce a variety of applications of IV methods in real-world scenarios and provide a summary of the available datasets and algorithms. Finally, we summarize the literature, discuss the open problems and suggest promising future research directions for IV methods and their applications. We also develop a toolkit of IVs methods reviewed in this survey at https://github.com/causal-machine-learning-lab/mliv.
translated by 谷歌翻译
我们提出了一项新的条件依赖度量和有条件独立性的统计检验。该度量基于在有限位置评估的两个合理分布的分析内嵌入之间的差异。我们在条件独立性的无效假设下获得其渐近分布,并从中设计一致的统计检验。我们进行了一系列实验,表明我们的新测试在I型和类型II误差方面都超过了最先进的方法,即使在高维设置中也是如此。
translated by 谷歌翻译
As causal inference becomes more widespread the importance of having good tools to test for causal effects increases. In this work we focus on the problem of testing for causal effects that manifest in a difference in distribution for treatment and control. We build on work applying kernel methods to causality, considering the previously introduced Counterfactual Mean Embedding framework (\textsc{CfME}). We improve on this by proposing the \emph{Doubly Robust Counterfactual Mean Embedding} (\textsc{DR-CfME}), which has better theoretical properties than its predecessor by leveraging semiparametric theory. This leads us to propose new kernel based test statistics for distributional effects which are based upon doubly robust estimators of treatment effects. We propose two test statistics, one which is a direct improvement on previous work and one which can be applied even when the support of the treatment arm is a subset of that of the control arm. We demonstrate the validity of our methods on simulated and real-world data, as well as giving an application in off-policy evaluation.
translated by 谷歌翻译
在制定政策指南时,随机对照试验(RCT)代表了黄金标准。但是,RCT通常是狭窄的,并且缺乏更广泛的感兴趣人群的数据。这些人群中的因果效应通常是使用观察数据集估算的,这可能会遭受未观察到的混杂和选择偏见。考虑到一组观察估计(例如,来自多项研究),我们提出了一个试图拒绝偏见的观察性估计值的元偏值。我们使用验证效应,可以从RCT和观察数据中推断出的因果效应。在拒绝未通过此测试的估计器之后,我们对RCT中未观察到的亚组的外推性效应产生了保守的置信区间。假设至少一个观察估计量在验证和外推效果方面是渐近正常且一致的,我们为我们算法输出的间隔的覆盖率概率提供了保证。为了促进在跨数据集的因果效应运输的设置中,我们给出的条件下,即使使用灵活的机器学习方法用于估计滋扰参数,群体平均治疗效应的双重稳定估计值也是渐近的正常。我们说明了方法在半合成和现实世界数据集上的特性,并表明它与标准的荟萃分析技术相比。
translated by 谷歌翻译
估计平均因果效应的理想回归(如果有)是什么?我们在离散协变量的设置中研究了这个问题,从而得出了各种分层估计器的有限样本方差的表达式。这种方法阐明了许多广泛引用的结果的基本统计现象。我们的博览会结合了研究因果效应估计的三种不同的方法论传统的见解:潜在结果,因果图和具有加性误差的结构模型。
translated by 谷歌翻译
我们提出了用于中介分析和动态治疗效果的内核脊回归估计。我们允许治疗,协变量和介质是离散或连续的,低,高或无限的尺寸。我们在内核矩阵操作方面提出了具有封闭式解决方案的依据,增量和分布的估算者。对于连续治疗案例,我们证明了具有有限样本速率的均匀一致性。对于离散处理案例,我们证明了根 - N一致性,高斯近似和半占用效率。我们进行仿真,然后估计美国职务团计划的介导和动态治疗效果,弱势青少年。
translated by 谷歌翻译