We introduce the Conditional Independence Regression CovariancE (CIRCE), a measure of conditional independence for multivariate continuous-valued variables. CIRCE applies as a regularizer in settings where we wish to learn neural features $\varphi(X)$ of data $X$ to estimate a target $Y$, while being conditionally independent of a distractor $Z$ given $Y$. Both $Z$ and $Y$ are assumed to be continuous-valued but relatively low dimensional, whereas $X$ and its features may be complex and high dimensional. Relevant settings include domain-invariant learning, fairness, and causal learning. The procedure requires just a single ridge regression from $Y$ to kernelized features of $Z$, which can be done in advance. It is then only necessary to enforce independence of $\varphi(X)$ from residuals of this regression, which is possible with attractive estimation properties and consistency guarantees. By contrast, earlier measures of conditional feature dependence require multiple regressions for each step of feature learning, resulting in more severe bias and variance, and greater computational cost. When sufficiently rich features are used, we establish that CIRCE is zero if and only if $\varphi(X) \perp \!\!\! \perp Z \mid Y$. In experiments, we show superior performance to previous methods on challenging benchmarks, including learning conditionally invariant image features.
translated by 谷歌翻译
我们提出了一种学习在某些协变量反事实变化下不变的预测因子的方法。当预测目标受到不应影响预测因子输出的协变量影响时,此方法很有用。例如,对象识别模型可能会受到对象本身的位置,方向或比例的影响。我们解决了训练预测因素的问题,这些预测因素明确反对反对这种协变量的变化。我们提出了一个基于条件内核均值嵌入的模型不合稳定项,以在训练过程中实现反事实的不变性。我们证明了我们的方法的健全性,可以处理混合的分类和连续多变量属性。关于合成和现实世界数据的经验结果证明了我们方法在各种环境中的功效。
translated by 谷歌翻译
我们解决了在没有观察到的混杂的存在下的因果效应估计的问题,但是观察到潜在混杂因素的代理。在这种情况下,我们提出了两种基于内核的方法,用于非线性因果效应估计:(a)两阶段回归方法,以及(b)最大矩限制方法。我们专注于近端因果学习设置,但是我们的方法可以用来解决以弗雷霍尔姆积分方程为特征的更广泛的逆问题。特别是,我们提供了在非线性环境中解决此问题的两阶段和矩限制方法的统一视图。我们为每种算法提供一致性保证,并证明这些方法在合成数据和模拟现实世界任务的数据上获得竞争结果。特别是,我们的方法优于不适合利用代理变量的早期方法。
translated by 谷歌翻译
随着混凝剂的数量增加,因果推理越来越复杂。给定护理$ x $,混淆器$ z $和结果$ y $,我们开发一个非参数方法来测试\ texit {do-null}假设$ h_0:\; p(y | \ text {\它do}(x = x))= p(y)$违反替代方案。在Hilbert Schmidt独立性标准(HSIC)上进行边缘独立性测试,我们提出了后门 - HSIC(BD-HSIC)并证明它被校准,并且在大量混淆下具有二元和连续治疗的力量。此外,我们建立了BD-HSIC中使用的协方差运算符的估计的收敛性质。我们研究了BD-HSIC对参数测试的优点和缺点以及与边缘独立测试或有条件独立测试相比使用DO-NULL测试的重要性。可以在\超链接{https:/github.com/mrhuff/kgformula} {\ texttt {https://github.com/mrhuff/kgformula}}完整的实现。
translated by 谷歌翻译
我们解决了条件平均嵌入(CME)的内核脊回归估算的一致性,这是给定$ y $ x $的条件分布的嵌入到目标重现内核hilbert space $ hilbert space $ hilbert Space $ \ Mathcal {H} _y $ $ $ $ 。 CME允许我们对目标RKHS功能的有条件期望,并已在非参数因果和贝叶斯推论中使用。我们解决了错误指定的设置,其中目标CME位于Hilbert-Schmidt操作员的空间中,该操作员从$ \ Mathcal {H} _X _x $和$ L_2 $和$ \ MATHCAL {H} _Y $ $之间的输入插值空间起作用。该操作员的空间被证明是新定义的矢量值插值空间的同构。使用这种同构,我们在未指定的设置下为经验CME估计量提供了一种新颖的自适应统计学习率。我们的分析表明,我们的费率与最佳$ o(\ log n / n)$速率匹配,而无需假设$ \ Mathcal {h} _y $是有限维度。我们进一步建立了学习率的下限,这表明所获得的上限是最佳的。
translated by 谷歌翻译
We propose a framework for analyzing and comparing distributions, which we use to construct statistical tests to determine if two samples are drawn from different distributions. Our test statistic is the largest difference in expectations over functions in the unit ball of a reproducing kernel Hilbert space (RKHS), and is called the maximum mean discrepancy (MMD). We present two distributionfree tests based on large deviation bounds for the MMD, and a third test based on the asymptotic distribution of this statistic. The MMD can be computed in quadratic time, although efficient linear time approximations are available. Our statistic is an instance of an integral probability metric, and various classical metrics on distributions are obtained when alternative function classes are used in place of an RKHS. We apply our two-sample tests to a variety of problems, including attribute matching for databases using the Hungarian marriage method, where they perform strongly. Excellent performance is also obtained when comparing distributions over graphs, for which these are the first such tests.
translated by 谷歌翻译
负面对照是在存在未衡量混杂的情况下学习治疗与结果之间因果关系的策略。但是,如果有两个辅助变量可用:阴性对照治疗(对实际结果没有影响),并且可以确定治疗效果,并且可以识别出负面对照的结果(不受实际治疗的影响)。这些辅助变量也可以看作是一组传统控制变量的代理,并且与仪器变量相似。我提出了一种基于内核脊回归的算法系列,用于学习非参数治疗效果,并具有阴性对照。例子包括剂量反应曲线,具有分布转移的剂量反应曲线以及异质治疗效果。数据可能是离散的或连续的,并且低,高或无限的尺寸。我证明一致性均匀,并提供有限的收敛速率。我使用宾夕法尼亚州1989年至1991年之间在宾夕法尼亚州的单身人士出生的数据集对婴儿的出生体重进行了吸烟的剂量反应曲线,以调整未观察到的混杂因素。
translated by 谷歌翻译
我们提出了基于内核Ridge回归的估计估算师,用于非参数结构功能(也称为剂量响应曲线)和半甲酰胺处理效果。治疗和协变量可以是离散的或连续的,低,高或无限的尺寸。与其他机器学习范例不同,降低了具有闭合形式解决方案的内核脊回归组合的因果估计和推理,这些ridge回归的组合,并通过矩阵操作轻松计算。这种计算简单允许我们在两个方向上扩展框架:从意味着增加和分布反事实结果;从完整人口参数到群体和替代人口的参数。对于结构函数,我们证明了具有有限样本速率的均匀一致性。对于治疗效果,我们通过新的双光谱鲁棒性属性证明$ \ sqrt {n} $一致性,高斯近似和半甲效率。我们对美国职能培训计划进行仿真和估计平均,异构和增量结构职能。
translated by 谷歌翻译
我提出了长期因果推断的内核脊回归估计,其中包含随机治疗和短期替代品的短期实验数据集与包含短期替代和长期结果的长期观测数据集融合。在核矩阵操作方面,我提出了治疗效果,剂量反应和反事实分布的估算方法。我允许协变量,治疗和替代品是离散的或连续的,低,高或无限的尺寸。对于长期治疗效果,我证明$ \ sqrt {n} $一致性,高斯近似和半占用效率。对于长期剂量反应,我证明了具有有限样品速率的均匀稠度。对于长期反事实分布,我证明了分布的收敛性。
translated by 谷歌翻译
近年来目睹了采用灵活的机械学习模型进行乐器变量(IV)回归的兴趣,但仍然缺乏不确定性量化方法的发展。在这项工作中,我们为IV次数回归提出了一种新的Quasi-Bayesian程序,建立了最近开发的核化IV模型和IV回归的双/极小配方。我们通过在$ l_2 $和sobolev规范中建立最低限度的最佳收缩率,并讨论可信球的常见有效性来分析所提出的方法的频繁行为。我们进一步推出了一种可扩展的推理算法,可以扩展到与宽神经网络模型一起工作。实证评价表明,我们的方法对复杂的高维问题产生了丰富的不确定性估计。
translated by 谷歌翻译
我们通过特征平均值研究了一种非参数计算方法,其中对先验特征的期望进行了更新,以产生预期的内核后验特征,基于学识渊博的神经网或观测值的内核特征的回归。贝叶斯更新中涉及的所有数量都从观察到的数据中学到了完全不含模型的方法。最终的算法是基于重要性加权的内核贝叶斯规则(KBR)的新颖实例。这会导致对KBR的原始方法具有较高的数值稳定性,而KBR需要运算符倒置。我们使用对无穷大标准中重要性加权估计器的新一致性分析来显示估计器的收敛性。我们评估了KBR关于挑战合成基准测试的,包括涉及高维图像观测值的状态空间模型的过滤问题。与原始KBR相比,重要性加权KBR的经验表现均匀地表现出更好的经验性能,并且具有其他竞争方法的竞争性能。
translated by 谷歌翻译
我们引入了一个新的非线性降低框架的新框架,其中预测因子和响应都是分布数据,它们被建模为度量空间的成员。我们实现非线性足够尺寸降低的关键步骤是在度量空间上构建通用内核,从而导致繁殖Hilbert空间的预测变量和响应,这些空间足以表征有条件的独立性,以决定足够的尺寸减少。对于单变量分布,我们使用Wasserstein距离的众所周知的分位数来构建通用内核。对于多元分布,我们求助于最近开发的切成薄片的Wasserstein距离,以实现此目的。由于可以通过单变量瓦斯汀距离的分位数表示来计算切片的瓦斯坦距离,因此多变量瓦斯坦距离的计算保持在可管理的水平。该方法应用于几个数据集,包括生育能力和死亡率分布数据和卡尔加里温度数据。
translated by 谷歌翻译
Many problems in causal inference and economics can be formulated in the framework of conditional moment models, which characterize the target function through a collection of conditional moment restrictions. For nonparametric conditional moment models, efficient estimation often relies on preimposed conditions on various measures of ill-posedness of the hypothesis space, which are hard to validate when flexible models are used. In this work, we address this issue by proposing a procedure that automatically learns representations with controlled measures of ill-posedness. Our method approximates a linear representation defined by the spectral decomposition of a conditional expectation operator, which can be used for kernelized estimators and is known to facilitate minimax optimal estimation in certain settings. We show this representation can be efficiently estimated from data, and establish L2 consistency for the resulting estimator. We evaluate the proposed method on proximal causal inference tasks, exhibiting promising performance on high-dimensional, semi-synthetic data.
translated by 谷歌翻译
我们提出了用于中介分析和动态治疗效果的内核脊回归估计。我们允许治疗,协变量和介质是离散或连续的,低,高或无限的尺寸。我们在内核矩阵操作方面提出了具有封闭式解决方案的依据,增量和分布的估算者。对于连续治疗案例,我们证明了具有有限样本速率的均匀一致性。对于离散处理案例,我们证明了根 - N一致性,高斯近似和半占用效率。我们进行仿真,然后估计美国职务团计划的介导和动态治疗效果,弱势青少年。
translated by 谷歌翻译
Many applications of representation learning, such as privacy preservation, algorithmic fairness, and domain adaptation, desire explicit control over semantic information being discarded. This goal is formulated as satisfying two objectives: maximizing utility for predicting a target attribute while simultaneously being invariant (independent) to a known semantic attribute. Solutions to invariant representation learning (IRepL) problems lead to a trade-off between utility and invariance when they are competing. While existing works study bounds on this trade-off, two questions remain outstanding: 1) What is the exact trade-off between utility and invariance? and 2) What are the encoders (mapping the data to a representation) that achieve the trade-off, and how can we estimate it from training data? This paper addresses these questions for IRepLs in reproducing kernel Hilbert spaces (RKHS)s. Under the assumption that the distribution of a low-dimensional projection of high-dimensional data is approximately normal, we derive a closed-form solution for the global optima of the underlying optimization problem for encoders in RKHSs. This yields closed formulae for a near-optimal trade-off, corresponding optimal representation dimensionality, and the corresponding encoder(s). We also numerically quantify the trade-off on representative problems and compare them to those achieved by baseline IRepL algorithms.
translated by 谷歌翻译
In nonparametric independence testing, we observe i.i.d.\ data $\{(X_i,Y_i)\}_{i=1}^n$, where $X \in \mathcal{X}, Y \in \mathcal{Y}$ lie in any general spaces, and we wish to test the null that $X$ is independent of $Y$. Modern test statistics such as the kernel Hilbert-Schmidt Independence Criterion (HSIC) and Distance Covariance (dCov) have intractable null distributions due to the degeneracy of the underlying U-statistics. Thus, in practice, one often resorts to using permutation testing, which provides a nonasymptotic guarantee at the expense of recalculating the quadratic-time statistics (say) a few hundred times. This paper provides a simple but nontrivial modification of HSIC and dCov (called xHSIC and xdCov, pronounced ``cross'' HSIC/dCov) so that they have a limiting Gaussian distribution under the null, and thus do not require permutations. This requires building on the newly developed theory of cross U-statistics by Kim and Ramdas (2020), and in particular developing several nontrivial extensions of the theory in Shekhar et al. (2022), which developed an analogous permutation-free kernel two-sample test. We show that our new tests, like the originals, are consistent against fixed alternatives, and minimax rate optimal against smooth local alternatives. Numerical simulations demonstrate that compared to the full dCov or HSIC, our variants have the same power up to a $\sqrt 2$ factor, giving practitioners a new option for large problems or data-analysis pipelines where computation, not sample size, could be the bottleneck.
translated by 谷歌翻译
我提出了用于非参数剂量响应曲线和半造型处理效果的内核脊回归估计,在分析师可以访问所选样品而不是随机样品的情况下;仅供选择观察,观察结果。我假设选择与治疗的随机条件一样好,并且具有足够丰富的观察协变量,其中允许协变量引起治疗或由治疗引起的 - 失踪 - 随机(MAR)的延伸。我提出了在核矩阵操作方面具有封闭形式解决方案的手段,增量和分布的估算,允许治疗和协调因子是离散的或连续的,低,高或无限尺寸。对于连续处理箱,我证明了具有有限样本速率的均匀一致性。对于离散处理案例,我证明了根 - N一致性,高斯近似和半占效率。
translated by 谷歌翻译
自我监督的表示学习解决辅助预测任务(称为借口任务),而不需要标记数据以学习有用的语义表示。这些借口任务仅使用输入特征,例如预测缺失的图像修补程序,从上下文中恢复图像的颜色通道,或者预测文本中的缺失单词;然而,预测该\ Texit {已知}信息有助于学习对下游预测任务的学习陈述。我们提供利用某些{\ EM重建}借口任务之间的统计连接的机制,以保证学习良好代表性。正式地,我们量化了借口任务的组件之间的近似独立性(标签和潜在变量的条件)允许我们学习可以通过训练在学习表示的顶部的线性层来解决下游任务的表示。我们证明了线性层即使对于复杂的地面真理函数类,也会产生小的近似误差,并且将急剧减少标记的样本复杂性。接下来,我们展示了我们方法的简单修改,导致非线性CCA,类似于流行的Simsiam算法,并显示了非线性CCA的类似保证。
translated by 谷歌翻译
Testing the significance of a variable or group of variables $X$ for predicting a response $Y$, given additional covariates $Z$, is a ubiquitous task in statistics. A simple but common approach is to specify a linear model, and then test whether the regression coefficient for $X$ is non-zero. However, when the model is misspecified, the test may have poor power, for example when $X$ is involved in complex interactions, or lead to many false rejections. In this work we study the problem of testing the model-free null of conditional mean independence, i.e. that the conditional mean of $Y$ given $X$ and $Z$ does not depend on $X$. We propose a simple and general framework that can leverage flexible nonparametric or machine learning methods, such as additive models or random forests, to yield both robust error control and high power. The procedure involves using these methods to perform regressions, first to estimate a form of projection of $Y$ on $X$ and $Z$ using one half of the data, and then to estimate the expected conditional covariance between this projection and $Y$ on the remaining half of the data. While the approach is general, we show that a version of our procedure using spline regression achieves what we show is the minimax optimal rate in this nonparametric testing problem. Numerical experiments demonstrate the effectiveness of our approach both in terms of maintaining Type I error control, and power, compared to several existing approaches.
translated by 谷歌翻译
因果推理,经济学以及更普遍的一般机器学习中的重要问题可以表示为条件力矩限制,但是估计变得具有挑战性,因为它需要解决无条件的力矩限制的连续性。以前的工作通过将广义的矩(GMM)方法扩展到连续矩限制来解决此问题。相比之下,广义经验可能性(GEL)提供了一个更通用的框架,并且与基于GMM的估计器相比,已显示出具有优惠的小样本特性。为了从机器学习的最新发展中受益,我们提供了可以利用任意模型的凝胶的功能重新重新制定。通过对所得无限尺寸优化问题的双重配方的激励,我们设计了一种实用方法并探索其渐近性能。最后,我们提供基于内核和基于神经网络的估计器实现,这些实现在两个条件矩限制问题上实现了最先进的经验绩效。
translated by 谷歌翻译