智能论文笔记

Double Robust Bayesian Inference on Average Treatment Effects

Christoph Breunig , Ruixuan Liu , Zhengfei Yu

分类： (统计)机器学习

2022-11-29

We study a double robust Bayesian inference procedure on the average treatment effect (ATE) under unconfoundedness. Our Bayesian approach involves a correction term for prior distributions adjusted by the propensity score. We prove asymptotic equivalence of our Bayesian estimator and efficient frequentist estimators by establishing a new semiparametric Bernstein-von Mises theorem under double robustness; i.e., the lack of smoothness of conditional mean functions can be compensated by high regularity of the propensity score and vice versa. Consequently, the resulting Bayesian point estimator internalizes the bias correction as the frequentist-type doubly robust estimator, and the Bayesian credible sets form confidence intervals with asymptotically exact coverage probability. In simulations, we find that this corrected Bayesian procedure leads to significant bias reduction of point estimation and accurate coverage of confidence intervals, especially when the dimensionality of covariates is large relative to the sample size and the underlying functions become complex. We illustrate our method in an application to the National Supported Work Demonstration.

translated by 谷歌翻译

Debiased Machine Learning of Set-Identified Linear Models

Vira Semenova

分类： (统计)机器学习 | 机器学习

2017-12-28

This paper provides estimation and inference methods for an identified set's boundary (i.e., support function) where the selection among a very large number of covariates is based on modern regularized tools. I characterize the boundary using a semiparametric moment equation. Combining Neyman-orthogonality and sample splitting ideas, I construct a root-N consistent, uniformly asymptotically Gaussian estimator of the boundary and propose a multiplier bootstrap procedure to conduct inference. I apply this result to the partially linear model, the partially linear IV model and the average partial derivative with an interval-valued outcome.

translated by 谷歌翻译

The Ridgelet Prior: A Covariance Function Approach to Prior Specification for Bayesian Neural Networks

Takuo Matsubara , Chris J. Oates , François-Xavier Briol

分类： (统计)机器学习 | 机器学习

2020-10-16

贝叶斯神经网络试图将神经网络的强大预测性能与与贝叶斯架构预测产出相关的不确定性的正式量化相结合。然而，它仍然不清楚如何在升入网络的输出空间时，如何赋予网络的参数。提出了一种可能的解决方案，使用户能够为手头的任务提供适当的高斯过程协方差函数。我们的方法构造了网络参数的先前分配，称为ridgelet，它近似于网络的输出空间中的Posited高斯过程。与神经网络和高斯过程之间的连接的现有工作相比，我们的分析是非渐近的，提供有限的样本大小的错误界限。这建立了贝叶斯神经网络可以近似任何高斯过程，其协方差函数是足够规律的任何高斯过程。我们的实验评估仅限于概念验证，在那里我们证明ridgele先前可以在可以提供合适的高斯过程的回归问题之前出现非结构化。

translated by 谷歌翻译

Time-uniform central limit theory, asymptotic confidence sequences, and anytime-valid causal inference

Ian Waudby-Smith , David Arbour , Ritwik Sinha , Edward H. Kennedy , Aaditya Ramdas

分类： (统计)机器学习

2021-03-11

基于中央限制定理（CLT）的置信区间是经典统计的基石。尽管仅渐近地有效，但它们是无处不在的，因为它们允许在非常弱的假设下进行统计推断，即使不可能进行非反应性推断，通常也可以应用于问题。本文引入了这种渐近置信区间的时间均匀类似物。为了详细说明，我们的方法采用置信序列（CS）的形式 - 随着时间的推移均匀有效的置信区间序列。 CSS在任意停止时间时提供有效的推断，与需要预先确定样本量的经典置信区间不同，因此没有受到“窥视”数据的惩罚。文献中现有的CSS是非肿瘤的，因此不享受上述渐近置信区间的广泛适用性。我们的工作通过给出“渐近CSS”的定义来弥合差距，并得出仅需要类似CLT的假设的通用渐近CS。虽然CLT在固定样本量下近似于高斯的样本平均值的分布，但我们使用强大的不变性原理（来自Komlos，Major和Tusnady的1970年代的开创性工作），按照整个样品平均过程均匀地近似于整个样品平均过程。隐性的高斯过程。我们通过在观察性研究中基于双重稳健的估计量来得出非参数渐近级别的CSS来证明它们的实用性，即使在固定的时间方案中，也可能不存在非催化方法（由于混淆偏见）。这些使双重强大的因果推断可以连续监测并自适应地停止。

translated by 谷歌翻译

Quasi-Bayesian Dual Instrumental Variable Regression

Ziyu Wang , Yuhao Zhou , Tongzheng Ren , Jun Zhu

分类： (统计)机器学习 | 机器学习

2021-06-16

近年来目睹了采用灵活的机械学习模型进行乐器变量（IV）回归的兴趣，但仍然缺乏不确定性量化方法的发展。在这项工作中，我们为IV次数回归提出了一种新的Quasi-Bayesian程序，建立了最近开发的核化IV模型和IV回归的双/极小配方。我们通过在$ l_2 $和sobolev规范中建立最低限度的最佳收缩率，并讨论可信球的常见有效性来分析所提出的方法的频繁行为。我们进一步推出了一种可扩展的推理算法，可以扩展到与宽神经网络模型一起工作。实证评价表明，我们的方法对复杂的高维问题产生了丰富的不确定性估计。

translated by 谷歌翻译

Optimal high-dimensional and nonparametric distributed testing under communication constraints

Botond Szabó , Lasse Vuursteen , Harry van Zanten

分类： (统计)机器学习

2022-02-02

我们在分布式框架中得出最小值测试错误，其中数据被分成多个机器，并且它们与中央机器的通信仅限于$ b $位。我们研究了高斯白噪声下的$ d $ - 和无限维信号检测问题。我们还得出达到理论下限的分布式测试算法。我们的结果表明，分布式测试受到从根本上不同的现象，这些现象在分布式估计中未观察到。在我们的发现中，我们表明，可以访问共享随机性的测试协议在某些制度中的性能比不进行的测试协议可以更好地表现。我们还观察到，即使仅使用单个本地计算机上可用的信息，一致的非参数分布式测试始终是可能的，即使只有$ 1 $的通信和相应的测试优于最佳本地测试。此外，我们还得出了自适应非参数分布测试策略和相应的理论下限。

translated by 谷歌翻译

On Variance Estimation of Random Forests

Tianning Xu , Ruoqing Zhu , Xiaofeng Shao

分类： (统计)机器学习 | 机器学习

2022-02-18

合奏方法（例如随机森林）由于其高预测精度而在应用中很受欢迎。现有文献将随机的森林预测视为无限顺序不完整的U统计量，以量化其不确定性。但是，这些方法集中在每棵树的小次采样大小上，这在理论上是有效但实际上有限的。本文基于不完整的U统计数据，开发了公正的方差估计器，该估计量可以与整体样本量相当，从而使统计推断在更广泛的实际应用中成为可能。仿真结果表明，我们的估计量没有额外的计算成本，估计器的偏见和更准确的覆盖率。我们还提出了一项局部平滑过程，以减少估计器的变化，当树木数量相对较小时，该过程显示出改善的数值性能。此外，我们研究了在特定方案下提出的方差估计器的比率一致性。特别是，我们开发了一种新的“双U统计”公式，以分析估算器差异的HOFFING分解。

translated by 谷歌翻译

Precise Asymptotics for Spectral Methods in Mixed Generalized Linear Models

Yihan Zhang , Marco Mondelli , Ramji Venkataramanan

分类：机器学习 | (统计)机器学习

2022-11-21

In a mixed generalized linear model, the objective is to learn multiple signals from unlabeled observations: each sample comes from exactly one signal, but it is not known which one. We consider the prototypical problem of estimating two statistically independent signals in a mixed generalized linear model with Gaussian covariates. Spectral methods are a popular class of estimators which output the top two eigenvectors of a suitable data-dependent matrix. However, despite the wide applicability, their design is still obtained via heuristic considerations, and the number of samples $n$ needed to guarantee recovery is super-linear in the signal dimension $d$. In this paper, we develop exact asymptotics on spectral methods in the challenging proportional regime in which $n, d$ grow large and their ratio converges to a finite constant. By doing so, we are able to optimize the design of the spectral method, and combine it with a simple linear estimator, in order to minimize the estimation error. Our characterization exploits a mix of tools from random matrices, free probability and the theory of approximate message passing algorithms. Numerical simulations for mixed linear regression and phase retrieval display the advantage enabled by our analysis over existing designs of spectral methods.

translated by 谷歌翻译

Distribution-free Prediction Sets Adaptive to Unknown Covariate Shift

Hongxiang Qiu , Edgar Dobriban , Eric Tchetgen Tchetgen

分类： (统计)机器学习

2022-03-11

预测一组结果 - 而不是独特的结果 - 是统计学习中不确定性定量的有前途的解决方案。尽管有关于构建具有统计保证的预测集的丰富文献，但适应未知的协变量转变（实践中普遍存在的问题）还是一个严重的未解决的挑战。在本文中，我们表明具有有限样本覆盖范围保证的预测集是非信息性的，并提出了一种新型的无灵活分配方法PredSet-1Step，以有效地构建了在未知协方差转移下具有渐近覆盖范围保证的预测集。我们正式表明我们的方法是\ textIt {渐近上可能是近似正确}，对大型样本的置信度有很好的覆盖误差。我们说明，在南非队列研究中，它在许多实验和有关HIV风险预测的数据集中实现了名义覆盖范围。我们的理论取决于基于一般渐近线性估计器的WALD置信区间覆盖范围的融合率的新结合。

translated by 谷歌翻译

Functional Linear Regression of Cumulative Distribution Functions

Qian Zhang , Anuran Makur , Kamyar Azizzadenesheli

分类：机器学习

2022-05-28

The estimation of cumulative distribution functions (CDFs) is an important learning task with a great variety of downstream applications, such as risk assessments in predictions and decision making. In this paper, we study functional regression of contextual CDFs where each data point is sampled from a linear combination of context dependent CDF basis functions. We propose functional ridge-regression-based estimation methods that estimate CDFs accurately everywhere. In particular, given $n$ samples with $d$ basis functions, we show estimation error upper bounds of $\widetilde{O}(\sqrt{d/n})$ for fixed design, random design, and adversarial context cases. We also derive matching information theoretic lower bounds, establishing minimax optimality for CDF functional regression. Furthermore, we remove the burn-in time in the random design setting using an alternative penalized estimator. Then, we consider agnostic settings where there is a mismatch in the data generation process. We characterize the error of the proposed estimators in terms of the mismatched error, and show that the estimators are well-behaved under model mismatch. Finally, to complete our study, we formalize infinite dimensional models where the parameter space is an infinite dimensional Hilbert space, and establish self-normalized estimation error upper bounds for this setting.

translated by 谷歌翻译

Robust Generalised Bayesian Inference for Intractable Likelihoods

Takuo Matsubara , Jeremias Knoblauch , François-Xavier Briol , Chris. J. Oates

分类： (统计)机器学习

2021-04-15

广义贝叶斯推理使用损失函数而不是可能性的先前信仰更新，因此可以用于赋予鲁棒性，以防止可能的错误规范的可能性。在这里，我们认为广泛化的贝叶斯推论斯坦坦差异作为损失函数的损失，由应用程序的可能性含有难治性归一化常数。在这种情况下，斯坦因差异来避免归一化恒定的评估，并产生封闭形式或使用标准马尔可夫链蒙特卡罗的通用后出版物。在理论层面上，我们显示了一致性，渐近的正常性和偏见 - 稳健性，突出了这些物业如何受到斯坦因差异的选择。然后，我们提供关于一系列棘手分布的数值实验，包括基于内核的指数家庭模型和非高斯图形模型的应用。

translated by 谷歌翻译

On High dimensional Poisson models with measurement error: hypothesis testing for nonlinear nonconvex optimization

Fei Jiang , Yeqing Zhou , Jianxuan Liu , Yanyuan Ma

分类：机器学习

2022-12-31

We study estimation and testing in the Poisson regression model with noisy high dimensional covariates, which has wide applications in analyzing noisy big data. Correcting for the estimation bias due to the covariate noise leads to a non-convex target function to minimize. Treating the high dimensional issue further leads us to augment an amenable penalty term to the target function. We propose to estimate the regression parameter through minimizing the penalized target function. We derive the L1 and L2 convergence rates of the estimator and prove the variable selection consistency. We further establish the asymptotic normality of any subset of the parameters, where the subset can have infinitely many components as long as its cardinality grows sufficiently slow. We develop Wald and score tests based on the asymptotic normality of the estimator, which permits testing of linear functions of the members if the subset. We examine the finite sample performance of the proposed tests by extensive simulation. Finally, the proposed method is successfully applied to the Alzheimer's Disease Neuroimaging Initiative study, which motivated this work initially.

translated by 谷歌翻译

Localized Debiased Machine Learning: Efficient Inference on Quantile Treatment Effects and Beyond

Nathan Kallus , Xiaojie Mao , Masatoshi Uehara

分类： (统计)机器学习 | 机器学习

2019-12-30

我们考虑在估计涉及依赖参数的高维滋扰的估计方程中估计一个低维参数。一个中心示例是因果推理中（局部）分位数处理效应（（L）QTE）的有效估计方程，涉及在分位数以估计的分位数评估的协方差累积分布函数。借记机学习（DML）是一种使用灵活的机器学习方法估算高维滋扰的数据分解方法，但是将其应用于参数依赖性滋扰的问题是不切实际的。对于（L）QTE，DML要求我们学习整个协变量累积分布函数。相反，我们提出了局部偏见的机器学习（LDML），该学习避免了这一繁重的步骤，并且只需要对参数进行一次初始粗糙猜测而估算烦恼。对于（L）QTE，LDML仅涉及学习两个回归功能，这是机器学习方法的标准任务。我们证明，在松弛速率条件下，我们的估计量与使用未知的真实滋扰的不可行的估计器具有相同的有利渐近行为。因此，LDML值得注意的是，当我们必须控制许多协变量和/或灵活的关系时，如（l）QTES在（（l）QTES）中，实际上可以有效地估算重要数量，例如（l）QTES。

translated by 谷歌翻译

Doubly-Valid/Doubly-Sharp Sensitivity Analysis for Causal Inference with Unmeasured Confounding

Jacob Dorn , Kevin Guo , Nathan Kallus

分类：机器学习 | (统计)机器学习

2021-12-21

在TAN（2006）边缘敏感模型下，在不观察到的混淆存在下构建平均处理效应的界限问题。结合涉及对冲倾向分数的现有表征具有对问题的新的分布稳健特征，我们提出了我们称之为“双重有效/双重尖锐”（DVD）估计的这些界限的新颖估算器。双重清晰度对应于DVD估计始终估计灵敏度模型所暗示的最有可能（即，夏普）的界限，即使当所有滋扰参数都适当一致时，即使在两个滋扰参数中的一个被击败并实现半污染参数之一。双倍有效性是部分识别的全新财产：DVD估计仍然提供有效，但即使在大多数滋扰参数都被遗漏时，仍然没有锐利。实际上，即使在DVDS点估计无法渐近正常的情况下，标准沃尔德置信区间也可能保持有效。在二进制结果的情况下，DVD估计是特别方便的并且在结果回归和倾向评分方面具有闭合形式的表达。我们展示了模拟研究中的DVD估计，以及对右心导管插入的案例研究。

translated by 谷歌翻译

Incremental Intervention Effects in Studies with Dropout and Many Timepoints

Kwangho Kim , Edward H. Kennedy , Ashley I. Naimi

分类： (统计)机器学习

2019-07-09

现代纵向研究在许多时间点收集特征数据，通常是相同的样本大小顺序。这些研究通常受到{辍学}和积极违规的影响。我们通过概括近期增量干预的效果（转换倾向分数而不是设置治疗价值）来解决这些问题，以适应多种结果和主题辍学。当条件忽略（不需要治疗阳性）时，我们给出了识别表达式的增量干预效果，并导出估计这些效果的非参数效率。然后我们提出了高效的非参数估计器，表明它们以快速参数速率收敛并产生均匀的推理保证，即使在较慢的速率下灵活估计滋扰函数。我们还研究了新型无限时间范围设置中的更传统的确定性效果的增量干预效应的方差比，其中时间点的数量可以随着样本大小而生长，并显示增量干预效果在统计精度下产生近乎指数的收益这个设置。最后，我们通过模拟得出结论，并在研究低剂量阿司匹林对妊娠结果的研究中进行了方法。

translated by 谷歌翻译

Uncertainty Quantification of the 4th kind; optimal posterior accuracy-uncertainty tradeoff with the minimum enclosing ball

Hamed Hamze Bajgiran , Pau Batlle Franch , Houman Owhadi , Mostafa Samir , Clint Scovel , Mahdy Shirdel , Michael Stanley , Peyman Tavallali

分类：机器学习

2021-08-24

基本上有三种不确定性量化方法（UQ）：（a）强大的优化，（b）贝叶斯，（c）决策理论。尽管（a）坚固，但在准确性和数据同化方面是不利的。（b）需要先验，通常是脆弱的，后验估计可能很慢。尽管（c）导致对最佳先验的识别，但其近似遭受了维度的诅咒，风险的概念是相对于数据分布的平均值。我们引入了第四种，它是（a），（b），（c）和假设检验之间的杂种。可以总结为在观察样本$ x $之后，（1）通过相对可能性定义了可能性区域，（2）在该区域玩Minmax游戏以定义最佳估计器及其风险。最终的方法具有几种理想的属性（a）测量数据后确定了最佳先验，并且风险概念是后部的，（b）确定最佳估计值，其风险可以降低到计算最小封闭的最小封闭式。利益图量下的可能性区域图像的球（这是快速的，不受维数的诅咒）。该方法的特征在于$ [0,1] $中的参数，该参数是在观察到的数据（相对可能性）的稀有度上被假定的下限。当该参数接近$ 1 $时，该方法会产生一个后分布，该分布集中在最大似然估计的情况下，并具有较低的置信度UQ估计值。当该参数接近$ 0 $时，该方法会产生最大风险后验分布，并具有很高的信心UQ估计值。除了导航准确性不确定性权衡外，该建议的方法还通过导航与数据同化相关的稳健性 - 准确性权衡解决了贝叶斯推断的脆弱性。

translated by 谷歌翻译

A General Framework for Treatment Effect Estimation in Semi-Supervised and High Dimensional Settings

Abhishek Chakrabortty , Guorong Dai , Eric Tchetgen Tchetgen

分类： (统计)机器学习

2022-01-03

在本文中，我们的目标是提供对半监督（SS）因果推理的一般性和完全理解治疗效果。具体而言，我们考虑两个这样的估计值：（a）平均治疗效果和（b）定量处理效果，作为原型案例，在SS设置中，其特征在于两个可用的数据集：（i）标记的数据集大小$ N $，为响应和一组高维协变量以及二元治疗指标提供观察。（ii）一个未标记的数据集，大小超过$ n $，但未观察到的响应。使用这两个数据集，我们开发了一个SS估计系列，该系列是：（1）更强大，并且（2）比其监督对应力更高的基于标记的数据集。除了通过监督方法可以实现的“标准”双重稳健结果（在一致性方面），我们还在正确指定模型中的倾向得分，我们进一步建立了我们SS估计的根本-N一致性和渐近常态。没有需要涉及的特定形式的滋扰职能。这种改善的鲁棒性来自使用大规模未标记的数据，因此通常不能在纯粹监督的环境中获得。此外，只要正确指定所有滋扰函数，我们的估计值都显示为半参数效率。此外，作为滋扰估计器的说明，我们考虑逆概率加权型核平滑估计，涉及未知的协变量转换机制，并在高维情景新颖的情况下建立其统一的收敛速率，这应该是独立的兴趣。两种模拟和实际数据的数值结果验证了我们对其监督对应物的优势，了解鲁棒性和效率。

translated by 谷歌翻译

Risk and optimal policies in bandit experiments

Karun Adusumilli

分类：机器学习

2021-12-13

本文提供了强盗实验的决策理论分析。强盗设置对应于动态编程问题，但是直接解决这通常是不可行的。在扩散渐近框架内工作，我们定义了合适的渐近贝叶斯风险概念的强盗设置。对于正常分布的奖励，最小贝叶斯风险可以表征为非线性二阶偏微分方程（PDE）的解决方案。使用实验限制方法，我们表明，该PDE表征也在参数和非参数分布下呈渐近的奖励。该方法进一步描述了它渐近的状态变量足以限制注意力，因此表明了尺寸减少的实际策略。结果是我们可以近似使用PDE定义带状设置的动态编程问题，该PDE可以使用稀疏矩阵例程有效地解决。我们从这些方程中的数值解源于近最佳的政策。拟议的政策大大主导了现有的现有方法，如汤普森采样。该框架还允许对强盗问题进行大量概括，例如时间折扣和纯粹的探索动机。

translated by 谷歌翻译

The Projected Covariance Measure for assumption-lean variable significance testing

Anton Rask Lundborg , Ilmun Kim , Rajen D. Shah , Richard J. Samworth

分类： (统计)机器学习

2022-11-03

Testing the significance of a variable or group of variables $X$ for predicting a response $Y$, given additional covariates $Z$, is a ubiquitous task in statistics. A simple but common approach is to specify a linear model, and then test whether the regression coefficient for $X$ is non-zero. However, when the model is misspecified, the test may have poor power, for example when $X$ is involved in complex interactions, or lead to many false rejections. In this work we study the problem of testing the model-free null of conditional mean independence, i.e. that the conditional mean of $Y$ given $X$ and $Z$ does not depend on $X$. We propose a simple and general framework that can leverage flexible nonparametric or machine learning methods, such as additive models or random forests, to yield both robust error control and high power. The procedure involves using these methods to perform regressions, first to estimate a form of projection of $Y$ on $X$ and $Z$ using one half of the data, and then to estimate the expected conditional covariance between this projection and $Y$ on the remaining half of the data. While the approach is general, we show that a version of our procedure using spline regression achieves what we show is the minimax optimal rate in this nonparametric testing problem. Numerical experiments demonstrate the effectiveness of our approach both in terms of maintaining Type I error control, and power, compared to several existing approaches.

translated by 谷歌翻译

Optimal Binary Classification Beyond Accuracy

Shashank Singh , Justin Khim

分类：机器学习 | (统计)机器学习

2021-07-05

关于二进制分类的绝大多数统计理论都以准确性为特征。然而，在许多情况下，已知准确性反映了分类错误的实际后果，最著名的是在不平衡的二元分类中，其中数据以两个类别之一的样本为主。本文的第一部分将贝叶斯最佳分类器的新概括从精度到从混淆矩阵计算的任何性能度量标准中。具体而言，该结果（a）表明，随机分类器有时优于最佳确定性分类器，并且（b）删除了经验上无法验证的绝对连续性假设，该假设是较知差的，但遍及现有结果。然后，我们演示了如何使用这种广义的贝叶斯分类器来获得遗憾的界限，以估算统一损失下的回归函数的误差。最后，我们使用这些结果来开发一些针对不平衡算法分类的第一个有限样本统计保证。具体而言，我们证明了最佳分类性能取决于类不平衡的属性，例如一种称为统一类不平衡的新颖概念，以前尚未正式化。在$ k $ neart的邻居分类的情况下，我们进一步以数值说明这些贡献

translated by 谷歌翻译