对未来观察的预测是一个重要且具有挑战性的问题。分别量化预测不确定性使用预测区域和预测分布的两种主流方法,后者认为更具信息性,因为它可以执行其他与预测相关的任务。有效性的标准概念(我们在这里称为1型有效性)着重于预测区域的覆盖范围,而与预测分布执行的其他与预测相关的任务相关的有效性概念则缺乏。在这里,我们提出了一个新概念,称为2型有效性,与这些其他预测任务有关。我们建立了2型有效性和相干性能之间的联系,并表明为实现它而需要不精确的概率考虑因素。我们继续表明,可以通过将共形预测输出作为辅音合理性度量的轮廓函数来实现两种类型的预测有效性。我们还基于新的非参数推论模型构建提供了保​​形预测的替代表征,其中辅音的出现是自然的,并证明了其有效性。
translated by 谷歌翻译
The notion of uncertainty is of major importance in machine learning and constitutes a key element of machine learning methodology. In line with the statistical tradition, uncertainty has long been perceived as almost synonymous with standard probability and probabilistic predictions. Yet, due to the steadily increasing relevance of machine learning for practical applications and related issues such as safety requirements, new problems and challenges have recently been identified by machine learning scholars, and these problems may call for new methodological developments. In particular, this includes the importance of distinguishing between (at least) two different types of uncertainty, often referred to as aleatoric and epistemic. In this paper, we provide an introduction to the topic of uncertainty in machine learning as well as an overview of attempts so far at handling uncertainty in general and formalizing this distinction in particular.
translated by 谷歌翻译
A flexible method is developed to construct a confidence interval for the frequency of a queried object in a very large data set, based on a much smaller sketch of the data. The approach requires no knowledge of the data distribution or of the details of the sketching algorithm; instead, it constructs provably valid frequentist confidence intervals for random queries using a conformal inference approach. After achieving marginal coverage for random queries under the assumption of data exchangeability, the proposed method is extended to provide stronger inferences accounting for possibly heterogeneous frequencies of different random queries, redundant queries, and distribution shifts. While the presented methods are broadly applicable, this paper focuses on use cases involving the count-min sketch algorithm and a non-linear variation thereof, to facilitate comparison to prior work. In particular, the developed methods are compared empirically to frequentist and Bayesian alternatives, through simulations and experiments with data sets of SARS-CoV-2 DNA sequences and classic English literature.
translated by 谷歌翻译
在这项工作中,我们对基本思想和新颖的发展进行了综述的综述,这是基于最小的假设的一种无创新的,无分配的,非参数预测的方法 - 能够以非常简单的方式预测集屈服在有限样本案例中,在统计意义上也有效。论文中提供的深入讨论涵盖了共形预测的理论基础,然后继续列出原始想法的更高级的发展和改编。
translated by 谷歌翻译
现在通常用于高风险设置,如医疗诊断,如医疗诊断,那么需要不确定量化,以避免后续模型失败。无分发的不确定性量化(无分布UQ)是用户友好的范式,用于为这种预测创建统计上严格的置信区间/集合。批判性地,间隔/集合有效而不进行分布假设或模型假设,即使具有最多许多DataPoints也具有显式保证。此外,它们适应输入的难度;当输入示例很困难时,不确定性间隔/集很大,信号传达模型可能是错误的。在没有多大的工作和没有再培训的情况下,可以在任何潜在的算法(例如神经网络)上使用无分​​发方法,以产生置信度集,以便包含用户指定概率,例如90%。实际上,这些方法易于理解和一般,应用于计算机视觉,自然语言处理,深度加强学习等领域出现的许多现代预测问题。这种实践介绍是针对对无需统计学家的免费UQ的实际实施感兴趣的读者。我们通过实际的理论和无分发UQ的应用领导读者,从保形预测开始,并使无关的任何风险的分布控制,如虚假发现率,假阳性分布检测,等等。我们将包括Python中的许多解释性插图,示例和代码样本,具有Pytorch语法。目标是提供读者对无分配UQ的工作理解,使它们能够将置信间隔放在算法上,其中包含一个自包含的文档。
translated by 谷歌翻译
Virtually all machine learning tasks are characterized using some form of loss function, and "good performance" is typically stated in terms of a sufficiently small average loss, taken over the random draw of test data. While optimizing for performance on average is intuitive, convenient to analyze in theory, and easy to implement in practice, such a choice brings about trade-offs. In this work, we survey and introduce a wide variety of non-traditional criteria used to design and evaluate machine learning algorithms, place the classical paradigm within the proper historical context, and propose a view of learning problems which emphasizes the question of "what makes for a desirable loss distribution?" in place of tacit use of the expected loss.
translated by 谷歌翻译
机器学习通常以经典的概率理论为前提,这意味着聚集是基于期望的。现在有多种原因可以激励人们将经典概率理论作为机器学习的数学基础。我们系统地检查了一系列强大而丰富的此类替代品,即各种称为光谱风险度量,Choquet积分或Lorentz规范。我们提出了一系列的表征结果,并演示了使这个光谱家族如此特别的原因。在此过程中,我们证明了所有连贯的风险度量的自然分层,从它们通过利用重新安排不变性Banach空间理论的结果来诱导的上层概率。我们凭经验证明了这种新的不确定性方法如何有助于解决实用的机器学习问题。
translated by 谷歌翻译
Conformal prediction uses past experience to determine precise levels of confidence in new predictions. Given an error probability ǫ, together with a method that makes a prediction ŷ of a label y, it produces a set of labels, typically containing ŷ, that also contains y with probability 1 − ǫ. Conformal prediction can be applied to any method for producing ŷ: a nearest-neighbor method, a support-vector machine, ridge regression, etc.Conformal prediction is designed for an on-line setting in which labels are predicted successively, each one being revealed before the next is predicted. The most novel and valuable feature of conformal prediction is that if the successive examples are sampled independently from the same distribution, then the successive predictions will be right 1 − ǫ of the time, even though they are based on an accumulating dataset rather than on independent datasets.In addition to the model under which successive examples are sampled independently, other on-line compression models can also use conformal prediction. The widely used Gaussian linear model is one of these.This tutorial presents a self-contained account of the theory of conformal prediction and works through several numerical examples. A more comprehensive treatment of the topic is provided in Algorithmic Learning in a Random World, by Vladimir Vovk, Alex Gammerman, and Glenn Shafer (Springer, 2005).
translated by 谷歌翻译
在过去几十年中,已经提出了各种方法,用于估计回归设置中的预测间隔,包括贝叶斯方法,集合方法,直接间隔估计方法和保形预测方法。重要问题是这些方法的校准:生成的预测间隔应该具有预定义的覆盖水平,而不会过于保守。在这项工作中,我们从概念和实验的角度审查上述四类方法。结果来自各个域的基准数据集突出显示从一个数据集中的性能的大波动。这些观察可能归因于违反某些类别的某些方法所固有的某些假设。我们说明了如何将共形预测用作提供不具有校准步骤的方法的方法的一般校准程序。
translated by 谷歌翻译
我们基于电子价值开发假设检测理论,这是一种与p值不同的证据,允许毫不费力地结合来自常见场景中的几项研究的结果,其中决定执行新研究可能取决于以前的结果。基于E-V值的测试是安全的,即它们在此类可选的延续下保留I型错误保证。我们将增长速率最优性(GRO)定义为可选的连续上下文中的电力模拟,并且我们展示了如何构建GRO E-VARIABLE,以便为复合空缺和替代,强调模型的常规测试问题,并强调具有滋扰参数的模型。 GRO E值采取具有特殊前瞻的贝叶斯因子的形式。我们使用几种经典示例说明了该理论,包括一个样本安全T检验(其中右哈尔前方的右手前锋为GE)和2x2差价表(其中GRE之前与标准前沿不同)。分享渔业,奈曼和杰弗里斯·贝叶斯解释,电子价值观和相应的测试可以提供所有三所学校的追随者可接受的方法。
translated by 谷歌翻译
Classical asymptotic theory for statistical inference usually involves calibrating a statistic by fixing the dimension $d$ while letting the sample size $n$ increase to infinity. Recently, much effort has been dedicated towards understanding how these methods behave in high-dimensional settings, where $d$ and $n$ both increase to infinity together. This often leads to different inference procedures, depending on the assumptions about the dimensionality, leaving the practitioner in a bind: given a dataset with 100 samples in 20 dimensions, should they calibrate by assuming $n \gg d$, or $d/n \approx 0.2$? This paper considers the goal of dimension-agnostic inference; developing methods whose validity does not depend on any assumption on $d$ versus $n$. We introduce an approach that uses variational representations of existing test statistics along with sample splitting and self-normalization to produce a new test statistic with a Gaussian limiting distribution, regardless of how $d$ scales with $n$. The resulting statistic can be viewed as a careful modification of degenerate U-statistics, dropping diagonal blocks and retaining off-diagonal blocks. We exemplify our technique for some classical problems including one-sample mean and covariance testing, and show that our tests have minimax rate-optimal power against appropriate local alternatives. In most settings, our cross U-statistic matches the high-dimensional power of the corresponding (degenerate) U-statistic up to a $\sqrt{2}$ factor.
translated by 谷歌翻译
算法稳定性是一种学习理论的概念,其表示对输入数据的改变的程度(例如,删除单个数据点)可能会影响回归算法的输出。了解算法的稳定性属性通常对许多下游应用程序有用 - 例如,已知稳定性导致所需的概括性属性和预测推理保证。然而,目前在实践中使用的许多现代算法太复杂,无法对其稳定性的理论分析,因此我们只能通过算法在各种数据集上的行为的实证探索来尝试建立这些属性。在这项工作中,我们为这种“黑匣子测试”奠定了一个正式的统计框架,而没有任何关于算法或数据分布的假设,并在任何黑匣子测试识别算法稳定性的能力方面建立基本界限。
translated by 谷歌翻译
预测一组结果 - 而不是独特的结果 - 是统计学习中不确定性定量的有前途的解决方案。尽管有关于构建具有统计保证的预测集的丰富文献,但适应未知的协变量转变(实践中普遍存在的问题)还是一个严重的未解决的挑战。在本文中,我们表明具有有限样本覆盖范围保证的预测集是非信息性的,并提出了一种新型的无灵活分配方法PredSet-1Step,以有效地构建了在未知协方差转移下具有渐近覆盖范围保证的预测集。我们正式表明我们的方法是\ textIt {渐近上可能是近似正确},对大型样本的置信度有很好的覆盖误差。我们说明,在南非队列研究中,它在许多实验和有关HIV风险预测的数据集中实现了名义覆盖范围。我们的理论取决于基于一般渐近线性估计器的WALD置信区间覆盖范围的融合率的新结合。
translated by 谷歌翻译
We develop a general framework for distribution-free predictive inference in regression, using conformal inference. The proposed methodology allows for the construction of a prediction band for the response variable using any estimator of the regression function. The resulting prediction band preserves the consistency properties of the original estimator under standard assumptions, while guaranteeing finite-sample marginal coverage even when these assumptions do not hold. We analyze and compare, both empirically and theoretically, the two major variants of our conformal framework: full conformal inference and split conformal inference, along with a related jackknife method. These methods offer different tradeoffs between statistical accuracy (length of resulting prediction intervals) and computational efficiency. As extensions, we develop a method for constructing valid in-sample prediction intervals called rank-one-out conformal inference, which has essentially the same computational efficiency as split conformal inference. We also describe an extension of our procedures for producing prediction bands with locally varying length, in order to adapt to heteroskedascity in the data. Finally, we propose a model-free notion of variable importance, called leave-one-covariate-out or LOCO inference. Accompanying this paper is an R package conformalInference that implements all of the proposals we have introduced. In the spirit of reproducibility, all of our empirical results can also be easily (re)generated using this package.
translated by 谷歌翻译
Testing the significance of a variable or group of variables $X$ for predicting a response $Y$, given additional covariates $Z$, is a ubiquitous task in statistics. A simple but common approach is to specify a linear model, and then test whether the regression coefficient for $X$ is non-zero. However, when the model is misspecified, the test may have poor power, for example when $X$ is involved in complex interactions, or lead to many false rejections. In this work we study the problem of testing the model-free null of conditional mean independence, i.e. that the conditional mean of $Y$ given $X$ and $Z$ does not depend on $X$. We propose a simple and general framework that can leverage flexible nonparametric or machine learning methods, such as additive models or random forests, to yield both robust error control and high power. The procedure involves using these methods to perform regressions, first to estimate a form of projection of $Y$ on $X$ and $Z$ using one half of the data, and then to estimate the expected conditional covariance between this projection and $Y$ on the remaining half of the data. While the approach is general, we show that a version of our procedure using spline regression achieves what we show is the minimax optimal rate in this nonparametric testing problem. Numerical experiments demonstrate the effectiveness of our approach both in terms of maintaining Type I error control, and power, compared to several existing approaches.
translated by 谷歌翻译
本文衍生了置信区间(CI)和时间统一的置信序列(CS),用于从有限观测值中估算未知平均值的经典问题。我们提出了一种衍生浓度界限的一般方法,可以看作是著名的切尔诺夫方法的概括(和改进)。它的核心是基于推导一类新的复合非负胸腔,通过投注和混合方法与测试的连接很强。我们展示了如何将这些想法扩展到无需更换的情况下,这是另一个经过深入研究的问题。在所有情况下,我们的界限都适应未知的差异,并且基于Hoeffding或经验的Bernstein不平等及其最近的Supermartingale概括,经验上大大优于现有方法。简而言之,我们为四个基本问题建立了一个新的最先进的问题:在有或没有替换的情况下进行采样时,CS和CI进行有限的手段。
translated by 谷歌翻译
Uncertainty is prevalent in engineering design, statistical learning, and decision making broadly. Due to inherent risk-averseness and ambiguity about assumptions, it is common to address uncertainty by formulating and solving conservative optimization models expressed using measure of risk and related concepts. We survey the rapid development of risk measures over the last quarter century. From its beginning in financial engineering, we recount their spread to nearly all areas of engineering and applied mathematics. Solidly rooted in convex analysis, risk measures furnish a general framework for handling uncertainty with significant computational and theoretical advantages. We describe the key facts, list several concrete algorithms, and provide an extensive list of references for further reading. The survey recalls connections with utility theory and distributionally robust optimization, points to emerging applications areas such as fair machine learning, and defines measures of reliability.
translated by 谷歌翻译
近年来目睹了采用灵活的机械学习模型进行乐器变量(IV)回归的兴趣,但仍然缺乏不确定性量化方法的发展。在这项工作中,我们为IV次数回归提出了一种新的Quasi-Bayesian程序,建立了最近开发的核化IV模型和IV回归的双/极小配方。我们通过在$ l_2 $和sobolev规范中建立最低限度的最佳收缩率,并讨论可信球的常见有效性来分析所提出的方法的频繁行为。我们进一步推出了一种可扩展的推理算法,可以扩展到与宽神经网络模型一起工作。实证评价表明,我们的方法对复杂的高维问题产生了丰富的不确定性估计。
translated by 谷歌翻译
基于AI和机器学习的决策系统已在各种现实世界中都使用,包括医疗保健,执法,教育和金融。不再是牵强的,即设想一个未来,自治系统将推动整个业务决策,并且更广泛地支持大规模决策基础设施以解决社会最具挑战性的问题。当人类做出决定时,不公平和歧视的问题普遍存在,并且当使用几乎没有透明度,问责制和公平性的机器做出决定时(或可能会放大)。在本文中,我们介绍了\ textit {Causal公平分析}的框架,目的是填补此差距,即理解,建模,并可能解决决策设置中的公平性问题。我们方法的主要见解是将观察到数据中存在的差异的量化与基本且通常是未观察到的因果机制收集的因果机制的收集,这些机制首先会产生差异,挑战我们称之为因果公平的基本问题分析(FPCFA)。为了解决FPCFA,我们研究了分解差异和公平性的经验度量的问题,将这种变化归因于结构机制和人群的不同单位。我们的努力最终达到了公平地图,这是组织和解释文献中不同标准之间关系的首次系统尝试。最后,我们研究了进行因果公平分析并提出一本公平食谱的最低因果假设,该假设使数据科学家能够评估不同影响和不同治疗的存在。
translated by 谷歌翻译
当训练概率分类器和校准时,可以容易地忽略校准损耗的所谓分组损耗组件。分组损失是指观察信息与实际校准运动中的信息之间的差距。我们调查分组损失与充足概念之间的关系,将Conoonotonics识别为充足的有用标准。我们重新审视Langford&Zadrozny(2005)的探测方法,发现它产生了减少分组损失的概率分类器的估计。最后,我们将Brier曲线讨论为支持培训的工具和“足够”概率分类器的校准。
translated by 谷歌翻译