A flexible conformal inference method is developed to construct confidence intervals for the frequencies of queried objects in very large data sets, based on a much smaller sketch of those data. The approach is data-adaptive and requires no knowledge of the data distribution or of the details of the sketching algorithm; instead, it constructs provably valid frequentist confidence intervals under the sole assumption of data exchangeability. Although our solution is broadly applicable, this paper focuses on applications involving the count-min sketch algorithm and a non-linear variation thereof. The performance is compared to that of frequentist and Bayesian alternatives through simulations and experiments with data sets of SARS-CoV-2 DNA sequences and classic English literature.
translated by 谷歌翻译
A flexible method is developed to construct a confidence interval for the frequency of a queried object in a very large data set, based on a much smaller sketch of the data. The approach requires no knowledge of the data distribution or of the details of the sketching algorithm; instead, it constructs provably valid frequentist confidence intervals for random queries using a conformal inference approach. After achieving marginal coverage for random queries under the assumption of data exchangeability, the proposed method is extended to provide stronger inferences accounting for possibly heterogeneous frequencies of different random queries, redundant queries, and distribution shifts. While the presented methods are broadly applicable, this paper focuses on use cases involving the count-min sketch algorithm and a non-linear variation thereof, to facilitate comparison to prior work. In particular, the developed methods are compared empirically to frequentist and Bayesian alternatives, through simulations and experiments with data sets of SARS-CoV-2 DNA sequences and classic English literature.
translated by 谷歌翻译
本文开发了新型的保形方法,以测试是否从与参考集相同的分布中采样了新的观察结果。以创新的方式将感应性和偏置的共形推断融合,所描述的方法可以以原则性的方式基于已知的分布式数据的依赖侧信息重新权重标准p值,并且可以自动利用最强大的优势来自任何一级和二进制分类器的模型。该解决方案可以通过样品分裂或通过新颖的转置交叉验证+方案来实现,该方案与现有的交叉验证方法相比,由于更严格的保证,这也可能在共形推理的其他应用中有用。在研究错误的发现率控制和在具有几个可能的离群值的多个测试框架内的虚假发现率控制和功率之后,提出的解决方案被证明通过模拟以及用于图像识别和表格数据的应用超过了标准的共形P值。
translated by 谷歌翻译
覆盖率概率的估计,尤其是缺失的质量,是许多科学领域应用的经典统计问题。在本文中,我们研究了与随机数据压缩或素描有关的问题。这是一种新颖但实际上相关的观点,它是指必须根据真实数据的压缩和不完美的摘要或草图来估算覆盖范围的情况,因为完全数据或不同符号的经验频率都无法直接观察。我们的贡献是一种贝叶斯非参数方法,可从随机哈希概述的数据中估算覆盖概率,这也解决了恢复真实数据中不同计数和不同计数的挑战性问题,并具有特定的感兴趣的经验频率。拟议的贝叶斯估计量很容易适用于大规模分析,结合了事先进行的差异过程,尽管在更一般的Pitman-yor过程中涉及一些公开的计算挑战。通过数值实验和应用于Covid DNA序列,经典英语文献和IP地址的真实数据集的应用,我们的方法论的经验有效性得到了证明。
translated by 谷歌翻译
Deep neural networks are powerful tools to detect hidden patterns in data and leverage them to make predictions, but they are not designed to understand uncertainty and estimate reliable probabilities. In particular, they tend to be overconfident. We begin to address this problem in the context of multi-class classification by developing a novel training algorithm producing models with more dependable uncertainty estimates, without sacrificing predictive power. The idea is to mitigate overconfidence by minimizing a loss function, inspired by advances in conformal inference, that quantifies model uncertainty by carefully leveraging hold-out data. Experiments with synthetic and real data demonstrate this method can lead to smaller conformal prediction sets with higher conditional coverage, after exact calibration with hold-out data, compared to state-of-the-art alternatives.
translated by 谷歌翻译
We develop a general framework for distribution-free predictive inference in regression, using conformal inference. The proposed methodology allows for the construction of a prediction band for the response variable using any estimator of the regression function. The resulting prediction band preserves the consistency properties of the original estimator under standard assumptions, while guaranteeing finite-sample marginal coverage even when these assumptions do not hold. We analyze and compare, both empirically and theoretically, the two major variants of our conformal framework: full conformal inference and split conformal inference, along with a related jackknife method. These methods offer different tradeoffs between statistical accuracy (length of resulting prediction intervals) and computational efficiency. As extensions, we develop a method for constructing valid in-sample prediction intervals called rank-one-out conformal inference, which has essentially the same computational efficiency as split conformal inference. We also describe an extension of our procedures for producing prediction bands with locally varying length, in order to adapt to heteroskedascity in the data. Finally, we propose a model-free notion of variable importance, called leave-one-covariate-out or LOCO inference. Accompanying this paper is an R package conformalInference that implements all of the proposals we have introduced. In the spirit of reproducibility, all of our empirical results can also be easily (re)generated using this package.
translated by 谷歌翻译
现在通常用于高风险设置,如医疗诊断,如医疗诊断,那么需要不确定量化,以避免后续模型失败。无分发的不确定性量化(无分布UQ)是用户友好的范式,用于为这种预测创建统计上严格的置信区间/集合。批判性地,间隔/集合有效而不进行分布假设或模型假设,即使具有最多许多DataPoints也具有显式保证。此外,它们适应输入的难度;当输入示例很困难时,不确定性间隔/集很大,信号传达模型可能是错误的。在没有多大的工作和没有再培训的情况下,可以在任何潜在的算法(例如神经网络)上使用无分​​发方法,以产生置信度集,以便包含用户指定概率,例如90%。实际上,这些方法易于理解和一般,应用于计算机视觉,自然语言处理,深度加强学习等领域出现的许多现代预测问题。这种实践介绍是针对对无需统计学家的免费UQ的实际实施感兴趣的读者。我们通过实际的理论和无分发UQ的应用领导读者,从保形预测开始,并使无关的任何风险的分布控制,如虚假发现率,假阳性分布检测,等等。我们将包括Python中的许多解释性插图,示例和代码样本,具有Pytorch语法。目标是提供读者对无分配UQ的工作理解,使它们能够将置信间隔放在算法上,其中包含一个自包含的文档。
translated by 谷歌翻译
对未来观察的预测是一个重要且具有挑战性的问题。分别量化预测不确定性使用预测区域和预测分布的两种主流方法,后者认为更具信息性,因为它可以执行其他与预测相关的任务。有效性的标准概念(我们在这里称为1型有效性)着重于预测区域的覆盖范围,而与预测分布执行的其他与预测相关的任务相关的有效性概念则缺乏。在这里,我们提出了一个新概念,称为2型有效性,与这些其他预测任务有关。我们建立了2型有效性和相干性能之间的联系,并表明为实现它而需要不精确的概率考虑因素。我们继续表明,可以通过将共形预测输出作为辅音合理性度量的轮廓函数来实现两种类型的预测有效性。我们还基于新的非参数推论模型构建提供了保​​形预测的替代表征,其中辅音的出现是自然的,并证明了其有效性。
translated by 谷歌翻译
在过去几十年中,已经提出了各种方法,用于估计回归设置中的预测间隔,包括贝叶斯方法,集合方法,直接间隔估计方法和保形预测方法。重要问题是这些方法的校准:生成的预测间隔应该具有预定义的覆盖水平,而不会过于保守。在这项工作中,我们从概念和实验的角度审查上述四类方法。结果来自各个域的基准数据集突出显示从一个数据集中的性能的大波动。这些观察可能归因于违反某些类别的某些方法所固有的某些假设。我们说明了如何将共形预测用作提供不具有校准步骤的方法的方法的一般校准程序。
translated by 谷歌翻译
我们开发了一个框架,用于在线环境中使用有效的覆盖范围保证构建不确定性集,其中基础数据分布可以急剧(甚至对手)随着时间的推移而发生巨大变化。我们提出的技术非常灵活,因为它可以与任何在线学习算法集成,需要最低限度的实施工作和计算成本。我们方法比现有替代方案的关键优势(也基于共形推断)是我们不需要将数据分为培训和保持校准集。这使我们能够以完全在线的方式拟合预测模型,并利用最新的观察结果来构建校准的不确定性集。因此,与现有技术相反,(i)我们构建的集合可以迅速适应分布的新变化; (ii)我们的过程不需要在每个时间步骤进行改装。使用合成和现实世界的基准数据集,我们证明了理论的有效性以及提案对现有技术的提高绩效。为了证明所提出的方法的更大灵活性,我们展示了如何为多出输出回归问题构造有效的间隔,而以前的顺序校准方法由于不切实际的计算和内存需求而无法处理。
translated by 谷歌翻译
在这项工作中,我们对基本思想和新颖的发展进行了综述的综述,这是基于最小的假设的一种无创新的,无分配的,非参数预测的方法 - 能够以非常简单的方式预测集屈服在有限样本案例中,在统计意义上也有效。论文中提供的深入讨论涵盖了共形预测的理论基础,然后继续列出原始想法的更高级的发展和改编。
translated by 谷歌翻译
我们提出了置信度序列 - 置信区间序列,其均匀地随时间均匀 - 用于基于I.I.D的流的完整,完全有序集中的任何分布的量级。观察。我们提供用于跟踪固定定量的方法并同时跟踪所有定量。具体而言,我们提供具有小常数的明确表达式,其宽度以尽可能快的$ \ SQRT {t} \ log \ log t} $率,以及实证分布函数的非渐近浓度不等式以相同的速率均匀地持续持续。后者加强了Smirnov迭代对数的实证过程法,延长了DVORETZKY-KIEFER-WOLFOITZ不等式以均匀地保持一段时间。我们提供了一种新的算法和样本复杂性,用于在多武装强盗框架中选择具有大约最佳定量的臂。在仿真中,我们的方法需要比现有方法更少五到五十的样品。
translated by 谷歌翻译
The notion of uncertainty is of major importance in machine learning and constitutes a key element of machine learning methodology. In line with the statistical tradition, uncertainty has long been perceived as almost synonymous with standard probability and probabilistic predictions. Yet, due to the steadily increasing relevance of machine learning for practical applications and related issues such as safety requirements, new problems and challenges have recently been identified by machine learning scholars, and these problems may call for new methodological developments. In particular, this includes the importance of distinguishing between (at least) two different types of uncertainty, often referred to as aleatoric and epistemic. In this paper, we provide an introduction to the topic of uncertainty in machine learning as well as an overview of attempts so far at handling uncertainty in general and formalizing this distinction in particular.
translated by 谷歌翻译
We present a new distribution-free conformal prediction algorithm for sequential data (e.g., time series), called the \textit{sequential predictive conformal inference} (\texttt{SPCI}). We specifically account for the nature that the time series data are non-exchangeable, and thus many existing conformal prediction algorithms based on temporal residuals are not applicable. The main idea is to exploit the temporal dependence of conformity scores; thus, the past conformity scores contain information about future ones. Then we cast the problem of conformal prediction interval as predicting the quantile of a future residual, given a prediction algorithm. Theoretically, we establish asymptotic valid conditional coverage upon extending consistency analyses in quantile regression. Using simulation and real-data experiments, we demonstrate a significant reduction in interval width of \texttt{SPCI} compared to other existing methods under the desired empirical coverage.
translated by 谷歌翻译
本文衍生了置信区间(CI)和时间统一的置信序列(CS),用于从有限观测值中估算未知平均值的经典问题。我们提出了一种衍生浓度界限的一般方法,可以看作是著名的切尔诺夫方法的概括(和改进)。它的核心是基于推导一类新的复合非负胸腔,通过投注和混合方法与测试的连接很强。我们展示了如何将这些想法扩展到无需更换的情况下,这是另一个经过深入研究的问题。在所有情况下,我们的界限都适应未知的差异,并且基于Hoeffding或经验的Bernstein不平等及其最近的Supermartingale概括,经验上大大优于现有方法。简而言之,我们为四个基本问题建立了一个新的最先进的问题:在有或没有替换的情况下进行采样时,CS和CI进行有限的手段。
translated by 谷歌翻译
分位数回归是统计学习中的一个基本问题,这是由于需要量化预测中的不确定性或对多样化的人群建模而不过分减少的统计学习。例如,流行病学预测,成本估算和收入预测都可以准确地量化可能的值的范围。因此,在计量经济学,统计和机器学习的多年研究中,已经为这个问题开发了许多模型。而不是提出另一种(新的)算法用于分位数回归,而是采用元观点:我们研究用于汇总任意数量的有条件分位模型的方法,以提高准确性和鲁棒性。我们考虑加权合奏,其中权重不仅可能因单个模型,而且要多于分位数和特征值而变化。我们在本文中考虑的所有模型都可以使用现代深度学习工具包适合,因此可以广泛访问(从实现的角度)和可扩展。为了提高预测分位数的准确性(或等效地,预测间隔),我们开发了确保分位数保持单调排序的工具,并采用保形校准方法。可以使用这些,而无需对原始模型的原始库进行任何修改。我们还回顾了一些围绕分数聚集和相关评分规则的基本理论,并为该文献做出了一些新的结果(例如,在分类或等渗后回归只能提高加权间隔得分的事实)。最后,我们提供了来自两个不同基准存储库的34个数据集的广泛的经验比较套件。
translated by 谷歌翻译
算法稳定性是一种学习理论的概念,其表示对输入数据的改变的程度(例如,删除单个数据点)可能会影响回归算法的输出。了解算法的稳定性属性通常对许多下游应用程序有用 - 例如,已知稳定性导致所需的概括性属性和预测推理保证。然而,目前在实践中使用的许多现代算法太复杂,无法对其稳定性的理论分析,因此我们只能通过算法在各种数据集上的行为的实证探索来尝试建立这些属性。在这项工作中,我们为这种“黑匣子测试”奠定了一个正式的统计框架,而没有任何关于算法或数据分布的假设,并在任何黑匣子测试识别算法稳定性的能力方面建立基本界限。
translated by 谷歌翻译
计数示意图(CMS)是一个时间和内存有效的随机数据结构,可根据随机哈希的数据提供令牌数据流(即点查询)中代币频率的估计。 CAI,Mitzenmacher和Adams(\ textit {neurips} 2018)提出了CMS的学习增强版本,称为CMS-DP,它依赖于贝叶斯非参与式(BNP)模型通过dirichlet过程(DP),给定数据,估计点查询作为位置查询后验分布的合适平均功能的估计值给定数据。尽管CMS-DP已被证明可以改善CMS的某些方面,但它具有``建设性的''证明的主要缺点,该证明是基于针对DP先验的论点构建的,即对其他非参数priors不使用的论点。在本文中,我们提出了CMS-DP的``贝叶斯''证明,其主要优点是基于原则上可用的参数,在广泛的非参数先验中,这是由归一化的完全随机措施引起的。该结果导致在Power-Law数据流下开发了一种新颖的学习增强的CMS,称为CMS-PYP,该CMS-PYP依赖于Pitman-Yor流程(PYP)的BNP模型。在这个更一般的框架下,我们应用了CMS-DP的``贝叶斯人''证明的论点,适当地适合PYP先验,以计算鉴于Hashed Data。数据和真实文本数据显示,CMS-PYP在估计低频代币方面优于CMS和CMS-DP,这在文本数据中是至关重要的,并且相对于CMS的变化,它具有竞争力还讨论了为低频代币设计的。还讨论了我们BNP方法扩展到更通用的查询。
translated by 谷歌翻译
我们介绍了学习然后测试,校准机器学习模型的框架,使其预测满足明确的,有限样本统计保证,无论底层模型如何和(未知)数据生成分布。框架地址,以及在其他示例中,在多标签分类中的错误发现速率控制,在实例分割中交叉联盟控制,以及同时控制分类或回归中的异常检测和置信度覆盖的类型误差。为实现这一目标,我们解决了一个关键的技术挑战:控制不一定单调的任意风险。我们的主要洞察力是将风险控制问题重新构建为多个假设检测,使技术和数学论据不同于先前文献中的技术。我们使用我们的框架为多个核心机器学习任务提供新的校准方法,在计算机视觉中具有详细的工作示例。
translated by 谷歌翻译
Classical asymptotic theory for statistical inference usually involves calibrating a statistic by fixing the dimension $d$ while letting the sample size $n$ increase to infinity. Recently, much effort has been dedicated towards understanding how these methods behave in high-dimensional settings, where $d$ and $n$ both increase to infinity together. This often leads to different inference procedures, depending on the assumptions about the dimensionality, leaving the practitioner in a bind: given a dataset with 100 samples in 20 dimensions, should they calibrate by assuming $n \gg d$, or $d/n \approx 0.2$? This paper considers the goal of dimension-agnostic inference; developing methods whose validity does not depend on any assumption on $d$ versus $n$. We introduce an approach that uses variational representations of existing test statistics along with sample splitting and self-normalization to produce a new test statistic with a Gaussian limiting distribution, regardless of how $d$ scales with $n$. The resulting statistic can be viewed as a careful modification of degenerate U-statistics, dropping diagonal blocks and retaining off-diagonal blocks. We exemplify our technique for some classical problems including one-sample mean and covariance testing, and show that our tests have minimax rate-optimal power against appropriate local alternatives. In most settings, our cross U-statistic matches the high-dimensional power of the corresponding (degenerate) U-statistic up to a $\sqrt{2}$ factor.
translated by 谷歌翻译