共形预测(CP)是一种多功能的非参数框架,用于量化预测问题中的不确定性。在这项工作中,我们通过首次提出可以应用于时间不断发展的表面,将这种方法扩展到在双变量域上定义的时间序列函数的情况。为了获得有意义有效的预测区域,CP必须与准确的预测算法结合使用,因此,我们扩展了希尔伯特空间中自回旋过程的理论理论,以允许具有双变量域的功能。考虑到该主题的新颖性,我们提出了功能自回旋模型(FAR)的估计技术。实施了仿真研究,以研究不同的点预测因子如何影响所得的预测频段。最后,我们探索了真正数据集中拟议方法的利益和限制,在过去的二十年中,每天都会观察到黑海的海平面异常。
translated by 谷歌翻译
在这项工作中,我们对基本思想和新颖的发展进行了综述的综述,这是基于最小的假设的一种无创新的,无分配的,非参数预测的方法 - 能够以非常简单的方式预测集屈服在有限样本案例中,在统计意义上也有效。论文中提供的深入讨论涵盖了共形预测的理论基础,然后继续列出原始想法的更高级的发展和改编。
translated by 谷歌翻译
我们介绍了一类小说的预计方法,对实际线上的概率分布数据集进行统计分析,具有2-Wassersein指标。我们特别关注主成分分析(PCA)和回归。为了定义这些模型,我们通过将数据映射到合适的线性空间并使用度量投影运算符来限制Wassersein空间中的结果来利用与其弱利米结构密切相关的Wasserstein空间的表示。通过仔细选择切线,我们能够推出快速的经验方法,利用受约束的B样条近似。作为我们方法的副产品,我们还能够为PCA的PCA进行更快的例程来获得分布。通过仿真研究,我们将我们的方法与先前提出的方法进行比较,表明我们预计的PCA具有类似的性能,即使在拼盘下也是极其灵活的。研究了模型的若干理论性质,并证明了渐近一致性。讨论了两个真实世界应用于美国和风速预测的Covid-19死亡率。
translated by 谷歌翻译
预测组合在预测社区中蓬勃发展,近年来,已经成为预测研究和活动主流的一部分。现在,由单个(目标)系列产生的多个预测组合通过整合来自不同来源收集的信息,从而提高准确性,从而减轻了识别单个“最佳”预测的风险。组合方案已从没有估计的简单组合方法演变为涉及时间变化的权重,非线性组合,组件之间的相关性和交叉学习的复杂方法。它们包括结合点预测和结合概率预测。本文提供了有关预测组合的广泛文献的最新评论,并参考可用的开源软件实施。我们讨论了各种方法的潜在和局限性,并突出了这些思想如何随着时间的推移而发展。还调查了有关预测组合实用性的一些重要问题。最后,我们以当前的研究差距和未来研究的潜在见解得出结论。
translated by 谷歌翻译
We present a new distribution-free conformal prediction algorithm for sequential data (e.g., time series), called the \textit{sequential predictive conformal inference} (\texttt{SPCI}). We specifically account for the nature that the time series data are non-exchangeable, and thus many existing conformal prediction algorithms based on temporal residuals are not applicable. The main idea is to exploit the temporal dependence of conformity scores; thus, the past conformity scores contain information about future ones. Then we cast the problem of conformal prediction interval as predicting the quantile of a future residual, given a prediction algorithm. Theoretically, we establish asymptotic valid conditional coverage upon extending consistency analyses in quantile regression. Using simulation and real-data experiments, we demonstrate a significant reduction in interval width of \texttt{SPCI} compared to other existing methods under the desired empirical coverage.
translated by 谷歌翻译
在过去几十年中,已经提出了各种方法,用于估计回归设置中的预测间隔,包括贝叶斯方法,集合方法,直接间隔估计方法和保形预测方法。重要问题是这些方法的校准:生成的预测间隔应该具有预定义的覆盖水平,而不会过于保守。在这项工作中,我们从概念和实验的角度审查上述四类方法。结果来自各个域的基准数据集突出显示从一个数据集中的性能的大波动。这些观察可能归因于违反某些类别的某些方法所固有的某些假设。我们说明了如何将共形预测用作提供不具有校准步骤的方法的方法的一般校准程序。
translated by 谷歌翻译
We develop a general framework for distribution-free predictive inference in regression, using conformal inference. The proposed methodology allows for the construction of a prediction band for the response variable using any estimator of the regression function. The resulting prediction band preserves the consistency properties of the original estimator under standard assumptions, while guaranteeing finite-sample marginal coverage even when these assumptions do not hold. We analyze and compare, both empirically and theoretically, the two major variants of our conformal framework: full conformal inference and split conformal inference, along with a related jackknife method. These methods offer different tradeoffs between statistical accuracy (length of resulting prediction intervals) and computational efficiency. As extensions, we develop a method for constructing valid in-sample prediction intervals called rank-one-out conformal inference, which has essentially the same computational efficiency as split conformal inference. We also describe an extension of our procedures for producing prediction bands with locally varying length, in order to adapt to heteroskedascity in the data. Finally, we propose a model-free notion of variable importance, called leave-one-covariate-out or LOCO inference. Accompanying this paper is an R package conformalInference that implements all of the proposals we have introduced. In the spirit of reproducibility, all of our empirical results can also be easily (re)generated using this package.
translated by 谷歌翻译
我们考虑在离散观察点上测量的功能数据。通常通过额外的噪声测量这种数据。我们在本文中探讨了这种类型数据的因子结构。我们表明潜伏信号可以归因于相应因子模型的公共组件,并且可以通过来自因子模型文献的方法借用方法来估计。我们还表明,在采取这种多变量而不是“功能”的角度之后,可以准确地估计在功能数据分析中发挥关键作用的主成分。除了估计问题之外,我们还解决了对IID噪声的零假设的测试。虽然这个假设在很大程度上在文献中主要是普遍存在的,但我们认为它通常不切实际,并且不受残留分析的支持。
translated by 谷歌翻译
Accurate uncertainty measurement is a key step to building robust and reliable machine learning systems. Conformal prediction is a distribution-free uncertainty quantification algorithm popular for its ease of implementation, statistical coverage guarantees, and versatility for underlying forecasters. However, existing conformal prediction algorithms for time series are limited to single-step prediction without considering the temporal dependency. In this paper we propose a Copula Conformal Prediction algorithm for multivariate, multi-step Time Series forecasting, CopulaCPTS. On several synthetic and real-world multivariate time series datasets, we show that CopulaCPTS produces more calibrated and sharp confidence intervals for multi-step prediction tasks than existing techniques.
translated by 谷歌翻译
Network-based analyses of dynamical systems have become increasingly popular in climate science. Here we address network construction from a statistical perspective and highlight the often ignored fact that the calculated correlation values are only empirical estimates. To measure spurious behaviour as deviation from a ground truth network, we simulate time-dependent isotropic random fields on the sphere and apply common network construction techniques. We find several ways in which the uncertainty stemming from the estimation procedure has major impact on network characteristics. When the data has locally coherent correlation structure, spurious link bundle teleconnections and spurious high-degree clusters have to be expected. Anisotropic estimation variance can also induce severe biases into empirical networks. We validate our findings with ERA5 reanalysis data. Moreover we explain why commonly applied resampling procedures are inappropriate for significance evaluation and propose a statistically more meaningful ensemble construction framework. By communicating which difficulties arise in estimation from scarce data and by presenting which design decisions increase robustness, we hope to contribute to more reliable climate network construction in the future.
translated by 谷歌翻译
了解特定待遇或政策与许多感兴趣领域有关的影响,从政治经济学,营销到医疗保健。在本文中,我们开发了一种非参数算法,用于在合成控制的背景下检测随着时间的流逝的治疗作用。该方法基于许多算法的反事实预测,而不必假设该算法正确捕获模型。我们介绍了一种推论程序来检测治疗效果,并表明测试程序对于固定,β混合过程渐近有效,而无需对所考虑的一组基础算法施加任何限制。我们讨论了平均治疗效果估计的一致性保证,并为提出的方法提供了遗憾的界限。算法类别可能包括随机森林,套索或任何其他机器学习估计器。数值研究和应用说明了该方法的优势。
translated by 谷歌翻译
在本文中,我们考虑了使用相同的预测精度测试程序在横截面依赖下实现了实现波动率测量的预测评估。在预测实现挥发性时,我们根据增强横截面评估模型的预测精度。在相等预测精度的零假设下,所采用的基准模型是标准的HAR模型,而在非相同的预测精度的替代方案下,预测模型是通过套索缩收估计的增强的HAR模型。我们通过结合测量误差校正以及横截面跳转分量测量来研究预报对模型规范的敏感性。使用数值实现评估模型的样本外预测评估。
translated by 谷歌翻译
Classical asymptotic theory for statistical inference usually involves calibrating a statistic by fixing the dimension $d$ while letting the sample size $n$ increase to infinity. Recently, much effort has been dedicated towards understanding how these methods behave in high-dimensional settings, where $d$ and $n$ both increase to infinity together. This often leads to different inference procedures, depending on the assumptions about the dimensionality, leaving the practitioner in a bind: given a dataset with 100 samples in 20 dimensions, should they calibrate by assuming $n \gg d$, or $d/n \approx 0.2$? This paper considers the goal of dimension-agnostic inference; developing methods whose validity does not depend on any assumption on $d$ versus $n$. We introduce an approach that uses variational representations of existing test statistics along with sample splitting and self-normalization to produce a new test statistic with a Gaussian limiting distribution, regardless of how $d$ scales with $n$. The resulting statistic can be viewed as a careful modification of degenerate U-statistics, dropping diagonal blocks and retaining off-diagonal blocks. We exemplify our technique for some classical problems including one-sample mean and covariance testing, and show that our tests have minimax rate-optimal power against appropriate local alternatives. In most settings, our cross U-statistic matches the high-dimensional power of the corresponding (degenerate) U-statistic up to a $\sqrt{2}$ factor.
translated by 谷歌翻译
预测一组结果 - 而不是独特的结果 - 是统计学习中不确定性定量的有前途的解决方案。尽管有关于构建具有统计保证的预测集的丰富文献,但适应未知的协变量转变(实践中普遍存在的问题)还是一个严重的未解决的挑战。在本文中,我们表明具有有限样本覆盖范围保证的预测集是非信息性的,并提出了一种新型的无灵活分配方法PredSet-1Step,以有效地构建了在未知协方差转移下具有渐近覆盖范围保证的预测集。我们正式表明我们的方法是\ textIt {渐近上可能是近似正确},对大型样本的置信度有很好的覆盖误差。我们说明,在南非队列研究中,它在许多实验和有关HIV风险预测的数据集中实现了名义覆盖范围。我们的理论取决于基于一般渐近线性估计器的WALD置信区间覆盖范围的融合率的新结合。
translated by 谷歌翻译
提出了用于基于合奏的估计和模拟高维动力系统(例如海洋或大气流)的方法学框架。为此,动态系统嵌入了一个由动力学驱动的内核功能的繁殖核Hilbert空间的家族中。这个家庭因其吸引人的财产而被昵称为仙境。在梦游仙境中,Koopman和Perron-Frobenius操作员是统一且均匀的。该属性保证它们可以在一系列可对角线的无限发电机中表达。访问Lyapunov指数和切线线性动力学的精确集合表达式也可以直接可用。仙境使我们能够根据轨迹样本的恒定时间线性组合来设计出惊人的简单集合数据同化方法。通过几个基本定理的完全合理的叠加原则,使这种令人尴尬的简单策略成为可能。
translated by 谷歌翻译
Testing the significance of a variable or group of variables $X$ for predicting a response $Y$, given additional covariates $Z$, is a ubiquitous task in statistics. A simple but common approach is to specify a linear model, and then test whether the regression coefficient for $X$ is non-zero. However, when the model is misspecified, the test may have poor power, for example when $X$ is involved in complex interactions, or lead to many false rejections. In this work we study the problem of testing the model-free null of conditional mean independence, i.e. that the conditional mean of $Y$ given $X$ and $Z$ does not depend on $X$. We propose a simple and general framework that can leverage flexible nonparametric or machine learning methods, such as additive models or random forests, to yield both robust error control and high power. The procedure involves using these methods to perform regressions, first to estimate a form of projection of $Y$ on $X$ and $Z$ using one half of the data, and then to estimate the expected conditional covariance between this projection and $Y$ on the remaining half of the data. While the approach is general, we show that a version of our procedure using spline regression achieves what we show is the minimax optimal rate in this nonparametric testing problem. Numerical experiments demonstrate the effectiveness of our approach both in terms of maintaining Type I error control, and power, compared to several existing approaches.
translated by 谷歌翻译
本文提出了删除 - $ D $ jackknife的概括,以解决时间序列的HyperParameter选择问题。我称之为人工删除 - $ D $ jackknife强调,这种方法用虚拟删除替代经典的去除步骤,其中观察到的数据点被人工缺失值替换。这样做保留了数据订单完好无损,并允许与时间序列的简单兼容性。此稿件显示了一种简单的例证,其中应用于调节高维弹性净矢量自动增加移动平均(Varma)模型。
translated by 谷歌翻译
对未来观察的预测是一个重要且具有挑战性的问题。分别量化预测不确定性使用预测区域和预测分布的两种主流方法,后者认为更具信息性,因为它可以执行其他与预测相关的任务。有效性的标准概念(我们在这里称为1型有效性)着重于预测区域的覆盖范围,而与预测分布执行的其他与预测相关的任务相关的有效性概念则缺乏。在这里,我们提出了一个新概念,称为2型有效性,与这些其他预测任务有关。我们建立了2型有效性和相干性能之间的联系,并表明为实现它而需要不精确的概率考虑因素。我们继续表明,可以通过将共形预测输出作为辅音合理性度量的轮廓函数来实现两种类型的预测有效性。我们还基于新的非参数推论模型构建提供了保​​形预测的替代表征,其中辅音的出现是自然的,并证明了其有效性。
translated by 谷歌翻译
基于签名的技术使数学洞察力洞悉不断发展的数据的复杂流之间的相互作用。这些见解可以自然地转化为理解流数据的数值方法,也许是由于它们的数学精度,已被证明在数据不规则而不是固定的情况下分析流的数据以及数据和数据的尺寸很有用样本量均为中等。了解流的多模式数据是指数的:$ d $ d $的字母中的$ n $字母中的一个单词可以是$ d^n $消息之一。签名消除了通过采样不规则性引起的指数级噪声,但仍然存在指数量的信息。这项调查旨在留在可以直接管理指数缩放的域中。在许多问题中,可伸缩性问题是一个重要的挑战,但需要另一篇调查文章和进一步的想法。这项调查描述了一系列环境集足够小以消除大规模机器学习的可能性,并且可以有效地使用一小部分免费上下文和原则性功能。工具的数学性质可以使他们对非数学家的使用恐吓。本文中介绍的示例旨在弥合此通信差距,并提供从机器学习环境中绘制的可进行的工作示例。笔记本可以在线提供这些示例中的一些。这项调查是基于伊利亚·雪佛兰(Ilya Chevryev)和安德烈·科米利津(Andrey Kormilitzin)的早期论文,它们在这种机械开发的较早时刻大致相似。本文说明了签名提供的理论见解是如何在对应用程序数据的分析中简单地实现的,这种方式在很大程度上对数据类型不可知。
translated by 谷歌翻译
Conformal prediction constructs a confidence set for an unobserved response of a feature vector based on previous identically distributed and exchangeable observations of responses and features. It has a coverage guarantee at any nominal level without additional assumptions on their distribution. Its computation deplorably requires a refitting procedure for all replacement candidates of the target response. In regression settings, this corresponds to an infinite number of model fits. Apart from relatively simple estimators that can be written as pieces of linear function of the response, efficiently computing such sets is difficult, and is still considered as an open problem. We exploit the fact that, \emph{often}, conformal prediction sets are intervals whose boundaries can be efficiently approximated by classical root-finding algorithms. We investigate how this approach can overcome many limitations of formerly used strategies; we discuss its complexity and drawbacks.
translated by 谷歌翻译