随机过程是随机变量,其中一些路径中的值。然而,将随机过程降低到路径值随机变量忽略其过滤,即通过时间通过该过程携带的信息流。通过调节其过滤过程,我们介绍了一系列高阶内核eMbeddings(KMES),概括了KME的概念,并捕获了与过滤有关的附加信息。我们导出了相关的高阶最大均衡(MMD)的经验估计器,并证明了一致性。然后,我们构建一个过滤敏感的内核两种样本测试,能够拾取标准MMD测试错过的信息。此外,利用我们的更高阶MMDS,我们在随机过程中构建了一个通用内核的家庭,允许通过经典内核的回归方法解决现实世界校准和最佳停止问题(例如美国选项的定价)。最后,调整对随机过程的情况的条件独立性的现有测试,我们设计了一种因果发现算法,以恢复与其多维轨迹的观察相互作用的结构依赖性的因果关系。
translated by 谷歌翻译
矢量值随机变量的矩序列可以表征其定律。我们通过使用所谓的稳健签名矩来研究路径值随机变量(即随机过程)的类似问题。这使我们能够为随机过程定律得出最大平均差异类型的度量,并研究其在随机过程定律方面引起的拓扑。可以使用签名内核对该度量进行内核,从而有效地计算它。作为应用程序,我们为随机过程定律提供了非参数的两样本假设检验。
translated by 谷歌翻译
We propose a framework for analyzing and comparing distributions, which we use to construct statistical tests to determine if two samples are drawn from different distributions. Our test statistic is the largest difference in expectations over functions in the unit ball of a reproducing kernel Hilbert space (RKHS), and is called the maximum mean discrepancy (MMD). We present two distributionfree tests based on large deviation bounds for the MMD, and a third test based on the asymptotic distribution of this statistic. The MMD can be computed in quadratic time, although efficient linear time approximations are available. Our statistic is an instance of an integral probability metric, and various classical metrics on distributions are obtained when alternative function classes are used in place of an RKHS. We apply our two-sample tests to a variety of problems, including attribute matching for databases using the Hungarian marriage method, where they perform strongly. Excellent performance is also obtained when comparing distributions over graphs, for which these are the first such tests.
translated by 谷歌翻译
We consider autocovariance operators of a stationary stochastic process on a Polish space that is embedded into a reproducing kernel Hilbert space. We investigate how empirical estimates of these operators converge along realizations of the process under various conditions. In particular, we examine ergodic and strongly mixing processes and obtain several asymptotic results as well as finite sample error bounds. We provide applications of our theory in terms of consistency results for kernel PCA with dependent data and the conditional mean embedding of transition probabilities. Finally, we use our approach to examine the nonparametric estimation of Markov transition operators and highlight how our theory can give a consistency analysis for a large family of spectral analysis methods including kernel-based dynamic mode decomposition.
translated by 谷歌翻译
We introduce the Conditional Independence Regression CovariancE (CIRCE), a measure of conditional independence for multivariate continuous-valued variables. CIRCE applies as a regularizer in settings where we wish to learn neural features $\varphi(X)$ of data $X$ to estimate a target $Y$, while being conditionally independent of a distractor $Z$ given $Y$. Both $Z$ and $Y$ are assumed to be continuous-valued but relatively low dimensional, whereas $X$ and its features may be complex and high dimensional. Relevant settings include domain-invariant learning, fairness, and causal learning. The procedure requires just a single ridge regression from $Y$ to kernelized features of $Z$, which can be done in advance. It is then only necessary to enforce independence of $\varphi(X)$ from residuals of this regression, which is possible with attractive estimation properties and consistency guarantees. By contrast, earlier measures of conditional feature dependence require multiple regressions for each step of feature learning, resulting in more severe bias and variance, and greater computational cost. When sufficiently rich features are used, we establish that CIRCE is zero if and only if $\varphi(X) \perp \!\!\! \perp Z \mid Y$. In experiments, we show superior performance to previous methods on challenging benchmarks, including learning conditionally invariant image features.
translated by 谷歌翻译
我们合并计算力学的因果状态(预测等同历史)的定义与再现 - 内核希尔伯特空间(RKHS)表示推断。结果是一种广泛适用的方法,可直接从系统行为的观察中迁移因果结构,无论它们是否超过离散或连续事件或时间。结构表示 - 有限或无限状态内核$ \ epsilon $ -Machine - 由减压变换提取,其提供了有效的因果状态及其拓扑。以这种方式,系统动态由用于在因果状态上的随机(普通或部分)微分方程表示。我们介绍了一种算法来估计相关的演化运营商。平行于Fokker-Plank方程,它有效地发展了因果状态分布,并通过RKHS功能映射在原始数据空间中进行预测。我们展示了这些技术,以及他们的预测能力,在离散时间的离散时间离散 - 有限的无限值Markov订单流程,其中有限状态隐藏马尔可夫模型与(i)有限或(ii)不可数 - 无限因果态和(iii)连续时间,由热驱动的混沌流产生的连续值处理。该方法在存在不同的外部和测量噪声水平和非常高的维数据存在下鲁棒地估计因果结构。
translated by 谷歌翻译
评估数据流是否是从相同分布中绘制的是各种机器学习问题的核心。这与动态系统生成的数据尤其重要,因为这种系统对于生物医学,经济或工程系统的许多实际过程至关重要。虽然内核两样本测试对于比较独立和相同分布的随机变量具有强大的功能,但没有建立的方法来比较动态系统。主要问题是固有的违反独立假设。我们通过解决三个核心挑战提出了针对动态系统的两样本测试:我们(i)引入了一种新颖的混合概念,该概念在相关度量标准中捕获自相关,(ii)提出了一种有效的方法来估计混合速度纯粹依赖于纯粹依赖混合的速度。数据,(iii)将它们集成到已建立的核两样本测试中。结果是一种数据驱动的方法,可直接在实践中使用,并具有合理的理论保证。在从人类步行数据中进行异常检测的示例应用程序中,我们表明该测试很容易适用,没有任何人类的专家知识和功能工程。
translated by 谷歌翻译
我们解决了在没有观察到的混杂的存在下的因果效应估计的问题,但是观察到潜在混杂因素的代理。在这种情况下,我们提出了两种基于内核的方法,用于非线性因果效应估计:(a)两阶段回归方法,以及(b)最大矩限制方法。我们专注于近端因果学习设置,但是我们的方法可以用来解决以弗雷霍尔姆积分方程为特征的更广泛的逆问题。特别是,我们提供了在非线性环境中解决此问题的两阶段和矩限制方法的统一视图。我们为每种算法提供一致性保证,并证明这些方法在合成数据和模拟现实世界任务的数据上获得竞争结果。特别是,我们的方法优于不适合利用代理变量的早期方法。
translated by 谷歌翻译
随着混凝剂的数量增加,因果推理越来越复杂。给定护理$ x $,混淆器$ z $和结果$ y $,我们开发一个非参数方法来测试\ texit {do-null}假设$ h_0:\; p(y | \ text {\它do}(x = x))= p(y)$违反替代方案。在Hilbert Schmidt独立性标准(HSIC)上进行边缘独立性测试,我们提出了后门 - HSIC(BD-HSIC)并证明它被校准,并且在大量混淆下具有二元和连续治疗的力量。此外,我们建立了BD-HSIC中使用的协方差运算符的估计的收敛性质。我们研究了BD-HSIC对参数测试的优点和缺点以及与边缘独立测试或有条件独立测试相比使用DO-NULL测试的重要性。可以在\超链接{https:/github.com/mrhuff/kgformula} {\ texttt {https://github.com/mrhuff/kgformula}}完整的实现。
translated by 谷歌翻译
Linear partial differential equations (PDEs) are an important, widely applied class of mechanistic models, describing physical processes such as heat transfer, electromagnetism, and wave propagation. In practice, specialized numerical methods based on discretization are used to solve PDEs. They generally use an estimate of the unknown model parameters and, if available, physical measurements for initialization. Such solvers are often embedded into larger scientific models or analyses with a downstream application such that error quantification plays a key role. However, by entirely ignoring parameter and measurement uncertainty, classical PDE solvers may fail to produce consistent estimates of their inherent approximation error. In this work, we approach this problem in a principled fashion by interpreting solving linear PDEs as physics-informed Gaussian process (GP) regression. Our framework is based on a key generalization of a widely-applied theorem for conditioning GPs on a finite number of direct observations to observations made via an arbitrary bounded linear operator. Crucially, this probabilistic viewpoint allows to (1) quantify the inherent discretization error; (2) propagate uncertainty about the model parameters to the solution; and (3) condition on noisy measurements. Demonstrating the strength of this formulation, we prove that it strictly generalizes methods of weighted residuals, a central class of PDE solvers including collocation, finite volume, pseudospectral, and (generalized) Galerkin methods such as finite element and spectral methods. This class can thus be directly equipped with a structured error estimate and the capability to incorporate uncertain model parameters and observations. In summary, our results enable the seamless integration of mechanistic models as modular building blocks into probabilistic models.
translated by 谷歌翻译
Independence testing is a fundamental and classical statistical problem that has been extensively studied in the batch setting when one fixes the sample size before collecting data. However, practitioners often prefer procedures that adapt to the complexity of a problem at hand instead of setting sample size in advance. Ideally, such procedures should (a) allow stopping earlier on easy tasks (and later on harder tasks), hence making better use of available resources, and (b) continuously monitor the data and efficiently incorporate statistical evidence after collecting new data, while controlling the false alarm rate. It is well known that classical batch tests are not tailored for streaming data settings, since valid inference after data peeking requires correcting for multiple testing, but such corrections generally result in low power. In this paper, we design sequential kernelized independence tests (SKITs) that overcome such shortcomings based on the principle of testing by betting. We exemplify our broad framework using bets inspired by kernelized dependence measures such as the Hilbert-Schmidt independence criterion (HSIC) and the constrained-covariance criterion (COCO). Importantly, we also generalize the framework to non-i.i.d. time-varying settings, for which there exist no batch tests. We demonstrate the power of our approaches on both simulated and real data.
translated by 谷歌翻译
基于签名的技术使数学洞察力洞悉不断发展的数据的复杂流之间的相互作用。这些见解可以自然地转化为理解流数据的数值方法,也许是由于它们的数学精度,已被证明在数据不规则而不是固定的情况下分析流的数据以及数据和数据的尺寸很有用样本量均为中等。了解流的多模式数据是指数的:$ d $ d $的字母中的$ n $字母中的一个单词可以是$ d^n $消息之一。签名消除了通过采样不规则性引起的指数级噪声,但仍然存在指数量的信息。这项调查旨在留在可以直接管理指数缩放的域中。在许多问题中,可伸缩性问题是一个重要的挑战,但需要另一篇调查文章和进一步的想法。这项调查描述了一系列环境集足够小以消除大规模机器学习的可能性,并且可以有效地使用一小部分免费上下文和原则性功能。工具的数学性质可以使他们对非数学家的使用恐吓。本文中介绍的示例旨在弥合此通信差距,并提供从机器学习环境中绘制的可进行的工作示例。笔记本可以在线提供这些示例中的一些。这项调查是基于伊利亚·雪佛兰(Ilya Chevryev)和安德烈·科米利津(Andrey Kormilitzin)的早期论文,它们在这种机械开发的较早时刻大致相似。本文说明了签名提供的理论见解是如何在对应用程序数据的分析中简单地实现的,这种方式在很大程度上对数据类型不可知。
translated by 谷歌翻译
We study a class of dynamical systems modelled as Markov chains that admit an invariant distribution via the corresponding transfer, or Koopman, operator. While data-driven algorithms to reconstruct such operators are well known, their relationship with statistical learning is largely unexplored. We formalize a framework to learn the Koopman operator from finite data trajectories of the dynamical system. We consider the restriction of this operator to a reproducing kernel Hilbert space and introduce a notion of risk, from which different estimators naturally arise. We link the risk with the estimation of the spectral decomposition of the Koopman operator. These observations motivate a reduced-rank operator regression (RRR) estimator. We derive learning bounds for the proposed estimator, holding both in i.i.d. and non i.i.d. settings, the latter in terms of mixing coefficients. Our results suggest RRR might be beneficial over other widely used estimators as confirmed in numerical experiments both for forecasting and mode decomposition.
translated by 谷歌翻译
Several problems in stochastic analysis are defined through their geometry, and preserving that geometric structure is essential to generating meaningful predictions. Nevertheless, how to design principled deep learning (DL) models capable of encoding these geometric structures remains largely unknown. We address this open problem by introducing a universal causal geometric DL framework in which the user specifies a suitable pair of geometries $\mathscr{X}$ and $\mathscr{Y}$ and our framework returns a DL model capable of causally approximating any ``regular'' map sending time series in $\mathscr{X}^{\mathbb{Z}}$ to time series in $\mathscr{Y}^{\mathbb{Z}}$ while respecting their forward flow of information throughout time. Suitable geometries on $\mathscr{Y}$ include various (adapted) Wasserstein spaces arising in optimal stopping problems, a variety of statistical manifolds describing the conditional distribution of continuous-time finite state Markov chains, and all Fr\'echet spaces admitting a Schauder basis, e.g. as in classical finance. Suitable, $\mathscr{X}$ are any compact subset of any Euclidean space. Our results all quantitatively express the number of parameters needed for our DL model to achieve a given approximation error as a function of the target map's regularity and the geometric structure both of $\mathscr{X}$ and of $\mathscr{Y}$. Even when omitting any temporal structure, our universal approximation theorems are the first guarantees that H\"older functions, defined between such $\mathscr{X}$ and $\mathscr{Y}$ can be approximated by DL models.
translated by 谷歌翻译
内核Stein差异(KSD)是一种基于内核的广泛使用概率指标之间差异的非参数量度。它通常在用户从候选概率度量中收集的样本集合的情况下使用,并希望将它们与指定的目标概率度量进行比较。 KSD的一个有用属性是,它可以仅从候选度量的样本中计算出来,并且不知道目标度量的正常化常数。 KSD已用于一系列设置,包括合适的测试,参数推断,MCMC输出评估和生成建模。当前KSD方法论的两个主要问题是(i)超出有限维度欧几里得环境之外的适用性以及(ii)缺乏影响KSD性能的清晰度。本文提供了KSD的新频谱表示,这两种补救措施都使KSD适用于希尔伯特(Hilbert)评估数据,并揭示了内核和Stein oterator Choice对KSD的影响。我们通过在许多合成数据实验中对各种高斯和非高斯功能模型进行拟合优度测试来证明所提出的方法的功效。
translated by 谷歌翻译
我们使用最大平均差异(MMD),Hilbert Schmidt独立标准(HSIC)和内核Stein差异(KSD),,提出了一系列针对两样本,独立性和合适性问题的计算效率,非参数测试,用于两样本,独立性和合适性问题。分别。我们的测试统计数据是不完整的$ u $统计信息,其计算成本与与经典$ u $ u $统计测试相关的样本数量和二次时间之间的线性时间之间的插值。这三个提出的测试在几个内核带宽上汇总,以检测各种尺度的零件:我们称之为结果测试mmdagginc,hsicagginc和ksdagginc。对于测试阈值,我们得出了一个针对野生引导不完整的$ U $ - 统计数据的分位数,该统计是独立的。我们得出了MMDagginc和Hsicagginc的均匀分离率,并准确量化了计算效率和可实现速率之间的权衡:据我们所知,该结果是基于不完整的$ U $统计学的测试新颖的。我们进一步表明,在二次时间案例中,野生引导程序不会对基于更广泛的基于置换的方法进行测试功率,因为​​两者都达到了相同的最小最佳速率(这反过来又与使用Oracle分位数的速率相匹配)。我们通过数值实验对计算效率和测试能力之间的权衡进行数字实验来支持我们的主张。在三个测试框架中,我们观察到我们提出的线性时间聚合测试获得的功率高于当前最新线性时间内核测试。
translated by 谷歌翻译
In nonparametric independence testing, we observe i.i.d.\ data $\{(X_i,Y_i)\}_{i=1}^n$, where $X \in \mathcal{X}, Y \in \mathcal{Y}$ lie in any general spaces, and we wish to test the null that $X$ is independent of $Y$. Modern test statistics such as the kernel Hilbert-Schmidt Independence Criterion (HSIC) and Distance Covariance (dCov) have intractable null distributions due to the degeneracy of the underlying U-statistics. Thus, in practice, one often resorts to using permutation testing, which provides a nonasymptotic guarantee at the expense of recalculating the quadratic-time statistics (say) a few hundred times. This paper provides a simple but nontrivial modification of HSIC and dCov (called xHSIC and xdCov, pronounced ``cross'' HSIC/dCov) so that they have a limiting Gaussian distribution under the null, and thus do not require permutations. This requires building on the newly developed theory of cross U-statistics by Kim and Ramdas (2020), and in particular developing several nontrivial extensions of the theory in Shekhar et al. (2022), which developed an analogous permutation-free kernel two-sample test. We show that our new tests, like the originals, are consistent against fixed alternatives, and minimax rate optimal against smooth local alternatives. Numerical simulations demonstrate that compared to the full dCov or HSIC, our variants have the same power up to a $\sqrt 2$ factor, giving practitioners a new option for large problems or data-analysis pipelines where computation, not sample size, could be the bottleneck.
translated by 谷歌翻译
数学模型是动态控制系统设计中的基本构件。随着控制系统变得越来越复杂和网络,基于第一原理的方法达到了限制。数据驱动的方法提供了替代方案。但是,在没有结构知识的情况下,这些方法很容易在训练数据中找到虚假的相关性,这可能会妨碍所获得的模型的概括能力。当系统暴露于未知情况时,这可以显着降低控制和预测性能。先前的因果鉴定可以防止这种陷阱。在本文中,我们提出了一种识别控制系统因果结构的方法。我们根据可控性概念设计实验,该概念提供了一种系统的方法来计算输入轨迹,该输入轨迹将系统引导到其状态空间中的特定区域。然后,我们分析从因果推理中利用强大技术的结果数据,并将其扩展到控制系统。此外,我们得出了保证发现系统真正因果结构的条件。在机器人臂上的实验表明,来自现实世界数据和增强的概括能力的可靠因果鉴定。
translated by 谷歌翻译
在概率空间或分销回归方面的学习功能的问题正在对机器学习社区产生重大兴趣。此问题背后的一个关键挑战是确定捕获基础功能映射的所有相关属性的合适表示形式。内核平均嵌入式提供了一种原则性的分布回归方法,该方法在概率水平上提高了内核诱导的输入域的相似性。该策略有效地解决了问题的两阶段抽样性质,使人们能够得出具有强大统计保证的估计器,例如普遍的一致性和过度的风险界限。但是,内核平均值嵌入在最大平均差异(MMD)上隐含地铰接,这是概率的度量,可能无法捕获分布之间的关键几何关系。相反,最佳运输(OT)指标可能更具吸引力。在这项工作中,我们提出了一个基于OT的分布回归估计器。我们建立在切成薄片的Wasserstein距离上,以获得基于OT的表示。我们基于这种表示,我们研究了内核脊回归估计量的理论特性,我们证明了普遍的一致性和过多的风险界限。初步实验通过显示提出方法的有效性并将其与基于MMD的估计器进行比较,以补充我们的理论发现。
translated by 谷歌翻译
我们提出了一项新的条件依赖度量和有条件独立性的统计检验。该度量基于在有限位置评估的两个合理分布的分析内嵌入之间的差异。我们在条件独立性的无效假设下获得其渐近分布,并从中设计一致的统计检验。我们进行了一系列实验,表明我们的新测试在I型和类型II误差方面都超过了最先进的方法,即使在高维设置中也是如此。
translated by 谷歌翻译