本文表明,基于对称决策树的梯度提升可以等效地重新重新重新构成核法,该方法会收敛到某个内核无脊回归问题的解决方案。因此,对于低级内核,我们获得了与高斯过程的收敛后均值,这又使我们能够轻松地将梯度从后部转化为采样器,从而通过蒙特卡洛估计来提供更好的知识不确定性估计。后方差。我们表明,提出的采样器允许更好的知识不确定性估计值,从而改善了域外检测。
translated by 谷歌翻译
近年来目睹了采用灵活的机械学习模型进行乐器变量(IV)回归的兴趣,但仍然缺乏不确定性量化方法的发展。在这项工作中,我们为IV次数回归提出了一种新的Quasi-Bayesian程序,建立了最近开发的核化IV模型和IV回归的双/极小配方。我们通过在$ l_2 $和sobolev规范中建立最低限度的最佳收缩率,并讨论可信球的常见有效性来分析所提出的方法的频繁行为。我们进一步推出了一种可扩展的推理算法,可以扩展到与宽神经网络模型一起工作。实证评价表明,我们的方法对复杂的高维问题产生了丰富的不确定性估计。
translated by 谷歌翻译
Linear partial differential equations (PDEs) are an important, widely applied class of mechanistic models, describing physical processes such as heat transfer, electromagnetism, and wave propagation. In practice, specialized numerical methods based on discretization are used to solve PDEs. They generally use an estimate of the unknown model parameters and, if available, physical measurements for initialization. Such solvers are often embedded into larger scientific models or analyses with a downstream application such that error quantification plays a key role. However, by entirely ignoring parameter and measurement uncertainty, classical PDE solvers may fail to produce consistent estimates of their inherent approximation error. In this work, we approach this problem in a principled fashion by interpreting solving linear PDEs as physics-informed Gaussian process (GP) regression. Our framework is based on a key generalization of a widely-applied theorem for conditioning GPs on a finite number of direct observations to observations made via an arbitrary bounded linear operator. Crucially, this probabilistic viewpoint allows to (1) quantify the inherent discretization error; (2) propagate uncertainty about the model parameters to the solution; and (3) condition on noisy measurements. Demonstrating the strength of this formulation, we prove that it strictly generalizes methods of weighted residuals, a central class of PDE solvers including collocation, finite volume, pseudospectral, and (generalized) Galerkin methods such as finite element and spectral methods. This class can thus be directly equipped with a structured error estimate and the capability to incorporate uncertain model parameters and observations. In summary, our results enable the seamless integration of mechanistic models as modular building blocks into probabilistic models.
translated by 谷歌翻译
Interacting particle or agent systems that display a rich variety of swarming behaviours are ubiquitous in science and engineering. A fundamental and challenging goal is to understand the link between individual interaction rules and swarming. In this paper, we study the data-driven discovery of a second-order particle swarming model that describes the evolution of $N$ particles in $\mathbb{R}^d$ under radial interactions. We propose a learning approach that models the latent radial interaction function as Gaussian processes, which can simultaneously fulfill two inference goals: one is the nonparametric inference of {the} interaction function with pointwise uncertainty quantification, and the other one is the inference of unknown scalar parameters in the non-collective friction forces of the system. We formulate the learning problem as a statistical inverse problem and provide a detailed analysis of recoverability conditions, establishing that a coercivity condition is sufficient for recoverability. Given data collected from $M$ i.i.d trajectories with independent Gaussian observational noise, we provide a finite-sample analysis, showing that our posterior mean estimator converges in a Reproducing kernel Hilbert space norm, at an optimal rate in $M$ equal to the one in the classical 1-dimensional Kernel Ridge regression. As a byproduct, we show we can obtain a parametric learning rate in $M$ for the posterior marginal variance using $L^{\infty}$ norm, and the rate could also involve $N$ and $L$ (the number of observation time instances for each trajectory), depending on the condition number of the inverse problem. Numerical results on systems that exhibit different swarming behaviors demonstrate efficient learning of our approach from scarce noisy trajectory data.
translated by 谷歌翻译
比较概率分布是许多机器学习算法的关键。最大平均差异(MMD)和最佳运输距离(OT)是在过去几年吸引丰富的关注的概率措施之间的两类距离。本文建立了一些条件,可以通过MMD规范控制Wassersein距离。我们的作品受到压缩统计学习(CSL)理论的推动,资源有效的大规模学习的一般框架,其中训练数据总结在单个向量(称为草图)中,该训练数据捕获与所考虑的学习任务相关的信息。在CSL中的现有结果启发,我们介绍了H \“较旧的较低限制的等距属性(H \”较旧的LRIP)并表明这家属性具有有趣的保证对压缩统计学习。基于MMD与Wassersein距离之间的关系,我们通过引入和研究学习任务的Wassersein可读性的概念来提供压缩统计学习的保证,即概率分布之间的某些特定于特定的特定度量,可以由Wassersein界定距离。
translated by 谷歌翻译
贝叶斯优化(BO)算法在涉及昂贵的黑盒功能的应用中表现出了显着的成功。传统上,BO被设置为一个顺序决策过程,该过程通过采集函数和先前的功能(例如高斯过程)来估计查询点的实用性。然而,最近,通过密度比率估计(BORE)对BO进行重新制定允许将采集函数重新诠释为概率二进制分类器,从而消除了对函数的显式先验和提高可伸缩性的需求。在本文中,我们介绍了对孔的遗憾和算法扩展的理论分析,并提高了不确定性估计。我们还表明,通过将问题重新提交为近似贝叶斯推断,可以自然地扩展到批处理优化设置。所得算法配备了理论性能保证,并在一系列实验中对其他批处理基本线进行了评估。
translated by 谷歌翻译
在预测功能(假设)中获得可靠的自适应置信度集是顺序决策任务的核心挑战,例如土匪和基于模型的强化学习。这些置信度集合通常依赖于对假设空间的先前假设,例如,繁殖核Hilbert Space(RKHS)的已知核。手动设计此类内核是容易发生的,错误指定可能导致性能差或不安全。在这项工作中,我们建议从离线数据(meta-kel)中进行元学习核。对于未知核是已知碱基核的组合的情况,我们基于结构化的稀疏性开发估计量。在温和的条件下,我们保证我们的估计RKHS会产生有效的置信度集,随着越来越多的离线数据的量,它变得与鉴于真正未知内核的置信度一样紧。我们展示了我们关于内核化强盗问题(又称贝叶斯优化)的方法,我们在其中建立了遗憾的界限,与鉴于真正的内核的人竞争。我们还经验评估方法对贝叶斯优化任务的有效性。
translated by 谷歌翻译
在本文中,我们考虑了基于系数的正则分布回归,该回归旨在从概率措施中回归到复制的内核希尔伯特空间(RKHS)的实现响应(RKHS),该响应将正则化放在系数上,而内核被假定为无限期的。 。该算法涉及两个采样阶段,第一阶段样本由分布组成,第二阶段样品是从这些分布中获得的。全面研究了回归函数的不同规律性范围内算法的渐近行为,并通过整体操作员技术得出学习率。我们在某些温和条件下获得最佳速率,这与单级采样的最小最佳速率相匹配。与文献中分布回归的内核方法相比,所考虑的算法不需要内核是对称的和阳性的半明确仪,因此为设计不确定的内核方法提供了一个简单的范式,从而丰富了分布回归的主题。据我们所知,这是使用不确定核进行分配回归的第一个结果,我们的算法可以改善饱和效果。
translated by 谷歌翻译
我们研究了非参数脊的最小二乘的学习属性。特别是,我们考虑常见的估计人的估计案例,由比例依赖性内核定义,并专注于规模的作用。这些估计器内插数据,可以显示规模来通过条件号控制其稳定性。我们的分析表明,这是不同的制度,具体取决于样本大小,其尺寸与问题的平滑度之间的相互作用。实际上,当样本大小小于数据维度中的指数时,可以选择比例,以便学习错误减少。随着样本尺寸变大,总体错误停止减小但有趣地可以选择规模,使得噪声引起的差异仍然存在界线。我们的分析结合了概率,具有来自插值理论的许多分析技术。
translated by 谷歌翻译
本文介绍了一种新的基于仿真的推理程序,以对访问I.I.D. \ samples的多维概率分布进行建模和样本,从而规避明确建模密度函数或设计Markov Chain Monte Carlo的通常方法。我们提出了一个称为可逆的Gromov-monge(RGM)距离的新概念的距离和同构的动机,并研究了RGM如何用于设计新的转换样本,以执行基于模拟的推断。我们的RGM采样器还可以估计两个异质度量度量空间之间的最佳对齐$(\ cx,\ mu,c _ {\ cx})$和$(\ cy,\ cy,\ nu,c _ {\ cy})$从经验数据集中,估计的地图大约将一个量度$ \ mu $推向另一个$ \ nu $,反之亦然。我们研究了RGM距离的分析特性,并在轻度条件下得出RGM等于经典的Gromov-Wasserstein距离。奇怪的是,与Brenier的两极分解结合了连接,我们表明RGM采样器以$ C _ {\ cx} $和$ C _ {\ cy} $的正确选择诱导了强度同构的偏见。研究了有关诱导采样器的收敛,表示和优化问题的统计率。还展示了展示RGM采样器有效性的合成和现实示例。
translated by 谷歌翻译
We study a natural extension of classical empirical risk minimization, where the hypothesis space is a random subspace of a given space. In particular, we consider possibly data dependent subspaces spanned by a random subset of the data, recovering as a special case Nystrom approaches for kernel methods. Considering random subspaces naturally leads to computational savings, but the question is whether the corresponding learning accuracy is degraded. These statistical-computational tradeoffs have been recently explored for the least squares loss and self-concordant loss functions, such as the logistic loss. Here, we work to extend these results to convex Lipschitz loss functions, that might not be smooth, such as the hinge loss used in support vector machines. This unified analysis requires developing new proofs, that use different technical tools, such as sub-gaussian inputs, to achieve fast rates. Our main results show the existence of different settings, depending on how hard the learning problem is, for which computational efficiency can be improved with no loss in performance.
translated by 谷歌翻译
截断的线性回归是统计学中的一个经典挑战,其中$ y = w^t x + \ varepsilon $及其相应的功能向量,$ x \ in \ mathbb {r}^k $,仅在当时才观察到标签属于某些子集$ s \ subseteq \ mathbb {r} $;否则,对$(x,y)$的存在被隐藏在观察中。以截断的观察结果的线性回归一直是其一般形式的挑战,因为〜\ citet {tobin1958估计,amemiya1973 reflecression}的早期作品。当误差的分布与已知方差正常时,〜\ citet {daskalakis2019 truncatedRegerse}的最新工作在线性模型$ w $上提供了计算和统计上有效的估计器。在本文中,当噪声方差未知时,我们为截断的线性回归提供了第一个计算和统计上有效的估计器,同时估计了噪声的线性模型和方差。我们的估计器基于对截短样品的负模样中的预测随机梯度下降的有效实施。重要的是,我们表明我们的估计错误是渐近正常的,我们使用它来为我们的估计提供明确的置信区域。
translated by 谷歌翻译
我们解决了条件平均嵌入(CME)的内核脊回归估算的一致性,这是给定$ y $ x $的条件分布的嵌入到目标重现内核hilbert space $ hilbert space $ hilbert Space $ \ Mathcal {H} _y $ $ $ $ 。 CME允许我们对目标RKHS功能的有条件期望,并已在非参数因果和贝叶斯推论中使用。我们解决了错误指定的设置,其中目标CME位于Hilbert-Schmidt操作员的空间中,该操作员从$ \ Mathcal {H} _X _x $和$ L_2 $和$ \ MATHCAL {H} _Y $ $之间的输入插值空间起作用。该操作员的空间被证明是新定义的矢量值插值空间的同构。使用这种同构,我们在未指定的设置下为经验CME估计量提供了一种新颖的自适应统计学习率。我们的分析表明,我们的费率与最佳$ o(\ log n / n)$速率匹配,而无需假设$ \ Mathcal {h} _y $是有限维度。我们进一步建立了学习率的下限,这表明所获得的上限是最佳的。
translated by 谷歌翻译
树合奏方法如随机森林[Breiman,2001]非常受欢迎,以处理高维表格数据集,特别是因为它们的预测精度良好。然而,当机器学习用于决策问题时,由于开明的决策需要对算法预测过程的深入理解来实现最佳预测程序的解决可能是不合理的。不幸的是,由于他们的预测结果从平均数百个决策树的预测结果,随机森林并不是本质上可解释的。在这种所谓的黑盒算法上获得知识的经典方法是计算可变重要性,这些重点是评估每个输入变量的预测影响。然后使用可变重要性对等变量进行排名或选择变量,从而在数据分析中发挥着重要作用。然而,没有理由使用随机森林变量以这种方式:我们甚至不知道这些数量估计。在本文中,我们分析了两个众所周知的随机森林可变重大之一,平均减少杂质(MDI)。我们证明,如果输入变量是独立的并且在没有相互作用的情况下,MDI提供了输出的方差分解,其中清楚地识别了每个变量的贡献。我们还研究表现出输入变量或交互之间的依赖性的模型,其中变量重要性本质上是不明的。我们的分析表明,与一棵树相比,可能存在使用森林的一些好处。
translated by 谷歌翻译
高斯流程已成为各种安全至关重要环境的有前途的工具,因为后方差可用于直接估计模型误差并量化风险。但是,针对安全 - 关键环境的最新技术取决于核超参数是已知的,这通常不适用。为了减轻这种情况,我们在具有未知的超参数的设置中引入了强大的高斯过程统一误差界。我们的方法计算超参数空间中的一个置信区域,这使我们能够获得具有任意超参数的高斯过程模型误差的概率上限。我们不需要对超参数的任何界限,这是相关工作中常见的假设。相反,我们能够以直观的方式从数据中得出界限。我们还采用了建议的技术来为一类基于学习的控制问题提供绩效保证。实验表明,界限的性能明显优于香草和完全贝叶斯高斯工艺。
translated by 谷歌翻译
协方差估计在功能数据分析中普遍存在。然而,对多维域的功能观测的情况引入了计算和统计挑战,使标准方法有效地不适用。为了解决这个问题,我们将“协方差网络”(CoVNet)介绍为建模和估算工具。 Covnet模型是“Universal” - 它可用于近似于达到所需精度的任何协方差。此外,该模型可以有效地拟合到数据,其神经网络架构允许我们在实现中采用现代计算工具。 Covnet模型还承认了一个封闭形式的实体分解,可以有效地计算,而不构建协方差本身。这有助于在CoVnet的背景下轻松存储和随后操纵协方差。我们建立了拟议估计者的一致性,得出了汇合速度。通过广泛的仿真研究和休息状态FMRI数据的应用,证明了所提出的方法的有用性。
translated by 谷歌翻译
Many applications require optimizing an unknown, noisy function that is expensive to evaluate. We formalize this task as a multiarmed bandit problem, where the payoff function is either sampled from a Gaussian process (GP) or has low RKHS norm. We resolve the important open problem of deriving regret bounds for this setting, which imply novel convergence rates for GP optimization. We analyze GP-UCB, an intuitive upper-confidence based algorithm, and bound its cumulative regret in terms of maximal information gain, establishing a novel connection between GP optimization and experimental design. Moreover, by bounding the latter in terms of operator spectra, we obtain explicit sublinear regret bounds for many commonly used covariance functions. In some important cases, our bounds have surprisingly weak dependence on the dimensionality. In our experiments on real sensor data, GP-UCB compares favorably with other heuristical GP optimization approaches.
translated by 谷歌翻译
我们在假设目标函数的先前和EIGENExpansion系数的假定下,我们将高斯进程回归(GPR)的幂律渐近学习曲线的幂律渐近学呈现出高斯过程回归(GPR)。在类似的假设下,我们利用GPR和内核RIDGE回归(KRR)之间的等价性来显示KRR的泛化误差。无限宽的神经网络可以与GPR相对于神经网络GP内核和神经切线内核有关,其中已知在几个情况下具有幂律谱。因此,我们的方法可以应用于研究无限宽神经网络的泛化误差。我们提出了展示理论的玩具实验。
translated by 谷歌翻译
We introduce and study a novel model-selection strategy for Bayesian learning, based on optimal transport, along with its associated predictive posterior law: the Wasserstein population barycenter of the posterior law over models. We first show how this estimator, termed Bayesian Wasserstein barycenter (BWB), arises naturally in a general, parameter-free Bayesian model-selection framework, when the considered Bayesian risk is the Wasserstein distance. Examples are given, illustrating how the BWB extends some classic parametric and non-parametric selection strategies. Furthermore, we also provide explicit conditions granting the existence and statistical consistency of the BWB, and discuss some of its general and specific properties, providing insights into its advantages compared to usual choices, such as the model average estimator. Finally, we illustrate how this estimator can be computed using the stochastic gradient descent (SGD) algorithm in Wasserstein space introduced in a companion paper arXiv:2201.04232v2 [math.OC], and provide a numerical example for experimental validation of the proposed method.
translated by 谷歌翻译
We reformulate unsupervised dimension reduction problem (UDR) in the language of tempered distributions, i.e. as a problem of approximating an empirical probability density function by another tempered distribution, supported in a $k$-dimensional subspace. We show that this task is connected with another classical problem of data science -- the sufficient dimension reduction problem (SDR). In fact, an algorithm for the first problem induces an algorithm for the second and vice versa. In order to reduce an optimization problem over distributions to an optimization problem over ordinary functions we introduce a nonnegative penalty function that ``forces'' the support of the model distribution to be $k$-dimensional. Then we present an algorithm for the minimization of the penalized objective, based on the infinite-dimensional low-rank optimization, which we call the alternating scheme. Also, we design an efficient approximate algorithm for a special case of the problem, where the distance between the empirical distribution and the model distribution is measured by Maximum Mean Discrepancy defined by a Mercer kernel of a certain type. We test our methods on four examples (three UDR and one SDR) using synthetic data and standard datasets.
translated by 谷歌翻译