智能论文笔记

Relaxed Gaussian process interpolation: a goal-oriented approach to Bayesian optimization

Sébastien Petit , Julien Bect , Emmanuel Vazquez

分类： (统计)机器学习

2022-06-07

这项工作提出了一个新的程序，可以在高斯过程（GP）建模的背景下获得预测分布，并放松了一些感兴趣的范围之外的插值约束：预测分布的平均值不一定会在观察到的值时插入观察值的值。感兴趣的外部范围，但仅限于留在外面。这种称为放松的高斯工艺（REGP）插值的方法在感兴趣的范围内提供了更好的预测分布，尤其是在GP模型的平稳性假设不合适的情况下。它可以被视为一种面向目标的方法，并且在贝叶斯优化中变得特别有趣，例如，对于目标函数的最小化，低功能值的良好预测分布很重要。当将预期改进标准和REGP用于依次选择评估点时，从理论上保证了所得优化算法的收敛性（前提）。实验表明，在贝叶斯优化中使用REGP代替固定的GP模型是有益的。

translated by 谷歌翻译

Quasi-Bayesian Dual Instrumental Variable Regression

Ziyu Wang , Yuhao Zhou , Tongzheng Ren , Jun Zhu

分类： (统计)机器学习 | 机器学习

2021-06-16

近年来目睹了采用灵活的机械学习模型进行乐器变量（IV）回归的兴趣，但仍然缺乏不确定性量化方法的发展。在这项工作中，我们为IV次数回归提出了一种新的Quasi-Bayesian程序，建立了最近开发的核化IV模型和IV回归的双/极小配方。我们通过在$ l_2 $和sobolev规范中建立最低限度的最佳收缩率，并讨论可信球的常见有效性来分析所提出的方法的频繁行为。我们进一步推出了一种可扩展的推理算法，可以扩展到与宽神经网络模型一起工作。实证评价表明，我们的方法对复杂的高维问题产生了丰富的不确定性估计。

translated by 谷歌翻译

Gaussian Process Optimization in the Bandit Setting: No Regret and Experimental Design

Niranjan Srinivas , Andreas Krause , Sham M. Kakade , Matthias Seeger

分类：

2009-12-21

Many applications require optimizing an unknown, noisy function that is expensive to evaluate. We formalize this task as a multiarmed bandit problem, where the payoff function is either sampled from a Gaussian process (GP) or has low RKHS norm. We resolve the important open problem of deriving regret bounds for this setting, which imply novel convergence rates for GP optimization. We analyze GP-UCB, an intuitive upper-confidence based algorithm, and bound its cumulative regret in terms of maximal information gain, establishing a novel connection between GP optimization and experimental design. Moreover, by bounding the latter in terms of operator spectra, we obtain explicit sublinear regret bounds for many commonly used covariance functions. In some important cases, our bounds have surprisingly weak dependence on the dimensionality. In our experiments on real sensor data, GP-UCB compares favorably with other heuristical GP optimization approaches.

translated by 谷歌翻译

Handling Hard Affine SDP Shape Constraints in RKHSs

Pierre-Cyril Aubin-Frankowski , Zoltan Szabo

分类： (统计)机器学习 | 机器学习

2021-01-05

形状约束，例如非负，单调性，凸度或超模型性，在机器学习和统计的各种应用中都起着关键作用。但是，将此方面的信息以艰苦的方式（例如，在间隔的所有点）纳入预测模型，这是一个众所周知的具有挑战性的问题。我们提出了一个统一和模块化的凸优化框架，依赖于二阶锥（SOC）拧紧，以编码属于矢量值重现的载体内核Hilbert Spaces（VRKHSS）的模型对函数衍生物的硬仿射SDP约束。所提出的方法的模块化性质允许同时处理多个形状约束，并将无限数量的约束限制为有限的许多。我们证明了所提出的方案的收敛及其自适应变体的收敛性，利用VRKHSS的几何特性。由于基于覆盖的拧紧构造，该方法特别适合具有小到中等输入维度的任务。该方法的效率在形状优化，机器人技术和计量经济学的背景下进行了说明。

translated by 谷歌翻译

Gaussian Process Uniform Error Bounds with Unknown Hyperparameters for Safety-Critical Applications

Alexandre Capone , Armin Lederer , Sandra Hirche

分类：机器学习 | 机器人

2021-09-06

高斯流程已成为各种安全至关重要环境的有前途的工具，因为后方差可用于直接估计模型误差并量化风险。但是，针对安全 - 关键环境的最新技术取决于核超参数是已知的，这通常不适用。为了减轻这种情况，我们在具有未知的超参数的设置中引入了强大的高斯过程统一误差界。我们的方法计算超参数空间中的一个置信区域，这使我们能够获得具有任意超参数的高斯过程模型误差的概率上限。我们不需要对超参数的任何界限，这是相关工作中常见的假设。相反，我们能够以直观的方式从数据中得出界限。我们还采用了建议的技术来为一类基于学习的控制问题提供绩效保证。实验表明，界限的性能明显优于香草和完全贝叶斯高斯工艺。

translated by 谷歌翻译

State-space deep Gaussian processes with applications

Zheng Zhao

分类： (统计)机器学习

2021-11-24

本论文主要涉及解决深层（时间）高斯过程（DGP）回归问题的状态空间方法。更具体地，我们代表DGP作为分层组合的随机微分方程（SDES），并且我们通过使用状态空间过滤和平滑方法来解决DGP回归问题。由此产生的状态空间DGP（SS-DGP）模型生成丰富的电视等级，与建模许多不规则信号/功能兼容。此外，由于他们的马尔可道结构，通过使用贝叶斯滤波和平滑方法可以有效地解决SS-DGPS回归问题。本论文的第二次贡献是我们通过使用泰勒力矩膨胀（TME）方法来解决连续离散高斯滤波和平滑问题。这诱导了一类滤波器和SmooThers，其可以渐近地精确地预测随机微分方程（SDES）解决方案的平均值和协方差。此外，TME方法和TME过滤器和SmoOthers兼容模拟SS-DGP并解决其回归问题。最后，本文具有多种状态 - 空间（深）GPS的应用。这些应用主要包括（i）来自部分观察到的轨迹的SDES的未知漂移功能和信号的光谱 - 时间特征估计。

translated by 谷歌翻译

Robust Generalised Bayesian Inference for Intractable Likelihoods

Takuo Matsubara , Jeremias Knoblauch , François-Xavier Briol , Chris. J. Oates

分类： (统计)机器学习

2021-04-15

广义贝叶斯推理使用损失函数而不是可能性的先前信仰更新，因此可以用于赋予鲁棒性，以防止可能的错误规范的可能性。在这里，我们认为广泛化的贝叶斯推论斯坦坦差异作为损失函数的损失，由应用程序的可能性含有难治性归一化常数。在这种情况下，斯坦因差异来避免归一化恒定的评估，并产生封闭形式或使用标准马尔可夫链蒙特卡罗的通用后出版物。在理论层面上，我们显示了一致性，渐近的正常性和偏见 - 稳健性，突出了这些物业如何受到斯坦因差异的选择。然后，我们提供关于一系列棘手分布的数值实验，包括基于内核的指数家庭模型和非高斯图形模型的应用。

translated by 谷歌翻译

BoTorch: A Framework for Efficient Monte-Carlo Bayesian Optimization

Maximilian Balandat , Brian Karrer , Daniel R. Jiang , Samuel Daulton , Benjamin Letham , Andrew Gordon Wilson , Eytan Bakshy

分类：

2019-10-14

Bayesian optimization provides sample-efficient global optimization for a broad range of applications, including automatic machine learning, engineering, physics, and experimental design. We introduce BOTORCH, a modern programming framework for Bayesian optimization that combines Monte-Carlo (MC) acquisition functions, a novel sample average approximation optimization approach, autodifferentiation, and variance reduction techniques. BOTORCH's modular design facilitates flexible specification and optimization of probabilistic models written in PyTorch, simplifying implementation of new acquisition functions. Our approach is backed by novel theoretical convergence results and made practical by a distinctive algorithmic foundation that leverages fast predictive distributions, hardware acceleration, and deterministic optimization. We also propose a novel "one-shot" formulation of the Knowledge Gradient, enabled by a combination of our theoretical and software contributions. In experiments, we demonstrate the improved sample efficiency of BOTORCH relative to other popular libraries.34th Conference on Neural Information Processing Systems (NeurIPS 2020),

translated by 谷歌翻译

Robust Uncertainty Bounds in Reproducing Kernel Hilbert Spaces: A Convex Optimization Approach

Paul Scharnhorst , Emilio T. Maddalena , Yuning Jiang , Colin N. Jones

分类：机器学习

2021-04-19

考虑了建立UNKONWN地面真相函数值的样本外界限的问题。内核及其相关的希尔伯特空间是本文所采用的主要形式主义，以及一个观察模型，在该模型中，输出被有限的测量噪声损坏。噪声可以源于任何紧凑的分布，并且没有对可用数据进行独立假设。在这种情况下，我们显示计算紧密的，有限样本的不确定性范围等于求解参数四次约束线性程序。接下来，建立了我们方法的属性，并研究了其与另一种方法的关系。提出了数值实验，以说明如何在许多情况下应用理论，并将其与其他封闭形式的替代方案进行对比。

translated by 谷歌翻译

On the representation and learning of monotone triangular transport maps

Ricardo Baptista , Youssef Marzouk , Olivier Zahm

分类： (统计)机器学习 | 机器学习

2020-09-22

度量的运输提供了一种用于建模复杂概率分布的多功能方法，并具有密度估计，贝叶斯推理，生成建模及其他方法的应用。单调三角传输地图$ \ unicode {x2014} $近似值$ \ unicode {x2013} $ rosenblatt（kr）重新安排$ \ unicode {x2014} $是这些任务的规范选择。然而，此类地图的表示和参数化对它们的一般性和表现力以及对从数据学习地图学习（例如，通过最大似然估计）出现的优化问题的属性产生了重大影响。我们提出了一个通用框架，用于通过平滑函数的可逆变换来表示单调三角图。我们建立了有关转化的条件，以使相关的无限维度最小化问题没有伪造的局部最小值，即所有局部最小值都是全球最小值。我们展示了满足某些尾巴条件的目标分布，唯一的全局最小化器与KR地图相对应。鉴于来自目标的样品，我们提出了一种自适应算法，该算法估计了基础KR映射的稀疏半参数近似。我们证明了如何将该框架应用于关节和条件密度估计，无可能的推断以及有向图形模型的结构学习，并在一系列样本量之间具有稳定的概括性能。

translated by 谷歌翻译

Sparse Continuous Distributions and Fenchel-Young Losses

André F. T. Martins , Marcos Treviso , António Farinhas , Pedro M. Q. Aguiar , Mário A. T. Figueiredo , Mathieu Blondel , Vlad Niculae

分类：机器学习 | 人工智能 | (统计)机器学习

2021-08-04

指数族在机器学习中广泛使用，包括连续和离散域中的许多分布（例如，通过SoftMax变换，Gaussian，Dirichlet，Poisson和分类分布）。这些家庭中的每个家庭的分布都有固定的支持。相比之下，对于有限域而言，最近在SoftMax稀疏替代方案（例如Sparsemax，$ \ alpha $ -entmax和Fusedmax）的稀疏替代方案中导致了带有不同支持的分布。本文基于几种技术贡献，开发了连续分布的稀疏替代方案：首先，我们定义了$ \ omega $ regultion的预测图和任意域的Fenchel-young损失（可能是无限或连续的）。对于线性参数化的家族，我们表明，Fenchel-Young损失的最小化等效于统计的矩匹配，从而概括了指数家族的基本特性。当$ \ omega $是带有参数$ \ alpha $的Tsallis negentropy时，我们将获得````trabormed rompential指数）''，其中包括$ \ alpha $ -entmax和sparsemax和sparsemax（$ \ alpha = 2 $）。对于二次能量函数，产生的密度为$ \ beta $ -Gaussians，椭圆形分布的实例，其中包含特殊情况，即高斯，双重量级，三人级和epanechnikov密度，我们为差异而得出了差异的封闭式表达式， Tsallis熵和Fenchel-Young损失。当$ \ Omega $是总变化或Sobolev正常化程序时，我们将获得Fusedmax的连续版本。最后，我们引入了连续的注意机制，从\ {1、4/3、3/3、3/2、2 \} $中得出有效的梯度反向传播算法。使用这些算法，我们证明了我们的稀疏连续分布，用于基于注意力的音频分类和视觉问题回答，表明它们允许参加时间间隔和紧凑区域。

translated by 谷歌翻译

Smooth Nested Simulation: Bridging Cubic and Square Root Convergence Rates in High Dimensions

Wenjia Wang , Yanyuan Wang , Xiaowei Zhang

分类： (统计)机器学习

2022-01-09

嵌套模拟涉及通过模拟估算条件期望的功能。在本文中，我们提出了一种基于内核RIDGE回归的新方法，利用作为多维调节变量的函数的条件期望的平滑度。渐近分析表明，随着仿真预算的增加，所提出的方法可以有效地减轻了对收敛速度的维度诅咒，只要条件期望足够平滑。平滑度桥接立方根收敛速度之间的间隙（即标准嵌套模拟的最佳速率）和平方根收敛速率（即标准蒙特卡罗模拟的规范率）。我们通过来自投资组合风险管理和输入不确定性量化的数值例子来证明所提出的方法的性能。

translated by 谷歌翻译

Efficient MCMC Sampling with Dimension-Free Convergence Rate using ADMM-type Splitting

Maxime Vono , Daniel Paulin , Arnaud Doucet

分类： (统计)机器学习

2019-05-23

对复杂模型执行精确的贝叶斯推理是计算的难治性的。马尔可夫链蒙特卡罗（MCMC）算法可以提供后部分布的可靠近似，但对于大型数据集和高维模型昂贵。减轻这种复杂性的标准方法包括使用子采样技术或在群集中分发数据。然而，这些方法通常在高维方案中不可靠。我们在此处专注于最近的替代类别的MCMC方案，利用类似于乘客（ADMM）优化算法的庆祝交替方向使用的分裂策略。这些方法似乎提供了凭经验最先进的性能，但其高维层的理论行为目前未知。在本文中，我们提出了一个详细的理论研究，该算法之一称为分裂Gibbs采样器。在规律条件下，我们使用RICCI曲率和耦合思路为此方案建立了明确的收敛速率。我们以数字插图支持我们的理论。

translated by 谷歌翻译

Breaking the Curse of Dimensionality with Convex Neural Networks

Francis Bach

分类：

2014-12-30

We consider neural networks with a single hidden layer and non-decreasing positively homogeneous activation functions like the rectified linear units. By letting the number of hidden units grow unbounded and using classical non-Euclidean regularization tools on the output weights, they lead to a convex optimization problem and we provide a detailed theoretical analysis of their generalization performance, with a study of both the approximation and the estimation errors. We show in particular that they are adaptive to unknown underlying linear structures, such as the dependence on the projection of the input variables onto a low-dimensional subspace. Moreover, when using sparsity-inducing norms on the input weights, we show that high-dimensional non-linear variable selection may be achieved, without any strong assumption regarding the data and with a total number of variables potentially exponential in the number of observations. However, solving this convex optimization problem in infinite dimensions is only possible if the non-convex subproblem of addition of a new unit can be solved efficiently. We provide a simple geometric interpretation for our choice of activation functions and describe simple conditions for convex relaxations of the finite-dimensional non-convex subproblem to achieve the same generalization error bounds, even when constant-factor approximations cannot be found. We were not able to find strong enough convex relaxations to obtain provably polynomial-time algorithms and leave open the existence or non-existence of such tractable algorithms with non-exponential sample complexities.

translated by 谷歌翻译

Bayesian multi-objective optimization for stochastic simulators: an extension of the Pareto Active Learning method

Bruno Barracosa , Julien Bect , Héloïse Dutrieux Baraffe , Juliette Morin , Josselin Fournel , Emmanuel Vazquez

分类： (统计)机器学习

2022-07-08

本文重点介绍了具有高输出方差的随机模拟器的多目标优化，其中输入空间是有限的，并且目标函数的评估昂贵。我们依靠贝叶斯优化算法，这些算法使用概率模型来对要优化的功能进行预测。所提出的方法是用于估计帕累托最佳溶液的帕累托主动学习（PAL）算法的扩展，该算法使其适合随机环境。我们将其命名为随机模拟器（PAL）的Pareto主动学习。通过数值实验对一组双维，双目标测试问题进行数值实验评估了PAL的表现。与其他基于标量的和随机搜索的方法相比，PAL表现出卓越的性能。

translated by 谷歌翻译

Sequential- and Parallel- Constrained Max-value Entropy Search via Information Lower Bound

Shion Takeno , Tomoyuki Tamura , Kazuki Shitara , Masayuki Karasuyama

分类：机器学习

2021-02-19

最大值熵搜索（MES）是贝叶斯优化（BO）的最先进的方法之一。在本文中，我们提出了一种用于受约束问题的MES的新型变型，通过信息下限（CMES-IBO）称为受约束的ME，其基于互信息的下限的蒙特卡罗（MC）估计器（MI）。我们首先定义定义最大值的MI，以便它可以在可行性方面结合不确定性。然后，我们得出了保证非消极性的MI的下限，而传统ME的受约束对应物可以是负的。我们进一步提供了理论分析，确保我们估算者的低变异性，从未针对任何现有的信息理论博进行调查。此外，使用条件MI，我们将CMES-1BO扩展到并联设置，同时保持所需的性质。我们展示了CMES-IBO对多个基准功能和真实问题的有效性。

translated by 谷歌翻译

The Projected Covariance Measure for assumption-lean variable significance testing

Anton Rask Lundborg , Ilmun Kim , Rajen D. Shah , Richard J. Samworth

分类： (统计)机器学习

2022-11-03

Testing the significance of a variable or group of variables $X$ for predicting a response $Y$, given additional covariates $Z$, is a ubiquitous task in statistics. A simple but common approach is to specify a linear model, and then test whether the regression coefficient for $X$ is non-zero. However, when the model is misspecified, the test may have poor power, for example when $X$ is involved in complex interactions, or lead to many false rejections. In this work we study the problem of testing the model-free null of conditional mean independence, i.e. that the conditional mean of $Y$ given $X$ and $Z$ does not depend on $X$. We propose a simple and general framework that can leverage flexible nonparametric or machine learning methods, such as additive models or random forests, to yield both robust error control and high power. The procedure involves using these methods to perform regressions, first to estimate a form of projection of $Y$ on $X$ and $Z$ using one half of the data, and then to estimate the expected conditional covariance between this projection and $Y$ on the remaining half of the data. While the approach is general, we show that a version of our procedure using spline regression achieves what we show is the minimax optimal rate in this nonparametric testing problem. Numerical experiments demonstrate the effectiveness of our approach both in terms of maintaining Type I error control, and power, compared to several existing approaches.

translated by 谷歌翻译

Batch Bayesian optimisation via density-ratio estimation with guarantees

Rafael Oliveira , Louis Tiao , Fabio Ramos

分类：机器学习 | 人工智能 | (统计)机器学习

2022-09-22

贝叶斯优化（BO）算法在涉及昂贵的黑盒功能的应用中表现出了显着的成功。传统上，BO被设置为一个顺序决策过程，该过程通过采集函数和先前的功能（例如高斯过程）来估计查询点的实用性。然而，最近，通过密度比率估计（BORE）对BO进行重新制定允许将采集函数重新诠释为概率二进制分类器，从而消除了对函数的显式先验和提高可伸缩性的需求。在本文中，我们介绍了对孔的遗憾和算法扩展的理论分析，并提高了不确定性估计。我们还表明，通过将问题重新提交为近似贝叶斯推断，可以自然地扩展到批处理优化设置。所得算法配备了理论性能保证，并在一系列实验中对其他批处理基本线进行了评估。

translated by 谷歌翻译

Physics-Informed Gaussian Process Regression Generalizes Linear PDE Solvers

Marvin Pförtner , Ingo Steinwart , Philipp Hennig , Jonathan Wenger

分类：机器学习 | (统计)机器学习

2022-12-23

Linear partial differential equations (PDEs) are an important, widely applied class of mechanistic models, describing physical processes such as heat transfer, electromagnetism, and wave propagation. In practice, specialized numerical methods based on discretization are used to solve PDEs. They generally use an estimate of the unknown model parameters and, if available, physical measurements for initialization. Such solvers are often embedded into larger scientific models or analyses with a downstream application such that error quantification plays a key role. However, by entirely ignoring parameter and measurement uncertainty, classical PDE solvers may fail to produce consistent estimates of their inherent approximation error. In this work, we approach this problem in a principled fashion by interpreting solving linear PDEs as physics-informed Gaussian process (GP) regression. Our framework is based on a key generalization of a widely-applied theorem for conditioning GPs on a finite number of direct observations to observations made via an arbitrary bounded linear operator. Crucially, this probabilistic viewpoint allows to (1) quantify the inherent discretization error; (2) propagate uncertainty about the model parameters to the solution; and (3) condition on noisy measurements. Demonstrating the strength of this formulation, we prove that it strictly generalizes methods of weighted residuals, a central class of PDE solvers including collocation, finite volume, pseudospectral, and (generalized) Galerkin methods such as finite element and spectral methods. This class can thus be directly equipped with a structured error estimate and the capability to incorporate uncertain model parameters and observations. In summary, our results enable the seamless integration of mechanistic models as modular building blocks into probabilistic models.

translated by 谷歌翻译

Generalised Bayesian Inference for Discrete Intractable Likelihood

Takuo Matsubara , Jeremias Knoblauch , François-Xavier Briol , Chris. J. Oates

分类： (统计)机器学习

2022-06-16

离散状态空间代表了对统计推断的主要计算挑战，因为归一化常数的计算需要在大型或可能的无限集中进行求和，这可能是不切实际的。本文通过开发适合离散可怜的可能性的新型贝叶斯推理程序来解决这一计算挑战。受到连续数据的最新方法学进步的启发，主要思想是使用离散的Fisher Divergence更新有关模型参数的信念，以代替有问题的棘手的可能性。结果是可以使用标准计算工具（例如Markov Chain Monte Carlo）进行采样的广义后部，从而规避了棘手的归一化常数。分析了广义后验的统计特性，并具有足够的后验一致性和渐近正态性的条件。此外，提出了一种新颖的通用后代校准方法。应用程序在离散空间数据的晶格模型和计数数据的多元模型上介绍，在每种情况下，方法论都以低计算成本促进通用的贝叶斯推断。

translated by 谷歌翻译