智能论文笔记

Pareto Smoothed Importance Sampling

Aki Vehtari , Daniel Simpson , Andrew Gelman , Yuling Yao , Jonah Gabry

分类： (统计)机器学习

2015-07-09

重要的加权是调整蒙特卡洛集成以说明错误分布中抽取的一种一般方法，但是当重要性比的右尾巴较重时，最终的估计值可能是高度可变的。当目标分布的某些方面无法通过近似分布捕获，在这种情况下，可以通过修改极端重要性比率来获得更稳定的估计。我们提出了一种新的方法，该方法使用拟合模拟重要性比率的上尾的广义帕累托分布来稳定重要性权重。该方法在经验上的性能要比现有方法稳定重要性采样估计值更好，包括稳定的有效样本量估计，蒙特卡洛误差估计和收敛诊断。提出的帕累托$ \ hat {k} $有限样本收敛率诊断对任何蒙特卡洛估计器都有用。

translated by 谷歌翻译

Stacking for Non-mixing Bayesian Computations: The Curse and Blessing of Multimodal Posteriors

Yuling Yao , Aki Vehtari , Andrew Gelman

分类： (统计)机器学习

2020-06-22

在使用多模式贝叶斯后部分布时，马尔可夫链蒙特卡罗（MCMC）算法难以在模式之间移动，并且默认变分或基于模式的近似推动将低估后不确定性。并且，即使找到最重要的模式，难以评估后部的相对重量。在这里，我们提出了一种使用MCMC，变分或基于模式的模式的并行运行的方法，以便尽可能多地击中多种模式或分离的区域，然后使用贝叶斯堆叠来组合这些用于构建分布的加权平均值的可扩展方法。通过堆叠从多模式后分布的堆叠，最小化交叉验证预测误差的结果，并且代表了比变分推断更好的不确定度，但它不一定是相当于渐近的，以完全贝叶斯推断。我们呈现理论一致性，其中堆叠推断逼近来自未衰退的模型和非混合采样器的真实数据生成过程，预测性能优于完全贝叶斯推断，因此可以被视为祝福而不是模型拼写下的诅咒。我们展示了几个模型家庭的实际实施：潜在的Dirichlet分配，高斯过程回归，分层回归，马蹄素变量选择和神经网络。

translated by 谷歌翻译

Robust leave-one-out cross-validation for high-dimensional Bayesian models

Luca Silva , Giacomo Zanella

分类： (统计)机器学习

2022-09-19

剩下的交叉验证（LOO-CV）是一种估计样本外预测准确性的流行方法。但是，由于需要多次拟合模型，因此计算LOO-CV标准在计算上可能很昂贵。在贝叶斯的情况下，重要性采样提供了一种可能的解决方案，但是经典方法可以轻松地产生差异是无限的估计器，从而使它们可能不可靠。在这里，我们提出和分析一种新型混合估计量来计算贝叶斯Loo-CV标准。我们的方法保留了经典方法的简单性和计算便利性，同时保证了所得估计器的有限差异。提供了理论和数值结果，以说明提高的鲁棒性和效率。在高维问题中，计算益处尤为重要，可以为更广泛的模型执行贝叶斯loo-CV。所提出的方法可以在标准概率编程软件中很容易实现，并且计算成本大致相当于拟合原始模型一次。

translated by 谷歌翻译

Improving the Accuracy of Marginal Approximations in Likelihood-Free Inference via Localisation

Christopher Drovandi , David J Nott , David T Frazier

分类： (统计)机器学习

2022-07-14

无似然方法是对可以模拟的隐式模型执行推断的必不可少的工具，但相应的可能性是棘手的。但是，常见的无可能方法不能很好地扩展到大量模型参数。一种有前途的无可能推理的有前途的方法涉及通过仅根据据信为低维成分提供信息的摘要统计数据来估计低维边缘后期，然后在某种程度上结合了低维近似值。在本文中，我们证明，对于看似直观的汇总统计选择，这种低维近似值在实践中可能是差的。我们描述了一个理想化的低维汇总统计量，原则上适用于边际估计。但是，在实践中很难直接近似理想的选择。因此，我们提出了一种替代的边际估计方法，该方法更容易实施和自动化。考虑到初始选择的低维摘要统计量可能仅对边缘后验位置有用，新方法通过使用所有摘要统计数据来确保全局可识别性来提高性能，从而提高性能使用低维摘要统计量进行精确的低维近似。我们表明，该方法的后部可以分别基于低维和完整的摘要统计数据将其表示为后验分布的对数库。在几个示例中说明了我们方法的良好性能。

translated by 谷歌翻译

Approximate Bayesian Computation via Classification

Yuexi Wang , Tetsuya Kaji , Veronika Ročková

分类： (统计)机器学习

2021-11-22

近似贝叶斯计算（ABC）使复杂模型中的统计推断能够计算，其可能性难以计算，但易于模拟。 ABC通过接受/拒绝机制构建到后部分布的内核类型近似，该机制比较真实和模拟数据的摘要统计信息。为了避免对汇总统计数据的需求，我们直接将经验分布与通过分类获得的Kullback-Leibler（KL）发散估计值进行比较。特别是，我们将灵活的机器学习分类器混合在ABC中以自动化虚假/真实数据比较。我们考虑传统的接受/拒绝内核以及不需要ABC接受阈值的指数加权方案。我们的理论结果表明，我们的ABC后部分布集中在真实参数周围的速率取决于分类器的估计误差。我们得出了限制后形状的结果，并找到了一个正确缩放的指数内核，渐近常态持有。我们展示了我们对模拟示例以及在股票波动率估计的背景下的真实数据的有用性。

translated by 谷歌翻译

Marginal likelihood computation for model selection and hypothesis testing: an extensive review

Fernando Llorente , Luca Martino , David Delgado , Javier Lopez-Santiago

分类：机器学习

2020-05-17

这是模型选择和假设检测的边缘似然计算的最新介绍和概述。计算概率模型（或常量比率）的常规规定常数是许多统计数据，应用数学，信号处理和机器学习中的许多应用中的基本问题。本文提供了对主题的全面研究。我们突出了不同技术之间的局限性，优势，连接和差异。还描述了使用不正确的前沿的问题和可能的解决方案。通过理论比较和数值实验比较一些最相关的方法。

translated by 谷歌翻译

Deep importance sampling using tensor-trains with application to a priori and a posteriori rare event estimation

Tiangang Cui , Sergey Dolgov , Robert Scheichl

分类： (统计)机器学习 | 机器学习

2022-09-05

我们提出了一种非常重要的抽样方法，该方法适用于估计高维问题中的罕见事件概率。我们将一般重要性抽样问题中的最佳重要性分布近似为在订单保留转换组成下的参考分布的推动力，在这种转换的组成下，每种转换都是由平方的张量训练 - 培训分解形成的。平方张量训练的分解提供了可扩展的ANSATZ，用于通过密度近似值来构建具有订单的高维转换。沿着一系列桥接密度移动的地图组成的使用减轻了直接近似浓缩密度函数的难度。为了计算对非规范概率分布的期望，我们设计了一个比率估计器，该比率估计器使用单独的重要性分布估算归一化常数，这再次通过张量训练格式的转换组成构建。与自称的重要性抽样相比，这提供了更好的理论差异，因此为贝叶斯推理问题中罕见事件概率的有效计算打开了大门。关于受微分方程约束的问题的数值实验显示，计算复杂性几乎没有增加，事件概率将零，并允许对迄今为止对复杂，高维后密度的罕见事件概率的迄今无法获得的估计。

translated by 谷歌翻译

Bayesian score calibration for approximate models

Joshua J Bon , David J Warne , David J Nott , Christopher Drovandi

分类： (统计)机器学习

2022-11-10

Scientists continue to develop increasingly complex mechanistic models to reflect their knowledge more realistically. Statistical inference using these models can be highly challenging, since the corresponding likelihood function is often intractable, and model simulation may be computationally burdensome or infeasible. Fortunately, in many of these situations, it is possible to adopt a surrogate model or approximate likelihood function. It may be convenient to base Bayesian inference directly on the surrogate, but this can result in bias and poor uncertainty quantification. In this paper we propose a new method for adjusting approximate posterior samples to reduce bias and produce more accurate uncertainty quantification. We do this by optimising a transform of the approximate posterior that minimises a scoring rule. Our approach requires only a (fixed) small number of complex model simulations and is numerically stable. We demonstrate good performance of the new method on several examples of increasing complexity.

translated by 谷歌翻译

Adjusted chi-square test for degree-corrected block models

Linfan Zhang , Arash A. Amini

分类： (统计)机器学习

2020-12-30

我们提出了对学度校正随机块模型（DCSBM）的合适性测试。该测试基于调整后的卡方统计量，用于测量$ n $多项式分布的组之间的平等性，该分布具有$ d_1，\ dots，d_n $观测值。在网络模型的背景下，多项式的数量（$ n $）的数量比观测值数量（$ d_i $）快得多，与节点$ i $的度相对应，因此设置偏离了经典的渐近学。我们表明，只要$ \ {d_i \} $的谐波平均值生长到无穷大，就可以使统计量在NULL下分配。顺序应用时，该测试也可以用于确定社区数量。该测试在邻接矩阵的压缩版本上进行操作，因此在学位上有条件，因此对大型稀疏网络具有高度可扩展性。我们结合了一个新颖的想法，即在测试$ K $社区时根据$（k+1）$ - 社区分配来压缩行。这种方法在不牺牲计算效率的情况下增加了顺序应用中的力量，我们证明了它在恢复社区数量方面的一致性。由于测试统计量不依赖于特定的替代方案，因此其效用超出了顺序测试，可用于同时测试DCSBM家族以外的各种替代方案。特别是，我们证明该测试与具有社区结构的潜在可变性网络模型的一般家庭一致。

translated by 谷歌翻译

Distributed Computation for Marginal Likelihood based Model Choice

Alexander Buchholz , Daniel Ahfock , Sylvia Richardson

分类： (统计)机器学习

2019-10-10

我们提出了一种使用边缘似然的分布式贝叶斯模型选择的一般方法，其中数据集被分开在非重叠子集中。这些子集仅由个别工人本地访问，工人之间没有共享数据。我们近似通过在每个子集的每个子集上从后部采样通过Monte Carlo采样的完整数据的模型证据。结果使用一种新的方法来组合，该方法校正使用所产生的样本的汇总统计分裂。我们的鸿沟和征服方法使贝叶斯模型在大型数据设置中选择，利用所有可用信息，而是限制工人之间的沟通。我们派生了理论误差界限，这些错误界限量化了计算增益与精度损失之间的结果。当我们的真实世界实验所示，令人尴尬的平行性质在大规模数据集时产生了重要的速度。此外，我们展示了如何在可逆跳转设置中扩展建议的方法以在可逆跳转设置中进行模型选择，该跳转设置在一个运行中探讨多个特征组合。

translated by 谷歌翻译

Dimension-agnostic inference using cross U-statistics

Ilmun Kim , Aaditya Ramdas

分类： (统计)机器学习

2020-11-10

Classical asymptotic theory for statistical inference usually involves calibrating a statistic by fixing the dimension $d$ while letting the sample size $n$ increase to infinity. Recently, much effort has been dedicated towards understanding how these methods behave in high-dimensional settings, where $d$ and $n$ both increase to infinity together. This often leads to different inference procedures, depending on the assumptions about the dimensionality, leaving the practitioner in a bind: given a dataset with 100 samples in 20 dimensions, should they calibrate by assuming $n \gg d$, or $d/n \approx 0.2$? This paper considers the goal of dimension-agnostic inference; developing methods whose validity does not depend on any assumption on $d$ versus $n$. We introduce an approach that uses variational representations of existing test statistics along with sample splitting and self-normalization to produce a new test statistic with a Gaussian limiting distribution, regardless of how $d$ scales with $n$. The resulting statistic can be viewed as a careful modification of degenerate U-statistics, dropping diagonal blocks and retaining off-diagonal blocks. We exemplify our technique for some classical problems including one-sample mean and covariance testing, and show that our tests have minimax rate-optimal power against appropriate local alternatives. In most settings, our cross U-statistic matches the high-dimensional power of the corresponding (degenerate) U-statistic up to a $\sqrt{2}$ factor.

translated by 谷歌翻译

A kernel two-sample test

分类：

We propose a framework for analyzing and comparing distributions, which we use to construct statistical tests to determine if two samples are drawn from different distributions. Our test statistic is the largest difference in expectations over functions in the unit ball of a reproducing kernel Hilbert space (RKHS), and is called the maximum mean discrepancy (MMD). We present two distributionfree tests based on large deviation bounds for the MMD, and a third test based on the asymptotic distribution of this statistic. The MMD can be computed in quadratic time, although efficient linear time approximations are available. Our statistic is an instance of an integral probability metric, and various classical metrics on distributions are obtained when alternative function classes are used in place of an RKHS. We apply our two-sample tests to a variety of problems, including attribute matching for databases using the Hungarian marriage method, where they perform strongly. Excellent performance is also obtained when comparing distributions over graphs, for which these are the first such tests.

translated by 谷歌翻译

Fast and robust Bayesian Inference using Gaussian Processes with GPry

Jonas El Gammal , Nils Schöneberg , Jesús Torrado , Christian Fidler

分类： (统计)机器学习

2022-11-03

We present the GPry algorithm for fast Bayesian inference of general (non-Gaussian) posteriors with a moderate number of parameters. GPry does not need any pre-training, special hardware such as GPUs, and is intended as a drop-in replacement for traditional Monte Carlo methods for Bayesian inference. Our algorithm is based on generating a Gaussian Process surrogate model of the log-posterior, aided by a Support Vector Machine classifier that excludes extreme or non-finite values. An active learning scheme allows us to reduce the number of required posterior evaluations by two orders of magnitude compared to traditional Monte Carlo inference. Our algorithm allows for parallel evaluations of the posterior at optimal locations, further reducing wall-clock times. We significantly improve performance using properties of the posterior in our active learning scheme and for the definition of the GP prior. In particular we account for the expected dynamical range of the posterior in different dimensionalities. We test our model against a number of synthetic and cosmological examples. GPry outperforms traditional Monte Carlo methods when the evaluation time of the likelihood (or the calculation of theoretical observables) is of the order of seconds; for evaluation times of over a minute it can perform inference in days that would take months using traditional methods. GPry is distributed as an open source Python package (pip install gpry) and can also be found at https://github.com/jonaselgammal/GPry.

translated by 谷歌翻译

Optimal Thinning of MCMC Output

Marina Riabiz , Wilson Chen , Jon Cockayne , Pawel Swietach , Steven A. Niederer , Lester Mackey , Chris. J. Oates

分类： (统计)机器学习

2020-05-08

利用启发式来评估收敛性和压缩马尔可夫链蒙特卡罗的输出可以在生产的经验逼近时是次优。通常，许多初始状态归因于“燃烧”并移除，而链条的其余部分是“变薄”，如果还需要压缩。在本文中，我们考虑回顾性地从样本路径中选择固定基数的状态的问题，使得由其经验分布提供的近似接近最佳。提出了一种基于核心稳定性差异的贪婪最小化的新方法，这适用于需要重压力的问题。理论结果保障方法的一致性及其有效性在常微分方程的参数推理的具体背景下证明了该效果。软件可在Python，R和Matlab中的Stein细化包中提供。

translated by 谷歌翻译

Statistical Hypothesis Testing Based on Machine Learning: Large Deviations Analysis

Paolo Braca , Leonardo M. Millefiori , Augusto Aubry , Stefano Marano , Antonio De Maio , Peter Willett

分类： (统计)机器学习 | 人工智能 | 机器学习

2022-07-22

我们研究了机器学习（ML）分类技术的误差概率收敛到零的速率的性能。利用大偏差理论，我们为ML分类器提供了数学条件，以表现出误差概率，这些误差概率呈指数级消失，例如$ \ sim \ exp \ left（-n \，i + o（i + o（n）\ right）$，其中$ n $是可用于测试的信息的数量（或其他相关参数，例如图像中目标的大小），而$ i $是错误率。这样的条件取决于数据驱动的决策功能的累积生成功能的Fenchel-Legendre变换（D3F，即，在做出最终二进制决策之前的阈值）在训练阶段中学到的。因此，D3F以及相关的错误率$ $ $取决于给定的训练集，该集合假定有限。有趣的是，可以根据基础统计模型的可用信息生成的可用数据集或合成数据集对这些条件进行验证和测试。换句话说，分类误差概率收敛到零，其速率可以在可用于培训的数据集的一部分上计算。与大偏差理论一致，我们还可以以足够大的$ n $为高斯分布的归一化D3F统计量来确定收敛性。利用此属性设置所需的渐近错误警报概率，从经验上来说，即使对于$ n $的非常现实的值，该属性也是准确的。此外，提供了近似错误概率曲线$ \ sim \ sim \ sim \ sim \ exp \ left（-n \，i \ right）$，这要归功于精制的渐近导数（通常称为精确的渐近学），其中$ \ zeta_n $代表$ \ zeta_n $误差概率的大多数代表性亚指数项。

translated by 谷歌翻译

Distilling Importance Sampling for Likelihood Free Inference

Dennis Prangle , Cecilia Viscardi

分类： (统计)机器学习

2019-10-08

无似然推理涉及在给定的数据和模拟器模型的情况下推断参数值。模拟器是计算机代码，它采用参数，执行随机计算并输出模拟数据。在这项工作中，我们将模拟器视为一个函数，其输入为（1）参数和（2）伪随机绘制的向量。我们试图推断出以观察结果为条件的所有这些输入。这是具有挑战性的，因为最终的后验可能是高维且涉及强大的依赖性。我们使用归一化流量（柔性参数密度族）近似后验。训练数据是通过具有较大带宽值Epsilon的非似然重要性采样来生成的，这使得目标与先验相似。培训数据通过使用它来训练更新的归一流流程来“蒸馏”。该过程是迭代的，使用更新的流程作为重要性采样建议，并慢慢降低epsilon，从而使目标变得更接近后部。与大多数其他无似然的方法不同，我们避免将数据减少到低维汇总统计数据，因此可以实现更准确的结果。我们在两个充满挑战的排队和流行病学示例中说明了我们的方法。

translated by 谷歌翻译

Gaussian Process Behaviour in Wide Deep Neural Networks

Alexander G. de G. Matthews , Mark Rowland , Jiri Hron , Richard E. Turner , Zoubin Ghahramani

分类：

2018-04-30

Whilst deep neural networks have shown great empirical success, there is still much work to be done to understand their theoretical properties. In this paper, we study the relationship between random, wide, fully connected, feedforward networks with more than one hidden layer and Gaussian processes with a recursive kernel definition. We show that, under broad conditions, as we make the architecture increasingly wide, the implied random function converges in distribution to a Gaussian process, formalising and extending existing results by Neal (1996) to deep networks. To evaluate convergence rates empirically, we use maximum mean discrepancy. We then compare finite Bayesian deep networks from the literature to Gaussian processes in terms of the key predictive quantities of interest, finding that in some cases the agreement can be very close. We discuss the desirability of Gaussian process behaviour and review non-Gaussian alternative models from the literature. 1

translated by 谷歌翻译

Wide Bayesian neural networks have a simple weight posterior: theory and accelerated sampling

Jiri Hron , Roman Novak , Jeffrey Pennington , Jascha Sohl-Dickstein

分类： (统计)机器学习 | 机器学习

2022-06-15

我们引入了重新定性，这是一种数据依赖性的重新聚集化，将贝叶斯神经网络（BNN）转化为后部的分布，其KL对BNN对BNN的差异随着层宽度的增长而消失。重新定义图直接作用于参数，其分析简单性补充了宽BNN在功能空间中宽BNN的已知神经网络过程（NNGP）行为。利用重新定性，我们开发了马尔可夫链蒙特卡洛（MCMC）后采样算法，该算法将BNN更快地混合在一起。这与MCMC在高维度上的表现差异很差。对于完全连接和残留网络，我们观察到有效样本量高达50倍。在各个宽度上都取得了改进，并在层宽度的重新培训和标准BNN之间的边缘。

translated by 谷歌翻译

Smooth Nested Simulation: Bridging Cubic and Square Root Convergence Rates in High Dimensions

Wenjia Wang , Yanyuan Wang , Xiaowei Zhang

分类： (统计)机器学习

2022-01-09

嵌套模拟涉及通过模拟估算条件期望的功能。在本文中，我们提出了一种基于内核RIDGE回归的新方法，利用作为多维调节变量的函数的条件期望的平滑度。渐近分析表明，随着仿真预算的增加，所提出的方法可以有效地减轻了对收敛速度的维度诅咒，只要条件期望足够平滑。平滑度桥接立方根收敛速度之间的间隙（即标准嵌套模拟的最佳速率）和平方根收敛速率（即标准蒙特卡罗模拟的规范率）。我们通过来自投资组合风险管理和输入不确定性量化的数值例子来证明所提出的方法的性能。

translated by 谷歌翻译

Distribution-free Prediction Sets Adaptive to Unknown Covariate Shift

Hongxiang Qiu , Edgar Dobriban , Eric Tchetgen Tchetgen

分类： (统计)机器学习

2022-03-11

预测一组结果 - 而不是独特的结果 - 是统计学习中不确定性定量的有前途的解决方案。尽管有关于构建具有统计保证的预测集的丰富文献，但适应未知的协变量转变（实践中普遍存在的问题）还是一个严重的未解决的挑战。在本文中，我们表明具有有限样本覆盖范围保证的预测集是非信息性的，并提出了一种新型的无灵活分配方法PredSet-1Step，以有效地构建了在未知协方差转移下具有渐近覆盖范围保证的预测集。我们正式表明我们的方法是\ textIt {渐近上可能是近似正确}，对大型样本的置信度有很好的覆盖误差。我们说明，在南非队列研究中，它在许多实验和有关HIV风险预测的数据集中实现了名义覆盖范围。我们的理论取决于基于一般渐近线性估计器的WALD置信区间覆盖范围的融合率的新结合。

translated by 谷歌翻译