智能论文笔记

Generalized Kernel Thinning

Raaz Dwivedi , Lester Mackey

分类： (统计)机器学习 | 机器学习

2021-10-04

Dwivedi和Mackey（2021）的核细化（kt）算法（2021）通过瞄准再现内核希尔伯特空间（RKHS）来更有效地压缩概率分布，并且通过瞄准再现内核Hilbert空间（RKHS）并利用较小的平方根根内核。在这里，我们提供了四种改进。首先，我们表明KT直接应用于目标RKHS，对任何内核，任何分布和RKHS中的任何固定功能都没有收益，无维保证。其次，我们表明，对于像高斯，反向多资本和SINC等分析核，目标KT承认最大平均差异（MMD）的保证与平方根KT相当的保证，而无需明确地使用平方根内核。第三，我们证明KT与分数电源内核产生了更好的Monte-Carlo MMD保证非平滑内核，如Laplace和Mat'ern，没有方形根源。第四，我们建立了kt应用于目标和电源内核的总和（我们呼叫kt +的程序）同时继承了Power Kt的改进的MMD保证和目标KT的更严格的各个功能保证。在我们的目标KT和KT +的实验中，我们目睹了甚至以100美元的尺寸，并且在压缩挑战微分方程后面时，我们目睹了整合误差的显着改进。

translated by 谷歌翻译

Kernel Thinning

Raaz Dwivedi , Lester Mackey

分类： (统计)机器学习 | 机器学习

2021-05-12

我们介绍内核变薄，更有效地压缩了一个新的程序，而不是i.i.d. \采样或标准变薄。给定合适的再现内核$ \ mathbf {k} $和$ \ mathcal {o}（n ^ 2）$ time，内核变薄将$ n $ thepoint近似压缩为$ \ mathbb {p} $ to to $ \ sqrt {n} $ - 点近似与相关的再现内核希尔伯特空间相比的可比最坏情况集成错误。具有高概率，集成错误中的最大差异是$ \ mathcal {o} _d（n ^ { - 1/2} \ sqrt {\ log n}）$，用于紧凑地支持$ \ mathbb {p} $和$ \ mathcal {o} _d（n ^ { - \ frac {1} {2}}（\ log n）^ {（d + 1）/ 2} \ sqrt {\ log \ log n}）$ for子指数$ \ $ \ mathbb {r} ^ d $上的mathbb {p} $。相反，来自$ \ mathbb {p} $ \ oomega（n ^ { - 1/4}）$ Integration错误的平等大小。我们的子指数保证类似于统一$ \ mathbb {p} $ on $ [0,1] ^ d $的典型准蒙特卡洛错误速率，但适用于$ \ mathbb {r} ^ d $和a的常规发行版广泛的常见内核。我们使用我们的结果推导出Gaussian，Mat \'ern和B样曲线内部的显式非渐近最大平均差异界限，并提出了两个渐晕，说明了内核变薄的实际益处，而\采样和标准马尔可夫链蒙特卡罗稀疏，尺寸$ d = 2美元到100美元。

translated by 谷歌翻译

Distribution Compression in Near-Linear Time

Abhishek Shetty , Raaz Dwivedi , Lester Mackey

分类： (统计)机器学习 | 机器学习

2021-11-15

在分发压缩中，一个目标是使用少量代表点准确地总结$ \ mathbb {p} $。近乎最佳的稀释程序通过从马尔可夫链中的$ n $积分来实现这一目标，并使用$ \ widetilde {\ mathcal {o}}识别$ \ sqrt {n} $ points（1 / sqrt {n}）$差异$ \ mathbb {p} $。不幸的是，这些算法患有样本大小$ N $的二次或超级二次运行时。为了解决这一缺陷，我们介绍了一种简单的元过程，用于加速任何细化算法，同时遭遇最多为4美元的次数为4美元。与DWivedi和Mackey的二次时间内核半核节点和内核变薄算法相结合（2021），Compress ++以$ \ mathcal {o}提供$ \ sqrt {n} $ points（\ sqrt {\ log n / n}）$ Integration error和monte-monte-carlo在$ \ mathcal {o}中的最大意义差异差异（n \ log ^ 3 n）$ time和$ \ mathcal {o}（\ sqrt {n} \ log ^ 2 n）$空间。此外，Compress ++享受相同的近线性运行时给出任何二次时间输入并通过平方根数减少超级二次算法的运行时间。在我们的基准测试中，具有高维蒙特卡罗样本和马尔可夫链瞄准具有挑战性的微分方程后海底，压缩++匹配或几乎匹配其输入算法的准确性在较少时间的时间顺序。

translated by 谷歌翻译

Controlling Moments with Kernel Stein Discrepancies

Heishiro Kanagawa , Arthur Gretton , Lester Mackey

分类： (统计)机器学习 | 机器学习

2022-11-10

Quantifying the deviation of a probability distribution is challenging when the target distribution is defined by a density with an intractable normalizing constant. The kernel Stein discrepancy (KSD) was proposed to address this problem and has been applied to various tasks including diagnosing approximate MCMC samplers and goodness-of-fit testing for unnormalized statistical models. This article investigates a convergence control property of the diffusion kernel Stein discrepancy (DKSD), an instance of the KSD proposed by Barp et al. (2019). We extend the result of Gorham and Mackey (2017), which showed that the KSD controls the bounded-Lipschitz metric, to functions of polynomial growth. Specifically, we prove that the DKSD controls the integral probability metric defined by a class of pseudo-Lipschitz functions, a polynomial generalization of Lipschitz functions. We also provide practical sufficient conditions on the reproducing kernel for the stated property to hold. In particular, we show that the DKSD detects non-convergence in moments with an appropriate kernel.

translated by 谷歌翻译

Targeted Separation and Convergence with Kernel Discrepancies

Alessandro Barp , Carl-Johann Simon-Gabriel , Mark Girolami , Lester Mackey

分类： (统计)机器学习 | 机器学习

2022-09-26

最大平均差异（MMD）（例如内核Stein差异（KSD））已成为广泛应用的中心，包括假设测试，采样器选择，分布近似和变异推断。在每种情况下，这些基于内核的差异度量都需要（i）（i）将目标p与其他概率度量分开，甚至（ii）控制弱收敛到P。在本文中，我们得出了新的足够和必要的条件，以确保（i）（ii）。对于可分开的度量空间上的MMD，我们表征了那些将BOCHNER嵌入量度分开的内核，并引入了简单条件，以将所有措施用无限的内核分开，并控制与有界内核的收敛。我们在$ \ mathbb {r}^d $上使用这些结果来实质性地扩大了KSD分离和收敛控制的已知条件，并开发了已知的第一个KSD，以恰好将弱收敛到P。我们的假设检验，测量和改善样本质量以及用Stein变异梯度下降进行抽样的结果。

translated by 谷歌翻译

Optimal and instance-dependent guarantees for Markovian linear stochastic approximation

Wenlong Mou , Ashwin Pananjady , Martin J. Wainwright , Peter L. Bartlett

分类：机器学习 | (统计)机器学习

2021-12-23

我们研究了随机近似程序，以便基于观察来自ergodic Markov链的长度$ n $的轨迹来求近求解$ d -dimension的线性固定点方程。我们首先表现出$ t _ {\ mathrm {mix}} \ tfrac {n}} \ tfrac {n}} \ tfrac {d}} \ tfrac {d} {n} $的非渐近性界限。$ t _ {\ mathrm {mix $是混合时间。然后，我们证明了一种在适当平均迭代序列上的非渐近实例依赖性，具有匹配局部渐近最小的限制的领先术语，包括对参数$的敏锐依赖（d，t _ {\ mathrm {mix}}） $以高阶术语。我们将这些上限与非渐近Minimax的下限补充，该下限是建立平均SA估计器的实例 - 最优性。我们通过Markov噪声的政策评估导出了这些结果的推导 - 覆盖了所有$ \ lambda \中的TD（$ \ lambda $）算法，以便[0,1）$ - 和线性自回归模型。我们的实例依赖性表征为HyperParameter调整的细粒度模型选择程序的设计开放了门（例如，在运行TD（$ \ Lambda $）算法时选择$ \ lambda $的值）。

translated by 谷歌翻译

Robust Generalised Bayesian Inference for Intractable Likelihoods

Takuo Matsubara , Jeremias Knoblauch , François-Xavier Briol , Chris. J. Oates

分类： (统计)机器学习

2021-04-15

广义贝叶斯推理使用损失函数而不是可能性的先前信仰更新，因此可以用于赋予鲁棒性，以防止可能的错误规范的可能性。在这里，我们认为广泛化的贝叶斯推论斯坦坦差异作为损失函数的损失，由应用程序的可能性含有难治性归一化常数。在这种情况下，斯坦因差异来避免归一化恒定的评估，并产生封闭形式或使用标准马尔可夫链蒙特卡罗的通用后出版物。在理论层面上，我们显示了一致性，渐近的正常性和偏见 - 稳健性，突出了这些物业如何受到斯坦因差异的选择。然后，我们提供关于一系列棘手分布的数值实验，包括基于内核的指数家庭模型和非高斯图形模型的应用。

translated by 谷歌翻译

Sequential Estimation of Convex Functionals and Divergences

Tudor Manole , Aaditya Ramdas

分类： (统计)机器学习

2021-03-16

我们提出了一种统一的技术，用于顺序估计分布之间的凸面分歧，包括内核最大差异等积分概率度量，$ \ varphi $ - 像Kullback-Leibler发散，以及最佳运输成本，例如Wassersein距离的权力。这是通过观察到经验凸起分歧（部分有序）反向半角分离的实现来实现的，而可交换过滤耦合，其具有这些方法的最大不等式。这些技术似乎是对置信度序列和凸分流的现有文献的互补和强大的补充。我们构建一个离线到顺序设备，将各种现有的离线浓度不等式转换为可以连续监测的时间均匀置信序列，在任意停止时间提供有效的测试或置信区间。得到的顺序边界仅在相应的固定时间范围内支付迭代对数价格，保留对问题参数的相同依赖性（如适用的尺寸或字母大小）。这些结果也适用于更一般的凸起功能，如负差分熵，实证过程的高度和V型统计。

translated by 谷歌翻译

Counterfactual inference for sequential experiments

Raaz Dwivedi , Katherine Tian , Sabina Tomkins , Predrag Klasnja , Susan Murphy , Devavrat Shah

分类： (统计)机器学习 | 机器学习

2022-02-14

We consider after-study statistical inference for sequentially designed experiments wherein multiple units are assigned treatments for multiple time points using treatment policies that adapt over time. Our goal is to provide inference guarantees for the counterfactual mean at the smallest possible scale -- mean outcome under different treatments for each unit and each time -- with minimal assumptions on the adaptive treatment policy. Without any structural assumptions on the counterfactual means, this challenging task is infeasible due to more unknowns than observed data points. To make progress, we introduce a latent factor model over the counterfactual means that serves as a non-parametric generalization of the non-linear mixed effects model and the bilinear latent factor model considered in prior works. For estimation, we use a non-parametric method, namely a variant of nearest neighbors, and establish a non-asymptotic high probability error bound for the counterfactual mean for each unit and each time. Under regularity conditions, this bound leads to asymptotically valid confidence intervals for the counterfactual mean as the number of units and time points grows to $\infty$.

translated by 谷歌翻译

Controlling Wasserstein distances by Kernel norms with application to Compressive Statistical Learning

Titouan Vayer , Rémi Gribonval

分类： (统计)机器学习 | 机器学习

2021-12-01

比较概率分布是许多机器学习算法的关键。最大平均差异（MMD）和最佳运输距离（OT）是在过去几年吸引丰富的关注的概率措施之间的两类距离。本文建立了一些条件，可以通过MMD规范控制Wassersein距离。我们的作品受到压缩统计学习（CSL）理论的推动，资源有效的大规模学习的一般框架，其中训练数据总结在单个向量（称为草图）中，该训练数据捕获与所考虑的学习任务相关的信息。在CSL中的现有结果启发，我们介绍了H \“较旧的较低限制的等距属性（H \”较旧的LRIP）并表明这家属性具有有趣的保证对压缩统计学习。基于MMD与Wassersein距离之间的关系，我们通过引入和研究学习任务的Wassersein可读性的概念来提供压缩统计学习的保证，即概率分布之间的某些特定于特定的特定度量，可以由Wassersein界定距离。

translated by 谷歌翻译

Optimal Thinning of MCMC Output

Marina Riabiz , Wilson Chen , Jon Cockayne , Pawel Swietach , Steven A. Niederer , Lester Mackey , Chris. J. Oates

分类： (统计)机器学习

2020-05-08

利用启发式来评估收敛性和压缩马尔可夫链蒙特卡罗的输出可以在生产的经验逼近时是次优。通常，许多初始状态归因于“燃烧”并移除，而链条的其余部分是“变薄”，如果还需要压缩。在本文中，我们考虑回顾性地从样本路径中选择固定基数的状态的问题，使得由其经验分布提供的近似接近最佳。提出了一种基于核心稳定性差异的贪婪最小化的新方法，这适用于需要重压力的问题。理论结果保障方法的一致性及其有效性在常微分方程的参数推理的具体背景下证明了该效果。软件可在Python，R和Matlab中的Stein细化包中提供。

translated by 谷歌翻译

PSD Representations for Effective Probability Models

Alessandro Rudi , Carlo Ciliberto

分类：机器学习 | (统计)机器学习

2021-06-30

找到模型概率密度的好方法是概率推断的关键。理想的模型应该能够简单地近似于概率，同时也与两个主要操作兼容：两个模型（产品规则）的乘法和相对于随机变量的子集（SUM规则）的边缘化。在这项工作中，我们表明最近提出的非负函数的正半明确（PSD）模型特别适用于此。特别是，我们表征了PSD模型的近似和泛化能力，显示它们享有强烈的理论保证。此外，我们表明我们可以通过矩阵操作以封闭形式的封闭形式有效地执行和产品规则，享受混合模型的相同多功能性。我们的结果为PSD模型应用于密度估计，决策理论和推理的方式开辟了途径。

translated by 谷歌翻译

Smooth $p$-Wasserstein Distance: Structure, Empirical Approximation, and Statistical Applications

Sloan Nietert , Ziv Goldfeld , Kengo Kato

分类： (统计)机器学习

2021-01-11

概率分布之间的差异措施，通常被称为统计距离，在概率理论，统计和机器学习中普遍存在。为了在估计这些距离的距离时，对维度的诅咒，最近的工作已经提出了通过带有高斯内核的卷积在测量的分布中平滑局部不规则性。通过该框架的可扩展性至高维度，我们研究了高斯平滑$ P $ -wassersein距离$ \ mathsf {w} _p ^ {（\ sigma）} $的结构和统计行为，用于任意$ p \ GEQ 1 $。在建立$ \ mathsf {w} _p ^ {（\ sigma）} $的基本度量和拓扑属性之后，我们探索$ \ mathsf {w} _p ^ {（\ sigma）}（\ hat {\ mu} _n，\ mu）$，其中$ \ hat {\ mu} _n $是$ n $独立观察的实证分布$ \ mu $。我们证明$ \ mathsf {w} _p ^ {（\ sigma）} $享受$ n ^ { - 1/2} $的参数经验融合速率，这对比$ n ^ { - 1 / d} $率对于未平滑的$ \ mathsf {w} _p $ why $ d \ geq 3 $。我们的证明依赖于控制$ \ mathsf {w} _p ^ {（\ sigma）} $ by $ p $ th-sting spoollow sobolev restion $ \ mathsf {d} _p ^ {（\ sigma）} $并导出限制$ \ sqrt {n} \，\ mathsf {d} _p ^ {（\ sigma）}（\ hat {\ mu} _n，\ mu）$，适用于所有尺寸$ d $。作为应用程序，我们提供了使用$ \ mathsf {w} _p ^ {（\ sigma）} $的两个样本测试和最小距离估计的渐近保证，使用$ p = 2 $的实验使用$ \ mathsf {d} _2 ^ {（\ sigma）} $。

translated by 谷歌翻译

Policy evaluation from a single path: Multi-step methods, mixing and mis-specification

Yaqi Duan , Martin J. Wainwright

分类： (统计)机器学习 | 机器学习

2022-11-07

We study non-parametric estimation of the value function of an infinite-horizon $\gamma$-discounted Markov reward process (MRP) using observations from a single trajectory. We provide non-asymptotic guarantees for a general family of kernel-based multi-step temporal difference (TD) estimates, including canonical $K$-step look-ahead TD for $K = 1, 2, \ldots$ and the TD$(\lambda)$ family for $\lambda \in [0,1)$ as special cases. Our bounds capture its dependence on Bellman fluctuations, mixing time of the Markov chain, any mis-specification in the model, as well as the choice of weight function defining the estimator itself, and reveal some delicate interactions between mixing time and model mis-specification. For a given TD method applied to a well-specified model, its statistical error under trajectory data is similar to that of i.i.d. sample transition pairs, whereas under mis-specification, temporal dependence in data inflates the statistical error. However, any such deterioration can be mitigated by increased look-ahead. We complement our upper bounds by proving minimax lower bounds that establish optimality of TD-based methods with appropriately chosen look-ahead and weighting, and reveal some fundamental differences between value function estimation and ordinary non-parametric regression.

translated by 谷歌翻译

Spectral Regularized Kernel Two-Sample Tests

Omar Hagrass , Bharath K. Sriperumbudur , Bing Li

分类：机器学习 | (统计)机器学习

2022-12-19

Over the last decade, an approach that has gained a lot of popularity to tackle non-parametric testing problems on general (i.e., non-Euclidean) domains is based on the notion of reproducing kernel Hilbert space (RKHS) embedding of probability distributions. The main goal of our work is to understand the optimality of two-sample tests constructed based on this approach. First, we show that the popular MMD (maximum mean discrepancy) two-sample test is not optimal in terms of the separation boundary measured in Hellinger distance. Second, we propose a modification to the MMD test based on spectral regularization by taking into account the covariance information (which is not captured by the MMD test) and prove the proposed test to be minimax optimal with a smaller separation boundary than that achieved by the MMD test. Third, we propose an adaptive version of the above test which involves a data-driven strategy to choose the regularization parameter and show the adaptive test to be almost minimax optimal up to a logarithmic factor. Moreover, our results hold for the permutation variant of the test where the test threshold is chosen elegantly through the permutation of the samples. Through numerical experiments on synthetic and real-world data, we demonstrate the superior performance of the proposed test in comparison to the MMD test.

translated by 谷歌翻译

Reversible Gromov-Monge Sampler for Simulation-Based Inference

YoonHaeng Hur , Wenxuan Guo , Tengyuan Liang

分类：机器学习 | (统计)机器学习

2021-09-28

本文介绍了一种新的基于仿真的推理程序，以对访问I.I.D. \ samples的多维概率分布进行建模和样本，从而规避明确建模密度函数或设计Markov Chain Monte Carlo的通常方法。我们提出了一个称为可逆的Gromov-monge（RGM）距离的新概念的距离和同构的动机，并研究了RGM如何用于设计新的转换样本，以执行基于模拟的推断。我们的RGM采样器还可以估计两个异质度量度量空间之间的最佳对齐$（\ cx，\ mu，c _ {\ cx}）$和$（\ cy，\ cy，\ nu，c _ {\ cy}）$从经验数据集中，估计的地图大约将一个量度$ \ mu $推向另一个$ \ nu $，反之亦然。我们研究了RGM距离的分析特性，并在轻度条件下得出RGM等于经典的Gromov-Wasserstein距离。奇怪的是，与Brenier的两极分解结合了连接，我们表明RGM采样器以$ C _ {\ cx} $和$ C _ {\ cy} $的正确选择诱导了强度同构的偏见。研究了有关诱导采样器的收敛，表示和优化问题的统计率。还展示了展示RGM采样器有效性的合成和现实示例。

translated by 谷歌翻译

Minimax Optimal Regression over Sobolev Spaces via Laplacian Eigenmaps on Neighborhood Graphs

Alden Green , Sivaraman Balakrishnan , Ryan J. Tibshirani

分类： (统计)机器学习

2021-11-14

本文研究了基于Laplacian Eigenmaps（Le）的基于Laplacian EIGENMAPS（PCR-LE）的主要成分回归的统计性质，这是基于Laplacian Eigenmaps（Le）的非参数回归的方法。 PCR-LE通过投影观察到的响应的向量$ {\ bf y} =（y_1，\ ldots，y_n）$ to to changbood图表拉普拉斯的某些特征向量跨越的子空间。我们表明PCR-Le通过SoboLev空格实现了随机设计回归的最小收敛速率。在设计密度$ P $的足够平滑条件下，PCR-le达到估计的最佳速率（其中已知平方$ l ^ 2 $ norm的最佳速率为$ n ^ { - 2s /（2s + d））} $）和健美的测试（$ n ^ { - 4s /（4s + d）$）。我们还表明PCR-LE是\ EMPH {歧管Adaptive}：即，我们考虑在小型内在维度$ M $的歧管上支持设计的情况，并为PCR-LE提供更快的界限Minimax估计（$ n ^ { - 2s /（2s + m）$）和测试（$ n ^ { - 4s /（4s + m）$）收敛率。有趣的是，这些利率几乎总是比图形拉普拉斯特征向量的已知收敛率更快;换句话说，对于这个问题的回归估计的特征似乎更容易，统计上讲，而不是估计特征本身。我们通过经验证据支持这些理论结果。

translated by 谷歌翻译

Off-policy estimation of linear functionals: Non-asymptotic theory for semi-parametric efficiency

Wenlong Mou , Martin J. Wainwright , Peter L. Bartlett

分类： (统计)机器学习

2022-09-26

在因果推理和强盗文献中，基于观察数据的线性功能估算线性功能的问题是规范的。我们分析了首先估计治疗效果函数的广泛的两阶段程序，然后使用该数量来估计线性功能。我们证明了此类过程的均方误差上的非反应性上限：这些边界表明，为了获得非反应性最佳程序，应在特定加权$ l^2 $中最大程度地估算治疗效果的误差。 -规范。我们根据该加权规范的约束回归分析了两阶段的程序，并通过匹配非轴突局部局部最小值下限，在有限样品中建立了实例依赖性最优性。这些结果表明，除了取决于渐近效率方差之外，最佳的非质子风险除了取决于样本量支持的最富有函数类别的真实结果函数与其近似类别之间的加权规范距离。

translated by 谷歌翻译

A kernel two-sample test

分类：

We propose a framework for analyzing and comparing distributions, which we use to construct statistical tests to determine if two samples are drawn from different distributions. Our test statistic is the largest difference in expectations over functions in the unit ball of a reproducing kernel Hilbert space (RKHS), and is called the maximum mean discrepancy (MMD). We present two distributionfree tests based on large deviation bounds for the MMD, and a third test based on the asymptotic distribution of this statistic. The MMD can be computed in quadratic time, although efficient linear time approximations are available. Our statistic is an instance of an integral probability metric, and various classical metrics on distributions are obtained when alternative function classes are used in place of an RKHS. We apply our two-sample tests to a variety of problems, including attribute matching for databases using the Hungarian marriage method, where they perform strongly. Excellent performance is also obtained when comparing distributions over graphs, for which these are the first such tests.

translated by 谷歌翻译

Is Monte Carlo a bad sampling strategy for learning smooth functions in high dimensions?

Ben Adcock , Simone Brugiapaglia

分类：机器学习

2022-08-18

本文涉及使用多项式的有限样品的平滑，高维函数的近似。这项任务是计算科学和工程中许多应用的核心 - 尤其是由参数建模和不确定性量化引起的。通常在此类应用中使用蒙特卡洛（MC）采样，以免屈服于维度的诅咒。但是，众所周知，这种策略在理论上是最佳的。尺寸$ n $有许多多项式空间，样品复杂度尺度划分为$ n $。这种有据可查的现象导致了一致的努力，以设计改进的，实际上是近乎最佳的策略，其样本复杂性是线性的，甚至线性地缩小了$ n $。自相矛盾的是，在这项工作中，我们表明MC实际上是高维度中的一个非常好的策略。我们首先通过几个数值示例记录了这种现象。接下来，我们提出一个理论分析，该分析能够解决这种悖论，以实现无限多变量的全体形态功能。我们表明，基于$ M $ MC样本的最小二乘方案，其错误衰减为$ m/\ log（m）$，其速率与最佳$ n $ term的速率相同多项式近似。该结果是非构造性的，因为它假定了进行近似的合适多项式空间的知识。接下来，我们提出了一个基于压缩感应的方案，该方案达到了相同的速率，除了较大的聚类因子。该方案是实用的，并且在数值上，它的性能和比知名的自适应最小二乘方案的性能和更好。总体而言，我们的发现表明，当尺寸足够高时，MC采样非常适合平滑功能近似。因此，改进的采样策略的好处通常仅限于较低维度的设置。

translated by 谷歌翻译