智能论文笔记

Smooth $p$-Wasserstein Distance: Structure, Empirical Approximation, and Statistical Applications

Sloan Nietert , Ziv Goldfeld , Kengo Kato

分类： (统计)机器学习

2021-01-11

概率分布之间的差异措施，通常被称为统计距离，在概率理论，统计和机器学习中普遍存在。为了在估计这些距离的距离时，对维度的诅咒，最近的工作已经提出了通过带有高斯内核的卷积在测量的分布中平滑局部不规则性。通过该框架的可扩展性至高维度，我们研究了高斯平滑$ P $ -wassersein距离$ \ mathsf {w} _p ^ {（\ sigma）} $的结构和统计行为，用于任意$ p \ GEQ 1 $。在建立$ \ mathsf {w} _p ^ {（\ sigma）} $的基本度量和拓扑属性之后，我们探索$ \ mathsf {w} _p ^ {（\ sigma）}（\ hat {\ mu} _n，\ mu）$，其中$ \ hat {\ mu} _n $是$ n $独立观察的实证分布$ \ mu $。我们证明$ \ mathsf {w} _p ^ {（\ sigma）} $享受$ n ^ { - 1/2} $的参数经验融合速率，这对比$ n ^ { - 1 / d} $率对于未平滑的$ \ mathsf {w} _p $ why $ d \ geq 3 $。我们的证明依赖于控制$ \ mathsf {w} _p ^ {（\ sigma）} $ by $ p $ th-sting spoollow sobolev restion $ \ mathsf {d} _p ^ {（\ sigma）} $并导出限制$ \ sqrt {n} \，\ mathsf {d} _p ^ {（\ sigma）}（\ hat {\ mu} _n，\ mu）$，适用于所有尺寸$ d $。作为应用程序，我们提供了使用$ \ mathsf {w} _p ^ {（\ sigma）} $的两个样本测试和最小距离估计的渐近保证，使用$ p = 2 $的实验使用$ \ mathsf {d} _2 ^ {（\ sigma）} $。

translated by 谷歌翻译

Controlling Wasserstein distances by Kernel norms with application to Compressive Statistical Learning

Titouan Vayer , Rémi Gribonval

分类： (统计)机器学习 | 机器学习

2021-12-01

比较概率分布是许多机器学习算法的关键。最大平均差异（MMD）和最佳运输距离（OT）是在过去几年吸引丰富的关注的概率措施之间的两类距离。本文建立了一些条件，可以通过MMD规范控制Wassersein距离。我们的作品受到压缩统计学习（CSL）理论的推动，资源有效的大规模学习的一般框架，其中训练数据总结在单个向量（称为草图）中，该训练数据捕获与所考虑的学习任务相关的信息。在CSL中的现有结果启发，我们介绍了H \“较旧的较低限制的等距属性（H \”较旧的LRIP）并表明这家属性具有有趣的保证对压缩统计学习。基于MMD与Wassersein距离之间的关系，我们通过引入和研究学习任务的Wassersein可读性的概念来提供压缩统计学习的保证，即概率分布之间的某些特定于特定的特定度量，可以由Wassersein界定距离。

translated by 谷歌翻译

Statistical and Topological Properties of Sliced Probability Divergences

Kimia Nadjahi , Alain Durmus , Lénaïc Chizat , Soheil Kolouri , Shahin Shahrampour , Umut Şimşekli

分类： (统计)机器学习 | 机器学习

2020-03-12

在包括生成建模的各种机器学习应用中的两个概率措施中，已经证明了切片分歧的想法是成功的，并且包括计算两种测量的一维随机投影之间的“基地分歧”的预期值。然而，这种技术的拓扑，统计和计算后果尚未完整地确定。在本文中，我们的目标是弥合这种差距并导出切片概率分歧的各种理论特性。首先，我们表明切片保留了公制公理和分歧的弱连续性，这意味着切片分歧将共享相似的拓扑性质。然后，我们在基本发散属于积分概率度量类别的情况下精确结果。另一方面，我们在轻度条件下建立了切片分歧的样本复杂性并不依赖于问题尺寸。我们终于将一般结果应用于几个基地分歧，并说明了我们对合成和实际数据实验的理论。

translated by 谷歌翻译

Outlier-Robust Optimal Transport: Duality, Structure, and Statistical Applications

Sloan Nietert , Rachel Cummings , Ziv Goldfeld

分类： (统计)机器学习 | 机器学习

2021-11-02

Wassersein距离，植根于最佳运输（OT）理论，是在统计和机器学习的各种应用程序之间的概率分布之间的流行差异测量。尽管其结构丰富，但效用，但Wasserstein距离对所考虑的分布中的异常值敏感，在实践中阻碍了适用性。灵感来自Huber污染模型，我们提出了一种新的异常值 - 强大的Wasserstein距离$ \ mathsf {w} _p ^ \ varepsilon $，它允许从每个受污染的分布中删除$ \ varepsilon $异常块。与以前考虑的框架相比，我们的配方达到了高度定期的优化问题，使其更好地分析。利用这一点，我们对$ \ mathsf {w} _p ^ \ varepsilon $的彻底理论研究，包括最佳扰动，规律性，二元性和统计估算和鲁棒性结果的表征。特别是，通过解耦优化变量，我们以$ \ mathsf {w} _p ^ \ varepsilon $到达一个简单的双重形式，可以通过基于标准的基于二元性的OT响音器的基本修改来实现。我们通过应用程序来说明我们的框架的好处，以与受污染的数据集进行生成建模。

translated by 谷歌翻译

Neural Estimation of Statistical Divergences

Sreejith Sreekumar , Ziv Goldfeld

分类： (统计)机器学习

2021-10-07

量化概率分布之间的异化的统计分歧（SDS）是统计推理和机器学习的基本组成部分。用于估计这些分歧的现代方法依赖于通过神经网络（NN）进行参数化经验变化形式并优化参数空间。这种神经估算器在实践中大量使用，但相应的性能保证是部分的，并呼吁进一步探索。特别是，涉及的两个错误源之间存在基本的权衡：近似和经验估计。虽然前者需要NN课程富有富有表现力，但后者依赖于控制复杂性。我们通过非渐近误差界限基于浅NN的基于浅NN的估计的估算权，重点关注四个流行的$ \ mathsf {f} $ - 分离 - kullback-leibler，chi squared，squared hellinger，以及总变异。我们分析依赖于实证过程理论的非渐近功能近似定理和工具。界限揭示了NN尺寸和样品数量之间的张力，并使能够表征其缩放速率，以确保一致性。对于紧凑型支持的分布，我们进一步表明，上述上三次分歧的神经估算器以适当的NN生长速率接近Minimax率 - 最佳，实现了对数因子的参数速率。

translated by 谷歌翻译

Triangular Flows for Generative Modeling: Statistical Consistency, Smoothness Classes, and Fast Rates

Nicholas J. Irons , Meyer Scetbon , Soumik Pal , Zaid Harchaoui

分类： (统计)机器学习 | 机器学习

2021-12-31

三角形流量，也称为kn \“{o}的Rosenblatt测量耦合，包括用于生成建模和密度估计的归一化流模型的重要构建块，包括诸如实值的非体积保存变换模型的流行自回归流模型（真实的NVP）。我们提出了三角形流量统计模型的统计保证和样本复杂性界限。特别是，我们建立了KN的统计一致性和kullback-leibler估算器的rospblatt的kullback-leibler估计的有限样本会聚率使用实证过程理论的工具测量耦合。我们的结果突出了三角形流动下播放功能类的各向异性几何形状，优化坐标排序，并导致雅各比比流动的统计保证。我们对合成数据进行数值实验，以说明我们理论发现的实际意义。

translated by 谷歌翻译

Sequential Estimation of Convex Functionals and Divergences

Tudor Manole , Aaditya Ramdas

分类： (统计)机器学习

2021-03-16

我们提出了一种统一的技术，用于顺序估计分布之间的凸面分歧，包括内核最大差异等积分概率度量，$ \ varphi $ - 像Kullback-Leibler发散，以及最佳运输成本，例如Wassersein距离的权力。这是通过观察到经验凸起分歧（部分有序）反向半角分离的实现来实现的，而可交换过滤耦合，其具有这些方法的最大不等式。这些技术似乎是对置信度序列和凸分流的现有文献的互补和强大的补充。我们构建一个离线到顺序设备，将各种现有的离线浓度不等式转换为可以连续监测的时间均匀置信序列，在任意停止时间提供有效的测试或置信区间。得到的顺序边界仅在相应的固定时间范围内支付迭代对数价格，保留对问题参数的相同依赖性（如适用的尺寸或字母大小）。这些结果也适用于更一般的凸起功能，如负差分熵，实证过程的高度和V型统计。

translated by 谷歌翻译

Bayesian Learning with Wasserstein Barycenters

Julio Backhoff-Veraguas , Joaquin Fontbona , Gonzalo Rios , Felipe Tobar

分类： (统计)机器学习 | 机器学习

2018-05-28

We introduce and study a novel model-selection strategy for Bayesian learning, based on optimal transport, along with its associated predictive posterior law: the Wasserstein population barycenter of the posterior law over models. We first show how this estimator, termed Bayesian Wasserstein barycenter (BWB), arises naturally in a general, parameter-free Bayesian model-selection framework, when the considered Bayesian risk is the Wasserstein distance. Examples are given, illustrating how the BWB extends some classic parametric and non-parametric selection strategies. Furthermore, we also provide explicit conditions granting the existence and statistical consistency of the BWB, and discuss some of its general and specific properties, providing insights into its advantages compared to usual choices, such as the model average estimator. Finally, we illustrate how this estimator can be computed using the stochastic gradient descent (SGD) algorithm in Wasserstein space introduced in a companion paper arXiv:2201.04232v2 [math.OC], and provide a numerical example for experimental validation of the proposed method.

translated by 谷歌翻译

Optimal transport map estimation in general function spaces

Vincent Divol , Jonathan Niles-Weed , Aram-Alexandre Pooladian

分类： (统计)机器学习

2022-12-07

We consider the problem of estimating the optimal transport map between a (fixed) source distribution $P$ and an unknown target distribution $Q$, based on samples from $Q$. The estimation of such optimal transport maps has become increasingly relevant in modern statistical applications, such as generative modeling. At present, estimation rates are only known in a few settings (e.g. when $P$ and $Q$ have densities bounded above and below and when the transport map lies in a H\"older class), which are often not reflected in practice. We present a unified methodology for obtaining rates of estimation of optimal transport maps in general function spaces. Our assumptions are significantly weaker than those appearing in the literature: we require only that the source measure $P$ satisfies a Poincar\'e inequality and that the optimal map be the gradient of a smooth convex function that lies in a space whose metric entropy can be controlled. As a special case, we recover known estimation rates for bounded densities and H\"older transport maps, but also obtain nearly sharp results in many settings not covered by prior work. For example, we provide the first statistical rates of estimation when $P$ is the normal distribution and the transport map is given by an infinite-width shallow neural network.

translated by 谷歌翻译

Overparametrized linear dimensionality reductions: From projection pursuit to two-layer neural networks

Andrea Montanari , Kangjie Zhou

分类： (统计)机器学习 | 机器学习

2022-06-14

给定$ n $数据点$ \ mathbb {r}^d $中的云，请考虑$ \ mathbb {r}^d $的$ m $ dimensional子空间预计点。当$ n，d $增长时，这一概率分布的集合如何？我们在零模型下考虑了这个问题。标准高斯矢量，重点是渐近方案，其中$ n，d \ to \ infty $，$ n/d \ to \ alpha \ in（0，\ infty）$，而$ m $是固定的。用$ \ mathscr {f} _ {m，\ alpha} $表示$ \ mathbb {r}^m $中的一组概率分布，在此限制中以低维度为单位，我们在此限制中建立了新的内部和外部界限$ \ mathscr {f} _ {m，\ alpha} $。特别是，我们将$ \ mathscr {f} _ {m，\ alpha} $的Wasserstein Radius表征为对数因素，并以$ M = 1 $确切确定它。我们还通过kullback-leibler差异和r \'{e} NYI信息维度证明了尖锐的界限。上一个问题已应用于无监督的学习方法，例如投影追求和独立的组件分析。我们介绍了与监督学习相关的相同问题的版本，并证明了尖锐的沃斯坦斯坦半径绑定。作为一个应用程序，我们在具有$ M $隐藏神经元的两层神经网络的插值阈值上建立了上限。

translated by 谷歌翻译

Dimension-agnostic inference using cross U-statistics

Ilmun Kim , Aaditya Ramdas

分类： (统计)机器学习

2020-11-10

Classical asymptotic theory for statistical inference usually involves calibrating a statistic by fixing the dimension $d$ while letting the sample size $n$ increase to infinity. Recently, much effort has been dedicated towards understanding how these methods behave in high-dimensional settings, where $d$ and $n$ both increase to infinity together. This often leads to different inference procedures, depending on the assumptions about the dimensionality, leaving the practitioner in a bind: given a dataset with 100 samples in 20 dimensions, should they calibrate by assuming $n \gg d$, or $d/n \approx 0.2$? This paper considers the goal of dimension-agnostic inference; developing methods whose validity does not depend on any assumption on $d$ versus $n$. We introduce an approach that uses variational representations of existing test statistics along with sample splitting and self-normalization to produce a new test statistic with a Gaussian limiting distribution, regardless of how $d$ scales with $n$. The resulting statistic can be viewed as a careful modification of degenerate U-statistics, dropping diagonal blocks and retaining off-diagonal blocks. We exemplify our technique for some classical problems including one-sample mean and covariance testing, and show that our tests have minimax rate-optimal power against appropriate local alternatives. In most settings, our cross U-statistic matches the high-dimensional power of the corresponding (degenerate) U-statistic up to a $\sqrt{2}$ factor.

translated by 谷歌翻译

Signature moments to characterize laws of stochastic processes

Ilya Chevyrev , Harald Oberhauser

分类： (统计)机器学习

2018-10-25

矢量值随机变量的矩序列可以表征其定律。我们通过使用所谓的稳健签名矩来研究路径值随机变量（即随机过程）的类似问题。这使我们能够为随机过程定律得出最大平均差异类型的度量，并研究其在随机过程定律方面引起的拓扑。可以使用签名内核对该度量进行内核，从而有效地计算它。作为应用程序，我们为随机过程定律提供了非参数的两样本假设检验。

translated by 谷歌翻译

Minimax Optimal Regression over Sobolev Spaces via Laplacian Eigenmaps on Neighborhood Graphs

Alden Green , Sivaraman Balakrishnan , Ryan J. Tibshirani

分类： (统计)机器学习

2021-11-14

本文研究了基于Laplacian Eigenmaps（Le）的基于Laplacian EIGENMAPS（PCR-LE）的主要成分回归的统计性质，这是基于Laplacian Eigenmaps（Le）的非参数回归的方法。 PCR-LE通过投影观察到的响应的向量$ {\ bf y} =（y_1，\ ldots，y_n）$ to to changbood图表拉普拉斯的某些特征向量跨越的子空间。我们表明PCR-Le通过SoboLev空格实现了随机设计回归的最小收敛速率。在设计密度$ P $的足够平滑条件下，PCR-le达到估计的最佳速率（其中已知平方$ l ^ 2 $ norm的最佳速率为$ n ^ { - 2s /（2s + d））} $）和健美的测试（$ n ^ { - 4s /（4s + d）$）。我们还表明PCR-LE是\ EMPH {歧管Adaptive}：即，我们考虑在小型内在维度$ M $的歧管上支持设计的情况，并为PCR-LE提供更快的界限Minimax估计（$ n ^ { - 2s /（2s + m）$）和测试（$ n ^ { - 4s /（4s + m）$）收敛率。有趣的是，这些利率几乎总是比图形拉普拉斯特征向量的已知收敛率更快;换句话说，对于这个问题的回归估计的特征似乎更容易，统计上讲，而不是估计特征本身。我们通过经验证据支持这些理论结果。

translated by 谷歌翻译

Quantifying the Effects of Data Augmentation

Kevin H. Huang , Peter Orbanz , Morgane Austern

分类：机器学习 | (统计)机器学习

2022-02-18

We provide results that exactly quantify how data augmentation affects the convergence rate and variance of estimates. They lead to some unexpected findings: Contrary to common intuition, data augmentation may increase rather than decrease the uncertainty of estimates, such as the empirical prediction risk. Our main theoretical tool is a limit theorem for functions of randomly transformed, high-dimensional random vectors. The proof draws on work in probability on noise stability of functions of many variables. The pathological behavior we identify is not a consequence of complex models, but can occur even in the simplest settings -- one of our examples is a ridge regressor with two parameters. On the other hand, our results also show that data augmentation can have real, quantifiable benefits.

translated by 谷歌翻译

Reversible Gromov-Monge Sampler for Simulation-Based Inference

YoonHaeng Hur , Wenxuan Guo , Tengyuan Liang

分类：机器学习 | (统计)机器学习

2021-09-28

本文介绍了一种新的基于仿真的推理程序，以对访问I.I.D. \ samples的多维概率分布进行建模和样本，从而规避明确建模密度函数或设计Markov Chain Monte Carlo的通常方法。我们提出了一个称为可逆的Gromov-monge（RGM）距离的新概念的距离和同构的动机，并研究了RGM如何用于设计新的转换样本，以执行基于模拟的推断。我们的RGM采样器还可以估计两个异质度量度量空间之间的最佳对齐$（\ cx，\ mu，c _ {\ cx}）$和$（\ cy，\ cy，\ nu，c _ {\ cy}）$从经验数据集中，估计的地图大约将一个量度$ \ mu $推向另一个$ \ nu $，反之亦然。我们研究了RGM距离的分析特性，并在轻度条件下得出RGM等于经典的Gromov-Wasserstein距离。奇怪的是，与Brenier的两极分解结合了连接，我们表明RGM采样器以$ C _ {\ cx} $和$ C _ {\ cy} $的正确选择诱导了强度同构的偏见。研究了有关诱导采样器的收敛，表示和优化问题的统计率。还展示了展示RGM采样器有效性的合成和现实示例。

translated by 谷歌翻译

The Projected Covariance Measure for assumption-lean variable significance testing

Anton Rask Lundborg , Ilmun Kim , Rajen D. Shah , Richard J. Samworth

分类： (统计)机器学习

2022-11-03

Testing the significance of a variable or group of variables $X$ for predicting a response $Y$, given additional covariates $Z$, is a ubiquitous task in statistics. A simple but common approach is to specify a linear model, and then test whether the regression coefficient for $X$ is non-zero. However, when the model is misspecified, the test may have poor power, for example when $X$ is involved in complex interactions, or lead to many false rejections. In this work we study the problem of testing the model-free null of conditional mean independence, i.e. that the conditional mean of $Y$ given $X$ and $Z$ does not depend on $X$. We propose a simple and general framework that can leverage flexible nonparametric or machine learning methods, such as additive models or random forests, to yield both robust error control and high power. The procedure involves using these methods to perform regressions, first to estimate a form of projection of $Y$ on $X$ and $Z$ using one half of the data, and then to estimate the expected conditional covariance between this projection and $Y$ on the remaining half of the data. While the approach is general, we show that a version of our procedure using spline regression achieves what we show is the minimax optimal rate in this nonparametric testing problem. Numerical experiments demonstrate the effectiveness of our approach both in terms of maintaining Type I error control, and power, compared to several existing approaches.

translated by 谷歌翻译

k-Sliced Mutual Information: A Quantitative Study of Scalability with Dimension

Ziv Goldfeld , Kristjan Greenewald , Theshani Nuradha , Galen Reeves

分类： (统计)机器学习

2022-06-17

切成薄片的相互信息（SMI）定义为在随机变量的一维随机投影之间的平均值（MI）项。它是对经典MI依赖的替代度量，该量子保留了许多特性，但更可扩展到高维度。但是，对SMI本身和其估计率的定量表征取决于环境维度，这对于理解可伸缩性至关重要，仍然晦涩难懂。这项工作将原始的SMI定义扩展到$ K $ -SMI，该定义将预测视为$ k $维二维子空间，并提供了有关其依赖性尺寸的多方面帐户。在2-Wasserstein指标中使用差分熵连续性的新结果，我们对Monte Carlo（MC）基于$ K $ -SMI的估计的错误得出了尖锐的界限，并明确依赖于$ K $和环境维度，揭示了他们与样品数量的相互作用。然后，我们将MC Integrator与神经估计框架相结合，以提供端到端$ K $ -SMI估算器，为此建立了最佳的收敛率。随着尺寸的增长，我们还探索了人口$ k $ -smi的渐近学，从而为高斯近似结果提供了在适当的力矩范围下衰减的残差。我们的理论通过数值实验验证，并适用于切片Infogan，该切片完全提供了$ k $ -smi的可伸缩性问题的全面定量说明，包括SMI作为特殊情况，当$ k = 1 $。

translated by 谷歌翻译

A kernel two-sample test

分类：

We propose a framework for analyzing and comparing distributions, which we use to construct statistical tests to determine if two samples are drawn from different distributions. Our test statistic is the largest difference in expectations over functions in the unit ball of a reproducing kernel Hilbert space (RKHS), and is called the maximum mean discrepancy (MMD). We present two distributionfree tests based on large deviation bounds for the MMD, and a third test based on the asymptotic distribution of this statistic. The MMD can be computed in quadratic time, although efficient linear time approximations are available. Our statistic is an instance of an integral probability metric, and various classical metrics on distributions are obtained when alternative function classes are used in place of an RKHS. We apply our two-sample tests to a variety of problems, including attribute matching for databases using the Hungarian marriage method, where they perform strongly. Excellent performance is also obtained when comparing distributions over graphs, for which these are the first such tests.

translated by 谷歌翻译

Cycle Consistent Probability Divergences Across Different Spaces

Zhengxin Zhang , Youssef Mroueh , Ziv Goldfeld , Bharath K. Sriperumbudur

分类：机器学习 | (统计)机器学习

2021-11-22

概率分布之间的差异措施是统计推理和机器学习的核心。在许多应用中，在不同的空格上支持感兴趣的分布，需要在数据点之间进行有意义的对应。激励明确地将一致的双向图编码为差异措施，这项工作提出了一种用于匹配的新型不平衡的Monge最佳运输制剂，达到异构体，在不同空间上的分布。我们的配方由于公制空间之间的Gromov-Haussdrow距离而受到了原则放松，并且采用了两个周期一致的地图，将每个分布推向另一个分布。我们研究了拟议的差异的结构性，并且特别表明它将流行的循环一致的生成对抗网络（GaN）框架捕获为特殊情况，从而提供理论解释它。通过计算效率激励，然后我们将差异括起来并将映射限制为参数函数类。由此产生的核化版本被创建为广义最大差异（GMMD）。研究了GMMD的经验估计的收敛速率，并提供了支持我们理论的实验。

translated by 谷歌翻译

The Voronoigram: Minimax Estimation of Bounded Variation Functions From Scattered Data

Addison J. Hu , Alden Green , Ryan J. Tibshirani

分类： (统计)机器学习 | 机器学习

2022-12-30

We consider the problem of estimating a multivariate function $f_0$ of bounded variation (BV), from noisy observations $y_i = f_0(x_i) + z_i$ made at random design points $x_i \in \mathbb{R}^d$, $i=1,\ldots,n$. We study an estimator that forms the Voronoi diagram of the design points, and then solves an optimization problem that regularizes according to a certain discrete notion of total variation (TV): the sum of weighted absolute differences of parameters $\theta_i,\theta_j$ (which estimate the function values $f_0(x_i),f_0(x_j)$) at all neighboring cells $i,j$ in the Voronoi diagram. This is seen to be equivalent to a variational optimization problem that regularizes according to the usual continuum (measure-theoretic) notion of TV, once we restrict the domain to functions that are piecewise constant over the Voronoi diagram. The regression estimator under consideration hence performs (shrunken) local averaging over adaptively formed unions of Voronoi cells, and we refer to it as the Voronoigram, following the ideas in Koenker (2005), and drawing inspiration from Tukey's regressogram (Tukey, 1961). Our contributions in this paper span both the conceptual and theoretical frontiers: we discuss some of the unique properties of the Voronoigram in comparison to TV-regularized estimators that use other graph-based discretizations; we derive the asymptotic limit of the Voronoi TV functional; and we prove that the Voronoigram is minimax rate optimal (up to log factors) for estimating BV functions that are essentially bounded.

translated by 谷歌翻译