智能论文笔记

Sinkhorn distances: Lightspeed computation of optimal transport

分类：

Optimal transport distances are a fundamental family of distances for probability measures and histograms of features. Despite their appealing theoretical properties, excellent performance in retrieval tasks and intuitive formulation, their computation involves the resolution of a linear program whose cost can quickly become prohibitive whenever the size of the support of these measures or the histograms' dimension exceeds a few hundred. We propose in this work a new family of optimal transport distances that look at transport problems from a maximumentropy perspective. We smooth the classic optimal transport problem with an entropic regularization term, and show that the resulting optimum is also a distance which can be computed through Sinkhorn's matrix scaling algorithm at a speed that is several orders of magnitude faster than that of transport solvers. We also show that this regularized distance improves upon classic optimal transport distances on the MNIST classification problem.

translated by 谷歌翻译

On Unbalanced Optimal Transport: Gradient Methods, Sparsity and Approximation Error

Quang Minh Nguyen , Hoang H. Nguyen , Yi Zhou , Lam M. Nguyen

分类：机器学习

2022-02-08

我们研究了两种可能不同质量的度量之间的不平衡最佳运输（UOT），其中最多是$ n $组件，其中标准最佳运输（OT）的边际约束是通过kullback-leibler差异与正则化因子$ \ tau $放松的。尽管仅在文献中分析了具有复杂性$ o \ big（\ tfrac {\ tau n^2 \ log（n）} {\ varepsilon} \ log \ big（\ tfrac {\ log（ n）} {{{\ varepsilon}} \ big）\ big）$）$用于实现错误$ \ varepsilon $，它们与某些深度学习模型和密集的输出运输计划不兼容，强烈阻碍了实用性。虽然被广泛用作计算现代深度学习应用中UOT的启发式方法，并且在稀疏的OT中表现出成功，但尚未正式研究用于UOT的梯度方法。为了填补这一空白，我们提出了一种基于梯度外推法（Gem-uot）的新颖算法，以找到$ \ varepsilon $ -Approximate解决方案，以解决$ o \ big中的UOT问题（\ kappa n^2 \ log \ log \ big（big） \ frac {\ tau n} {\ varepsilon} \ big）\ big）$，其中$ \ kappa $是条件号，具体取决于两个输入度量。我们的算法是通过优化平方$ \ ell_2 $ -norm UOT目标的新的双重配方设计的，从而填补了缺乏稀疏的UOT文献。最后，我们在运输计划和运输距离方面建立了UOT和OT之间近似误差的新颖表征。该结果阐明了一个新的主要瓶颈，该瓶颈被强大的OT文献忽略了：尽管OT放松了OT，因为UOT承认对离群值的稳健性，但计算出的UOT距离远离原始OT距离。我们通过基于Gem-uot从UOT中检索的原则方法来解决此类限制，并使用微调的$ \ tau $和后进程投影步骤来解决。关于合成和真实数据集的实验验证了我们的理论，并证明了我们的方法的良好性能。

translated by 谷歌翻译

Generative Adversarial Learning of Sinkhorn Algorithm Initializations

Jonathan Geuter , Vaios Laschos

分类：机器学习 | (统计)机器学习

2022-11-30

The Sinkhorn algorithm (arXiv:1306.0895) is the state-of-the-art to compute approximations of optimal transport distances between discrete probability distributions, making use of an entropically regularized formulation of the problem. The algorithm is guaranteed to converge, no matter its initialization. This lead to little attention being paid to initializing it, and simple starting vectors like the n-dimensional one-vector are common choices. We train a neural network to compute initializations for the algorithm, which significantly outperform standard initializations. The network predicts a potential of the optimal transport dual problem, where training is conducted in an adversarial fashion using a second, generating network. The network is universal in the sense that it is able to generalize to any pair of distributions of fixed dimension. Furthermore, we show that for certain applications the network can be used independently.

translated by 谷歌翻译

Efficient Approximation of Gromov-Wasserstein Distance using Importance Sparsification

Mengyu Li , Jun Yu , Hongteng Xu , Cheng Meng

分类：机器学习 | (统计)机器学习

2022-05-26

作为度量度量空间的有效度量，Gromov-Wasserstein（GW）距离显示了匹配结构化数据（例如点云和图形）问题的潜力。但是，由于其较高的计算复杂性，其实践中的应用受到限制。为了克服这一挑战，我们提出了一种新颖的重要性稀疏方法，称为SPAR-GW，以有效地近似GW距离。特别是，我们的方法没有考虑密集的耦合矩阵，而是利用一种简单但有效的采样策略来构建稀疏的耦合矩阵，并使用几个计算进行更新。我们证明了所提出的SPAR-GW方法适用于GW距离，并以任意地面成本适用于GW距离，并且将复杂性从$ \ Mathcal {o}（n^4）$降低到$ \ Mathcal {o}（n^{2） +\ delta}）$对于任意的小$ \ delta> 0 $。另外，该方法可以扩展到近似GW距离的变体，包括熵GW距离，融合的GW距离和不平衡的GW距离。实验表明，在合成和现实世界任务中，我们的SPAR-GW对最先进的方法的优越性。

translated by 谷歌翻译

Project and Forget: Solving Large-Scale Metric Constrained Problems

Rishi Sonthalia , Anna C. Gilbert

分类：机器学习 | (统计)机器学习

2020-05-08

给定数据点之间的一组差异测量值，确定哪种度量表示与输入测量最“一致”或最能捕获数据相关几何特征的度量是许多机器学习算法的关键步骤。现有方法仅限于特定类型的指标或小问题大小，因为在此类问题中有大量的度量约束。在本文中，我们提供了一种活跃的集合算法，即项目和忘记，该算法使用Bregman的预测，以解决许多（可能是指数）不平等约束的度量约束问题。我们提供了\ textsc {project and Hoses}的理论分析，并证明我们的算法会收敛到全局最佳解决方案，并以指数速率渐近地渐近地衰减了当前迭代的$ L_2 $距离。我们证明，使用我们的方法，我们可以解决三种类型的度量约束问题的大型问题实例：一般体重相关聚类，度量近距离和度量学习；在每种情况下，就CPU时间和问题尺寸而言，超越了艺术方法的表现。

translated by 谷歌翻译

Stochastic Saddle-Point Optimization for Wasserstein Barycenters

Daniil Tiapkin , Alexander Gasnikov , Pavel Dvurechensky

分类：机器学习 | (统计)机器学习

2020-06-11

我们考虑人口Wasserstein Barycenter问题，用于随机概率措施支持有限一组点，由在线数据流生成。这导致了复杂的随机优化问题，其中目标是作为作为随机优化问题的解决方案给出的函数的期望。我们采用了问题的结构，并获得了这个问题的凸凹陷的随机鞍点重构。在设置随机概率措施的分布是离散的情况下，我们提出了一种随机优化算法并估计其复杂性。基于内核方法的第二个结果将前一个延伸到随机概率措施的任意分布。此外，这种新算法在许多情况下，与随机近似方法相结合的随机近似方法，具有优于随机近似方法的总复杂性。我们还通过一系列数值实验说明了我们的发展。

translated by 谷歌翻译

Projection Robust Wasserstein Distance and Riemannian Optimization

Tianyi Lin , Chenyou Fan , Nhat Ho , Marco Cuturi , Michael I. Jordan

分类：机器学习 | (统计)机器学习

2020-06-12

Projection robust Wasserstein (PRW) distance, or Wasserstein projection pursuit (WPP), is a robust variant of the Wasserstein distance. Recent work suggests that this quantity is more robust than the standard Wasserstein distance, in particular when comparing probability measures in high-dimensions. However, it is ruled out for practical application because the optimization model is essentially non-convex and non-smooth which makes the computation intractable. Our contribution in this paper is to revisit the original motivation behind WPP/PRW, but take the hard route of showing that, despite its non-convexity and lack of nonsmoothness, and even despite some hardness results proved by~\citet{Niles-2019-Estimation} in a minimax sense, the original formulation for PRW/WPP \textit{can} be efficiently computed in practice using Riemannian optimization, yielding in relevant cases better behavior than its convex relaxation. More specifically, we provide three simple algorithms with solid theoretical guarantee on their complexity bound (one in the appendix), and demonstrate their effectiveness and efficiency by conducing extensive experiments on synthetic and real data. This paper provides a first step into a computational theory of the PRW distance and provides the links between optimal transport and Riemannian optimization.

translated by 谷歌翻译

Rethinking Initialization of the Sinkhorn Algorithm

James Thornton , Marco Cuturi

分类： (统计)机器学习 | 机器学习

2022-06-15

计算分布之间的最佳传输（OT）耦合在机器学习中起着越来越重要的作用。虽然可以将OT问题求解为线性程序，但添加熵平滑项会导致求解器对离群值更快，更强大，可区分且易于并行化。 Sinkhorn固定点算法是这些方法的基石，结果，已经进行了多次尝试以缩短其运行时，例如退火，动量或加速度。本文的前提是，\ textit {initialization}的sindhorn算法受到了相对较少的关注，可能是由于两个先入为主的：由于正规化的ot问题是凸的，因此可能不值得制定量身定制的初始化，因为\ textit {\ textit { }保证工作；其次，由于sindhorn算法在端到端管道中通常是区分的，因此数据依赖性初始化可能会通过展开迭代而获得的偏差梯度估计。我们挑战了这种传统的观点，并表明精心选择的初始化可能会导致巨大的加速，并且不会偏向梯度，这些梯度是通过隐式分化计算的。我们详细介绍如何使用1D或高斯设置中的已知结果从封闭形式或近似OT解决方案中恢复初始化。我们从经验上表明，这些初始化可以在现成的情况下使用，几乎没有调整，并且导致各种OT问题的速度持续加速。

translated by 谷歌翻译

Approximating Optimal Transport via Low-rank and Sparse Factorization

Weijie Liu , Chao Zhang , Nenggan Zheng , Hui Qian

分类：机器学习

2021-11-12

最佳运输（OT）自然地出现在广泛的机器学习应用中，但可能经常成为计算瓶颈。最近，一行作品建议大致通过在低秩子空间中搜索\ emph {transport计划}来解决OT。然而，最佳运输计划通常不是低秩，这往往会产生大的近似误差。例如，当存在Monge的\ EMPH {Transport Map}时，运输计划是完整的排名。本文涉及具有足够精度和效率的OT距离的计算。提出了一种用于OT的新颖近似，其中运输计划可以分解成低级矩阵和稀疏矩阵的总和。理论上我们分析近似误差。然后设计增强拉格朗日方法以有效地计算运输计划。

translated by 谷歌翻译

Robust computation of optimal transport by $β$-potential regularization

Shintaro Nakamura , Han Bao , Masashi Sugiyama

分类：机器学习 | 人工智能

2022-12-26

Optimal transport (OT) has become a widely used tool in the machine learning field to measure the discrepancy between probability distributions. For instance, OT is a popular loss function that quantifies the discrepancy between an empirical distribution and a parametric model. Recently, an entropic penalty term and the celebrated Sinkhorn algorithm have been commonly used to approximate the original OT in a computationally efficient way. However, since the Sinkhorn algorithm runs a projection associated with the Kullback-Leibler divergence, it is often vulnerable to outliers. To overcome this problem, we propose regularizing OT with the \beta-potential term associated with the so-called $\beta$-divergence, which was developed in robust statistics. Our theoretical analysis reveals that the $\beta$-potential can prevent the mass from being transported to outliers. We experimentally demonstrate that the transport matrix computed with our algorithm helps estimate a probability distribution robustly even in the presence of outliers. In addition, our proposed method can successfully detect outliers from a contaminated dataset

translated by 谷歌翻译

Semi-relaxed Gromov-Wasserstein divergence with applications on graphs

Cédric Vincent-Cuaz , Rémi Flamary , Marco Corneli , Titouan Vayer , Nicolas Courty

分类：机器学习

2021-10-06

比较图形等结构的对象是许多学习任务中涉及的基本操作。为此，基于最优传输（OT）的Gromov-Wasserstein（GW）距离已被证明可以成功处理相关对象的特定性质。更具体地说，通过节点连接关系，GW在图表上运行，视为特定空间上的概率测量。在OT的核心处是质量守恒的想法，这在两个被认为的图表中的所有节点之间施加了耦合。我们在本文中争辩说，这种财产可能对图形字典或分区学习等任务有害，我们通过提出新的半轻松的Gromov-Wasserstein发散来放松它。除了立即计算福利之外，我们讨论其属性，并表明它可以导致有效的图表字典学习算法。我们经验展示其对图形上的复杂任务的相关性，例如分区，聚类和完成。

translated by 谷歌翻译

$k$FW: A Frank-Wolfe style algorithm with stronger subproblem oracles

Lijun Ding , Jicong Fan , Madeleine Udell

分类：机器学习

2020-06-29

本文提出了弗兰克 - 沃尔夫（FW）的新变种，称为$ k $ fw。标准FW遭受缓慢的收敛性：迭代通常是Zig-zag作为更新方向振荡约束集的极端点。新变种，$ k $ fw，通过在每次迭代中使用两个更强的子问题oracelles克服了这个问题。第一个是$ k $线性优化Oracle（$ k $ loo），计算$ k $最新的更新方向（而不是一个）。第二个是$ k $方向搜索（$ k $ ds），最大限度地减少由$ k $最新更新方向和之前迭代表示的约束组的目标。当问题解决方案承认稀疏表示时，奥克斯都易于计算，而且$ k $ FW会迅速收敛，以便平滑凸起目标和几个有趣的约束集：$ k $ fw实现有限$ \ frac {4l_f ^ 3d ^} { \ Gamma \ Delta ^ 2} $融合在多台和集团规范球上，以及光谱和核规范球上的线性收敛。数值实验验证了$ k $ fw的有效性，并展示了现有方法的数量级加速。

translated by 谷歌翻译

Measure Estimation in the Barycentric Coding Model

Matthew Werenski , Ruijie Jiang , Abiy Tasissa , Shuchin Aeron , James M. Murphy

分类： (统计)机器学习 | 机器学习

2022-01-28

本文考虑了Barycentric编码模型（BCM）下的测量估计问题，其中假定未知的度量属于有限的已知测量集的Wasserstein-2 Barycenters集合。估计该模型下的度量等同于估计未知的Barycentric坐标。我们为BCM下的测量估计提供了新颖的几何，统计和计算见解，由三个主要结果组成。我们的第一个主要结果利用了Wasserstein-2空间的Riemannian几何形状，以提供恢复Barycentric坐标的程序，作为假设对真实参考度量访问的二次优化问题的解决方案。基本的几何见解是，该二次问题的参数是由从给定度量到定义BCM的参考度量的最佳位移图之间的内部产物确定的。然后，我们的第二个主要结果建立了一种算法，用于求解BCM中坐标的算法，当时通过I.I.D进行经验观察到所有测量。样品。我们证明了该算法的精确收敛速率 - 取决于基本措施的平稳性及其维度 - 从而保证其统计一致性。最后，我们证明了BCM和相关估计程序在三个应用领域的实用性：（i）高斯措施的协方差估计；（ii）图像处理；（iii）自然语言处理。

translated by 谷歌翻译

Clustering in Hilbert simplex geometry

Frank Nielsen , Ke Sun

分类：机器学习 | 计算机视觉

2017-04-03

有限维概率单纯x中的聚类分类分布是处理归一化直方图的许多应用中的基本任务。传统上，概率单位的差分几何结构已经通过（i）将Riemannian公制矩阵设定为分类分布的Fisher信息矩阵，或（ii）定义由平滑异化性引起的二元信息 - 几何结构衡量标准，kullback-leibler发散。在这项工作中，我们介绍了群集任务一种新颖的计算型友好框架，用于在几何上建模概率单纯x：{\ em hilbert simplex几何}。在Hilbert Simplex几何形状中，距离是不可分离的Hilbert公制距离，其满足与多光镜边界描述的距离水平集功能的信息单调性的特性。我们表明，Aitchison和Hilbert Simplex的距离分别是关于$ \ ell_2 $和变化规范的标准化对数表示的距离。我们讨论了这些不同的统计建模的利弊，并通过基于基于中心的$ k $ -means和$ k $ -center聚类的基准这些不同的几何形状。此外，由于可以在欧几里德空间的任何有界凸形子集上定义规范希尔伯特距离，因此我们还考虑了与FR \“Obenius和Log-Det分歧相比的相关矩阵的椭圆形的几何形状并研究其聚类性能。

translated by 谷歌翻译

Sliced Optimal Partial Transport

Yikun Bai , Bernard Schmitzer , Mathew Thorpe , Soheil Kolouri

分类：机器学习 | (统计)机器学习

2022-12-15

Optimal transport (OT) has become exceedingly popular in machine learning, data science, and computer vision. The core assumption in the OT problem is the equal total amount of mass in source and target measures, which limits its application. Optimal Partial Transport (OPT) is a recently proposed solution to this limitation. Similar to the OT problem, the computation of OPT relies on solving a linear programming problem (often in high dimensions), which can become computationally prohibitive. In this paper, we propose an efficient algorithm for calculating the OPT problem between two non-negative measures in one dimension. Next, following the idea of sliced OT distances, we utilize slicing to define the sliced OPT distance. Finally, we demonstrate the computational and accuracy benefits of the sliced OPT-based method in various numerical experiments. In particular, we show an application of our proposed Sliced-OPT in noisy point cloud registration.

translated by 谷歌翻译

Low-rank Optimal Transport: Approximation, Statistics and Debiasing

Meyer Scetbon , Marco Cuturi

分类： (统计)机器学习 | 机器学习

2022-05-24

最佳运输（OT）背后的匹配原理在机器学习中起着越来越重要的作用，这一趋势可以观察到ot被用来消除应用程序中的数据集（例如，单细胞基因组学）或用于改善更复杂的方法（例如，平衡平衡）注意变形金刚或自我监督的学习）。为了扩展到更具挑战性的问题，越来越多的共识要求求解器可以在数百万而不是数千点上运作。在\ cite {scetbon2021lowrank}中提倡的低级最佳运输方法（LOT）方法在这方面有几个诺言，并被证明可以补充更确定的熵正则化方法，能够将自己插入更复杂的管道中，例如Quadratic OT。批次将低成本耦合的搜索限制在具有低位级等级的耦合方面，在感兴趣的情况下产生线性时间算法。但是，只有在比较感兴趣的属性时，只有将批次方法视为熵正则化的合法竞争者，这些诺言才能实现，记分卡通常包含理论属性（统计复杂性和与其他方法）或实际方面（偏见，偏见，偏见，依据，，依据，统计复杂性和关系）高参数调整，初始化）。我们针对本文中的每个领域，以巩固计算OT中低级别方法的影响。

translated by 谷歌翻译

Unsupervised Ground Metric Learning using Wasserstein Singular Vectors

Geert-Jan Huizing , Laura Cantini , Gabriel Peyré

分类： (统计)机器学习 | 机器学习

2021-02-11

在数据集中定义样本之间有意义的距离是机器学习中的一个基本问题。最佳传输（OT）在样品之间提高特征（“地面度量”）到几何意义上的距离之间的距离。但是，通常没有直接的地面度量选择。有监督的地面度量学习方法存在，但需要标记的数据。在没有标签的情况下，仅保留临时地面指标。因此，无监督的地面学习是启用数据驱动的OT应用程序的基本问题。在本文中，我们首次通过同时计算样本之间和数据集功能之间的OT距离来提出规范答案。这些距离矩阵自然出现，作为函数映射接地指标的正奇异向量。我们提供标准以确保这些奇异向量的存在和独特性。然后，我们使用随机近似和熵正则化引入可扩展的计算方法以在高维设置中近似它们。最后，我们在单细胞RNA测序数据集上展示了Wasserstein奇异向量。

translated by 谷歌翻译

Fast and Provably Convergent Algorithms for Gromov-Wasserstein in Graph Data

Jiajin Li , Jianheng Tang , Lemin Kong , Huikang Liu , Jia Li , Anthony Man-Cho So , Jose Blanchet

分类：机器学习

2022-05-17

In this paper, we study the design and analysis of a class of efficient algorithms for computing the Gromov-Wasserstein (GW) distance tailored to large-scale graph learning tasks. Armed with the Luo-Tseng error bound condition~\citep{luo1992error}, two proposed algorithms, called Bregman Alternating Projected Gradient (BAPG) and hybrid Bregman Proximal Gradient (hBPG) enjoy the convergence guarantees. Upon task-specific properties, our analysis further provides novel theoretical insights to guide how to select the best-fit method. As a result, we are able to provide comprehensive experiments to validate the effectiveness of our methods on a host of tasks, including graph alignment, graph partition, and shape matching. In terms of both wall-clock time and modeling performance, the proposed methods achieve state-of-the-art results.

translated by 谷歌翻译

Factored couplings in multi-marginal optimal transport via difference of convex programming

Quang Huy Tran , Hicham Janati , Ievgen Redko , Rémi Flamary , Nicolas Courty

分类： (统计)机器学习 | 机器学习

2021-10-01

最佳运输（OT）理论下潜许多新兴机器学习（ML）方法现在解决了各种任务，例如生成建模，转移学习和信息检索。然而，这些后者通常会在传统的OT设置上具有两个分布，同时留下更一般的多边缘OT配方，稍微探索。在本文中，我们研究了多边缘OT（MMOT）问题，并通过促进关于耦合的结构信息，统一其伞下的几种流行的OT方法。我们表明将这种结构信息结合到MMOT中，在允许我们在数值上解决它的不同凸（DC）编程问题的实例。尽管后一级的计算成本高，但DC优化提供的解决方案通常与使用当前采用的优化方案获得的解决方案一样定性。

translated by 谷歌翻译

Reversible Gromov-Monge Sampler for Simulation-Based Inference

YoonHaeng Hur , Wenxuan Guo , Tengyuan Liang

分类：机器学习 | (统计)机器学习

2021-09-28

本文介绍了一种新的基于仿真的推理程序，以对访问I.I.D. \ samples的多维概率分布进行建模和样本，从而规避明确建模密度函数或设计Markov Chain Monte Carlo的通常方法。我们提出了一个称为可逆的Gromov-monge（RGM）距离的新概念的距离和同构的动机，并研究了RGM如何用于设计新的转换样本，以执行基于模拟的推断。我们的RGM采样器还可以估计两个异质度量度量空间之间的最佳对齐$（\ cx，\ mu，c _ {\ cx}）$和$（\ cy，\ cy，\ nu，c _ {\ cy}）$从经验数据集中，估计的地图大约将一个量度$ \ mu $推向另一个$ \ nu $，反之亦然。我们研究了RGM距离的分析特性，并在轻度条件下得出RGM等于经典的Gromov-Wasserstein距离。奇怪的是，与Brenier的两极分解结合了连接，我们表明RGM采样器以$ C _ {\ cx} $和$ C _ {\ cy} $的正确选择诱导了强度同构的偏见。研究了有关诱导采样器的收敛，表示和优化问题的统计率。还展示了展示RGM采样器有效性的合成和现实示例。

translated by 谷歌翻译