智能论文笔记

Small Transformers Compute Universal Metric Embeddings

Anastasis Kratsios , Valentin Debarnot , Ivan Dokmanić

分类：机器学习 | 神经与进化计算 | (统计)机器学习

2022-09-14

我们使用运输公制（Delon和Desolneux 2020）中的单变量高斯混合物中的任意度量空间$ \ MATHCAL {X} $研究数据表示。我们得出了由称为\ emph {Probabilistic Transfersers}的小神经网络实现的特征图的保证。我们的保证是记忆类型：我们证明了深度约为$ n \ log（n）$的概率变压器和大约$ n^2 $ can bi-h \'{o} lder嵌入任何$ n $ - 点数据集从低度量失真的$ \ Mathcal {x} $，从而避免了维数的诅咒。我们进一步得出了概率的bi-lipschitz保证，可以兑换失真量和随机选择的点与该失真的随机选择点的可能性。如果$ \ MATHCAL {X} $的几何形状足够规律，那么我们可以为数据集中的所有点获得更强的Bi-Lipschitz保证。作为应用程序，我们从Riemannian歧管，指标和某些类型的数据集中获得了神经嵌入保证金组合图。

translated by 谷歌翻译

Universal Regular Conditional Distributions

Anastasis Kratsios

分类：机器学习 | 神经与进化计算 | (统计)机器学习

2021-05-17

我们引入了一个深度学习模型，该模型通常可以近似于常规条件分布（RCD）。所提出的模型分为三个阶段：首先从给定的度量空间$ \ mathcal {x} $到$ \ mathbb {r}^d $通过功能映射进行线性化输入，然后这些线性化的功能由深层馈电的神经网络处理，然后通过Bahdanau等人引入的注意机制的概率扩展，将网络的输出转换为$ 1 $ -WASSERSTEIN SPACE $ \ MATHCAL {P} _1（\ Mathbb {r}^d）$。（2014）。我们发现，使用我们的框架构建的模型可以从$ \ mathbb {r}^d $到$ \ mathcal {p} _1（\ mathbb {r}^d）$均匀地在紧凑的集合上近似任何连续功能。当近似$ \ mathcal {p} _1（\ mathbb {r}^d）$ - 有价值的函数时，我们确定了两种避免维数的诅咒的方法。第一个策略描述了$ c（\ mathbb {r}^d，\ mathcal {p} _1（\ mathbb {r}^d））$中的函数，可以在$ \ mathbb {r}的任何紧凑子集上有效地近似地近似^D $。第二种方法描述了$ \ mathbb {r}^d $的紧凑子集，其中最多的$ c（\ mathbb {r}^d，\ mathcal {p} _1 _1（\ mathbb {r}^d））$可以有效地近似。结果经过实验验证。

translated by 谷歌翻译

Designing Universal Causal Deep Learning Models: The Geometric (Hyper)Transformer

Beatrice Acciaio , Anastasis Kratsios , Gudmund Pammer

分类：机器学习 | 神经与进化计算

2022-01-31

Several problems in stochastic analysis are defined through their geometry, and preserving that geometric structure is essential to generating meaningful predictions. Nevertheless, how to design principled deep learning (DL) models capable of encoding these geometric structures remains largely unknown. We address this open problem by introducing a universal causal geometric DL framework in which the user specifies a suitable pair of geometries $\mathscr{X}$ and $\mathscr{Y}$ and our framework returns a DL model capable of causally approximating any ``regular'' map sending time series in $\mathscr{X}^{\mathbb{Z}}$ to time series in $\mathscr{Y}^{\mathbb{Z}}$ while respecting their forward flow of information throughout time. Suitable geometries on $\mathscr{Y}$ include various (adapted) Wasserstein spaces arising in optimal stopping problems, a variety of statistical manifolds describing the conditional distribution of continuous-time finite state Markov chains, and all Fr\'echet spaces admitting a Schauder basis, e.g. as in classical finance. Suitable, $\mathscr{X}$ are any compact subset of any Euclidean space. Our results all quantitatively express the number of parameters needed for our DL model to achieve a given approximation error as a function of the target map's regularity and the geometric structure both of $\mathscr{X}$ and of $\mathscr{Y}$. Even when omitting any temporal structure, our universal approximation theorems are the first guarantees that H\"older functions, defined between such $\mathscr{X}$ and $\mathscr{Y}$ can be approximated by DL models.

translated by 谷歌翻译

Universal Approximation Theorems for Differentiable Geometric Deep Learning

Anastasis Kratsios , Leonie Papon

分类：机器学习

2021-01-13

本文通过引入几何深度学习（GDL）框架来构建通用馈电型型模型与可区分的流形几何形状兼容的通用馈电型模型，从而解决了对非欧国人数据进行处理的需求。我们表明，我们的GDL模型可以在受控最大直径的紧凑型组上均匀地近似任何连续目标函数。我们在近似GDL模型的深度上获得了最大直径和上限的曲率依赖性下限。相反，我们发现任何两个非分类紧凑型歧管之间始终都有连续的函数，任何“局部定义”的GDL模型都不能均匀地近似。我们的最后一个主要结果确定了数据依赖性条件，确保实施我们近似的GDL模型破坏了“维度的诅咒”。我们发现，任何“现实世界”（即有限）数据集始终满足我们的状况，相反，如果目标函数平滑，则任何数据集都满足我们的要求。作为应用，我们确认了以下GDL模型的通用近似功能：Ganea等。（2018）的双波利馈电网络，实施Krishnan等人的体系结构。（2015年）的深卡尔曼 - 滤波器和深度玛克斯分类器。我们构建了：Meyer等人的SPD-Matrix回归剂的通用扩展/变体。（2011）和Fletcher（2003）的Procrustean回归剂。在欧几里得的环境中，我们的结果暗示了Kidger和Lyons（2020）的近似定理和Yarotsky和Zhevnerchuk（2019）无估计近似率的数据依赖性版本的定量版本。

translated by 谷歌翻译

Do ReLU Networks Have An Edge When Approximating Compactly-Supported Functions?

Anastasis Kratsios , Behnoosh Zamanlooy

分类：机器学习 | 人工智能 | 神经与进化计算

2022-04-24

我们研究了使用前馈神经网络实施其支持集的同时近似紧凑型积分功能的问题。我们的第一个主要结果将这个“结构化”近似问题转录为普遍性问题。我们通过在空间上构建通常的拓扑结构来做到这一点，$ l^1 _ {\ propatatorName {loc}}（\ m athbb {r}^d，\ m athbb {r}^d）locally-intellable-intellable-intellable-intellable-intellable-in紧凑型函数只能通过具有匹配的离散支持的函数来近似于$ l^1 $ norm。我们建立了Relu Feedforwward网络的普遍性，并在此精致拓扑结构中具有双线性池层。因此，我们发现具有双线性池的Relu FeedForward网络可以在实施其离散支持的同时近似紧凑的功能。我们在紧凑型Lipschitz函数的致密亚类中得出了通用近似定理的定量均匀版本。该定量结果表达了通过目标函数的规律性，其基本支持的度量和直径以及输入和输出空间的尺寸来构建此relu网络所需的双线性池层层的深度，宽度和数量。相反，我们表明多项式回归器和分析前馈网络在该空间中并非通用。

translated by 谷歌翻译

Learning Sub-Patterns in Piecewise Continuous Functions

Anastasis Kratsios , Behnoosh Zamanlooy

分类：神经与进化计算 | 机器学习 | (统计)机器学习

2020-10-29

大多数随机梯度下降算法可以优化在其参数中的子微分内的神经网络;然而，这意味着神经网络的激活函数必须表现出一定程度的连续性，这将神经网络模型的均匀近似容量限制为连续功能。本文重点介绍不连续性从不同的子模式产生的情况，每个子模式都在输入空间的不同部分上定义。我们提出了一种新的不连续的深度神经网络模型，通过解耦的两步过程培训，避免通过网络的唯一和战略放置的不连续单元通过梯度更新。我们为我们在我们在此介绍的分段连续功能的空间中提供了近似的宽度保证。我们为我们的结构量身定制了一部小型半监督两步培训程序，为其结构量身定制，我们为其有效性提供了理论支持。我们的模型和提议程序培训的性能在实验上在实际的金融数据集和合成数据集上进行了实验评估。

translated by 谷歌翻译

Clustering in Hilbert simplex geometry

Frank Nielsen , Ke Sun

分类：机器学习 | 计算机视觉

2017-04-03

有限维概率单纯x中的聚类分类分布是处理归一化直方图的许多应用中的基本任务。传统上，概率单位的差分几何结构已经通过（i）将Riemannian公制矩阵设定为分类分布的Fisher信息矩阵，或（ii）定义由平滑异化性引起的二元信息 - 几何结构衡量标准，kullback-leibler发散。在这项工作中，我们介绍了群集任务一种新颖的计算型友好框架，用于在几何上建模概率单纯x：{\ em hilbert simplex几何}。在Hilbert Simplex几何形状中，距离是不可分离的Hilbert公制距离，其满足与多光镜边界描述的距离水平集功能的信息单调性的特性。我们表明，Aitchison和Hilbert Simplex的距离分别是关于$ \ ell_2 $和变化规范的标准化对数表示的距离。我们讨论了这些不同的统计建模的利弊，并通过基于基于中心的$ k $ -means和$ k $ -center聚类的基准这些不同的几何形状。此外，由于可以在欧几里德空间的任何有界凸形子集上定义规范希尔伯特距离，因此我们还考虑了与FR \“Obenius和Log-Det分歧相比的相关矩阵的椭圆形的几何形状并研究其聚类性能。

translated by 谷歌翻译

Controlling Wasserstein distances by Kernel norms with application to Compressive Statistical Learning

Titouan Vayer , Rémi Gribonval

分类： (统计)机器学习 | 机器学习

2021-12-01

比较概率分布是许多机器学习算法的关键。最大平均差异（MMD）和最佳运输距离（OT）是在过去几年吸引丰富的关注的概率措施之间的两类距离。本文建立了一些条件，可以通过MMD规范控制Wassersein距离。我们的作品受到压缩统计学习（CSL）理论的推动，资源有效的大规模学习的一般框架，其中训练数据总结在单个向量（称为草图）中，该训练数据捕获与所考虑的学习任务相关的信息。在CSL中的现有结果启发，我们介绍了H \“较旧的较低限制的等距属性（H \”较旧的LRIP）并表明这家属性具有有趣的保证对压缩统计学习。基于MMD与Wassersein距离之间的关系，我们通过引入和研究学习任务的Wassersein可读性的概念来提供压缩统计学习的保证，即概率分布之间的某些特定于特定的特定度量，可以由Wassersein界定距离。

translated by 谷歌翻译

Identifying the latent space geometry of network models through analysis of curvature

Shane Lubold , Arun G. Chandrasekhar , Tyler H. McCormick

分类： (统计)机器学习

2020-12-19

A common approach to modeling networks assigns each node to a position on a low-dimensional manifold where distance is inversely proportional to connection likelihood. More positive manifold curvature encourages more and tighter communities; negative curvature induces repulsion. We consistently estimate manifold type, dimension, and curvature from simply connected, complete Riemannian manifolds of constant curvature. We represent the graph as a noisy distance matrix based on the ties between cliques, then develop hypothesis tests to determine whether the observed distances could plausibly be embedded isometrically in each of the candidate geometries. We apply our approach to data-sets from economics and neuroscience.

translated by 谷歌翻译

On the Whitney extension problem for near isometries and beyond

Steven B. Damelin

分类：计算机视觉 | 机器学习

2021-03-17

在此备忘录中，我们开发了一般框架，它允许同时研究$ \ MathBB R ^ D $和惠特尼在$ \ Mathbb r的离散和非离散子集附近的insoctry扩展问题附近的标签和未标记的近对准数据问题。^ d $与某些几何形状。此外，我们调查了与集群，维度减少，流形学习，视觉以及最小的能量分区，差异和最小最大优化的相关工作。给出了谐波分析，计算机视觉，歧管学习和与我们工作的信号处理中的众多开放问题。本发明内容中的一部分工作基于纸张中查尔斯Fefferman的联合研究[48]，[49]，[50]，[51]。

translated by 谷歌翻译

Instance-Dependent Generalization Bounds via Optimal Transport

Songyan Hou , Parnian Kassraie , Anastasis Kratsios , Jonas Rothfuss , Andreas Krause

分类： (统计)机器学习 | 机器学习

2022-11-02

Existing generalization bounds fail to explain crucial factors that drive generalization of modern neural networks. Since such bounds often hold uniformly over all parameters, they suffer from over-parametrization, and fail to account for the strong inductive bias of initialization and stochastic gradient descent. As an alternative, we propose a novel optimal transport interpretation of the generalization problem. This allows us to derive instance-dependent generalization bounds that depend on the local Lipschitz regularity of the earned prediction function in the data space. Therefore, our bounds are agnostic to the parametrization of the model and work well when the number of training samples is much smaller than the number of parameters. With small modifications, our approach yields accelerated rates for data on low-dimensional manifolds, and guarantees under distribution shifts. We empirically analyze our generalization bounds for neural networks, showing that the bound values are meaningful and capture the effect of popular regularization methods during training.

translated by 谷歌翻译

Large sample spectral analysis of graph-based multi-manifold clustering

Nicolas Garcia Trillos , Pengfei He , Chenghui Li

分类：机器学习 | (统计)机器学习

2021-07-28

In this work we study statistical properties of graph-based algorithms for multi-manifold clustering (MMC). In MMC the goal is to retrieve the multi-manifold structure underlying a given Euclidean data set when this one is assumed to be obtained by sampling a distribution on a union of manifolds $\mathcal{M} = \mathcal{M}_1 \cup\dots \cup \mathcal{M}_N$ that may intersect with each other and that may have different dimensions. We investigate sufficient conditions that similarity graphs on data sets must satisfy in order for their corresponding graph Laplacians to capture the right geometric information to solve the MMC problem. Precisely, we provide high probability error bounds for the spectral approximation of a tensorized Laplacian on $\mathcal{M}$ with a suitable graph Laplacian built from the observations; the recovered tensorized Laplacian contains all geometric information of all the individual underlying manifolds. We provide an example of a family of similarity graphs, which we call annular proximity graphs with angle constraints, satisfying these sufficient conditions. We contrast our family of graphs with other constructions in the literature based on the alignment of tangent planes. Extensive numerical experiments expand the insights that our theory provides on the MMC problem.

translated by 谷歌翻译

Optimal transport map estimation in general function spaces

Vincent Divol , Jonathan Niles-Weed , Aram-Alexandre Pooladian

分类： (统计)机器学习

2022-12-07

We consider the problem of estimating the optimal transport map between a (fixed) source distribution $P$ and an unknown target distribution $Q$, based on samples from $Q$. The estimation of such optimal transport maps has become increasingly relevant in modern statistical applications, such as generative modeling. At present, estimation rates are only known in a few settings (e.g. when $P$ and $Q$ have densities bounded above and below and when the transport map lies in a H\"older class), which are often not reflected in practice. We present a unified methodology for obtaining rates of estimation of optimal transport maps in general function spaces. Our assumptions are significantly weaker than those appearing in the literature: we require only that the source measure $P$ satisfies a Poincar\'e inequality and that the optimal map be the gradient of a smooth convex function that lies in a space whose metric entropy can be controlled. As a special case, we recover known estimation rates for bounded densities and H\"older transport maps, but also obtain nearly sharp results in many settings not covered by prior work. For example, we provide the first statistical rates of estimation when $P$ is the normal distribution and the transport map is given by an infinite-width shallow neural network.

translated by 谷歌翻译

Linear Classifiers in Product Space Forms

Puoya Tabaghi , Chao Pan , Eli Chien , Jianhao Peng , Olgica Milenkovic

分类：机器学习 | (统计)机器学习

2021-02-19

产品空间的嵌入方法是用于复杂数据结构的低失真和低维表示的强大技术。在这里，我们解决了Euclidean，球形和双曲线产品的产品空间形式的线性分类新问题。首先，我们描述了使用测地仪和黎曼·歧木的线性分类器的新型制剂，其使用大气和黎曼指标在向量空间中推广直线和内部产品。其次，我们证明了$ D $ -dimential空间形式的线性分类器的任何曲率具有相同的表现力，即，它们可以粉碎恰好$ d + 1 $积分。第三，我们在产品空间形式中正式化线性分类器，描述了第一个已知的Perceptron和支持这些空间的传染媒介机分类器，并为感知者建立严格的融合结果。此外，我们证明了vapnik-chervonenkis尺寸在尺寸的产品空间形式的线性分类器的维度为\ {至少} $ d + 1 $。我们支持我们的理论发现，在多个数据集上模拟，包括合成数据，图像数据和单细胞RNA测序（SCRNA-SEQ）数据。结果表明，与相同维度的欧几里德空间中的欧几里德空间中，SCRNA-SEQ数据的低维产品空间形式的分类为SCRNA-SEQ数据提供了$ \ SIM15 \％$的性能改进。

translated by 谷歌翻译

The Separation Capacity of Random Neural Networks

Sjoerd Dirksen , Martin Genzel , Laurent Jacques , Alexander Stollenwerk

分类：机器学习

2021-07-31

Neural networks with random weights appear in a variety of machine learning applications, most prominently as the initialization of many deep learning algorithms and as a computationally cheap alternative to fully learned neural networks. In the present article, we enhance the theoretical understanding of random neural networks by addressing the following data separation problem: under what conditions can a random neural network make two classes $\mathcal{X}^-, \mathcal{X}^+ \subset \mathbb{R}^d$ (with positive distance) linearly separable? We show that a sufficiently large two-layer ReLU-network with standard Gaussian weights and uniformly distributed biases can solve this problem with high probability. Crucially, the number of required neurons is explicitly linked to geometric properties of the underlying sets $\mathcal{X}^-, \mathcal{X}^+$ and their mutual arrangement. This instance-specific viewpoint allows us to overcome the usual curse of dimensionality (exponential width of the layers) in non-pathological situations where the data carries low-complexity structure. We quantify the relevant structure of the data in terms of a novel notion of mutual complexity (based on a localized version of Gaussian mean width), which leads to sound and informative separation guarantees. We connect our result with related lines of work on approximation, memorization, and generalization.

translated by 谷歌翻译

Matrix factorisation and the interpretation of geodesic distance

Nick Whiteley , Annie Gray , Patrick Rubin-Delanchy

分类： (统计)机器学习 | 机器学习

2021-06-02

给定图形或相似性矩阵，我们考虑了恢复节点之间真实距离的概念以及它们的真实位置的问题。我们证明这可以通过两个步骤完成：矩阵分解，然后进行非线性尺寸降低。这种组合之所以有效，是因为在第一步中获得的点云一直生活在歧管上，其中潜在距离被编码为地球距离。因此，一个非线性降低尺寸的工具，即近似地球距离，可以恢复潜在位置，直至简单的转换。我们详细说明了使用光谱嵌入，其次是ISOMAP的情况，并为其他技术组合提供了令人鼓舞的实验证据。

translated by 谷歌翻译

Tangent Space and Dimension Estimation with the Wasserstein Distance

Uzu Lim , Harald Oberhauser , Vidit Nanda

分类：机器学习

2021-10-12

我们提供了通过局部主成分分析估计切线空间和（光滑，紧凑）欧几里德子多元化的固定空间和固有尺寸所需的采样点数量的明确界限。我们的方法直接估计本地协方差矩阵，其同时允许估计切线空间和歧管的固有尺寸。关键争论涉及矩阵浓度不等式，是用于平坦化歧管的Wasserstein，以及关于Wassersein距离的协方差矩阵的Lipschitz关系。

translated by 谷歌翻译

Why Robust Generalization in Deep Learning is Difficult: Perspective of Expressive Power

Binghui Li , Jikai Jin , Han Zhong , John E. Hopcroft , Liwei Wang

分类：机器学习 | 人工智能 | (统计)机器学习

2022-05-27

众所周知，现代神经网络容易受到对抗例子的影响。为了减轻这个问题，已经提出了一系列强大的学习算法。但是，尽管通过某些方法可以通过某些方法接近稳定的训练误差，但所有现有的算法都会导致较高的鲁棒概括误差。在本文中，我们从深层神经网络的表达能力的角度提供了对这种令人困惑的现象的理论理解。具体而言，对于二进制分类数据，我们表明，对于Relu网络，虽然轻度的过度参数足以满足较高的鲁棒训练精度，但存在持续的稳健概括差距，除非神经网络的大小是指数的，却是指数的。数据维度$ d $。即使数据是线性可分离的，这意味着要实现低清洁概括错误很容易，我们仍然可以证明$ \ exp（{\ omega}（d））$下限可用于鲁棒概括。通常，只要它们的VC维度最多是参数数量，我们的指数下限也适用于各种神经网络家族和其他功能类别。此外，我们为网络大小建立了$ \ exp（{\ mathcal {o}}（k））$的改进的上限，当数据放在具有内在尺寸$ k $的歧管上时，以实现低鲁棒的概括错误（$） k \ ll d $）。尽管如此，我们也有一个下限，相对于$ k $成倍增长 - 维度的诅咒是不可避免的。通过证明网络大小之间的指数分离以实现较低的鲁棒训练和泛化错误，我们的结果表明，鲁棒概括的硬度可能源于实用模型的表现力。

translated by 谷歌翻译

Geometric Scattering on Measure Spaces

Joyce Chew , Matthew Hirn , Smita Krishnaswamy , Deanna Needell , Michael Perlmutter , Holly Steach , Siddharth Viswanath , Hau-Tieng Wu

分类： (统计)机器学习 | 机器学习

2022-08-17

散射变换是一种基于小波的多层转换，最初是作为卷积神经网络（CNN）的模型引入的，它在我们对这些网络稳定性和不变性属性的理解中发挥了基础作用。随后，人们普遍兴趣将CNN的成功扩展到具有非欧盟结构的数据集，例如图形和歧管，从而导致了几何深度学习的新兴领域。为了提高我们对这个新领域中使用的体系结构的理解，几篇论文提出了对非欧几里得数据结构（如无方向的图形和紧凑的Riemannian歧管）的散射转换的概括。在本文中，我们介绍了一个通用的统一模型，用于测量空间上的几何散射。我们提出的框架包括以前的几何散射作品作为特殊情况，但也适用于更通用的设置，例如有向图，签名图和带边界的歧管。我们提出了一个新标准，该标准可以识别哪些有用表示应该不变的组，并表明该标准足以确保散射变换具有理想的稳定性和不变性属性。此外，我们考虑从随机采样未知歧管获得的有限度量空间。我们提出了两种构造数据驱动图的方法，在该图上相关的图形散射转换近似于基础歧管上的散射变换。此外，我们使用基于扩散图的方法来证明这些近似值之一的收敛速率的定量估计值，因为样品点的数量趋向于无穷大。最后，我们在球形图像，有向图和高维单细胞数据上展示了方法的实用性。

translated by 谷歌翻译

Learning Low Bending and Low Distortion Manifold Embeddings: Theory and Applications

Juliane Braunsmann , Marko Rajković , Martin Rumpf , Benedikt Wirth

分类：计算机视觉 | 机器学习

2022-08-22

由编码器和解码器组成的自动编码器被广泛用于机器学习，以缩小高维数据的尺寸。编码器将输入数据歧管嵌入到较低的潜在空间中，而解码器表示反向映射，从而提供了潜在空间中的歧管的数据歧管的参数化。嵌入式歧管的良好规律性和结构可以实质性地简化进一步的数据处理任务，例如群集分析或数据插值。我们提出并分析了一种新的正则化，以学习自动编码器的编码器组件：一种损失功能，可倾向于等距，外层平坦的嵌入，并允许自行训练编码器。为了进行训练，假定对于输入歧管上的附近点，他们的本地riemannian距离及其本地riemannian平均水平可以评估。损失函数是通过蒙特卡洛集成计算的，具有不同的采样策略，用于输入歧管上的一对点。我们的主要定理将嵌入图的几何损失函数识别为$ \ gamma $ - 依赖于采样损失功能的限制。使用编码不同明确给定的数据歧管的图像数据的数值测试表明，将获得平滑的歧管嵌入到潜在空间中。由于促进了外部平坦度，这些嵌入足够规律，因此在潜在空间中线性插值可以作为一种可能的后处理。

translated by 谷歌翻译