智能论文笔记

From Kernel Methods to Neural Networks: A Unifying Variational Formulation

Michael Unser

分类：机器学习

2022-06-29

数据保真项和加性正则化功能的最小化为监督学习带来了强大的框架。在本文中，我们提出了一个统一的正则功能，该功能取决于操作员和通用的ra域标准。我们确定了最小化器的存在，并在非常温和的假设下给出了溶液的参数形式。当规范是希尔伯特人时，提出的配方会产生涉及径向基础功能的解决方案，并且与机器学习的经典方法兼容。相比之下，对于总差异规范，解决方案采用具有正则化运算符确定的激活函数的两层神经网络的形式。特别是，我们通过让操作员成为拉普拉斯（Laplacian）来检索流行的Relu网络。我们还表征了中间正规化规范的解决方案$ \ | \ cdot \ | = \ | \ | \ cdot \ | _ {l_p} $ at（1,2] $。我们的框架提供了保证通用近似值的保证广泛的正规化操作员家庭或等同于各种浅层神经网络，包括激活函数在多项式上增加的病例（例如Relu）。它还解释了偏见和跳过连接在神经建筑中的有利作用。

translated by 谷歌翻译

Optimal Learning Rates for Regularized Least-Squares with a Fourier Capacity Condition

Prem Talwai , David Simchi-Levi

分类： (统计)机器学习

2022-04-16

我们为在一般来源条件下的希尔伯特量表中的新型Tikhonov登记学习问题提供了最小的自适应率。我们的分析不需要在假设类中包含回归函数，并且最著名的是不使用传统的\ textit {先验{先验}假设。使用插值理论，我们证明了Mercer运算符的光谱可以在存在“紧密''$ l^{\ infty} $嵌入的存在的情况下，可以推断出合适的Hilbert鳞片的嵌入。我们的分析利用了新的傅立叶能力条件在某些参数制度中，修改后的Mercer运算符的最佳Lorentz范围空间。

translated by 谷歌翻译

Signature moments to characterize laws of stochastic processes

Ilya Chevyrev , Harald Oberhauser

分类： (统计)机器学习

2018-10-25

矢量值随机变量的矩序列可以表征其定律。我们通过使用所谓的稳健签名矩来研究路径值随机变量（即随机过程）的类似问题。这使我们能够为随机过程定律得出最大平均差异类型的度量，并研究其在随机过程定律方面引起的拓扑。可以使用签名内核对该度量进行内核，从而有效地计算它。作为应用程序，我们为随机过程定律提供了非参数的两样本假设检验。

translated by 谷歌翻译

Breaking the Curse of Dimensionality with Convex Neural Networks

Francis Bach

分类：

2014-12-30

We consider neural networks with a single hidden layer and non-decreasing positively homogeneous activation functions like the rectified linear units. By letting the number of hidden units grow unbounded and using classical non-Euclidean regularization tools on the output weights, they lead to a convex optimization problem and we provide a detailed theoretical analysis of their generalization performance, with a study of both the approximation and the estimation errors. We show in particular that they are adaptive to unknown underlying linear structures, such as the dependence on the projection of the input variables onto a low-dimensional subspace. Moreover, when using sparsity-inducing norms on the input weights, we show that high-dimensional non-linear variable selection may be achieved, without any strong assumption regarding the data and with a total number of variables potentially exponential in the number of observations. However, solving this convex optimization problem in infinite dimensions is only possible if the non-convex subproblem of addition of a new unit can be solved efficiently. We provide a simple geometric interpretation for our choice of activation functions and describe simple conditions for convex relaxations of the finite-dimensional non-convex subproblem to achieve the same generalization error bounds, even when constant-factor approximations cannot be found. We were not able to find strong enough convex relaxations to obtain provably polynomial-time algorithms and leave open the existence or non-existence of such tractable algorithms with non-exponential sample complexities.

translated by 谷歌翻译

Integral representations of shallow neural network with Rectified Power Unit activation function

Ahmed Abdeljawad , Philipp Grohs

分类：神经与进化计算 | 机器学习

2021-12-20

在这项工作中，我们通过整流电源单元激活功能导出浅神经网络的整体表示的公式。主要是，我们的第一件结果涉及REPU浅网络的非相似性表现能力。本文的多维结果表征了可以用有界规范和可能无界宽度表示的功能集。

translated by 谷歌翻译

Optimal bump functions for shallow ReLU networks: Weight decay, depth separation and the curse of dimensionality

Stephan Wojtowytsch

分类： (统计)机器学习 | 机器学习

2022-09-02

在本说明中，我们研究了如何使用单个隐藏层和RELU激活的神经网络插值数据，该数据是从径向对称分布中的，目标标签1处的目标标签1和单位球外部0，如果单位球内没有标签。通过重量衰减正则化和无限神经元的无限数据限制，我们证明存在独特的径向对称最小化器，其重量衰减正常器和Lipschitz常数分别为$ d $和$ \ sqrt {d} $。我们此外表明，如果标签$ 1 $强加于半径$ \ varepsilon $，而不仅仅是源头，则重量衰减正规剂会在$ d $中成倍增长。相比之下，具有两个隐藏层的神经网络可以近似目标函数，而不会遇到维度的诅咒。

translated by 谷歌翻译

HTML版本

Universal Approximation Theorems for Differentiable Geometric Deep Learning

Anastasis Kratsios , Leonie Papon

分类：机器学习

2021-01-13

本文通过引入几何深度学习（GDL）框架来构建通用馈电型型模型与可区分的流形几何形状兼容的通用馈电型模型，从而解决了对非欧国人数据进行处理的需求。我们表明，我们的GDL模型可以在受控最大直径的紧凑型组上均匀地近似任何连续目标函数。我们在近似GDL模型的深度上获得了最大直径和上限的曲率依赖性下限。相反，我们发现任何两个非分类紧凑型歧管之间始终都有连续的函数，任何“局部定义”的GDL模型都不能均匀地近似。我们的最后一个主要结果确定了数据依赖性条件，确保实施我们近似的GDL模型破坏了“维度的诅咒”。我们发现，任何“现实世界”（即有限）数据集始终满足我们的状况，相反，如果目标函数平滑，则任何数据集都满足我们的要求。作为应用，我们确认了以下GDL模型的通用近似功能：Ganea等。（2018）的双波利馈电网络，实施Krishnan等人的体系结构。（2015年）的深卡尔曼 - 滤波器和深度玛克斯分类器。我们构建了：Meyer等人的SPD-Matrix回归剂的通用扩展/变体。（2011）和Fletcher（2003）的Procrustean回归剂。在欧几里得的环境中，我们的结果暗示了Kidger和Lyons（2020）的近似定理和Yarotsky和Zhevnerchuk（2019）无估计近似率的数据依赖性版本的定量版本。

translated by 谷歌翻译

Tighter Sparse Approximation Bounds for ReLU Neural Networks

Carles Domingo-Enrich , Youssef Mroueh

分类： (统计)机器学习 | 机器学习

2021-10-07

着名的工作系列（Barron，1993; Bresiman，1993; Klusowski＆Barron，2018）提供了宽度$ N $的界限，所需的relu两层神经网络需要近似函数$ f $超过球。 \ mathcal {b} _r（\ mathbb {r} ^ d）$最终$ \ epsilon $，当傅立叶的数量$ c_f = \ frac {1} {（2 \ pi）^ {d / 2}} \ int _ {\ mathbb {r} ^ d} \ | \ xi \ | ^ 2 | \ hat {f}（\ xi）| \ d \ xi $是有限的。最近ongie等。（2019）将Radon变换用作分析无限宽度Relu两层网络的工具。特别是，他们介绍了基于氡的$ \ mathcal {r} $ - norms的概念，并显示$ \ mathbb {r} ^ d $上定义的函数可以表示为无限宽度的双层神经网络如果只有在$ \ mathcal {r} $ - norm是有限的。在这项工作中，我们扩展了Ongie等人的框架。（2019）并定义类似的基于氡的半规范（$ \ mathcal {r}，\ mathcal {r} $ - norms），使得函数承认在有界开放式$ \ mathcal上的无限宽度神经网络表示{ u} \ subseteq \ mathbb {r} ^ d $当它$ \ mathcal {r}时，\ mathcal {u} $ - norm是有限的。建立在这方面，我们派生稀疏（有限宽度）神经网络近似界，其优化Breiman（1993）; Klusowski＆Barron（2018）。最后，我们表明有限开放集的无限宽度神经网络表示不是唯一的，并研究其结构，提供模式连接的功能视图。

translated by 谷歌翻译

Targeted Separation and Convergence with Kernel Discrepancies

Alessandro Barp , Carl-Johann Simon-Gabriel , Mark Girolami , Lester Mackey

分类： (统计)机器学习 | 机器学习

2022-09-26

最大平均差异（MMD）（例如内核Stein差异（KSD））已成为广泛应用的中心，包括假设测试，采样器选择，分布近似和变异推断。在每种情况下，这些基于内核的差异度量都需要（i）（i）将目标p与其他概率度量分开，甚至（ii）控制弱收敛到P。在本文中，我们得出了新的足够和必要的条件，以确保（i）（ii）。对于可分开的度量空间上的MMD，我们表征了那些将BOCHNER嵌入量度分开的内核，并引入了简单条件，以将所有措施用无限的内核分开，并控制与有界内核的收敛。我们在$ \ mathbb {r}^d $上使用这些结果来实质性地扩大了KSD分离和收敛控制的已知条件，并开发了已知的第一个KSD，以恰好将弱收敛到P。我们的假设检验，测量和改善样本质量以及用Stein变异梯度下降进行抽样的结果。

translated by 谷歌翻译

Sparsest Univariate Learning Models Under Lipschitz Constraint

Shayan Aziznejad , Thomas Debarre , Michael Unser

分类：机器学习

2021-12-27

除了预测误差的最小化之外，回归方案的两个最期望的性质是稳定性和解释性。由这些原则驱动，我们提出了连续域配方进行一维回归问题。在我们的第一种方法中，我们使用Lipschitz常数作为规范器，这导致了解学习映射的整体稳健性的调整。在我们的第二种方法中，我们使用用户定义的上限和使用稀疏性常规程序来控制Lipschitz常数，以便更简单地支持（以及因此，更可取的可解释）的解决方案。后者制剂的理论研究部分地通过其证明的等效性，利用整流线性单元（Relu）激活和重量衰减，训练Lipschitz受约束的两层单变量神经网络。通过证明代表定理，我们表明这两个问题都承认是连续和分段线性（CPWL）功能的全局最小值。此外，我们提出了高效的算法，该算法找到了每个问题的稀疏解决方案：具有最少数量的线性区域的CPWL映射。最后，我们在数字上说明了我们的配方的结果。

translated by 谷歌翻译

Controlling Wasserstein distances by Kernel norms with application to Compressive Statistical Learning

Titouan Vayer , Rémi Gribonval

分类： (统计)机器学习 | 机器学习

2021-12-01

比较概率分布是许多机器学习算法的关键。最大平均差异（MMD）和最佳运输距离（OT）是在过去几年吸引丰富的关注的概率措施之间的两类距离。本文建立了一些条件，可以通过MMD规范控制Wassersein距离。我们的作品受到压缩统计学习（CSL）理论的推动，资源有效的大规模学习的一般框架，其中训练数据总结在单个向量（称为草图）中，该训练数据捕获与所考虑的学习任务相关的信息。在CSL中的现有结果启发，我们介绍了H \“较旧的较低限制的等距属性（H \”较旧的LRIP）并表明这家属性具有有趣的保证对压缩统计学习。基于MMD与Wassersein距离之间的关系，我们通过引入和研究学习任务的Wassersein可读性的概念来提供压缩统计学习的保证，即概率分布之间的某些特定于特定的特定度量，可以由Wassersein界定距离。

translated by 谷歌翻译

The universal approximation theorem for complex-valued neural networks

Felix Voigtlaender

分类：机器学习 | (统计)机器学习

2020-12-06

We generalize the classical universal approximation theorem for neural networks to the case of complex-valued neural networks. Precisely, we consider feedforward networks with a complex activation function $\sigma : \mathbb{C} \to \mathbb{C}$ in which each neuron performs the operation $\mathbb{C}^N \to \mathbb{C}, z \mapsto \sigma(b + w^T z)$ with weights $w \in \mathbb{C}^N$ and a bias $b \in \mathbb{C}$, and with $\sigma$ applied componentwise. We completely characterize those activation functions $\sigma$ for which the associated complex networks have the universal approximation property, meaning that they can uniformly approximate any continuous function on any compact subset of $\mathbb{C}^d$ arbitrarily well. Unlike the classical case of real networks, the set of "good activation functions" which give rise to networks with the universal approximation property differs significantly depending on whether one considers deep networks or shallow networks: For deep networks with at least two hidden layers, the universal approximation property holds as long as $\sigma$ is neither a polynomial, a holomorphic function, or an antiholomorphic function. Shallow networks, on the other hand, are universal if and only if the real part or the imaginary part of $\sigma$ is not a polyharmonic function.

translated by 谷歌翻译

Interpolation with the polynomial kernels

Giacomo Elefante , Wolfgang Erb , Francesco Marchetti , Emma Perracchione , Davide Poggiali , Gabriele Santin

分类：机器学习

2022-12-15

The polynomial kernels are widely used in machine learning and they are one of the default choices to develop kernel-based classification and regression models. However, they are rarely used and considered in numerical analysis due to their lack of strict positive definiteness. In particular they do not enjoy the usual property of unisolvency for arbitrary point sets, which is one of the key properties used to build kernel-based interpolation methods. This paper is devoted to establish some initial results for the study of these kernels, and their related interpolation algorithms, in the context of approximation theory. We will first prove necessary and sufficient conditions on point sets which guarantee the existence and uniqueness of an interpolant. We will then study the Reproducing Kernel Hilbert Spaces (or native spaces) of these kernels and their norms, and provide inclusion relations between spaces corresponding to different kernel parameters. With these spaces at hand, it will be further possible to derive generic error estimates which apply to sufficiently smooth functions, thus escaping the native space. Finally, we will show how to employ an efficient stable algorithm to these kernels to obtain accurate interpolants, and we will test them in some numerical experiment. After this analysis several computational and theoretical aspects remain open, and we will outline possible further research directions in a concluding section. This work builds some bridges between kernel and polynomial interpolation, two topics to which the authors, to different extents, have been introduced under the supervision or through the work of Stefano De Marchi. For this reason, they wish to dedicate this work to him in the occasion of his 60th birthday.

translated by 谷歌翻译

Characteristic kernels on Hilbert spaces, Banach spaces, and on sets of measures

Johanna Ziegel , David Ginsbourger , Lutz Dümbgen

分类： (统计)机器学习 | 机器学习

2022-06-15

我们在非标准空间上介绍了积极的确定核的新类别，这些空间完全是严格的确定性或特征。特别是，我们讨论了可分离的希尔伯特空间上的径向内核，并在Banach空间和强型负类型的度量空间上引入了广泛的内核。一般结果用于在可分离的$ l^p $空间和一组措施上提供明确的核类。

translated by 谷歌翻译

On the Stability Properties and the Optimization Landscape of Training Problems with Squared Loss for Neural Networks and General Nonlinear Conic Approximation Schemes

Constantin Christof

分类：机器学习

2020-11-06

我们研究了神经网络中平方损耗训练问题的优化景观和稳定性，但通用非线性圆锥近似方案。据证明，如果认为非线性圆锥近似方案是（以适当定义的意义）比经典线性近似方法更具表现力，并且如果存在不完美的标签向量，则在方位损耗的训练问题必须在其中不稳定感知其解决方案集在训练数据中的标签向量上不连续地取决于标签向量。我们进一步证明对这些不稳定属性负责的效果也是马鞍点出现的原因和杂散的局部最小值，这可能是从全球解决方案的任意遥远的，并且既不训练问题也不是训练问题的不稳定性通常，杂散局部最小值的存在可以通过向目标函数添加正则化术语来克服衡量近似方案中参数大小的目标函数。无论可实现的可实现性是否满足，后一种结果都被证明是正确的。我们表明，我们的分析特别适用于具有可变宽度的自由结插值方案和深层和浅层神经网络的培训问题，其涉及各种激活功能的任意混合（例如，二进制，六骨，Tanh，arctan，软标志， ISRU，Soft-Clip，SQNL，Relu，Lifley Relu，Soft-Plus，Bent Identity，Silu，Isrlu和ELU）。总之，本文的发现说明了神经网络和一般非线性圆锥近似仪器的改进近似特性以直接和可量化的方式与必须解决的优化问题的不期望的性质链接，以便训练它们。

translated by 谷歌翻译

Ensemble forecasts in reproducing kernel Hilbert space family: dynamical systems in Wonderland

Bérenger Hug , Etienne Memin , Gilles Tissot

分类：机器学习

2022-07-29

提出了用于基于合奏的估计和模拟高维动力系统（例如海洋或大气流）的方法学框架。为此，动态系统嵌入了一个由动力学驱动的内核功能的繁殖核Hilbert空间的家族中。这个家庭因其吸引人的财产而被昵称为仙境。在梦游仙境中，Koopman和Perron-Frobenius操作员是统一且均匀的。该属性保证它们可以在一系列可对角线的无限发电机中表达。访问Lyapunov指数和切线线性动力学的精确集合表达式也可以直接可用。仙境使我们能够根据轨迹样本的恒定时间线性组合来设计出惊人的简单集合数据同化方法。通过几个基本定理的完全合理的叠加原则，使这种令人尴尬的简单策略成为可能。

translated by 谷歌翻译

Sharp Bounds on the Approximation Rates, Metric Entropy, and $n$-widths of Shallow Neural Networks

Jonathan W. Siegel , Jinchao Xu

分类： (统计)机器学习 | 机器学习

2021-01-29

在本文中，我们研究了与具有多种激活函数的浅神经网络相对应的变异空间的近似特性。我们介绍了两个主要工具，用于估计这些空间的度量熵，近似率和$ n $宽度。首先，我们介绍了平滑参数化词典的概念，并在非线性近似速率，度量熵和$ n $ widths上给出了上限。上限取决于参数化的平滑度。该结果适用于与浅神经网络相对应的脊功能的字典，并且在许多情况下它们的现有结果改善了。接下来，我们提供了一种方法，用于下限度量熵和$ n $ widths的变化空间，其中包含某些类别的山脊功能。该结果给出了$ l^2 $ approximation速率，度量熵和$ n $ widths的变化空间的急剧下限具有界变化的乙状结激活函数。

translated by 谷歌翻译

Handling Hard Affine SDP Shape Constraints in RKHSs

Pierre-Cyril Aubin-Frankowski , Zoltan Szabo

分类： (统计)机器学习 | 机器学习

2021-01-05

形状约束，例如非负，单调性，凸度或超模型性，在机器学习和统计的各种应用中都起着关键作用。但是，将此方面的信息以艰苦的方式（例如，在间隔的所有点）纳入预测模型，这是一个众所周知的具有挑战性的问题。我们提出了一个统一和模块化的凸优化框架，依赖于二阶锥（SOC）拧紧，以编码属于矢量值重现的载体内核Hilbert Spaces（VRKHSS）的模型对函数衍生物的硬仿射SDP约束。所提出的方法的模块化性质允许同时处理多个形状约束，并将无限数量的约束限制为有限的许多。我们证明了所提出的方案的收敛及其自适应变体的收敛性，利用VRKHSS的几何特性。由于基于覆盖的拧紧构造，该方法特别适合具有小到中等输入维度的任务。该方法的效率在形状优化，机器人技术和计量经济学的背景下进行了说明。

translated by 谷歌翻译

Deep learning architectures for nonlinear operator functions and nonlinear inverse problems

Maarten V. de Hoop , Matti Lassas , Christopher A. Wong

分类：机器学习

2019-12-23

我们为特殊神经网络架构，称为运营商复发性神经网络的理论分析，用于近似非线性函数，其输入是线性运算符。这些功能通常在解决方案算法中出现用于逆边值问题的问题。传统的神经网络将输入数据视为向量，因此它们没有有效地捕获与对应于这种逆问题中的数据的线性运算符相关联的乘法结构。因此，我们介绍一个类似标准的神经网络架构的新系列，但是输入数据在向量上乘法作用。由较小的算子出现在边界控制中的紧凑型操作员和波动方程的反边值问题分析，我们在网络中的选择权重矩阵中促进结构和稀疏性。在描述此架构后，我们研究其表示属性以及其近似属性。我们还表明，可以引入明确的正则化，其可以从所述逆问题的数学分析导出，并导致概括属性上的某些保证。我们观察到重量矩阵的稀疏性改善了概括估计。最后，我们讨论如何将运营商复发网络视为深度学习模拟，以确定诸如用于从边界测量的声波方程中重建所未知的WAVESTED的边界控制的算法算法。

translated by 谷歌翻译

Optimal transport map estimation in general function spaces

Vincent Divol , Jonathan Niles-Weed , Aram-Alexandre Pooladian

分类： (统计)机器学习

2022-12-07

We consider the problem of estimating the optimal transport map between a (fixed) source distribution $P$ and an unknown target distribution $Q$, based on samples from $Q$. The estimation of such optimal transport maps has become increasingly relevant in modern statistical applications, such as generative modeling. At present, estimation rates are only known in a few settings (e.g. when $P$ and $Q$ have densities bounded above and below and when the transport map lies in a H\"older class), which are often not reflected in practice. We present a unified methodology for obtaining rates of estimation of optimal transport maps in general function spaces. Our assumptions are significantly weaker than those appearing in the literature: we require only that the source measure $P$ satisfies a Poincar\'e inequality and that the optimal map be the gradient of a smooth convex function that lies in a space whose metric entropy can be controlled. As a special case, we recover known estimation rates for bounded densities and H\"older transport maps, but also obtain nearly sharp results in many settings not covered by prior work. For example, we provide the first statistical rates of estimation when $P$ is the normal distribution and the transport map is given by an infinite-width shallow neural network.

translated by 谷歌翻译