智能论文笔记

Multivariate Trend Filtering for Lattice Data

Veeranjaneyulu Sadhanala , Yu-Xiang Wang , Addison J. Hu , Ryan J. Tibshirani

分类： (统计)机器学习 | 机器学习

2021-12-29

我们研究了趋势过滤的多元版本，称为Kronecker趋势过滤或KTF，因为设计点以$ D $维度形成格子。 KTF是单变量趋势过滤的自然延伸（Steidl等，2006; Kim等人，2009; Tibshirani，2014），并通过最大限度地减少惩罚最小二乘问题，其罚款术语总和绝对（高阶）沿每个坐标方向估计参数的差异。相应的惩罚运算符可以编写单次趋势过滤惩罚运营商的Kronecker产品，因此名称Kronecker趋势过滤。等效，可以在$ \ ell_1 $ -penalized基础回归问题上查看KTF，其中基本功能是下降阶段函数的张量产品，是一个分段多项式（离散样条）基础，基于单变量趋势过滤。本文是Sadhanala等人的统一和延伸结果。（2016,2017）。我们开发了一套完整的理论结果，描述了$ k \ grone 0 $和$ d \ geq 1 $的$ k ^ {\ mathrm {th}} $ over kronecker趋势过滤的行为。这揭示了许多有趣的现象，包括KTF在估计异构平滑的功能时KTF的优势，并且在$ d = 2（k + 1）$的相位过渡，一个边界过去（在高维对 - 光滑侧）线性泡沫不能完全保持一致。我们还利用Tibshirani（2020）的离散花键来利用最近的结果，特别是离散的花键插值结果，使我们能够将KTF估计扩展到恒定时间内的任何偏离晶格位置（与晶格数量的大小无关）。

translated by 谷歌翻译

The Voronoigram: Minimax Estimation of Bounded Variation Functions From Scattered Data

Addison J. Hu , Alden Green , Ryan J. Tibshirani

分类： (统计)机器学习 | 机器学习

2022-12-30

We consider the problem of estimating a multivariate function $f_0$ of bounded variation (BV), from noisy observations $y_i = f_0(x_i) + z_i$ made at random design points $x_i \in \mathbb{R}^d$, $i=1,\ldots,n$. We study an estimator that forms the Voronoi diagram of the design points, and then solves an optimization problem that regularizes according to a certain discrete notion of total variation (TV): the sum of weighted absolute differences of parameters $\theta_i,\theta_j$ (which estimate the function values $f_0(x_i),f_0(x_j)$) at all neighboring cells $i,j$ in the Voronoi diagram. This is seen to be equivalent to a variational optimization problem that regularizes according to the usual continuum (measure-theoretic) notion of TV, once we restrict the domain to functions that are piecewise constant over the Voronoi diagram. The regression estimator under consideration hence performs (shrunken) local averaging over adaptively formed unions of Voronoi cells, and we refer to it as the Voronoigram, following the ideas in Koenker (2005), and drawing inspiration from Tukey's regressogram (Tukey, 1961). Our contributions in this paper span both the conceptual and theoretical frontiers: we discuss some of the unique properties of the Voronoigram in comparison to TV-regularized estimators that use other graph-based discretizations; we derive the asymptotic limit of the Voronoi TV functional; and we prove that the Voronoigram is minimax rate optimal (up to log factors) for estimating BV functions that are essentially bounded.

translated by 谷歌翻译

Minimax Optimal Regression over Sobolev Spaces via Laplacian Eigenmaps on Neighborhood Graphs

Alden Green , Sivaraman Balakrishnan , Ryan J. Tibshirani

分类： (统计)机器学习

2021-11-14

本文研究了基于Laplacian Eigenmaps（Le）的基于Laplacian EIGENMAPS（PCR-LE）的主要成分回归的统计性质，这是基于Laplacian Eigenmaps（Le）的非参数回归的方法。 PCR-LE通过投影观察到的响应的向量$ {\ bf y} =（y_1，\ ldots，y_n）$ to to changbood图表拉普拉斯的某些特征向量跨越的子空间。我们表明PCR-Le通过SoboLev空格实现了随机设计回归的最小收敛速率。在设计密度$ P $的足够平滑条件下，PCR-le达到估计的最佳速率（其中已知平方$ l ^ 2 $ norm的最佳速率为$ n ^ { - 2s /（2s + d））} $）和健美的测试（$ n ^ { - 4s /（4s + d）$）。我们还表明PCR-LE是\ EMPH {歧管Adaptive}：即，我们考虑在小型内在维度$ M $的歧管上支持设计的情况，并为PCR-LE提供更快的界限Minimax估计（$ n ^ { - 2s /（2s + m）$）和测试（$ n ^ { - 4s /（4s + m）$）收敛率。有趣的是，这些利率几乎总是比图形拉普拉斯特征向量的已知收敛率更快;换句话说，对于这个问题的回归估计的特征似乎更容易，统计上讲，而不是估计特征本身。我们通过经验证据支持这些理论结果。

translated by 谷歌翻译

A Cross Validation framework for Signal Denoising with Applications to Trend Filtering, Dyadic CART and Beyond

Anamitra Chaudhuri , Sabyasachi Chatterjee

分类： (统计)机器学习

2022-01-07

本文为信号去噪提供了一般交叉验证框架。然后将一般框架应用于非参数回归方法，例如趋势过滤和二元推车。然后显示所得到的交叉验证版本以获得最佳调谐的类似物所熟知的几乎相同的收敛速度。没有任何先前的趋势过滤或二元推车的理论分析。为了说明框架的一般性，我们还提出并研究了两个基本估算器的交叉验证版本;套索用于高维线性回归和矩阵估计的奇异值阈值阈值。我们的一般框架是由Chatterjee和Jafarov（2015）的想法的启发，并且可能适用于使用调整参数的广泛估算方法。

translated by 谷歌翻译

MARS via LASSO

Dohyeong Ki , Billy Fang , Adityanand Guntuboyina

分类： (统计)机器学习

2021-11-23

火星是1991年弗里德曼引入的非参数回归的流行方法。火星适合回归数据的简单非线性和非添加功能。我们提出并研究了火星方法的自然套索变体。我们的方法基于通过考虑MARS中的功能的无限维线性组合而获得的凸类功能的最小二乘估计，并施加基于变化的复杂性约束。我们表明我们的估计器可以通过有限维凸优化来计算，并且基于平滑度约束自然地连接到非参数函数估计技术。在一个简单的设计假设下，我们证明了我们的估算仪实现了一定程度上仅依赖于对数的收敛速度，从而在一定程度上避免了通常的维度诅咒。我们使用交叉验证方案实现了用于选择所涉及的调谐参数的方法，并显示与仿真和实际数据设置中的通常的MARS方法相比具有良好的性能。

translated by 谷歌翻译

On lower bounds for the bias-variance trade-off

Alexis Derumigny , Johannes Schmidt-Hieber

分类： (统计)机器学习

2020-05-30

对于高维和非参数统计模型，速率最优估计器平衡平方偏差和方差是一种常见的现象。虽然这种平衡被广泛观察到，但很少知道是否存在可以避免偏差和方差之间的权衡的方法。我们提出了一般的策略，以获得对任何估计方差的下限，偏差小于预先限定的界限。这表明偏差差异折衷的程度是不可避免的，并且允许量化不服从其的方法的性能损失。该方法基于许多抽象的下限，用于涉及关于不同概率措施的预期变化以及诸如Kullback-Leibler或Chi-Sque-diversence的信息措施的变化。其中一些不平等依赖于信息矩阵的新概念。在该物品的第二部分中，将抽象的下限应用于几种统计模型，包括高斯白噪声模型，边界估计问题，高斯序列模型和高维线性回归模型。对于这些特定的统计应用，发生不同类型的偏差差异发生，其实力变化很大。对于高斯白噪声模型中集成平方偏置和集成方差之间的权衡，我们将较低界限的一般策略与减少技术相结合。这允许我们将原始问题与估计的估计器中的偏差折衷联动，以更简单的统计模型中具有额外的对称性属性。在高斯序列模型中，发生偏差差异的不同相位转换。虽然偏差和方差之间存在非平凡的相互作用，但是平方偏差的速率和方差不必平衡以实现最小估计速率。

translated by 谷歌翻译

Bless and curse of smoothness and phase transitions in nonparametric regressions: a nonasymptotic perspective

Ying Zhu

分类：机器学习

2021-12-07

当回归函数属于标准的平滑类时，由衍生物的单变量函数组成，衍生物到达$（\ gamma + 1）$ th由Action Anclople或Ae界定的常见常数，众所周知，最小的收敛速率均值平均错误（MSE）是$ \左（\ FRAC {\ SIGMA ^ {2}} {n} \右）^ {\ frac {2 \ gamma + 2} {2 \ gamma + 3}} $ \伽玛$是有限的，样本尺寸$ n \ lightarrow \ idty $。从一个不可思议的观点来看，考虑有限$ N $，本文显示：对于旧的H \“较旧的和SoboLev类，最低限度最佳速率是$ \ frac {\ sigma ^ {2} \ left（\ gamma \ vee1 \右）$ \ frac {n} {\ sigma ^ {2}} \ precsim \ left（\ gamma \ vee1 \右）^ {2 \ gamma + 3} $和$ \ left（\ frac {\ sigma ^ {2}} {n} \右）^ {\ frac {2 \ gamma + 2} $ \ r \ frac {n} {\ sigma ^ {2}}} \ succsim \ left（\ gamma \ vee1 \右）^ {2 \ gamma + 3} $。为了建立这些结果，我们在覆盖和覆盖号码上获得上下界限，以获得$ k的广义H \“较旧的班级$ th（$ k = 0，...，\ gamma $）衍生物由上面的参数$ r_ {k} $和$ \ gamma $ th衍生物是$ r _ {\ gamma + 1} - $ lipschitz （以及广义椭圆形的平滑功能）。我们的界限锐化了标准类的古典度量熵结果，并赋予$ \ gamma $和$ r_ {k} $的一般依赖。通过在$ r_ {k} = 1 $以下派生MIMIMAX最佳MSE率，$ r_ {k} \ LEQ \ left（k-1 \右）！$和$ r_ {k} = k！$（与后两个在我们的介绍中有动机的情况）在我们的新熵界的帮助下，我们展示了一些有趣的结果，无法在文献中的现有熵界显示。对于H \“较旧的$ D-$变化函数，我们的结果表明，归一渐近率$ \左（\ frac {\ sigma ^ {2}} {n}右）^ {\ frac {2 \ Gamma + 2} {2 \ Gamma + 2 + D}} $可能是有限样本中的MSE低估。

translated by 谷歌翻译

The Lasso with general Gaussian designs with applications to hypothesis testing

Michael Celentano , Andrea Montanari , Yuting Wei

分类：机器学习 | (统计)机器学习

2020-07-27

套索是一种高维回归的方法，当时，当协变量$ p $的订单数量或大于观测值$ n $时，通常使用它。由于两个基本原因，经典的渐近态性理论不适用于该模型：$（1）$正规风险是非平滑的； $（2）$估算器$ \ wideHat {\ boldsymbol {\ theta}} $与true参数vector $ \ boldsymbol {\ theta}^*$无法忽略。结果，标准的扰动论点是渐近正态性的传统基础。另一方面，套索估计器可以精确地以$ n $和$ p $大，$ n/p $的订单为一。这种表征首先是在使用I.I.D的高斯设计的情况下获得的。协变量：在这里，我们将其推广到具有非偏差协方差结构的高斯相关设计。这是根据更简单的``固定设计''模型表示的。我们在两个模型中各种数量的分布之间的距离上建立了非反应界限，它们在合适的稀疏类别中均匀地固定在信号上$ \ boldsymbol {\ theta}^*$。作为应用程序，我们研究了借助拉索的分布，并表明需要校正程度对于计算有效的置信区间是必要的。

translated by 谷歌翻译

The Projected Covariance Measure for assumption-lean variable significance testing

Anton Rask Lundborg , Ilmun Kim , Rajen D. Shah , Richard J. Samworth

分类： (统计)机器学习

2022-11-03

Testing the significance of a variable or group of variables $X$ for predicting a response $Y$, given additional covariates $Z$, is a ubiquitous task in statistics. A simple but common approach is to specify a linear model, and then test whether the regression coefficient for $X$ is non-zero. However, when the model is misspecified, the test may have poor power, for example when $X$ is involved in complex interactions, or lead to many false rejections. In this work we study the problem of testing the model-free null of conditional mean independence, i.e. that the conditional mean of $Y$ given $X$ and $Z$ does not depend on $X$. We propose a simple and general framework that can leverage flexible nonparametric or machine learning methods, such as additive models or random forests, to yield both robust error control and high power. The procedure involves using these methods to perform regressions, first to estimate a form of projection of $Y$ on $X$ and $Z$ using one half of the data, and then to estimate the expected conditional covariance between this projection and $Y$ on the remaining half of the data. While the approach is general, we show that a version of our procedure using spline regression achieves what we show is the minimax optimal rate in this nonparametric testing problem. Numerical experiments demonstrate the effectiveness of our approach both in terms of maintaining Type I error control, and power, compared to several existing approaches.

translated by 谷歌翻译

Off-policy estimation of linear functionals: Non-asymptotic theory for semi-parametric efficiency

Wenlong Mou , Martin J. Wainwright , Peter L. Bartlett

分类： (统计)机器学习

2022-09-26

在因果推理和强盗文献中，基于观察数据的线性功能估算线性功能的问题是规范的。我们分析了首先估计治疗效果函数的广泛的两阶段程序，然后使用该数量来估计线性功能。我们证明了此类过程的均方误差上的非反应性上限：这些边界表明，为了获得非反应性最佳程序，应在特定加权$ l^2 $中最大程度地估算治疗效果的误差。 -规范。我们根据该加权规范的约束回归分析了两阶段的程序，并通过匹配非轴突局部局部最小值下限，在有限样品中建立了实例依赖性最优性。这些结果表明，除了取决于渐近效率方差之外，最佳的非质子风险除了取决于样本量支持的最富有函数类别的真实结果函数与其近似类别之间的加权规范距离。

translated by 谷歌翻译

Optimal and instance-dependent guarantees for Markovian linear stochastic approximation

Wenlong Mou , Ashwin Pananjady , Martin J. Wainwright , Peter L. Bartlett

分类：机器学习 | (统计)机器学习

2021-12-23

我们研究了随机近似程序，以便基于观察来自ergodic Markov链的长度$ n $的轨迹来求近求解$ d -dimension的线性固定点方程。我们首先表现出$ t _ {\ mathrm {mix}} \ tfrac {n}} \ tfrac {n}} \ tfrac {d}} \ tfrac {d} {n} $的非渐近性界限。$ t _ {\ mathrm {mix $是混合时间。然后，我们证明了一种在适当平均迭代序列上的非渐近实例依赖性，具有匹配局部渐近最小的限制的领先术语，包括对参数$的敏锐依赖（d，t _ {\ mathrm {mix}}） $以高阶术语。我们将这些上限与非渐近Minimax的下限补充，该下限是建立平均SA估计器的实例 - 最优性。我们通过Markov噪声的政策评估导出了这些结果的推导 - 覆盖了所有$ \ lambda \中的TD（$ \ lambda $）算法，以便[0,1）$ - 和线性自回归模型。我们的实例依赖性表征为HyperParameter调整的细粒度模型选择程序的设计开放了门（例如，在运行TD（$ \ Lambda $）算法时选择$ \ lambda $的值）。

translated by 谷歌翻译

Sharp Bounds on the Approximation Rates, Metric Entropy, and $n$-widths of Shallow Neural Networks

Jonathan W. Siegel , Jinchao Xu

分类： (统计)机器学习 | 机器学习

2021-01-29

在本文中，我们研究了与具有多种激活函数的浅神经网络相对应的变异空间的近似特性。我们介绍了两个主要工具，用于估计这些空间的度量熵，近似率和$ n $宽度。首先，我们介绍了平滑参数化词典的概念，并在非线性近似速率，度量熵和$ n $ widths上给出了上限。上限取决于参数化的平滑度。该结果适用于与浅神经网络相对应的脊功能的字典，并且在许多情况下它们的现有结果改善了。接下来，我们提供了一种方法，用于下限度量熵和$ n $ widths的变化空间，其中包含某些类别的山脊功能。该结果给出了$ l^2 $ approximation速率，度量熵和$ n $ widths的变化空间的急剧下限具有界变化的乙状结激活函数。

translated by 谷歌翻译

Finding structure with randomness: Probabilistic algorithms for constructing approximate matrix decompositions

Nathan Halko , Per-Gunnar Martinsson , Joel A. Tropp

分类：

2009-09-22

Low-rank matrix approximations, such as the truncated singular value decomposition and the rank-revealing QR decomposition, play a central role in data analysis and scientific computing. This work surveys and extends recent research which demonstrates that randomization offers a powerful tool for performing low-rank matrix approximation. These techniques exploit modern computational architectures more fully than classical methods and open the possibility of dealing with truly massive data sets.This paper presents a modular framework for constructing randomized algorithms that compute partial matrix decompositions. These methods use random sampling to identify a subspace that captures most of the action of a matrix. The input matrix is then compressed-either explicitly or implicitly-to this subspace, and the reduced matrix is manipulated deterministically to obtain the desired low-rank factorization. In many cases, this approach beats its classical competitors in terms of accuracy, speed, and robustness. These claims are supported by extensive numerical experiments and a detailed error analysis.The specific benefits of randomized techniques depend on the computational environment. Consider the model problem of finding the k dominant components of the singular value decomposition of an m × n matrix. (i) For a dense input matrix, randomized algorithms require O(mn log(k)) floating-point operations (flops) in contrast with O(mnk) for classical algorithms. (ii) For a sparse input matrix, the flop count matches classical Krylov subspace methods, but the randomized approach is more robust and can easily be reorganized to exploit multi-processor architectures. (iii) For a matrix that is too large to fit in fast memory, the randomized techniques require only a constant number of passes over the data, as opposed to O(k) passes for classical algorithms. In fact, it is sometimes possible to perform matrix approximation with a single pass over the data.

translated by 谷歌翻译

Error analysis for denoising smooth modulo signals on a graph

Hemant Tyagi

分类： (统计)机器学习

2020-09-10

在许多应用中，我们获得了流畅的函数的嘈杂模态样本的访问，其目标是鲁棒地解开样本，即估计该功能的原始样本。在最近的工作中，Cucuringu和Tyagi通过首先将它们代表在单元复杂圆上，然后解决平滑度规则化最小二乘问题 - Laplacian的平滑度适用的Proximity Graph的平滑度$ G $ - ON单位圆的产品歧管。这个问题是二次受约束的二次程序（QCQP），其是非凸显的，因此提出解决其球形放松导致信任区域子问题（TRS）。就理论担保而言，派生$ \ ell_2 $错误界限（trs）。然而，这些界限通常弱，并且没有真正证明由（TRS）进行的去噪。在这项工作中，我们分析（TRS）以及（QCQP）的不受约束的放松。对于这些估算器，我们在高斯噪声的设置中提供了一种精致的分析，并导出了噪音制度，其中他们可否证明模数观察W.R.T $ \ ell_2 $常规。分析在$ G $是任何连接的图形中的常规设置中进行。

translated by 谷歌翻译

Is Monte Carlo a bad sampling strategy for learning smooth functions in high dimensions?

Ben Adcock , Simone Brugiapaglia

分类：机器学习

2022-08-18

本文涉及使用多项式的有限样品的平滑，高维函数的近似。这项任务是计算科学和工程中许多应用的核心 - 尤其是由参数建模和不确定性量化引起的。通常在此类应用中使用蒙特卡洛（MC）采样，以免屈服于维度的诅咒。但是，众所周知，这种策略在理论上是最佳的。尺寸$ n $有许多多项式空间，样品复杂度尺度划分为$ n $。这种有据可查的现象导致了一致的努力，以设计改进的，实际上是近乎最佳的策略，其样本复杂性是线性的，甚至线性地缩小了$ n $。自相矛盾的是，在这项工作中，我们表明MC实际上是高维度中的一个非常好的策略。我们首先通过几个数值示例记录了这种现象。接下来，我们提出一个理论分析，该分析能够解决这种悖论，以实现无限多变量的全体形态功能。我们表明，基于$ M $ MC样本的最小二乘方案，其错误衰减为$ m/\ log（m）$，其速率与最佳$ n $ term的速率相同多项式近似。该结果是非构造性的，因为它假定了进行近似的合适多项式空间的知识。接下来，我们提出了一个基于压缩感应的方案，该方案达到了相同的速率，除了较大的聚类因子。该方案是实用的，并且在数值上，它的性能和比知名的自适应最小二乘方案的性能和更好。总体而言，我们的发现表明，当尺寸足够高时，MC采样非常适合平滑功能近似。因此，改进的采样策略的好处通常仅限于较低维度的设置。

translated by 谷歌翻译

Deep Learning meets Nonparametric Regression: Are Weight-Decayed DNNs Locally Adaptive?

Kaiqi Zhang , Yu-Xiang Wang

分类：机器学习 | (统计)机器学习

2022-04-20

我们从经典非参数回归问题的镜头研究神经网络（NN）的理论，重点是NN具有异质平滑度自适应估计功能的能力 - BESOV或有界变异（BV）类的功能属性。关于此问题的现有工作需要根据功能空间和样本量来调整NN体系结构。我们考虑了Deep Relu网络的“平行NN”变体，并表明标准重量衰减相当于促进端到端学习的系数向量的$ \ ell_p $ -sparsity（$ 0 <p <1 $）函数基础，即字典。使用这种等效性，我们进一步确定，仅通过调整权重衰减，这种平行的NN就可以任意接近BESOV和BV类的最小值率达到估计误差。值得注意的是，随着NN的深度，它呈指数级接近最佳。我们的研究为为什么深度重要以及NNS如何比内核方法更强大。

translated by 谷歌翻译

Deep learning architectures for nonlinear operator functions and nonlinear inverse problems

Maarten V. de Hoop , Matti Lassas , Christopher A. Wong

分类：机器学习

2019-12-23

我们为特殊神经网络架构，称为运营商复发性神经网络的理论分析，用于近似非线性函数，其输入是线性运算符。这些功能通常在解决方案算法中出现用于逆边值问题的问题。传统的神经网络将输入数据视为向量，因此它们没有有效地捕获与对应于这种逆问题中的数据的线性运算符相关联的乘法结构。因此，我们介绍一个类似标准的神经网络架构的新系列，但是输入数据在向量上乘法作用。由较小的算子出现在边界控制中的紧凑型操作员和波动方程的反边值问题分析，我们在网络中的选择权重矩阵中促进结构和稀疏性。在描述此架构后，我们研究其表示属性以及其近似属性。我们还表明，可以引入明确的正则化，其可以从所述逆问题的数学分析导出，并导致概括属性上的某些保证。我们观察到重量矩阵的稀疏性改善了概括估计。最后，我们讨论如何将运营商复发网络视为深度学习模拟，以确定诸如用于从边界测量的声波方程中重建所未知的WAVESTED的边界控制的算法算法。

translated by 谷歌翻译

Triangular Flows for Generative Modeling: Statistical Consistency, Smoothness Classes, and Fast Rates

Nicholas J. Irons , Meyer Scetbon , Soumik Pal , Zaid Harchaoui

分类： (统计)机器学习 | 机器学习

2021-12-31

三角形流量，也称为kn \“{o}的Rosenblatt测量耦合，包括用于生成建模和密度估计的归一化流模型的重要构建块，包括诸如实值的非体积保存变换模型的流行自回归流模型（真实的NVP）。我们提出了三角形流量统计模型的统计保证和样本复杂性界限。特别是，我们建立了KN的统计一致性和kullback-leibler估算器的rospblatt的kullback-leibler估计的有限样本会聚率使用实证过程理论的工具测量耦合。我们的结果突出了三角形流动下播放功能类的各向异性几何形状，优化坐标排序，并导致雅各比比流动的统计保证。我们对合成数据进行数值实验，以说明我们理论发现的实际意义。

translated by 谷歌翻译

Nonparametric regression using deep neural networks with ReLU activation function

Johannes Schmidt-Hieber

分类：

2017-08-22

Consider the multivariate nonparametric regression model. It is shown that estimators based on sparsely connected deep neural networks with ReLU activation function and properly chosen network architecture achieve the minimax rates of convergence (up to log nfactors) under a general composition assumption on the regression function. The framework includes many well-studied structural constraints such as (generalized) additive models. While there is a lot of flexibility in the network architecture, the tuning parameter is the sparsity of the network. Specifically, we consider large networks with number of potential network parameters exceeding the sample size. The analysis gives some insights into why multilayer feedforward neural networks perform well in practice. Interestingly, for ReLU activation function the depth (number of layers) of the neural network architectures plays an important role and our theory suggests that for nonparametric regression, scaling the network depth with the sample size is natural. It is also shown that under the composition assumption wavelet estimators can only achieve suboptimal rates.

translated by 谷歌翻译

High Dimensional Optimization through the Lens of Machine Learning

Felix Benning

分类： (统计)机器学习

2021-12-31

本文评价用机器学习问题的数值优化方法。由于机器学习模型是高度参数化的，我们专注于适合高维优化的方法。我们在二次模型上构建直觉，以确定哪种方法适用于非凸优化，并在凸函数上开发用于这种方法的凸起函数。随着随机梯度下降和动量方法的这种理论基础，我们试图解释为什么机器学习领域通常使用的方法非常成功。除了解释成功的启发式之外，最后一章还提供了对更多理论方法的广泛审查，这在实践中并不像惯例。所以在某些情况下，这项工作试图回答这个问题：为什么默认值中包含的默认TensorFlow优化器？

translated by 谷歌翻译