智能论文笔记

Online Estimation and Optimization of Utility-Based Shortfall Risk

Arvind S. Menon , Prashanth L. A. , Krishna Jagannathan

分类： (统计)机器学习 | 机器学习

2021-11-16

基于实用的缺点风险（UBSR）是一种风险指标，越来越受到金融应用中的流行，由于它享有的某些理想的属性。我们考虑在递归设置中估算UBSR的问题，其中来自潜在损耗分布的样本是一次性的。我们将UBSR估计问题作为根发现问题，并提出了基于随机近似的估计方案。我们在样本数量的估计误差中获得了非渐近界。我们还考虑在随机变量的参数化类中的UBSR优化问题。我们提出了一种用于UBSR优化的随机梯度下降算法，并导出其收敛性的非渐近界。

translated by 谷歌翻译

A Gradient Smoothed Functional Algorithm with Truncated Cauchy Random Perturbations for Stochastic Optimization

Akash Mondal , Prashanth L. A. , Shalabh Bhatnagar

分类：机器学习

2022-07-30

在本文中，我们提出了一种随机梯度算法，用于最大程度地减少对嘈杂成本样本的期望，而对于任何给定参数，则只观察到后者。我们的算法采用带有随机扰动的梯度估计方案，该方案是使用单位球体截断的cauchy分布形成的。我们分析了提出的梯度估计量的偏差和方差。发现我们的算法在目标函数是非凸且参数维度较高的情况下特别有用。从渐近收敛分析中，我们确定我们的算法几乎可以肯定地收敛到目标函数的固定点并获得渐近收敛速率。我们还表明，我们的算法避免了不稳定的平衡，这意味着与局部最小值的融合。此外，我们对我们的算法进行非反应收敛分析。特别是，我们在这里建立了一个非质子绑定，用于寻找非convex目标函数的$ \ epsilon $ stationary点。最后，我们通过模拟以数字方式证明我们的算法的性能在一些非凸面设置上优于GSF，SPSA和RDSA，并进一步验证其在凸（NOISY）目标上的性能。

translated by 谷歌翻译

Optimal variance-reduced stochastic approximation in Banach spaces

Wenlong Mou , Koulik Khamaru , Martin J. Wainwright , Peter L. Bartlett , Michael I. Jordan

分类：机器学习 | (统计)机器学习

2022-01-21

We study the problem of estimating the fixed point of a contractive operator defined on a separable Banach space. Focusing on a stochastic query model that provides noisy evaluations of the operator, we analyze a variance-reduced stochastic approximation scheme, and establish non-asymptotic bounds for both the operator defect and the estimation error, measured in an arbitrary semi-norm. In contrast to worst-case guarantees, our bounds are instance-dependent, and achieve the local asymptotic minimax risk non-asymptotically. For linear operators, contractivity can be relaxed to multi-step contractivity, so that the theory can be applied to problems like average reward policy evaluation problem in reinforcement learning. We illustrate the theory via applications to stochastic shortest path problems, two-player zero-sum Markov games, as well as policy evaluation and $Q$-learning for tabular Markov decision processes.

translated by 谷歌翻译

Risk-sensitive Reinforcement Learning via Distortion Risk Measures

Nithia Vijayan , Prashanth L. A

分类：机器学习

2021-07-09

我们通过失真风险度量（DRM）解决了风险敏感的增强学习（RL）环境中控制问题的问题。我们提出了策略梯度算法，该算法最大程度地提高了累积奖励的DRM，以在政策和损失的RL设置中进行情节的马尔可夫决策过程。我们采用两种不同的方法来设计政策梯度算法。在第一种方法中，我们得出了构成DRM目标的策略梯度定理的变体，并与基于可能的梯度估计方案结合使用该定理。在第二种方法中，我们从累积奖励的经验分布中估算了DRM，并使用此估计方案以及基于功能的平滑梯度估计方案。对于使用这两种方法的策略梯度算法，我们得出了非反应界限，这些界限将收敛建立到DRM目标的近似固定点。

translated by 谷歌翻译

Statistical inference with implicit SGD: proximal Robbins-Monro vs. Polyak-Ruppert

Yoonhyung Lee , Sungdong Lee , Joong-Ho Won

分类： (统计)机器学习 | 机器学习

2022-06-25

The implicit stochastic gradient descent (ISGD), a proximal version of SGD, is gaining interest in the literature due to its stability over (explicit) SGD. In this paper, we conduct an in-depth analysis of the two modes of ISGD for smooth convex functions, namely proximal Robbins-Monro (proxRM) and proximal Poylak-Ruppert (proxPR) procedures, for their use in statistical inference on model parameters. Specifically, we derive nonasymptotic point estimation error bounds of both proxRM and proxPR iterates and their limiting distributions, and propose on-line estimators of their asymptotic covariance matrices that require only a single run of ISGD. The latter estimators are used to construct valid confidence intervals for the model parameters. Our analysis is free of the generalized linear model assumption that has limited the preceding analyses, and employs feasible procedures. Our on-line covariance matrix estimators appear to be the first of this kind in the ISGD literature.* Equal contribution 1 Kakao Entertainment Corp.

translated by 谷歌翻译

Optimal and instance-dependent guarantees for Markovian linear stochastic approximation

Wenlong Mou , Ashwin Pananjady , Martin J. Wainwright , Peter L. Bartlett

分类：机器学习 | (统计)机器学习

2021-12-23

我们研究了随机近似程序，以便基于观察来自ergodic Markov链的长度$ n $的轨迹来求近求解$ d -dimension的线性固定点方程。我们首先表现出$ t _ {\ mathrm {mix}} \ tfrac {n}} \ tfrac {n}} \ tfrac {d}} \ tfrac {d} {n} $的非渐近性界限。$ t _ {\ mathrm {mix $是混合时间。然后，我们证明了一种在适当平均迭代序列上的非渐近实例依赖性，具有匹配局部渐近最小的限制的领先术语，包括对参数$的敏锐依赖（d，t _ {\ mathrm {mix}}） $以高阶术语。我们将这些上限与非渐近Minimax的下限补充，该下限是建立平均SA估计器的实例 - 最优性。我们通过Markov噪声的政策评估导出了这些结果的推导 - 覆盖了所有$ \ lambda \中的TD（$ \ lambda $）算法，以便[0,1）$ - 和线性自回归模型。我们的实例依赖性表征为HyperParameter调整的细粒度模型选择程序的设计开放了门（例如，在运行TD（$ \ Lambda $）算法时选择$ \ lambda $的值）。

translated by 谷歌翻译

Statistical Inference for Polyak-Ruppert Averaged Zeroth-order Stochastic Gradient Algorithm

Yanhao Jin , Tesi Xiao , Krishnakumar Balasubramanian

分类： (统计)机器学习 | 机器学习

2021-02-10

在关键的科学应用中，随着随机梯度算法培训的统计机器学习模型越来越多地部署。然而，在若干这样的应用中计算随机梯度是高度昂贵的甚至不可能。在这种情况下，使用衍生物或零顺序算法。迄今为止在统计机器学习文献中没有充分解决的一个重要问题是用实用又严谨的推理能力装备随机零顺序算法，以便我们不仅具有点估计或预测，而且还通过信心量化相关的不确定性间隔或集合。在这方面，在这项工作中，我们首先建立一个用于Polyak-ruppert平均随机零级梯度算法的中央极限定理。然后，我们提供出现在中央极限定理中的渐变协方差矩阵的在线估算，从而提供用于在零顺序设置中为参数估计（或预测）构建渐近有效的置信度（或间隔）的实际过程。

translated by 谷歌翻译

Stochastic optimization under distributional drift

Joshua Cutler , Dmitriy Drusvyatskiy , Zaid Harchaoui

分类：机器学习

2021-08-16

我们考虑最小化根据未知和可能随机动态发展的凸起功能的问题，这可以按时和在决策变量上共同依赖。在机器学习和信号处理文献中比比皆是，在概念漂移，随机跟踪和执行预测的名称下取比。我们为随机算法提供了新的非渐近融合保障，其具有迭代平均值，专注于期望和高概率有效。我们获得的效率估计明确地解除了优化误差，梯度噪声和时间漂移的贡献。值得注意的是，我们表明近端随机梯度方法的跟踪效率仅取决于配备步骤衰减计划时的初始化质量上的对数。数值实验说明了我们的结果。

translated by 谷歌翻译

Stochastic Halpern Iteration with Variance Reduction for Stochastic Monotone Inclusions

Xufeng Cai , Chaobing Song , Cristóbal Guzmán , Jelena Diakonikolas

分类：机器学习

2022-03-17

We study stochastic monotone inclusion problems, which widely appear in machine learning applications, including robust regression and adversarial learning. We propose novel variants of stochastic Halpern iteration with recursive variance reduction. In the cocoercive -- and more generally Lipschitz-monotone -- setup, our algorithm attains $\epsilon$ norm of the operator with $\mathcal{O}(\frac{1}{\epsilon^3})$ stochastic operator evaluations, which significantly improves over state of the art $\mathcal{O}(\frac{1}{\epsilon^4})$ stochastic operator evaluations required for existing monotone inclusion solvers applied to the same problem classes. We further show how to couple one of the proposed variants of stochastic Halpern iteration with a scheduled restart scheme to solve stochastic monotone inclusion problems with ${\mathcal{O}}(\frac{\log(1/\epsilon)}{\epsilon^2})$ stochastic operator evaluations under additional sharpness or strong monotonicity assumptions.

translated by 谷歌翻译

Online Statistical Inference for Stochastic Optimization via Gradient-free Kiefer-Wolfowitz Methods

Xi Chen , Zehua Lai , He Li , Yichen Zhang

分类： (统计)机器学习

2021-02-05

在本文中，我们通过随机搜索方向的Kiefer-Wolfowitz算法调查了随机优化问题模型参数的统计参数问题。我们首先介绍了Polyak-ruppert-veriving型Kiefer-Wolfowitz（AKW）估计器的渐近分布，其渐近协方差矩阵取决于函数查询复杂性和搜索方向的分布。分布结果反映了统计效率与函数查询复杂性之间的权衡。我们进一步分析了随机搜索方向的选择来最小化渐变协方差矩阵，并得出结论，最佳搜索方向取决于相对于Fisher信息矩阵的不同摘要统计的最优标准。根据渐近分布结果，我们通过提供两个有效置信区间的结构进行一次通过统计推理。我们提供了验证我们的理论结果的数值实验，并通过程序的实际效果。

translated by 谷歌翻译

Convergence and Complexity of Stochastic Block Majorization-Minimization

Hanbaek Lyu

分类：机器学习 | (统计)机器学习

2022-01-05

随机多变最小化 - 最小化（SMM）是大多数变化最小化的经典原则的在线延伸，这包括采样I.I.D。来自固定数据分布的数据点，并最小化递归定义的主函数的主要替代。在本文中，我们引入了随机块大大化 - 最小化，其中替代品现在只能块多凸，在半径递减内的时间优化单个块。在SMM中的代理人放松标准的强大凸起要求，我们的框架在内提供了更广泛的适用性，包括在线CANDECOMP / PARAFAC（CP）字典学习，并且尤其是当问题尺寸大时产生更大的计算效率。我们对所提出的算法提供广泛的收敛性分析，我们在可能的数据流下派生，放松标准i.i.d。对数据样本的假设。我们表明，所提出的算法几乎肯定会收敛于速率$ O（（\ log n）^ {1+ \ eps} / n ^ {1/2}）$的约束下的非凸起物镜的静止点集合。实证丢失函数和$ O（（\ log n）^ {1+ \ eps} / n ^ {1/4}）$的预期丢失函数，其中$ n $表示处理的数据样本数。在一些额外的假设下，后一趋同率可以提高到$ o（（\ log n）^ {1+ \ eps} / n ^ {1/2}）$。我们的结果为一般马尔维亚数据设置提供了各种在线矩阵和张量分解算法的第一融合率界限。

translated by 谷歌翻译

From inexact optimization to learning via gradient concentration

Bernhard Stankewitz , Nicole Mücke , Lorenzo Rosasco

分类： (统计)机器学习 | 机器学习

2021-06-09

在机器学习通常与优化通过训练数据定义实证目标的最小化交易。然而，学习的最终目的是尽量减少对未来的数据错误（测试误差），为此，训练数据只提供部分信息。这种观点认为，是实际可行的优化问题是基于不准确的数量在本质上是随机的。在本文中，我们显示了如何概率的结果，特别是浓度梯度，可以用来自不精确优化结果来导出尖锐测试误差保证组合。通过考虑无约束的目标，我们强调优化隐含正规化性学习。

translated by 谷歌翻译

A Two-Timescale Framework for Bilevel Optimization: Complexity Analysis and Application to Actor-Critic

Mingyi Hong , Hoi-To Wai , Zhaoran Wang , Zhuoran Yang

分类：机器学习

2020-07-10

本文分析了双模的彼此优化随机算法框架。 Bilevel优化是一类表现出两级结构的问题，其目标是使具有变量的外目标函数最小化，该变量被限制为对（内部）优化问题的最佳解决方案。我们考虑内部问题的情况是不受约束的并且强烈凸起的情况，而外部问题受到约束并具有平滑的目标函数。我们提出了一种用于解决如此偏纤维问题的两次时间尺度随机近似（TTSA）算法。在算法中，使用较大步长的随机梯度更新用于内部问题，而具有较小步长的投影随机梯度更新用于外部问题。我们在各种设置下分析了TTSA算法的收敛速率：当外部问题强烈凸起（RESP。〜弱凸）时，TTSA算法查找$ \ MATHCAL {O}（k ^ { - 2/3}）$ -Optimal（resp。〜$ \ mathcal {o}（k ^ {-2/5}）$ - 静止）解决方案，其中$ k $是总迭代号。作为一个应用程序，我们表明，两个时间尺度的自然演员 - 批评批评近端策略优化算法可以被视为我们的TTSA框架的特殊情况。重要的是，与全球最优政策相比，自然演员批评算法显示以预期折扣奖励的差距，以$ \ mathcal {o}（k ^ { - 1/4}）的速率收敛。

translated by 谷歌翻译

Off-policy estimation of linear functionals: Non-asymptotic theory for semi-parametric efficiency

Wenlong Mou , Martin J. Wainwright , Peter L. Bartlett

分类： (统计)机器学习

2022-09-26

在因果推理和强盗文献中，基于观察数据的线性功能估算线性功能的问题是规范的。我们分析了首先估计治疗效果函数的广泛的两阶段程序，然后使用该数量来估计线性功能。我们证明了此类过程的均方误差上的非反应性上限：这些边界表明，为了获得非反应性最佳程序，应在特定加权$ l^2 $中最大程度地估算治疗效果的误差。 -规范。我们根据该加权规范的约束回归分析了两阶段的程序，并通过匹配非轴突局部局部最小值下限，在有限样品中建立了实例依赖性最优性。这些结果表明，除了取决于渐近效率方差之外，最佳的非质子风险除了取决于样本量支持的最富有函数类别的真实结果函数与其近似类别之间的加权规范距离。

translated by 谷歌翻译

A Non-Asymptotic Framework for Approximate Message Passing in Spiked Models

Gen Li , Yuting Wei

分类：机器学习 | (统计)机器学习

2022-08-05

近似消息传递（AMP）是解决高维统计问题的有效迭代范式。但是，当迭代次数超过$ o \ big（\ frac {\ log n} {\ log log \ log \ log n} \时big）$（带有$ n $问题维度）。为了解决这一不足，本文开发了一个非吸附框架，用于理解峰值矩阵估计中的AMP。基于AMP更新的新分解和可控的残差项，我们布置了一个分析配方，以表征在存在独立初始化的情况下AMP的有限样本行为，该过程被进一步概括以进行光谱初始化。作为提出的分析配方的两个具体后果：（i）求解$ \ mathbb {z} _2 $同步时，我们预测了频谱初始化AMP的行为，最高为$ o \ big（\ frac {n} {\ mathrm {\ mathrm { poly} \ log n} \ big）$迭代，表明该算法成功而无需随后的细化阶段（如最近由\ citet {celentano2021local}推测）; （ii）我们表征了稀疏PCA中AMP的非反应性行为（在尖刺的Wigner模型中），以广泛的信噪比。

translated by 谷歌翻译

Estimating the minimizer and the minimum value of a regression function under passive design

Arya Akhavan , Davit Gogolashvili , Alexandre B. Tsybakov

分类： (统计)机器学习

2022-11-29

We propose a new method for estimating the minimizer $\boldsymbol{x}^*$ and the minimum value $f^*$ of a smooth and strongly convex regression function $f$ from the observations contaminated by random noise. Our estimator $\boldsymbol{z}_n$ of the minimizer $\boldsymbol{x}^*$ is based on a version of the projected gradient descent with the gradient estimated by a regularized local polynomial algorithm. Next, we propose a two-stage procedure for estimation of the minimum value $f^*$ of regression function $f$. At the first stage, we construct an accurate enough estimator of $\boldsymbol{x}^*$, which can be, for example, $\boldsymbol{z}_n$. At the second stage, we estimate the function value at the point obtained in the first stage using a rate optimal nonparametric procedure. We derive non-asymptotic upper bounds for the quadratic risk and optimization error of $\boldsymbol{z}_n$, and for the risk of estimating $f^*$. We establish minimax lower bounds showing that, under certain choice of parameters, the proposed algorithms achieve the minimax optimal rates of convergence on the class of smooth and strongly convex functions.

translated by 谷歌翻译

A Unified Convergence Theorem for Stochastic Optimization Methods

Xiao Li , Andre Milzarek

分类：机器学习

2022-06-08

在这项工作中，我们提供了一种基本的统一收敛定理，用于得出一系列随机优化方法的预期和几乎确定的收敛结果。我们的统一定理仅需要验证几种代表性条件，并且不适合任何特定算法。作为直接应用，我们在更一般的设置下恢复了随机梯度方法（SGD）和随机改组（RR）的预期收敛结果。此外，我们为非滑动非convex优化问题的随机近端梯度方法（Prox-SGD）和基于随机模型的方法（SMM）建立了新的预期和几乎确定的收敛结果。这些应用程序表明，我们的统一定理为广泛的随机优化方法提供了插件类型的收敛分析和强大的收敛保证。

translated by 谷歌翻译

High Probability Bounds for Stochastic Subgradient Schemes with Heavy Tailed Noise

Daniela A. Parletta , Andrea Paudice , Massimiliano Pontil , Saverio Salzo

分类： (统计)机器学习

2022-08-17

在这项工作中，我们研究了沉重的尾部噪声下的随机亚级别方法的高概率边界。在这种情况下，仅假定噪声具有有限的方差，而不是次高斯的分布，众所周知，标准亚级别方法具有很高的概率边界。我们分析了投影的随机亚级别方法的剪裁版本，其中每当具有大规范时，亚级别估计值都会被截断。我们表明，这种剪裁策略既导致了许多经典平均方案的任何时间和有限的地平线界限。初步实验显示以支持该方法的有效性。

translated by 谷歌翻译

Tractability from overparametrization: The example of the negative perceptron

Andrea Montanari , Yiqiao Zhong , Kangjie Zhou

分类：机器学习

2021-10-28

在负面的感知问题中，我们给出了$ n $数据点$（{\ boldsymbol x} _i，y_i）$，其中$ {\ boldsymbol x} _i $是$ d $ -densional vector和$ y_i \ in \ { + 1，-1 \} $是二进制标签。数据不是线性可分离的，因此我们满足自己的内容，以找到最大的线性分类器，具有最大的\ emph {否定}余量。换句话说，我们想找到一个单位常规矢量$ {\ boldsymbol \ theta} $，最大化$ \ min_ {i \ le n} y_i \ langle {\ boldsymbol \ theta}，{\ boldsymbol x} _i \ rangle $ 。这是一个非凸优化问题（它相当于在Polytope中找到最大标准矢量），我们在两个随机模型下研究其典型属性。我们考虑比例渐近，其中$ n，d \ to \ idty $以$ n / d \ to \ delta $，并在最大边缘$ \ kappa _ {\ text {s}}（\ delta）上证明了上限和下限）$或 - 等效 - 在其逆函数$ \ delta _ {\ text {s}}（\ kappa）$。换句话说，$ \ delta _ {\ text {s}}（\ kappa）$是overparametization阈值：以$ n / d \ le \ delta _ {\ text {s}}（\ kappa） - \ varepsilon $一个分类器实现了消失的训练错误，具有高概率，而以$ n / d \ ge \ delta _ {\ text {s}}（\ kappa）+ \ varepsilon $。我们在$ \ delta _ {\ text {s}}（\ kappa）$匹配，以$ \ kappa \ to - \ idty $匹配。然后，我们分析了线性编程算法来查找解决方案，并表征相应的阈值$ \ delta _ {\ text {lin}}（\ kappa）$。我们观察插值阈值$ \ delta _ {\ text {s}}（\ kappa）$和线性编程阈值$ \ delta _ {\ text {lin {lin}}（\ kappa）$之间的差距，提出了行为的问题其他算法。

translated by 谷歌翻译

Hessian Averaging in Stochastic Newton Methods Achieves Superlinear Convergence

Sen Na , Michał Dereziński , Michael W. Mahoney

分类：机器学习 | (统计)机器学习

2022-04-20

We consider minimizing a smooth and strongly convex objective function using a stochastic Newton method. At each iteration, the algorithm is given an oracle access to a stochastic estimate of the Hessian matrix. The oracle model includes popular algorithms such as Subsampled Newton and Newton Sketch. Despite using second-order information, these existing methods do not exhibit superlinear convergence, unless the stochastic noise is gradually reduced to zero during the iteration, which would lead to a computational blow-up in the per-iteration cost. We propose to address this limitation with Hessian averaging: instead of using the most recent Hessian estimate, our algorithm maintains an average of all the past estimates. This reduces the stochastic noise while avoiding the computational blow-up. We show that this scheme exhibits local $Q$-superlinear convergence with a non-asymptotic rate of $(\Upsilon\sqrt{\log (t)/t}\,)^{t}$, where $\Upsilon$ is proportional to the level of stochastic noise in the Hessian oracle. A potential drawback of this (uniform averaging) approach is that the averaged estimates contain Hessian information from the global phase of the method, i.e., before the iterates converge to a local neighborhood. This leads to a distortion that may substantially delay the superlinear convergence until long after the local neighborhood is reached. To address this drawback, we study a number of weighted averaging schemes that assign larger weights to recent Hessians, so that the superlinear convergence arises sooner, albeit with a slightly slower rate. Remarkably, we show that there exists a universal weighted averaging scheme that transitions to local convergence at an optimal stage, and still exhibits a superlinear convergence rate nearly (up to a logarithmic factor) matching that of uniform Hessian averaging.

translated by 谷歌翻译