最小 - 最大问题,也称为鞍点问题,是一类优化问题,其中我们同时最小化和最大化两个变量子集。这类问题可用于制定广泛的信号处理和通信(SPCOM)问题。尽管它具有普遍性,但该类的现有理论主要是针对具有一定特殊凸凹结构的问题而开发的。因此,它不能用于指导SPCOM中许多有趣问题的算法设计,其中经常会出现某种非凸性。在这项工作中,我们考虑一般的块式单边非凸最小 - 最大问题,其中最小化问题由多个块组成并且是非凸的,而最大化问题是(强)凹的。我们提出了一类简单算法,称为混合块逐次逼近(HiBSA),它为最小化块交替执行梯度下降型步骤,并为最大化问题交替执行一个梯度上升型步骤。该算法的一个关键要素是引入了一些适当设计的正则化和惩罚项,用于稳定算法并确保收敛。我们首次证明了这种简单的交替最小 - 最大算法收敛于一阶固定解,具有可量化的全局速率。为了验证所提算法的效率,我们对许多信息处理和无线通信问题进行了数值测试,包括鲁棒学习问题,凸 - 最小 - 效用最大化问题,以及干扰信道中出现的某些无线干扰问题。
translated by 谷歌翻译
我们研究了线性二次调节器的生成对抗模仿学习的全局收敛性,它被认为是极小极大优化。针对非凸凹几何所带来的挑战,我们分析了交替梯度算法,并将其Q-线性收敛速度建立到一个独特的鞍点,同时恢复了全局最优策略和奖励函数。我们希望我们的结果可以作为一个小小的理解和驯服模仿学习中的不稳定性,以及更强大的非凸凹交替极小极大优化,从强化学习和生成对抗性学习。
translated by 谷歌翻译
尽管单一代理强化学习取得了成功,但由于代理之间复杂的相互作用,多智能体强化学习(MARL)仍然具有挑战性。在传感器网络,群体机器人和电网等分散应用的推动下,我们研究MARL中的政策评估,其中联合观察到的国家 - 行动对和私人地方奖励的代理人合作学习特定政策的价值。在本文中,我们提出了一种双重平均方案,其中每个代理分别在空间和时间上进行平均,以分别合并相邻的梯度信息和本地奖励信息。我们证明了所提出的算法以全球几何速率收敛到最优解。特别地,这种算法建立在均方投影Bellman误差最小化问题的原始 - 对偶重构上,这产生了分散的凸凹鞍点问题。据我们所知,所提出的双平均原始 - 对偶优化算法是第一个在分散的凹凸鞍点问题上实现快速有限时间收敛的算法。
translated by 谷歌翻译
For the past couple of decades, numerical optimization has played a centralrole in addressing wireless resource management problems such as power controland beamformer design. However, optimization algorithms often entailconsiderable complexity, which creates a serious gap between theoreticaldesign/analysis and real-time processing. To address this challenge, we proposea new learning-based approach. The key idea is to treat the input and output ofa resource allocation algorithm as an unknown non-linear mapping and use a deepneural network (DNN) to approximate it. If the non-linear mapping can belearned accurately by a DNN of moderate size, then resource allocation can bedone in almost real time -- since passing the input through a DNN only requiresa small number of simple operations. In this work, we address both the thereotical and practical aspects ofDNN-based algorithm approximation with applications to wireless resourcemanagement. We first pin down a class of optimization algorithms that are`learnable' in theory by a fully connected DNN. Then, we focus on DNN-basedapproximation to a popular power allocation algorithm named WMMSE (Shi {\it etal} 2011). We show that using a DNN to approximate WMMSE can be fairly accurate-- the approximation error $\epsilon$ depends mildly [in the order of$\log(1/\epsilon)$] on the numbers of neurons and layers of the DNN. On theimplementation side, we use extensive numerical simulations to demonstrate thatDNNs can achieve orders of magnitude speedup in computational time compared tostate-of-the-art power allocation algorithms based on optimization.
translated by 谷歌翻译
Many machine learning techniques sacrifice convenient computational structures to gain estimation robustness and modeling flexibility. However, by exploring the modeling structures , we find these "sacrifices" do not always require more computational efforts. To shed light on such a "free-lunch" phenomenon, we study the square-root-Lasso (SQRT-Lasso) type regression problem. Specifically, we show that the nonsmooth loss functions of SQRT-Lasso type regression ease tuning effort and gain adaptivity to inhomogeneous noise, but is not necessarily more challenging than Lasso in computation. We can directly apply proximal algorithms (e.g. proximal gradient descent, proximal Newton, and proximal quasi-Newton algorithms) without worrying about the nonsmoothness of the loss function. Theoretically, we prove that the proximal algorithms enjoy fast local convergence with high probability. Our numerical experiments also show that when further combined with the pathwise optimization scheme, the proximal algorithms significantly outperform other competing algorithms.
translated by 谷歌翻译
The alternating direction method of multipliers (ADMM) is widely used tosolve large-scale linearly constrained optimization problems, convex ornonconvex, in many engineering fields. However there is a general lack oftheoretical understanding of the algorithm when the objective function isnonconvex. In this paper we analyze the convergence of the ADMM for solvingcertain nonconvex consensus and sharing problems, and show that the classicalADMM converges to the set of stationary solutions, provided that the penaltyparameter in the augmented Lagrangian is chosen to be sufficiently large. Forthe sharing problems, we show that the ADMM is convergent regardless of thenumber of variable blocks. Our analysis does not impose any assumptions on theiterates generated by the algorithm, and is broadly applicable to many ADMMvariants involving proximal update rules and various flexible block selectionrules.
translated by 谷歌翻译
The block coordinate descent (BCD) method is widely used for minimizing acontinuous function f of several block variables. At each iteration of thismethod, a single block of variables is optimized, while the remaining variablesare held fixed. To ensure the convergence of the BCD method, the subproblem tobe optimized in each iteration needs to be solved exactly to its unique optimalsolution. Unfortunately, these requirements are often too restrictive for manypractical scenarios. In this paper, we study an alternative inexact BCDapproach which updates the variable blocks by successively minimizing asequence of approximations of f which are either locally tight upper bounds off or strictly convex local approximations of f. We focus on characterizing theconvergence properties for a fairly wide class of such methods, especially forthe cases where the objective functions are either non-differentiable ornonconvex. Our results unify and extend the existing convergence results formany classical algorithms such as the BCD method, the difference of convexfunctions (DC) method, the expectation maximization (EM) algorithm, as well asthe alternating proximal minimization algorithm.
translated by 谷歌翻译
卷积神经网络(CNN)已成为包括深度估计在内的最先进的多视觉任务。然而,内存和计算能力要求仍然是这些模型中需要解决的挑战。单目深度估计在机器人和虚拟现实中具有重要意义,需要在低端设备上进行部署。从划痕中训练一个小模型会导致准确度显着下降,并且不会使训练有素的大型模型受益。在模型修剪文献的推动下,我们提出了一个从大型训练模型中获得的轻量级单眼深度模型。这是通过使用新颖的端对端滤波器修剪去除最不重要的特征来实现的。我们建议为每个过滤器学习二进制掩码,以决定是否丢弃过滤器。这些掩模被共同训练以利用不同层处的滤波器之间的关系以及同一层内的冗余。我们表明,我们可以在KITTI驾驶数据集上实现大约5倍的压缩率和精度的小幅下降。我们还表明,即使不加强压缩损失,屏蔽也可以通过较少的参数提高基线的准确度。
translated by 谷歌翻译
在线图像散列最近受到越来越多的研究关注,其以流方式接收大规模数据以即时更新散列函数。其主要挑战在于难以平衡学习时效性和模型准确性。为此,大多数工作都利用了监督设置,即使用类标签来提高散列性能,这在两个方面存在缺陷:首先,需要大量的训练批次来学习最新的散列函数,然而这大大增加了学习复杂。其次,使用强约束,例如正交或相似保持,然而这些约束通常是放松的并且导致大的精度下降。为了应对上述挑战,本文提出了一种名为Hadamard Matrix Guided Online Hashing(HMOH)的小型监督在线哈希方案。我们的关键创新在于哈达马尔矩阵的构造和使用,这是一种正交二进制矩阵,是通过西尔维斯特方法构建的。为了释放强约束的需要,我们将Hadamard矩阵的每一列视为每个类标签的目标代码,其中bynature满足散列代码的几个所需属性。为了加速在线训练,首先采用LSH来对齐目标代码的长度和待学习的二进制代码。然后,我们将哈希函数的学习视为一组二进制分类问题,以适应指定的目标代码。最后,我们建议在所有轮次中集成学习模型,以最大限度地保留过去流数据的信息。通过对三种广泛使用的数据集进行深入的实验,与各种最先进的方法进行比较,证明了所提方法的优越性和效率。
translated by 谷歌翻译
从示范学习(LfD)和模仿学习提供了将任务行为转移到机器人的新范例。一类实现这种在线学习的方法需要机器人观察正在执行的任务,并将感测到的流数据分解为状态 - 动作对的序列,然后输入到方法中。因此,在感测数据中正确且快速地识别状态 - 动作对是这些方法的关键先决条件。我们将SA-Net呈现为一种深度神经网络架构,可识别来自RGB-D数据流的状态 - 动作对。 SA-Net在LfD的双向机器人应用中表现良好 - 一个涉及移动地面机器人,另一个涉及机器人操纵器 - 这表明该体系结构很好地适用于不同的环境。综合评估,包括在物理机器人上的部署,表明\ sanet {}显着提高了利用传统图像处理和分割的先前方法的准确性。
translated by 谷歌翻译