This paper provides a review and commentary on the past, present, and future of numerical optimization algorithms in the context of machine learning applications. Through case studies on text classification and the training of deep neural networks, we discuss how optimization problems arise in machine learning and what makes them challenging. A major theme of our study is that large-scale machine learning represents a distinctive setting in which the stochastic gradient (SG) method has traditionally played a central role while conventional gradient-based nonlinear optimization techniques typically falter. Based on this viewpoint, we present a comprehensive theory of a straightforward, yet versatile SG algorithm, discuss its practical behavior, and highlight opportunities for designing algorithms with improved performance. This leads to a discussion about the next generation of optimization methods for large-scale machine learning, including an investigation of two main streams of research on techniques that diminish noise in the stochastic directions and methods that make use of second-order derivative approximations.
translated by 谷歌翻译
A class of trust-region methods is presented for solving unconstrained nonlinear and possibly nonconvex discretized optimization problems, like those arising in systems governed by partial differential equations. The algorithms in this class make use of the discretization level as a mean of speeding up the computation of the step. This use is recursive, leading to true mul-tilevel/multiscale optimization methods reminiscent of multigrid methods in linear algebra and the solution of partial-differential equations. A simple algorithm of the class is then described and its numerical performance is shown to be numerically promising. This observation then motivates a proof of global convergence to first-order stationary points on the fine grid that is valid for all algorithms in the class. AMS subject classifications. 90C30, 65K05, 90C26, 90C06 1. Introduction. Large-scale finite-dimensional optimization problems often arise from the discretization of infinite-dimensional problems, a primary example being optimal-control problems defined in terms of either ordinary or partial differential equations. While the direct solution of such problems for a discretization level is often possible using existing packages for large-scale numerical optimization, this technique typically does make very little use of the fact that the underlying infinite-dimensional problem may be described at several discretization levels; the approach thus rapidly becomes cumbersome. Motivated by this observation, we explore here a class of algorithms which makes explicit use of this fact in the hope of improving efficiency and, possibly, enhancing reliability. Using the different levels of discretization for an infinite-dimensional problem is not a new idea. A simple first approach is to use coarser grids in order to compute approximate solutions which can then be used as starting points for the optimization problem on a finer grid (see [5, 6, 7, 22], for instance). However, potentially more efficient techniques are inspired from the multigrid paradigm in the solution of partial differential equations and associated systems of linear algebraic equations (see, for example, [10, 11, 12, 23, 40, 42], for descriptions and references). The work presented here was in particular motivated by the paper by Gelman and Mandel [16], the "gen-eralized truncated Newton algorithm" presented in Fisher [15], a talk by Moré [27] and the contributions by Nash and co-authors [25, 26, 29]. These latter three papers present the description of MG/OPT, a linesearch-based recursive algorithm, an outline of its convergence properties and impressive numerical results. The generalized truncated Newton algorithm and MG/OPT are very similar and, like many linesearch methods, naturally suited to convex problems, although their generalization to the nonconvex case is possible. Further motivation is also provided by the computational success of the low/high-fidelity model management techniques of Alexandrov and L
translated by 谷歌翻译
We develop a non-intrusive reduced basis (RB) method for parametrized steady-state partial differential equations (PDEs). The method extracts a reduced basis from a collection of high-fidelity solutions via a proper orthogonal decomposition (POD) and employs artificial neural networks (ANNs), particularly multi-layer perceptrons (MLPs), to accurately approximate the coefficients of the reduced model. The search for the optimal number of neurons and the minimum amount of training samples to avoid overfitting is carried out in the offline phase through an automatic routine, relying upon a joint use of the latin hypercube sampling (LHS) and the Levenberg-Marquardt training algorithm. This guarantees a complete offline-online decoupling, leading to an efficient RB method-referred to as POD-NN-suitable also for general nonlinear problems with a non-affine parametric dependence. Numerical studies are presented for the nonlinear Poisson equation and for driven cavity viscous flows, modeled through the steady incompressible Navier-Stokes equations. Both physical and geometrical parametrizations are considered. Several results confirm the accuracy of the POD-NN method and show the substantial speed-up enabled at the online stage as compared to a traditional RB strategy.
translated by 谷歌翻译
We present a method to solve initial and boundary value problems usingartificial neural networks. A trial solution of the differential equation iswritten as a sum of two parts. The first part satisfies the boundary (orinitial) conditions and contains no adjustable parameters. The second part isconstructed so as not to affect the boundary conditions. This part involves afeedforward neural network, containing adjustable parameters (the weights).Hence by construction the boundary conditions are satisfied and the network istrained to satisfy the differential equation. The applicability of thisapproach ranges from single ODE's, to systems of coupled ODE's and also toPDE's. In this article we illustrate the method by solving a variety of modelproblems and present comparisons with finite elements for several cases ofpartial differential equations.
translated by 谷歌翻译
在本文中,我们使用深度前馈人工神经网络逼近复杂几何中偏微分方程的解。我们展示了如何修改反向传播算法来计算网络输出的偏导数与空间变量的近似差分算子。该方法基于anansatz用于解决方案,其仅需要前馈神经网络和基于无约束梯度的优化方法,例如梯度下降或准牛顿方法。我们展示了一个例子,其中不能使用基于经典网格的方法,并且可以将神经网络看作是一种有吸引力的替代方案。最后,我们强调深度与浅层神经网络相比的好处,以及其他一些融合增强技术。
translated by 谷歌翻译
通过使用现代计算代数几何的观点,我们探索了深度线性神经网络模型的优化景观的性质。在澄清了“平面”最小值的各种定义后,我们证明了几何平坦的最小值,它们只是深线性网络的残余连续对称的伪影,可以通过广义的$ L_2 $正则化直接去除。然后,我们利用代数几何建立上界这些网络孤立静止点的数量。使用这些上界并利用数字代数几何方法,我们找到适度深度和矩阵大小的所有静止点。我们表明,在存在非零正则化的情况下,深层线性网络确实具有不是全局最小值的局部极小值。我们的计算结果阐明了深度线性网络损耗表面的某些方面,并提供了新的见解。
translated by 谷歌翻译
在本文中,我们展示了如何用人工神经网络增加经典反问题的方法。神经网络充当从噪声数据估计的系数的先验。神经网络是全局的平滑函数逼近器,因此它们不需要明确的误差正则化来恢复平滑解和系数。我们在1,2和3个空间维度中使用泊松方程给出了详细的例子,并表明神经网络增强对于噪声和不完整的数据,网格和几何是稳健的。
translated by 谷歌翻译
在本文中,我们讨论了有限状态折扣马尔可夫决策问题的近似解的政策迭代方法,重点是基于特征的聚合方法及其与深度加固学习方案的关系。我们引入了原问题状态的特征,并且我们制定了一个较小的“聚合”马尔可夫决策问题,其状态与特征有关。我们讨论了这种聚合的属性和可能的​​实现,包括一种近似策略的新方法。在这种方法中,策略改进操作将基于特征的聚合与使用深度神经网络或其他计算的特征构造相结合。我们认为,策略的成本函数可以通过聚合提供的特征的非线性函数更准确地接近,而不是通过基于神经网络的强化学习提供的特征的线性函数,从而可能导致有效的策略改进。
translated by 谷歌翻译
This article surveys the classical techniques of nonlinear optimal control such as the Pontryagin Maximum Principle and the conjugate point theory, and how they can be implemented numerically, with a special focus on applications to aerospace problems. In practice the knowledge resulting from the maximum principle is often insufficient for solving the problem in particular because of the well-known problem of initializing adequately the shooting method. In this survey article it is explained how the classical tools of optimal control can be combined with other mathematical techniques to improve significantly their performances and widen their domain of application. The focus is put onto three important issues. The first is geometric optimal control, which is a theory that has emerged in the 80's and is combining optimal control with various concepts of differential geometry, the ultimate objective being to derive optimal synthesis results for general classes of control systems. Its applicability and relevance is demonstrated on the problem of atmospheric re-entry of a space shuttle. The second is the powerful continuation or homotopy method, consisting of deforming continuously a problem towards a simpler one, and then of solving a series of parametrized problems to end up with the solution of the initial problem. After having recalled its mathematical foundations, it is shown how to combine successfully this method with the shooting method on several aerospace problems such as the orbit transfer problem. The third one consists of concepts of dynamical system theory, providing evidence of nice properties of the celestial dynamics that are of great interest for future mission design such as low cost interplanetary space missions. The article ends with open problems and perspectives.
translated by 谷歌翻译
Gradient descent is commonly used to solve optimization problems arising in machine learning, such as training neural networks. Although it seems to be effective for many different neural network training problems, it is unclear if the effectiveness of gradient descent can be explained using existing performance guarantees for the algorithm. We argue that existing analyses of gradient descent rely on assumptions that are too strong to be applicable in the case of multi-layer neural networks. To address this, we propose an algorithm, duality structure gradient descent (DSGD), that is amenable to a non-asymptotic performance analysis, under mild assumptions on the training set and network architecture. The algorithm can be viewed as a form of layer-wise coordinate descent, where at each iteration the algorithm chooses one layer of the network to update. The decision of what layer to update is done in a greedy fashion, based on a rigorous lower bound of the function decrease for each possible choice of layer. In the analysis, we bound the time required to reach approximate stationary points, in both the deterministic and stochastic settings. The convergence is measured in terms of a Finsler geometry that is derived from the network architecture and designed to confirm a Lipschitz-like property on the gradient of the training objective function. Numerical experiments in both the full batch and mini-batch settings suggest that the algorithm is a promising step towards methods for training neural networks that are both rigorous and efficient.
translated by 谷歌翻译
神经网络越来越多地用于复杂(数据驱动)模拟中,或者用于加速经典代理的计算。在许多应用中,必须满足物理限制,例如质量或能量守恒,才能获得可靠的结果。然而,标准机器学习算法通常不适合于遵守这些约束。我们提出了两种不同的策略来生成约束感知神经网络。我们在可压缩流体流中的强非线性波动的前捕获方案的背景下测试它们的性能。准确地说,在这种情况下,所谓的黎曼问题必须作为代理人来解决。他们的解决方案描述了数值模拟中捕获波前的局部动力学。考虑三个模型问题:立方通量模型问题,等温两相流模型和欧拉方程。我们证明,除了满足约束条件的结构优势之外,约束偏差的减小与所有模型问题的低离散化误差相关。
translated by 谷歌翻译
提出了一种称为CaNN(校准神经网络)的数据驱动方法,用于使用人工神经网络(ANN)校准金融资产价格模型。确定模型参数的最佳值是基于可用的金融期权价格在机器学习框架内制定隐藏的神经元。该框架由两部分组成:前向传递,我们在线离线训练ANN的权重,在不同的资产模型参数设置下评估选项;和一个向后传递,其中我们在线评估训练的人工神经网络求解器,旨在找到输入层中神经元的权重。 NATs对隐含波动率的快速在线学习,结合使用适应的并行全局优化方法,解决了计算瓶颈,为校准模型参数提供了快速可靠的技术,同时尽可能地避免陷入局部极小值。数值实验证实,该机器学习框架可以有效,准确地校准高维随机波动率模型的参数。
translated by 谷歌翻译
基于one.g.的数值模拟评估的计算工作量。有限元方法很高。元模型可用于创建低成本替代方案。然而,用于创建足够的元模型的所需样本的数量应该保持较低,这可以通过使用自适应采样技术来实现。在这篇硕士论文中,研究了自适应采样技术在使用克里金技术创建元模型中的应用,该技术通过由先验协方差控制的高斯过程来插值。提出了扩展到多保真问题的Kriging框架,并用于比较文献中提出的基准问题的自适应采样技术以及接触力学的应用。本文首次对Kriging框架的自适应技术的大范围进行了综合比较。此外,自适应技术的灵活性被引入到多保真Kriging以及具有减少的超参数维度的Kriging模型,称为偏最小二乘Kriging。此外,提出了一种创新的二进制分类自适应方案,并用于识别Duffing型振荡器的混沌运动。
translated by 谷歌翻译
高维PDE一直是一项长期的计算挑战。我们提出通过用神经网络近似解决方案来解决高维PDE,该神经网络经过训练以满足微分算子,初始条件和边界条件。我们的算法是无网格的,这是因为网格在更高维度上变得不可行。神经网络不是形成网格,而是在批量随机抽样的时间和空间点上进行训练。该算法在一类高维自由边界偏微分方程上进行了测试,我们能够以高达200美元的维度精确求解该算法。该算法还在高维Hamilton-Jacobi-Bellman PDE和Burgers方程上进行了测试。深度学习算法近似于Burgers方程的一般解决方案,用于连续的不同边界条件和物理条件(可以视为高维空间)。我们将该算法称为“深度Galerkin方法(DGM)”,因为它在精神上与Galerkin方法类似,其解决方案由神经网络近似而不是基函数的线性组合。此外,我们证明了关于拟线性抛物型偏微分方程组的神经网络近似能力的定理。
translated by 谷歌翻译
Among optimal hierarchical algorithms for the computational solution of elliptic problems, the Fast Multipole Method (FMM) stands out for its adaptability to emerging architectures, having high arithmetic intensity, tunable accuracy, and relaxable global synchronization requirements. We demonstrate that, beyond its traditional use as a solver in problems for which explicit free-space kernel representations are available, the FMM has applicability as a preconditioner in finite domain elliptic boundary value problems, by equipping it with boundary integral capability for satisfying conditions at finite boundaries and by wrapping it in a Krylov method for extensibility to more general operators. Here, we do not discuss the well developed applications of FMM to implement matrix-vector multiplications within Krylov solvers of boundary element methods. Instead, we propose using FMM for the volume-to-volume contribution of inhomogeneous Poisson-like problems, where the boundary integral is a small part of the overall computation. Our method may be used to precondition sparse matrices arising from finite difference/element discretizations, and can handle a broader range of scientific applications. Compared with multigrid methods, it is capable of comparable algebraic convergence rates down to the truncation error of the discretized PDE, and it offers potentially superior multicore and distributed memory scalability properties on commodity architecture supercomputers. Compared with other methods exploiting the low rank character of off-diagonal blocks of the dense resolvent operator, FMM-preconditioned Krylov iteration may reduce the amount of communication because it is matrix-free and exploits the tree structure of FMM. We describe our tests in reproducible detail with freely available codes and outline directions for further extensibility.
translated by 谷歌翻译
We present a primal-dual interior-point algorithm with a filter line-search method for nonlinear programming. Local and global convergence properties of this method were analyzed in previous work. Here we provide a comprehensive description of the algorithm, including the feasibility restoration phase for the filter method, second-order corrections, and inertia correction of the KKT matrix. Heuristics are also considered that allow faster performance. This method has been implemented in the IPOPT code, which we demonstrate in a detailed numerical study based on 954 problems from the CUTEr test set. An evaluation is made of several line-search options, and a comparison is provided with two state-of-the-art interior-point codes for nonlinear programming.
translated by 谷歌翻译
到目前为止,深度学习和深层体系结构正在成为许多实际应用中最好的机器学习方法,例如降低数据的维度,图像分类,语音识别或对象分割。事实上,许多领先的技术公司,如谷歌,微软或IBM,正在研究和使用他们系统中的深层架构来取代其他传统模型。因此,提高这些模型的性能可以在机器学习领域产生强烈的影响。然而,深度学习是一个快速发展的研究领域,在过去几年中发现了许多核心方法和范例。本文将首先作为深度学习的简短总结,试图包括本研究领域中所有最重要的思想。基于这一知识,我们提出并进行了一些实验,以研究基于自动编程(ADATE)改进深度学习的可能性。尽管我们的实验确实产生了良好的结果,但由于时间有限以及当前ADATE版本的局限性,我们还有更多的可能性无法尝试。我希望这篇论文可以促进关于这个主题的未来工作,特别是在ADATE的下一个版本中。本文还简要分析了ADATEsystem的功能,这对于想要了解其功能的其他研究人员非常有用。
translated by 谷歌翻译
Learning algorithms related to artificial neural networks and in particular for Deep Learning may seem to involve many bells and whistles, called hyper-parameters. This chapter is meant as a practical guide with recommendations for some of the most commonly used hyper-parameters, in particular in the context of learning algorithms based on back-propagated gradient and gradient-based optimization. It also discusses how to deal with the fact that more interesting results can be obtained when allowing one to adjust many hyper-parameters. Overall, it describes elements of the practice used to successfully and efficiently train and debug large-scale and often deep multi-layer neural networks. It closes with open questions about the training difficulties observed with deeper architectures.
translated by 谷歌翻译