Deep Learning methods are currently the state-of-the
translated by 谷歌翻译
到目前为止,深度学习和深层体系结构正在成为许多实际应用中最好的机器学习方法,例如降低数据的维度,图像分类,语音识别或对象分割。事实上,许多领先的技术公司,如谷歌,微软或IBM,正在研究和使用他们系统中的深层架构来取代其他传统模型。因此,提高这些模型的性能可以在机器学习领域产生强烈的影响。然而,深度学习是一个快速发展的研究领域,在过去几年中发现了许多核心方法和范例。本文将首先作为深度学习的简短总结,试图包括本研究领域中所有最重要的思想。基于这一知识,我们提出并进行了一些实验,以研究基于自动编程(ADATE)改进深度学习的可能性。尽管我们的实验确实产生了良好的结果,但由于时间有限以及当前ADATE版本的局限性,我们还有更多的可能性无法尝试。我希望这篇论文可以促进关于这个主题的未来工作,特别是在ADATE的下一个版本中。本文还简要分析了ADATEsystem的功能,这对于想要了解其功能的其他研究人员非常有用。
translated by 谷歌翻译
This paper provides a review and commentary on the past, present, and future of numerical optimization algorithms in the context of machine learning applications. Through case studies on text classification and the training of deep neural networks, we discuss how optimization problems arise in machine learning and what makes them challenging. A major theme of our study is that large-scale machine learning represents a distinctive setting in which the stochastic gradient (SG) method has traditionally played a central role while conventional gradient-based nonlinear optimization techniques typically falter. Based on this viewpoint, we present a comprehensive theory of a straightforward, yet versatile SG algorithm, discuss its practical behavior, and highlight opportunities for designing algorithms with improved performance. This leads to a discussion about the next generation of optimization methods for large-scale machine learning, including an investigation of two main streams of research on techniques that diminish noise in the stochastic directions and methods that make use of second-order derivative approximations.
translated by 谷歌翻译
Learning algorithms related to artificial neural networks and in particular for Deep Learning may seem to involve many bells and whistles, called hyper-parameters. This chapter is meant as a practical guide with recommendations for some of the most commonly used hyper-parameters, in particular in the context of learning algorithms based on back-propagated gradient and gradient-based optimization. It also discusses how to deal with the fact that more interesting results can be obtained when allowing one to adjust many hyper-parameters. Overall, it describes elements of the practice used to successfully and efficiently train and debug large-scale and often deep multi-layer neural networks. It closes with open questions about the training difficulties observed with deeper architectures.
translated by 谷歌翻译
在过去几年中,神经网络重新成为强大的机器学习模型,在图像识别和语音处理等领域产生了最先进的结果。最近,神经网络模型开始应用于文本自然语言信号,同样具有非常有希望的结果。本教程从自然语言处理研究的角度对神经网络模型进行了调查,试图通过神经技术使自然语言研究人员加快速度。本教程介绍了自然语言任务,前馈网络,卷积网络,循环网络和递归网络的输入编码,以及自动梯度计算的计算图形抽象。
translated by 谷歌翻译
声学数据提供从生物学和通信到海洋和地球科学等领域的科学和工程见解。我们调查了机器学习(ML)的进步和变革潜力,包括声学领域的深度学习。 ML是用于自动检测和利用模式印度的广泛的统计技术家族。相对于传统的声学和信号处理,ML是数据驱动的。给定足够的训练数据,ML可以发现特征之间的复杂关系。通过大量的训练数据,ML candiscover模型描述复杂的声学现象,如人类语音和混响。声学中的ML正在迅速发展,具有令人瞩目的成果和未来的重大前景。我们首先介绍ML,然后在五个声学研究领域强调MLdevelopments:语音处理中的源定位,海洋声学中的源定位,生物声学,地震探测和日常场景中的环境声音。
translated by 谷歌翻译
深度学习(DL)是一种高维数据简化技术,用于构建输入 - 输出模型中的高维预测器。 DL是一种机器学习形式,它使用潜在特征的分层。在本文中,我们从建模和算法的角度回顾了深度学习的最新技术。我们提供了人工智能(AI),图像处理,机器人和自动化领域的成功应用领域列表。深度学习本质上是预测性的,而不是推论性的,可以被视为用于高维函数估计的黑盒方法。
translated by 谷歌翻译
深度神经网络(DNN)正在成为现代计算应用中的重要工具。加速他们的培训是一项重大挑战,技术范围从分布式算法到低级电路设计。在这项调查中,我们从理论的角度描述了这个问题,然后是并行化的方法。我们介绍了DNN体系结构的趋势以及由此产生的对并行化策略的影响。然后,我们回顾并模拟DNN中不同类型的并发性:从单个运算符,到网络推理和训练中的并行性,再到分布式深度学习。我们讨论异步随机优化,分布式系统架构,通信方案和神经架构搜索。基于这些方法,我们推断了在深度学习中并行性的潜在方向。
translated by 谷歌翻译
机器学习算法的成功通常取决于数据表示,我们假设这是因为不同的表示可以或多或少地隐藏数据背后变异的不同解释因素。虽然可以使用特定领域知识来帮助设计表示,但也可以使用通用先验学习,并且对AI的追求正在激励设计实现这些先验的更强大的表示 - 学习算法。本文回顾了无监督特征学习和深度学习领域的最新研究成果,涵盖了概率模型,自动编码器,流形学习和深度网络的进步。这激发了关于学习良好表征,计算表示(即推理)以及表示学习,密度估计和流形学习之间的几何联系的适当目标的长期未回答的问题。
translated by 谷歌翻译
本文的目的是研究为什么随机梯度下降(SGD)对神经网络有效,以及神经网络设计如何影响SGD。特别是,我们研究了过度参数化 - 超出训练数据数量的参数数量的增加 - 如何影响SGD的动态。我们介绍一种称为梯度混淆的简单概念。当混淆度很高时,不同数据样本产生的随机梯度可能会相关,从而减慢收敛速度。但是当梯度混淆时,我们表明SGD具有比经典理论预测的更好的收敛性。利用理论和实验结果,我们研究过度参数化如何影响梯度混淆,从而研究SDD在线性模型和神经网络上的收敛性。我们表明,增加线性模型的参数数量或增加神经网络的宽度会导致较低的梯度混淆,从而更快更容易地进行模型训练。我们通过增加神经网络的深度来进行过度参数化会导致更高的梯度混淆,使得更深的模型更难训练。最后,我们通过经验观察到批量归一化和滑动连接等技术减少了梯度混淆,这有助于减少深层网络的训练负担。
translated by 谷歌翻译
In this invited paper, my overview material on the same topic as presented in the plenary overview session of APSIPA-2011 and the tutorial material presented in the same conference [1] are expanded and updated to include more recent developments in deep learning. The previous and the updated materials cover both theory and applications, and analyze its future directions. The goal of this tutorial survey is to introduce the emerging area of deep learning or hierarchical learning to the APSIPA community. Deep learning refers to a class of machine learning techniques, developed largely since 2006, where many stages of non-linear information processing in hierarchical architectures are exploited for pattern classification and for feature learning. In the more recent literature, it is also connected to representation learning, which involves a hierarchy of features or concepts where higher-level concepts are defined from lower-level ones and where the same lower-level concepts help to define higher-level ones. In this tutorial survey, a brief history of deep learning research is discussed first. Then, a classificatory scheme is developed to analyze and summarize major work reported in the recent deep learning literature. Using this scheme, I provide a taxonomy-oriented survey on the existing deep architectures and algorithms in the literature, and categorize them into three classes: generative, discriminative, and hybrid. Three representative deep architectures-deep autoencoders, deep stacking networks with their generalization to the temporal domain (recurrent networks), and deep neural networks (pretrained with deep belief networks)-one in each of the three classes, are presented in more detail. Next, selected applications of deep learning are reviewed in broad areas of signal and information processing including audio/speech, image/vision, multimodality, language modeling, natural language processing, and information retrieval. Finally, future directions of deep learning are discussed and analyzed.
translated by 谷歌翻译
本文的目的是提供对使用梯度下降训练深度神经网络的主要概念的教学介绍;一个称为反向传播的过程。虽然我们专注于一种非常有影响力的架构,称为“卷积神经网络”(CNN),但这种方法对机器学习社区来说是通用的,并且对整个机构有用。由反向传播的推导areoften由出现somewhatmathemagical笨拙索引重叙述模糊观察的启发,我们的目标是提供一个在概念上清楚,向量化描述thatarticulates以及更高级别的逻辑。我们遵循“写自然的方式,让你知道你的思想是多么草率”的原则,我们试图使计算细致,自成一体,尽可能直观。没有任何理所当然,充足的插图作为视觉指南和广泛的参考书目。提供进一步的探索。 (为了清楚起见,长的数学推导和visualizationshave被分为编码成PDF作为可选内容组短“概括意见”和较长的“具体意见”,旨在说明在一个更吸引人的风格是非常重要的概念有些人物containanimations。出于这些原因,我们建议在本地下载文档并使用Adobe Acrobat Reader打开它。其他观众未经过测试,可能无法正确呈现详细视图,动画。)
translated by 谷歌翻译
The success of machine learning algorithms generally depends on data representation, and we hypothesize that this is because different representations can entangle and hide more or less the different explanatory factors of variation behind the data. Although domain knowledge can be used to help design representations, learning can also be used, and the quest for AI is motivating the design of more powerful representation-learning algorithms. This paper reviews recent work in the area of unsupervised feature learning and deep learning, covering advances in probabilistic models, manifold learning, and deep learning. This motivates longer-term unanswered questions about the appropriate objectives for learning good representations, for computing representations (i.e., inference), and the geometrical connections between representation learning, density estimation and manifold learning.
translated by 谷歌翻译
One long-term goal of machine learning research is to produce methods that are applicable to highly complex tasks, such as perception (vision, audition), reasoning , intelligent control, and other artificially intelligent behaviors. We argue that in order to progress toward this goal, the Machine Learning community must endeavor to discover algorithms that can learn highly complex functions, with minimal need for prior knowledge, and with minimal human intervention. We present mathematical and empirical evidence suggesting that many popular approaches to non-parametric learning, particularly kernel methods, are fundamentally limited in their ability to learn complex high-dimensional functions. Our analysis focuses on two problems. First, kernel machines are shallow architectures, in which one large layer of simple template matchers is followed by a single layer of trainable coefficients. We argue that shallow architectures can be very inefficient in terms of required number of computational elements and examples. Second , we analyze a limitation of kernel machines with a local kernel, linked to the curse of dimensionality, that applies to supervised, unsupervised (manifold learning) and semi-supervised kernel machines. Using empirical results on invariant image recognition tasks, kernel methods are compared with deep architectures, in which lower-level features or concepts are progressively combined into more abstract and higher-level representations. We argue that deep architectures have the potential to generalize in non-local ways, i.e., beyond immediate neighbors, and that this is crucial in order to make progress on the kind of complex tasks required for artificial intelligence.
translated by 谷歌翻译
技术发展旨在生产能够执行复杂任务的几代效率越来越高的机器人。这需要来自科学界的大量努力,以找到解决计算机视觉问题的新算法,例如物体识别。 RGB-Dcameras的传播将研究引向研究能够开发RGB和深度信息的新架构。本文开发的项目涉及实现一种新的端到端架构,用于识别称为RCFusion的RGB-D对象。我们的方法通过组合表示不同抽象级别的互补RGB和深度信息来生成紧凑且高度辨别的多模态特征。我们在标准物体识别数据集,RGB-D物体数据集和JHUIT-50上评估我们的方法。所进行的实验表明,我们的方法优于现有方法,并为这两种数据集建立了新的最新结果。
translated by 谷歌翻译
Recently there has been a dramatic increase in the performance of recognition systems due to the introduction of deep architectures for representation learning and classification. However, the mathematical reasons for this success remain elusive. This tutorial will review recent work that aims to provide a mathematical justification for several properties of deep networks, such as global optimality, geometric stability, and invariance of the learned representations.
translated by 谷歌翻译
机器学习的标准概率视角产生了经验风险最小化任务,这些任务经常通过随机梯度下降(SGD)及其变体来解决。我们将这些任务的公式表示为经典的逆或过滤问题,此外,我们提出了一种有效的无梯度算法,用于使用集合卡尔曼反演(EKI)来寻找这些问题的解。我们的方法的应用包括离线和在线监督学习与深度神经网络,以及基于图的半监督学习。 EKI程序的本质是基于集合的近似梯度下降,其中导数被集合内的差异所取代。我们建议对基本方法进行一些修改,这些修改源于在SGD背景下开发的经验上成功的启发式方法。数值结果表明了该算法的广泛适用性和鲁棒性。
translated by 谷歌翻译
高维动力系统的模型减少减轻了从设计优化到模型预测控制的各种任务所面临的计算负担。一种流行的模型简化方法基于将控制方程投影到由解压缩快照的数据集的压缩获得的基函数所跨越的子空间上。然而,这种方法是侵入性的,因为投影需要访问系统操作员。此外,一些系统可能需要对非线性进行特殊处理以确保计算效率或额外建模以保持稳定性。在这项工作中,我们提出了一种基于深度学习的非线性模型减少策略,该策略受到基于投影的模型简化的启发,其中的思想是识别一些最优的低维表示并及时演化。 Ourapproach构建了一个模块化模型,包括深度卷积算法编码器和修改后的LSTM网络。深度卷积自动编码器在一些表达非线性数据支持流形上的坐标方面返回了低维表示。然后,通过修改的LSTM网络以计算有效的方式对该流形上的动力学进行建模。还开发了利用模型模块的离线无监督训练策略。我们在三个说明性示例中展示了我们的模型,每个示例突出了模型在具有大参数变化的流体系统的预测任务中的性能及其在长时间预测中的稳定性。
translated by 谷歌翻译
我们通过样条函数和算子在深度网络(DN)和近似理论之间建立了一个严格的桥梁。我们的关键结果是,大类DN可以编写为最大仿射样条运算符(MASO)的组合,它提供了一个强大的门户,通过它可以查看和分析其内部工作。例如,以输入信号为条件,MASODN的输出可以写为输入的简单仿射变换。这表明DN构建了一组信号相关的,特定于类的模板,通过简单的内积来比较信号;我们通过匹配滤波器和数据记忆的影响探索经典的最优分类理论的链接。更进一步,我们提出一个简单的惩罚措施,可以添加到任何DN学习算法的成本函数中,使模板彼此正交;这导致分类性能得到显着改善,并且在不改变DN架构的情况下减少了过度拟合。由MASO隐式引起的输入信号空间的样条分区将DN直接链接到矢量量化(VQ)理论和$ K $ -means聚类,这开辟了新的几何途径来研究DN如何以分层方式组织信号。为了验证VQ解释的效用,我们开发并验证了新的信号和图像距离度量,以量化其VQencodings之间的差异。 (本文是ICML 2018中深度学习的样条理论的显着扩展版本。)
translated by 谷歌翻译