In this invited paper, my overview material on the same topic as presented in the plenary overview session of APSIPA-2011 and the tutorial material presented in the same conference [1] are expanded and updated to include more recent developments in deep learning. The previous and the updated materials cover both theory and applications, and analyze its future directions. The goal of this tutorial survey is to introduce the emerging area of deep learning or hierarchical learning to the APSIPA community. Deep learning refers to a class of machine learning techniques, developed largely since 2006, where many stages of non-linear information processing in hierarchical architectures are exploited for pattern classification and for feature learning. In the more recent literature, it is also connected to representation learning, which involves a hierarchy of features or concepts where higher-level concepts are defined from lower-level ones and where the same lower-level concepts help to define higher-level ones. In this tutorial survey, a brief history of deep learning research is discussed first. Then, a classificatory scheme is developed to analyze and summarize major work reported in the recent deep learning literature. Using this scheme, I provide a taxonomy-oriented survey on the existing deep architectures and algorithms in the literature, and categorize them into three classes: generative, discriminative, and hybrid. Three representative deep architectures-deep autoencoders, deep stacking networks with their generalization to the temporal domain (recurrent networks), and deep neural networks (pretrained with deep belief networks)-one in each of the three classes, are presented in more detail. Next, selected applications of deep learning are reviewed in broad areas of signal and information processing including audio/speech, image/vision, multimodality, language modeling, natural language processing, and information retrieval. Finally, future directions of deep learning are discussed and analyzed.
translated by 谷歌翻译
In the era of the Internet of Things (IoT), an enormous amount of sensing devices collect and/or generate various sensory data over time for a wide range of fields and applications. Based on the nature of the application, these devices will result in big or fast/real-time data streams. Applying analytics over such data streams to discover new information, predict future insights, and make control decisions is a crucial process that makes IoT a worthy paradigm for businesses and a quality-of-life improving technology. In this paper, we provide a thorough overview on using a class of advanced machine learning techniques, namely Deep Learning (DL), to facilitate the analytics and learning in the IoT domain. We start by articulating IoT data characteristics and identifying two major treatments for IoT data from a machine learning perspective, namely IoT big data analytics and IoT streaming data analytics. We also discuss why DL is a promising approach to achieve the desired analytics in these types of data and applications. The potential of using emerging DL techniques for IoT data analytics are then discussed, and its promises and challenges are introduced. We present a comprehensive background on different DL architectures and algorithms. We also analyze and summarize major reported research attempts that leveraged DL in the IoT domain. The smart IoT devices that have incorporated DL in their intelligence background are also discussed. DL implementation approaches on the fog and cloud centers in support of IoT applications are also surveyed. Finally, we shed light on some challenges and potential directions for future research. At the end of each section, we highlight the lessons learned based on our experiments and review of the recent literature.
translated by 谷歌翻译
Currently, the network traffic control systems are mainly composed of the Internet core and wired/wireless heterogeneous backbone networks. Recently, these packet-switched systems are experiencing an explosive network traffic growth due to the rapid development of communication technologies. The existing network policies are not sophisticated enough to cope with the continually varying network conditions arising from the tremendous traffic growth. Deep learning, with the recent breakthrough in the machine learning/intelligence area, appears to be a viable approach for the network operators to configure and manage their networks in a more intelligent and autonomous fashion. While deep learning has received a significant research attention in a number of other domains such as computer vision, speech recognition, robotics, and so forth, its applications in network traffic control systems are relatively recent and garnered rather little attention. In this paper, we address this point and indicate the necessity of surveying the scattered works on deep learning applications for various network traffic control aspects. In this vein, we provide an overview of the state-of-the-art deep learning architectures and algorithms relevant to the network traffic control systems. Also, we discuss the deep learning enablers for network systems. In addition, we discuss, in detail, a new use case, i.e., deep learning based intelligent routing. We demonstrate the effectiveness of the deep learning-based routing approach in contrast with the conventional routing strategy. Furthermore, we discuss a number of open research issues, which researchers may find useful in the future. Index Terms-Machine learning, machine intelligence, artificial neural network, deep learning, deep belief system, network traffic control, routing.
translated by 谷歌翻译
Automatic Speech Recognition (ASR) has historically been a driving force behind many machine learning (ML) techniques, including the ubiquitously used hidden Markov model, discriminative learning, structured sequence learning, Bayesian learning, and adaptive learning. Moreover, ML can and occasionally does use ASR as a large-scale, realistic application to rigorously test the effectiveness of a given technique, and to inspire new problems arising from the inherently sequential and dynamic nature of speech. On the other hand, even though ASR is available commercially for some applications, it is largely an unsolved problem-for almost all applications, the performance of ASR is not on par with human performance. New insight from modern ML methodology shows great promise to advance the state-of-the-art in ASR technology. This overview article provides readers with an overview of modern ML techniques as utilized in the current and as relevant to future ASR research and systems. The intent is to foster further cross-pollination between the ML and ASR communities than has occurred in the past. The article is organized according to the major ML paradigms that are either popular already or have potential for making significant contributions to ASR technology. The paradigms presented and elaborated in this overview include: generative and discriminative learning; supervised, unsupervised, semi-supervised, and active learning; adaptive and multi-task learning; and Bayesian learning. These learning paradigms are motivated and discussed in the context of ASR technology and applications. We finally present and analyze recent developments of deep learning and learning with sparse representations, focusing on their direct relevance to advancing ASR technology.
translated by 谷歌翻译
Deep learning is currently an extremely active research area in machine learning and pattern recognition society. It has gained huge successes in a broad area of applications such as speech recognition, computer vision, and natural language processing. With the sheer size of data available today, big data brings big opportunities and transformative potential for various sectors; on the other hand, it also presents unprecedented challenges to harnessing data and information. As the data keeps getting bigger, deep learning is coming to play a key role in providing big data predictive analytics solutions. In this paper, we provide a brief overview of deep learning, and highlight current research efforts and the challenges to big data, as well as the future trends. INDEX TERMS Classifier design and evaluation, feature representation, machine learning, neural nets models, parallel processing.
translated by 谷歌翻译
机器学习算法的成功通常取决于数据表示,我们假设这是因为不同的表示可以或多或少地隐藏数据背后变异的不同解释因素。虽然可以使用特定领域知识来帮助设计表示,但也可以使用通用先验学习,并且对AI的追求正在激励设计实现这些先验的更强大的表示 - 学习算法。本文回顾了无监督特征学习和深度学习领域的最新研究成果,涵盖了概率模型,自动编码器,流形学习和深度网络的进步。这激发了关于学习良好表征,计算表示(即推理)以及表示学习,密度估计和流形学习之间的几何联系的适当目标的长期未回答的问题。
translated by 谷歌翻译
从高维和异质数据中解释用于诊断和治疗复杂疾病的医学图像仍然是改善医疗保健的关键挑战。在过去几年中,监督和非监督深度学习在医学成像和图像分析领域取得了可喜的成果。与有偏见的监督学习不同,它是监督和手动努力为算法创建类标签,无监督学习直接从数据本身获得见解,对数据进行分组并有助于在没有任何外部偏差的情况下做出数据驱动的决策。本综述系统地介绍了应用于医学图像分析的各种无监督模型,包括自动编码器及其几种变体,限制玻尔兹曼机器,深信念网络,深玻尔兹曼机器和发生对抗网络。还讨论了用于医学图像分析的未监督技术的未来研究机会和挑战。
translated by 谷歌翻译
M ost current speech recognition systems use hidden Markov models (HMMs) to deal with the temporal variability of speech and Gaussian mixture models (GMMs) to determine how well each state of each HMM fits a frame or a short window of frames of coefficients that represents the acoustic input. An alternative way to evaluate the fit is to use a feed-forward neural network that takes several frames of coefficients as input and produces posterior probabilities over HMM states as output. Deep neural networks (DNNs) that have many hidden layers and are trained using new methods have been shown to outperform GMMs on a variety of speech recognition benchmarks, sometimes by a large margin. This article provides an overview of this progress and represents the shared views of four research groups that have had recent successes in using DNNs for acoustic modeling in speech recognition.
translated by 谷歌翻译
22 0274-6638/16©2016IEEE ieee Geoscience and remote sensinG maGazine jUNE 2016 d eep-learning (DL) algorithms, which learn the representative and discriminative features in a hierarchical manner from the data, have recently become a hotspot in the machine-learning area and have been introduced into the geoscience and remote sensing (RS) community for RS big data analysis. Considering the low-level features (e.g., spectral and texture) as the bottom level, the output feature representation from the top level of the network can be directly fed into a subsequent classifier for pixel-based classification. As a matter of fact, by carefully addressing the practical demands in RS applications and designing the input-output levels of the whole network, we have found that DL is actually everywhere in RS data analysis: from the traditional topics of image preprocessing, pixel-based classification, and target recognition, to the recent challenging tasks of high-level semantic feature extraction and RS scene understanding. In this technical tutorial, a general framework of DL for RS data is provided, and the state-of-the-art DL methods in RS are regarded as special cases of input-output data combined with various deep networks and tuning tricks. Although extensive experimental results confirm the excellent performance of the DL-based algorithms in RS big data analysis, even more exciting prospects can be expected for DL in RS. Key bottlenecks and potential directions are also
translated by 谷歌翻译
声学数据提供从生物学和通信到海洋和地球科学等领域的科学和工程见解。我们调查了机器学习(ML)的进步和变革潜力,包括声学领域的深度学习。 ML是用于自动检测和利用模式印度的广泛的统计技术家族。相对于传统的声学和信号处理,ML是数据驱动的。给定足够的训练数据,ML可以发现特征之间的复杂关系。通过大量的训练数据,ML candiscover模型描述复杂的声学现象,如人类语音和混响。声学中的ML正在迅速发展,具有令人瞩目的成果和未来的重大前景。我们首先介绍ML,然后在五个声学研究领域强调MLdevelopments:语音处理中的源定位,海洋声学中的源定位,生物声学,地震探测和日常场景中的环境声音。
translated by 谷歌翻译
消除非静止环境噪声的负面影响是自动语音识别的基础研究课题,仍然是一项重要的挑战。数据驱动的监督方法,包括基于深度神经网络的方法,最近已成为传统无监督方法的潜在替代方案,并且具有足够的训练,可以减轻无监督方法在各种真实声学环境中的缺点。有鉴于此,我们回顾了最近开发的,具有代表性的深度学习方法,用于解决语音的非固定加性和卷积退化问题,旨在为那些参与开发环境健全的语音识别系统的人提供指导。我们分别讨论了为语音识别系统的前端和后端开发的单通道和多通道技术,以及联合前端和后端培训框架。
translated by 谷歌翻译
The success of machine learning algorithms generally depends on data representation, and we hypothesize that this is because different representations can entangle and hide more or less the different explanatory factors of variation behind the data. Although domain knowledge can be used to help design representations, learning can also be used, and the quest for AI is motivating the design of more powerful representation-learning algorithms. This paper reviews recent work in the area of unsupervised feature learning and deep learning, covering advances in probabilistic models, manifold learning, and deep learning. This motivates longer-term unanswered questions about the appropriate objectives for learning good representations, for computing representations (i.e., inference), and the geometrical connections between representation learning, density estimation and manifold learning.
translated by 谷歌翻译
Deep learning algorithms are a subset of the machine learning algorithms, which aim at discovering multiple levels of distributed representations. Recently, numerous deep learning algorithms have been proposed to solve traditional arti-cial intelligence problems. This chapter aims to review the state-of-the-art in deep learning algorithms in computer vision by highlighting the contributions and challenges from about 200 recent research papers. It rst gives an overview of various deep learning approaches and their recent developments, and then briey describes their applications in diverse vision tasks, such as image classication, object detection, image retrieval, semantic segmentation and human pose estimation. Finally, the chapter summarizes the future trends and challenges in designing and training deep neural networks. 7 2. A COMPREHENSIVE REVIEW OF DEEP LEARNING METHODS AND APPLICATIONS 2.1 Introduction Deep learning is a subeld of machine learning which attempts to learn high-level abstractions in data by utilizing hierarchical architectures. It is an emerging approach and has been widely applied in traditional articial intelligence domains , such as semantic parsing [11], natural language processing [12], computer vision [13, 14] and many more. There are mainly three important reasons for the booming of deep learning today: the dramatically increased chip processing abilities (e.g. GPU units), the signicantly lowered cost of computing hardware, and the considerable advances in the machine learning algorithms [15]. Deep learning approaches have been extensively reviewed and discussed in recent years [1519]. Among those Schmidhuber et al. [17] emphasized the important inspirations and technical contributions in a historical timeline format, while Bengio [18] examined the challenges of deep learning research and proposed a few forward-looking research directions. Deep networks have shown to be successful for computer vision tasks because they can extract appropriate features while jointly performing discrimination [15, 20]. In recent ImageNet Large Scale Visual Recognition Challenge (ILSVRC) competitions [21], deep learning methods have been widely utilized by researchers and achieved top accuracy scores. This chapter is intended to be useful to general neural computing, computer vision and multimedia researchers who are interested in state-of-the-art deep learning studies for computer vision. It provides an overview of various deep learning algorithms and their applications, especially those that can be applied in the computer vision domain. The remainder of this chapter is organized as follows: In Section 2.2, we divide the deep learning algorithms into four categories: Convolutional Neural Networks, Restricted Boltzmann Machines, Autoencoder and Sparse Coding. Some well-known models in these categories as well as their developments are listed. We also describe the contributions and limitations for these models in this section. In Section 2.
translated by 谷歌翻译
到目前为止,深度学习和深层体系结构正在成为许多实际应用中最好的机器学习方法,例如降低数据的维度,图像分类,语音识别或对象分割。事实上,许多领先的技术公司,如谷歌,微软或IBM,正在研究和使用他们系统中的深层架构来取代其他传统模型。因此,提高这些模型的性能可以在机器学习领域产生强烈的影响。然而,深度学习是一个快速发展的研究领域,在过去几年中发现了许多核心方法和范例。本文将首先作为深度学习的简短总结,试图包括本研究领域中所有最重要的思想。基于这一知识,我们提出并进行了一些实验,以研究基于自动编程(ADATE)改进深度学习的可能性。尽管我们的实验确实产生了良好的结果,但由于时间有限以及当前ADATE版本的局限性,我们还有更多的可能性无法尝试。我希望这篇论文可以促进关于这个主题的未来工作,特别是在ADATE的下一个版本中。本文还简要分析了ADATEsystem的功能,这对于想要了解其功能的其他研究人员非常有用。
translated by 谷歌翻译
This paper reviews recent results in audiovisual fusion and discusses main challenges in the area with a focus on desynchronization of the two modalities and the issue of training and testing where one of the modalities might be absent from testing. ABSTRACT | In this paper, we review recent results on audiovisual (AV) fusion. We also discuss some of the challenges and report on approaches to address them. One important issue in AV fusion is how the modalities interact and influence each other. This review will address this question in the context of AV speech processing, and especially speech recognition, where one of the issues is that the modalities both interact but also sometimes appear to desynchronize from each other. An additional issue that sometimes arises is that one of the modalities may be missing at test time, although it is available at training time; for example, it may be possible to collect AV training data while only having access to audio at test time. We will review approaches to address this issue from the area of multiview learning, where the goal is to learn a model or representation for each of the modalities separately while taking advantage of the rich multimodal training data. In addition to multiview learning, we also discuss the recent application of deep learning (DL) toward AV fusion. We finally draw conclusions and offer our assessment of the future in the area of AV fusion.
translated by 谷歌翻译
自动人类情感识别是实现更自然的人机交互的关键一步。最近的趋势包括使用视听和生理传感器的融合在野外进行识别,这是传统机器学习算法的挑战性设置。自2010年以来,新的深度学习算法越来越多地应用于该领域。在本文中,分析了2010年至2017年间人类情感识别的文献,特别关注使用深度神经网络的方法。通过根据他们对浅层或深层架构的使用对950项研究进行分类,我们能够展示深度学习的趋势。回顾一下采用深度神经网络的233个研究的子集,我们全面量化了该领域的应用。我们发现深度学习用于学习(i)空间特征表示,(ii)时间特征表示,以及(iii)多模态传感器数据的联合特征表示。示例性的最先进的架构说明了进展。我们的研究结果表明,深层次的体系结构将在人类情感识别中发挥作用,并可作为研究相关应用的研究人员的参考点。
translated by 谷歌翻译
深度卷积神经网络(CNN)是一种特殊类型的神经网络,它在各种竞争性基准上展示了最先进的结果。通过使用多个非线性特征提取阶段,可以自动从数据中获取分层表示,从而在很大程度上实现了深度CNN的强大学习能力。大量数据的可用性和硬件处理单元的改进加速了CNN中的研究,并且最近报道了非常有趣的深CNN架构。最近在CNN深层架构中用于在具有挑战性的基准测试中实现高性能的竞赛表明,创新的架构思想以及参数优化可以改善各种视觉相关任务的CNN性能。在这方面,已经探索了CNN设计中的不同想法,例如使用不同的激活和损失函数,参数优化,正则化和处理单元的重构。但是,代表能力的重大改进是通过重组加工单位实现的。特别是,使用块作为结构单元而不是层的想法正在获得实质性的认识。因此,该调查侧重于最近报道的CNN架构中存在的内在分类,因此,将CNN架构中的近期创新分为七个不同的类别。这七个类别基于空间利用,深度,多路径,宽度,特征图开发,渠道提升和关注。此外,它涵盖了对CNN组件的基本理解,并阐明了CNN的当前挑战和应用。
translated by 谷歌翻译
鉴于最近深度学习的发展激增,本文提供了对音频信号处理的最新深度学习技术的回顾。语音,音乐和环境声音处理被并排考虑,以指出领域之间的相似点和不同点,突出一般方法,问题,关键参考和区域之间相互交流的可能性。回顾了主要特征表示(特别是log-mel光谱和原始波形)和deeplearning模型,包括卷积神经网络,长期短期记忆体系结构的变体,以及更多音频特定的神经网络模型。随后,涵盖了突出的深度学习应用领域,即音频识别(自动语音识别,音乐信息检索,环境声音检测,定位和跟踪)和合成与转换(源分离,音频增强,语音,声音和音乐合成的生成模型)。最后,确定了应用于音频信号处理的深度学习的关键问题和未来问题。
translated by 谷歌翻译
The Annual Review of Biomedical Engineering is online at bioeng.annualreviews.org Abstract This review covers computer-assisted analysis of images in the field of medical imaging. Recent advances in machine learning, especially with regard to deep learning, are helping to identify, classify, and quantify patterns in medical images. At the core of these advances is the ability to exploit hierarchical feature representations learned solely from data, instead of features designed by hand according to domain-specific knowledge. Deep learning is rapidly becoming the state of the art, leading to enhanced performance in various medical applications. We introduce the fundamentals of deep learning methods and review their successes in image registration, detection of anatomical and cellular structures, tissue segmentation, computer-aided disease diagnosis and prognosis, and so on. We conclude by discussing research issues and suggesting future directions for further improvement.
translated by 谷歌翻译
随着面部表情识别(FER)从实验室控制到具有挑战性的野外条件的转变以及深度学习技术在各个领域的重新获得,深度神经网络越来越多地被用于学习自动FER的判别表示。最近的深度FER系统通常关注两个重要问题:由于缺乏足够的训练数据而引起的过度拟合和与表达无关的变化,例如照明,头部姿势和身份。在本文中,我们提供了深度FER的综合调查,包括数据集和算法,提供了对这些内在问题的见解。首先,我们描述了深度FER系统的标准流水线,并提供了相关的背景知识和每个阶段适用实施的建议。然后,我们介绍了在文献中广泛使用的可用数据集,并为这些数据集提供了可接受的数据选择和评估原则。对于深度FER的现有技术,我们回顾了现有的新型深度神经网络和相关的训练策略,这些策略是针对基于静态图像和动态图像序列的FER而设计的,并讨论了优势和局限性。本节还总结了广泛使用的基准测试的竞争性能。然后,我们将调查扩展到其他相关问题和应用场景。最后,我们回顾了该领域的其余挑战和相应的机会,以及强大的深FER系统设计的未来发展方向。
translated by 谷歌翻译