In this invited paper, my overview material on the same topic as presented in the plenary overview session of APSIPA-2011 and the tutorial material presented in the same conference [1] are expanded and updated to include more recent developments in deep learning. The previous and the updated materials cover both theory and applications, and analyze its future directions. The goal of this tutorial survey is to introduce the emerging area of deep learning or hierarchical learning to the APSIPA community. Deep learning refers to a class of machine learning techniques, developed largely since 2006, where many stages of non-linear information processing in hierarchical architectures are exploited for pattern classification and for feature learning. In the more recent literature, it is also connected to representation learning, which involves a hierarchy of features or concepts where higher-level concepts are defined from lower-level ones and where the same lower-level concepts help to define higher-level ones. In this tutorial survey, a brief history of deep learning research is discussed first. Then, a classificatory scheme is developed to analyze and summarize major work reported in the recent deep learning literature. Using this scheme, I provide a taxonomy-oriented survey on the existing deep architectures and algorithms in the literature, and categorize them into three classes: generative, discriminative, and hybrid. Three representative deep architectures-deep autoencoders, deep stacking networks with their generalization to the temporal domain (recurrent networks), and deep neural networks (pretrained with deep belief networks)-one in each of the three classes, are presented in more detail. Next, selected applications of deep learning are reviewed in broad areas of signal and information processing including audio/speech, image/vision, multimodality, language modeling, natural language processing, and information retrieval. Finally, future directions of deep learning are discussed and analyzed.
translated by 谷歌翻译
New waves of consumer-centric applications, such as voice search and voice interaction with mobile devices and home entertainment systems, increasingly require automatic speech recognition (ASR) to be robust to the full range of real-world noise and other acoustic distorting conditions. Despite its practical importance, however, the inherent links between and distinctions among the myriad of methods for noise-robust ASR have yet to be carefully studied in order to advance the field further. To this end, it is critical to establish a solid, consistent, and common mathematical foundation for noise-robust ASR, which is lacking at present. This article is intended to fill this gap and to provide a thorough overview of modern noise-robust techniques for ASR developed over the past 30 years. We emphasize methods that are proven to be successful and that are likely to sustain or expand their future applicability. We distill key insights from our comprehensive overview in this field and take a fresh look at a few old problems, which nevertheless are still highly relevant today. Specifically, we have analyzed and categorized a wide range of noise-robust techniques using five different criteria: 1) feature-domain vs. model-domain processing, 2) the use of prior knowledge about the acoustic environment distortion, 3) the use of explicit environment-distortion models, 4) deterministic vs. uncertainty processing, and 5) the use of acoustic models trained jointly with the same feature enhancement or model adaptation process used in the testing stage. With this taxonomy-oriented review, we equip the reader with the insight to choose among techniques and with the awareness of the performance-complexity tradeoffs. The pros and cons of using different noise-robust ASR techniques in practical application scenarios are provided as a guide to interested practitioners. The current challenges and future research directions in this field is also carefully analyzed.
translated by 谷歌翻译
声学数据提供从生物学和通信到海洋和地球科学等领域的科学和工程见解。我们调查了机器学习(ML)的进步和变革潜力,包括声学领域的深度学习。 ML是用于自动检测和利用模式印度的广泛的统计技术家族。相对于传统的声学和信号处理,ML是数据驱动的。给定足够的训练数据,ML可以发现特征之间的复杂关系。通过大量的训练数据,ML candiscover模型描述复杂的声学现象,如人类语音和混响。声学中的ML正在迅速发展,具有令人瞩目的成果和未来的重大前景。我们首先介绍ML,然后在五个声学研究领域强调MLdevelopments:语音处理中的源定位,海洋声学中的源定位,生物声学,地震探测和日常场景中的环境声音。
translated by 谷歌翻译
机器学习算法的成功通常取决于数据表示,我们假设这是因为不同的表示可以或多或少地隐藏数据背后变异的不同解释因素。虽然可以使用特定领域知识来帮助设计表示,但也可以使用通用先验学习,并且对AI的追求正在激励设计实现这些先验的更强大的表示 - 学习算法。本文回顾了无监督特征学习和深度学习领域的最新研究成果,涵盖了概率模型,自动编码器,流形学习和深度网络的进步。这激发了关于学习良好表征,计算表示(即推理)以及表示学习,密度估计和流形学习之间的几何联系的适当目标的长期未回答的问题。
translated by 谷歌翻译
The success of machine learning algorithms generally depends on data representation, and we hypothesize that this is because different representations can entangle and hide more or less the different explanatory factors of variation behind the data. Although domain knowledge can be used to help design representations, learning can also be used, and the quest for AI is motivating the design of more powerful representation-learning algorithms. This paper reviews recent work in the area of unsupervised feature learning and deep learning, covering advances in probabilistic models, manifold learning, and deep learning. This motivates longer-term unanswered questions about the appropriate objectives for learning good representations, for computing representations (i.e., inference), and the geometrical connections between representation learning, density estimation and manifold learning.
translated by 谷歌翻译
translated by 谷歌翻译
This paper gives an overview of automatic speak er recognition technology, with an emphasis on text-independent recognition. Speak er recognition has been studied actively for several decades. W e give an overview of both the classical and the state-of-the-art methods. W e start with the fundamentals of automatic speak er recognition, concerning feature extraction and speak er modeling. W e elaborate advanced computational techniques to address robustness and session variability. The recent progress from vectors towards supervectors opens up a new area of exploration and represents a technology trend. W e also provide an overview of this recent development and discuss the evaluation methodology of speak er recognition systems. W e conclude the paper with discussion on future directions.
translated by 谷歌翻译
Since the proposal of a fast learning algorithm for deep belief networks in 2006, the deep learning techniques have drawn ever-increasing research interests because of their inherent capability of overcoming the drawback of traditional algorithms dependent on hand-designed features. Deep learning approaches have also been found to be suitable for big data analysis with successful applications to computer vision, pattern recognition, speech recognition, natural language processing, and recommendation systems. In this paper, we discuss some widely-used deep learning architectures and their practical applications. An up-to-date overview is provided on four deep learning architectures, namely, autoencoder, convolutional neural network, deep belief network, and restricted Boltzmann machine. Different types of deep neural networks are surveyed and recent progresses are summarized. Applications of deep learning techniques on some selected areas (speech recognition, pattern recognition and computer vision) are highlighted. A list of future research topics are finally given with clear justifications.
translated by 谷歌翻译
This paper reviews recent results in audiovisual fusion and discusses main challenges in the area with a focus on desynchronization of the two modalities and the issue of training and testing where one of the modalities might be absent from testing. ABSTRACT | In this paper, we review recent results on audiovisual (AV) fusion. We also discuss some of the challenges and report on approaches to address them. One important issue in AV fusion is how the modalities interact and influence each other. This review will address this question in the context of AV speech processing, and especially speech recognition, where one of the issues is that the modalities both interact but also sometimes appear to desynchronize from each other. An additional issue that sometimes arises is that one of the modalities may be missing at test time, although it is available at training time; for example, it may be possible to collect AV training data while only having access to audio at test time. We will review approaches to address this issue from the area of multiview learning, where the goal is to learn a model or representation for each of the modalities separately while taking advantage of the rich multimodal training data. In addition to multiview learning, we also discuss the recent application of deep learning (DL) toward AV fusion. We finally draw conclusions and offer our assessment of the future in the area of AV fusion.
translated by 谷歌翻译
Modeling dynamic structure of speech is a novel paradigm in speech recognition research within the genera-tive modeling framework, and it offers a potential to overcome limitations of the current hidden Markov modeling approach. Analogous to structured language models where syntactic structure is exploited to represent long-distance relationships among words [5], the structured speech model described in this paper makes use of the dynamic structure in the hidden vocal tract resonance space to characterize long-span contextual influence among phonetic units. A general overview is provided first on hierarchically classified types of dynamic speech models in the literature. A detailed account is then given for a specific model type called the hidden trajectory model, and we describe detailed steps of model construction and the parameter estimation algorithms. We show how the use of resonance target parameters and their temporal filtering enables joint modeling of long-span coarticulation and phonetic reduction effects. Experiments on phonetic recognition evaluation demonstrate superior recognizer performance over a modern hidden Markov model-based system. Error analysis shows that the greatest performance gain occurs within the sonorant speech class.
translated by 谷歌翻译
M ost current speech recognition systems use hidden Markov models (HMMs) to deal with the temporal variability of speech and Gaussian mixture models (GMMs) to determine how well each state of each HMM fits a frame or a short window of frames of coefficients that represents the acoustic input. An alternative way to evaluate the fit is to use a feed-forward neural network that takes several frames of coefficients as input and produces posterior probabilities over HMM states as output. Deep neural networks (DNNs) that have many hidden layers and are trained using new methods have been shown to outperform GMMs on a variety of speech recognition benchmarks, sometimes by a large margin. This article provides an overview of this progress and represents the shared views of four research groups that have had recent successes in using DNNs for acoustic modeling in speech recognition.
translated by 谷歌翻译
今天的电信网络已成为大量广泛异构数据的来源。该信息可以从网络交通轨迹,网络警报,信号质量指示符,用户行为数据等中检索。需要高级数学工具从这些数据中提取有意义的信息,并从网络生成的数据中做出与网络的正常运行有关的决策。在这些数学工具中,机器学习(ML)被认为是执行网络数据分析和实现自动网络自配置和故障管理的最具前景的方法之一。 ML技术在光通信网络领域的应用受到光网络在最近几年所面临的网络复杂性的前所未有的增长的推动。这种复杂性的增加是由于引入了一系列可调和相互依赖的系统参数(例如,路由配置,调制格式,符号率,编码方案等),这些参数通过使用相干传输/接收技术,高级数字信号处理和光纤传播中非线性效应的补偿。在本文中,我们概述了ML在光通信和网络中的应用。我们对涉及该主题的相关文献进行分类和调查,并且我们还为对该领域感兴趣的研究人员和从业者提供了ML的入门教程。虽然最近出现了大量的研究论文,但ML光学网络的应用仍处于起步阶段:为了激发这一领域的进一步工作,我们总结了该论文提出了新的可能的研究方向。
translated by 谷歌翻译
Objective: Most current Electroencephalography (EEG)-based Brain-Computer Interfaces (BCIs) are based on machine learning algorithms. There is a large diversity of classifier types that are used in this field, as described in our 2007 review paper. Now, approximately 10 years after this review publication, many new algorithms have been developed and tested to classify EEG signals in BCIs. The time is therefore ripe for an updated review of EEG classification algorithms for BCIs. Approach: We surveyed the BCI and machine learning literature from 2007 to 2017 to identify the new classification approaches that have been investigated to design BCIs. We synthesize these studies in order to present such algorithms, to report how they were used for BCIs, what were the outcomes, and to identify their pros and cons. Main results: We found that the recently designed classification algorithms for EEG-based BCIs can be divided into four main categories: adaptive classifiers, matrix and tensor classifiers, transfer learning and deep learning, plus a few other miscellaneous classifiers. Among these, adaptive classifiers were demonstrated to be generally superior to static ones, even with unsupervised adaptation. Transfer learning can also prove useful although the benefits of transfer learning remain unpredictable. Riemannian geometry-based methods have reached state-of-the-art performances on multiple BCI problems and deserve to be explored more thoroughly, along with tensor-based methods. Shrinkage linear discriminant analysis and random forests also appear particularly useful for small training samples settings. On the other hand, deep learning methods have not yet shown convincing improvement over state-of-the-art BCI methods. Significance: This paper provides a comprehensive overview of the modern classification algorithms used in EEG-based BCIs, presents the principles of these Review of Classification Algorithms for EEG-based BCI 2 methods and guidelines on when and how to use them. It also identifies a number of challenges to further advance EEG classification in BCI.
translated by 谷歌翻译
Often we wish to predict a large number of variables that depend on eachother as well as on other observed variables. Structured prediction methods areessentially a combination of classification and graphical modeling, combiningthe ability of graphical models to compactly model multivariate data with theability of classification methods to perform prediction using large sets ofinput features. This tutorial describes conditional random fields, a popularprobabilistic method for structured prediction. CRFs have seen wide applicationin natural language processing, computer vision, and bioinformatics. Wedescribe methods for inference and parameter estimation for CRFs, includingpractical issues for implementing large scale CRFs. We do not assume previousknowledge of graphical modeling, so this tutorial is intended to be useful topractitioners in a wide variety of fields.
translated by 谷歌翻译
This review gives a general overview of techniques used in statistical parametric speech synthesis. One instance of these techniques, called hidden Markov model (HMM)-based speech synthesis, has recently been demonstrated to be very effective in synthesizing acceptable speech. This review also contrasts these techniques with the more conventional technique of unit-selection synthesis that has dominated speech synthesis over the last decade. The advantages and drawbacks of statistical parametric synthesis are highlighted and we identify where we expect key developments to appear in the immediate future.
translated by 谷歌翻译
Deep learning research aims at discovering learning algorithms that discovermultiple levels of distributed representations, with higher levels representingmore abstract concepts. Although the study of deep learning has already led toimpressive theoretical results, learning algorithms and breakthroughexperiments, several challenges lie ahead. This paper proposes to examine someof these challenges, centering on the questions of scaling deep learningalgorithms to much larger models and datasets, reducing optimizationdifficulties due to ill-conditioning or local minima, designing more efficientand powerful inference and sampling procedures, and learning to disentangle thefactors of variation underlying the observed data. It also proposes a fewforward-looking research directions aimed at overcoming these challenges.
translated by 谷歌翻译
We are honored to welcome you to the 2nd International Workshop on Advanced Analyt-ics and Learning on Temporal Data (AALTD), which is held in Riva del Garda, Italy, on September 19th, 2016, co-located with The European Conference on Machine Learning and Principles and Practice of Knowledge Discovery in Databases (ECML/PKDD 2016). The aim of this workshop is to bring together researchers and experts in machine learning, data mining, pattern analysis and statistics to share their challenging issues and advance researches on temporal data analysis. Analysis and learning from temporal data cover a wide scope of tasks including learning metrics, learning representations, unsupervised feature extraction, clustering and classification. This volume contains the conference program, an abstract of the invited keynotes and the set of regular papers accepted to be presented at the conference. Each of the submitted papers was reviewed by at least two independent reviewers, leading to the selection of eleven papers accepted for presentation and inclusion into the program and these proceedings. The contributions are given by the alphabetical order, by surname. The keynote given by Marco Cuturi on "Regularized DTW Divergences for Time Se-ries" focuses on the definition of alignment kernels for time series that can later be used at the core of standard machine learning algorithms. The one given by Tony Bagnall on "The Great Time Series Classification Bake Off" presents an important attempt to experimentally compare performance of a wide range of time series classifiers, together with ensemble classifiers that aim at combining existing classifiers to improve classification quality. Accepted papers spanned from innovative ideas on analytic of temporal data, including promising new approaches and covering both practical and theoretical issues. We wish to thank the ECML PKDD council members for giving us the opportunity to hold the AALTD workshop within the framework of the ECML/PKDD Conference and the members of the local organizing committee for their support. The organizers of the AALTD conference gratefully thank the financial support of the Université de Rennes 2, MODES and Universidade da Coruña. Last but not least, we wish to thank the contributing authors for the high quality works and all members of the Reviewing Committee for their invaluable assistance in the iii selection process. All of them have significantly contributed to the success of AALTD 2106. We sincerely hope that the workshop participants have a great and fruitful time at the conference.
translated by 谷歌翻译
We are honored to welcome you to the 2nd International Workshop on Advanced Analyt-ics and Learning on Temporal Data (AALTD), which is held in Riva del Garda, Italy, on September 19th, 2016, co-located with The European Conference on Machine Learning and Principles and Practice of Knowledge Discovery in Databases (ECML/PKDD 2016). The aim of this workshop is to bring together researchers and experts in machine learning, data mining, pattern analysis and statistics to share their challenging issues and advance researches on temporal data analysis. Analysis and learning from temporal data cover a wide scope of tasks including learning metrics, learning representations, unsupervised feature extraction, clustering and classification. This volume contains the conference program, an abstract of the invited keynotes and the set of regular papers accepted to be presented at the conference. Each of the submitted papers was reviewed by at least two independent reviewers, leading to the selection of eleven papers accepted for presentation and inclusion into the program and these proceedings. The contributions are given by the alphabetical order, by surname. The keynote given by Marco Cuturi on "Regularized DTW Divergences for Time Se-ries" focuses on the definition of alignment kernels for time series that can later be used at the core of standard machine learning algorithms. The one given by Tony Bagnall on "The Great Time Series Classification Bake Off" presents an important attempt to experimentally compare performance of a wide range of time series classifiers, together with ensemble classifiers that aim at combining existing classifiers to improve classification quality. Accepted papers spanned from innovative ideas on analytic of temporal data, including promising new approaches and covering both practical and theoretical issues. We wish to thank the ECML PKDD council members for giving us the opportunity to hold the AALTD workshop within the framework of the ECML/PKDD Conference and the members of the local organizing committee for their support. The organizers of the AALTD conference gratefully thank the financial support of the Université de Rennes 2, MODES and Universidade da Coruña. Last but not least, we wish to thank the contributing authors for the high quality works and all members of the Reviewing Committee for their invaluable assistance in the iii selection process. All of them have significantly contributed to the success of AALTD 2106. We sincerely hope that the workshop participants have a great and fruitful time at the conference.
translated by 谷歌翻译