Y.Cho和L.K.提出了使用多层内核机(MKMS)的基于内核的深度学习。扫罗\ cite {saul}。在MKMS中,它们仅在基于内核PCA的特征提取的图层中使用一个内核(arc-casine内核)。我们建议通过在无监督的学习策略之后通过许多内核的凸组合使用多个内核。通过在Mnist DataSet的图像背景中添加随机噪声生成的\ texit {mnist-back-rand},\ textit {mnist-back-image}和\ textit {mnist-resti-image}数据集进行了实证研究。实验结果表明,MKM中的MKL赢得了原始数据的更好表示并提高了分类器性能。
translated by 谷歌翻译
We introduce a new family of positive-definite kernel functions that mimic the computation in large, multilayer neural nets. These kernel functions can be used in shallow architectures, such as support vector machines (SVMs), or in deep kernel-based architectures that we call multilayer kernel machines (MKMs). We evaluate SVMs and MKMs with these kernel functions on problems designed to illustrate the advantages of deep architectures. On several problems, we obtain better results than previous, leading benchmarks from both SVMs with Gaussian kernels as well as deep belief nets.
translated by 谷歌翻译
The success of machine learning algorithms generally depends on data representation, and we hypothesize that this is because different representations can entangle and hide more or less the different explanatory factors of variation behind the data. Although specific domain knowledge can be used to help design representations, learning with generic priors can also be used, and the quest for AI is motivating the design of more powerful representation-learning algorithms implementing such priors. This paper reviews recent work in the area of unsupervised feature learning and deep learning, covering advances in probabilistic models, auto-encoders, manifold learning, and deep networks. This motivates longer-term unanswered questions about the appropriate objectives for learning good representations, for computing representations (i.e., inference), and the geometrical connections between representation learning, density estimation and manifold learning.
translated by 谷歌翻译
We explore an original strategy for building deep networks, based on stacking layers of denoising autoencoders which are trained locally to denoise corrupted versions of their inputs. The resulting algorithm is a straightforward variation on the stacking of ordinary autoencoders. It is however shown on a benchmark of classification problems to yield significantly lower classification error, thus bridging the performance gap with deep belief networks (DBN), and in several cases surpassing it. Higher level representations learnt in this purely unsupervised fashion also help boost the performance of subsequent SVM classifiers. Qualitative experiments show that, contrary to ordinary autoencoders, denoising autoencoders are able to learn Gabor-like edge detectors from natural image patches and larger stroke detectors from digit images. This work clearly establishes the value of using a denoising criterion as a tractable unsupervised objective to guide the learning of useful higher level representations.
translated by 谷歌翻译
随机特征方法已广泛用于大型机器学习中的内核近似。最近的一些研究已经探索了数据相关的功能,修改随机特征的随机oracle进行采样。虽然该领域的提出技术提高了近似值,但它们通常在单个学习任务上验证它们的适用性。在本文中,我们提出了一种特定于任务的评分规则,用于选择随机特征,该规则可以用于不同的应用程序具有一些调整。我们限制了我们对规范相关性分析(CCA)的注意,我们提供了一种新颖的,原则性指南,用于找到最大化规范相关性的得分函数。我们证明了这种方法,称为ORCCA,可以胜过(期望)具有默认内核的相应内核CCA。数值实验验证ORCCA明显优于CCA任务中的其他近似技术。
translated by 谷歌翻译
Complexity theory of circuits strongly suggests that deep architectures can be much more efficient (sometimes exponentially) than shallow architectures, in terms of computational elements required to represent some functions. Deep multi-layer neural networks have many levels of non-linearities allowing them to compactly represent highly non-linear and highly-varying functions. However, until recently it was not clear how to train such deep networks, since gradient-based optimization starting from random initialization appears to often get stuck in poor solutions. Hinton et al. recently introduced a greedy layer-wise unsupervised learning algorithm for Deep Belief Networks (DBN), a generative model with many layers of hidden causal variables. In the context of the above optimization problem, we study this algorithm empirically and explore variants to better understand its success and extend it to cases where the inputs are continuous or where the structure of the input distribution is not revealing enough about the variable to be predicted in a supervised task. Our experiments also confirm the hypothesis that the greedy layer-wise unsupervised training strategy mostly helps the optimization, by initializing weights in a region near a good local minimum, giving rise to internal distributed representations that are high-level abstractions of the input, bringing better generalization.
translated by 谷歌翻译
In many recent object recognition systems, feature extraction stages are generally composed of a filter bank, a non-linear transformation, and some sort of feature pooling layer. Most systems use only one stage of feature extraction in which the filters are hard-wired, or two stages where the filters in one or both stages are learned in supervised or unsupervised mode. This paper addresses three questions: 1. How does the non-linearities that follow the filter banks influence the recognition accuracy? 2. does learning the filter banks in an unsupervised or supervised manner improve the performance over random filters or hardwired filters? 3. Is there any advantage to using an architecture with two stages of feature extraction, rather than one? We show that using non-linearities that include rectification and local contrast normalization is the single most important ingredient for good accuracy on object recognition benchmarks. We show that two stages of feature extraction yield better accuracy than one. Most surprisingly, we show that a two-stage system with random filters can yield almost 63% recognition rate on Caltech-101, provided that the proper non-linearities and pooling layers are used. Finally, we show that with supervised refinement, the system achieves state-of-the-art performance on NORB dataset (5.6%) and unsupervised pre-training followed by supervised refinement produces good accuracy on Caltech-101 (> 65%), and the lowest known error rate on the undistorted, unprocessed MNIST dataset (0.53%).
translated by 谷歌翻译
The problem of domain generalization is to take knowledge acquired from a number of related domains where training data is available, and to then successfully apply it to previously unseen domains. We propose a new feature learning algorithm, Multi-Task Autoencoder (MTAE), that provides good generalization performance for crossdomain object recognition.Our algorithm extends the standard denoising autoencoder framework by substituting artificially induced corruption with naturally occurring inter-domain variability in the appearance of objects. Instead of reconstructing images from noisy versions, MTAE learns to transform the original image into analogs in multiple related domains. It thereby learns features that are robust to variations across domains. The learnt features are then used as inputs to a classifier.We evaluated the performance of the algorithm on benchmark image recognition datasets, where the task is to learn features from multiple datasets and to then predict the image label from unseen datasets. We found that (denoising) MTAE outperforms alternative autoencoder-based models as well as the current state-of-the-art algorithms for domain generalization.
translated by 谷歌翻译
机器学习对图像和视频数据的应用通常会产生高维特征空间。有效的功能选择技术确定了一个判别特征子空间,该子空间可降低计算和建模成本,而绩效很少。提出了一种新颖的监督功能选择方法,用于这项工作中的机器学习决策。所得测试分别称为分类和回归问题的判别功能测试(DFT)和相关特征测试(RFT)。 DFT和RFT程序进行了详细描述。此外,我们将DFT和RFT的有效性与几种经典特征选择方法进行了比较。为此,我们使用LENET-5为MNIST和时尚流行数据集获得的深度功能作为说明性示例。其他具有手工制作和基因表达功能的数据集也包括用于性能评估。实验结果表明,DFT和RFT可以在保持较高的决策绩效的同时明确,稳健地选择较低的尺寸特征子空间。
translated by 谷歌翻译
自动编码器已被广泛用作降低数据维度的非线性工具。虽然自动编码器不使用标签信息,但质心编码器(CE)\ cite {ghosh20222superpised}在其学习过程中使用类标签。在这项研究中,我们提出了使用Centroid-编码器结构进行稀疏优化,以确定一组最小的特征,以区分两个或多个类别。所得的算法,稀疏的质心编码器(SCE),使用稀疏性诱导$ \ ell_1 $ - norm提取歧视性特征,同时将点映射到其类质心。 SCE的一个关键属性是,它可以从多模式数据集(即其类似乎具有多个群集的数据集)中提取信息性功能。该算法应用于各种现实世界数据集,包括单细胞数据,高维生物学数据,图像数据,语音数据和加速度计传感器数据。我们将我们的方法与各种最先进的特征选择技术进行了比较,包括监督的混凝土自动编码器(SCAE),功能选择网络(FSNET),深度特征选择(DFS),随机门(STG)和Lassonet。我们从经验上表明,SCE特征通常比隔离测试集中的其他方法产生更好的分类精度。
translated by 谷歌翻译
特征选择是机器学习的重要过程。它通过选择对预测目标贡献最大的功能来构建一个可解释且健壮的模型。但是,大多数成熟的特征选择算法,包括受监督和半监督,无法完全利用特征之间的复杂潜在结构。我们认为,这些结构对于特征选择过程非常重要,尤其是在缺乏标签并且数据嘈杂的情况下。为此,我们创新地向特征选择问题(即基于批量注意的自我划分特征选择(A-SFS))进行了创新的深入的自我监督机制。首先,多任务自我监督的自动编码器旨在在两个借口任务的支持下揭示功能之间的隐藏结构。在来自多自制的学习模型的集成信息的指导下,批处理注意机制旨在根据基于批处理的特征选择模式产生特征权重,以减轻少数嘈杂数据引入的影响。将此方法与14个主要强大基准进行了比较,包括LightGBM和XGBoost。实验结果表明,A-SFS在大多数数据集中达到了最高的精度。此外,这种设计大大降低了对标签的依赖,仅需1/10个标记的数据即可达到与那些先进的基线相同的性能。结果表明,A-SFS对于嘈杂和缺少数据也是最强大的。
translated by 谷歌翻译
手写数字识别(HDR)是光学特征识别(OCR)领域中最具挑战性的任务之一。不管语言如何,HDR都存在一些固有的挑战,这主要是由于个人跨个人的写作风格的变化,编写媒介和环境的变化,无法在反复编写任何数字等时保持相同的笔触。除此之外,特定语言数字的结构复杂性可能会导致HDR的模棱两可。多年来,研究人员开发了许多离线和在线HDR管道,其中不同的图像处理技术与传统的机器学习(ML)基于基于的和/或基于深度学习(DL)的体系结构相结合。尽管文献中存在有关HDR的广泛审查研究的证据,例如:英语,阿拉伯语,印度,法尔西,中文等,但几乎没有对孟加拉人HDR(BHDR)的调查,这缺乏对孟加拉语HDR(BHDR)的研究,而这些调查缺乏对孟加拉语HDR(BHDR)的研究。挑战,基础识别过程以及可能的未来方向。在本文中,已经分析了孟加拉语手写数字的特征和固有的歧义,以及二十年来最先进的数据集的全面见解和离线BHDR的方法。此外,还详细讨论了一些涉及BHDR的现实应用特定研究。本文还将作为对离线BHDR背后科学感兴趣的研究人员的汇编,煽动了对相关研究的新途径的探索,这可能会进一步导致在不同应用领域对孟加拉语手写数字进行更好的离线认识。
translated by 谷歌翻译
合奏学习结合了几个单独的模型,以获得更好的概括性能。目前,与浅层或传统模型相比,深度学习体系结构表现更好。深度合奏学习模型结合了深度学习模型以及整体学习的优势,使最终模型具有更好的概括性能。本文回顾了最先进的深度合奏模型,因此是研究人员的广泛摘要。合奏模型广泛地分类为包装,增强,堆叠,基于负相关的深度合奏模型,显式/隐式合奏,同质/异质合奏,基于决策融合策略的深层集合模型。还简要讨论了在不同领域中深层集成模型的应用。最后,我们以一些潜在的未来研究方向结束了本文。
translated by 谷歌翻译
The accuracy of k-nearest neighbor (kNN) classification depends significantly on the metric used to compute distances between different examples. In this paper, we show how to learn a Mahalanobis distance metric for kNN classification from labeled examples. The Mahalanobis metric can equivalently be viewed as a global linear transformation of the input space that precedes kNN classification using Euclidean distances. In our approach, the metric is trained with the goal that the k-nearest neighbors always belong to the same class while examples from different classes are separated by a large margin. As in support vector machines (SVMs), the margin criterion leads to a convex optimization based on the hinge loss. Unlike learning in SVMs, however, our approach requires no modification or extension for problems in multiway (as opposed to binary) classification. In our framework, the Mahalanobis distance metric is obtained as the solution to a semidefinite program. On several data sets of varying size and difficulty, we find that metrics trained in this way lead to significant improvements in kNN classification. Sometimes these results can be further improved by clustering the training examples and learning an individual metric within each cluster. We show how to learn and combine these local metrics in a globally integrated manner.
translated by 谷歌翻译
Much recent research has been devoted to learning algorithms for deep architectures such as Deep Belief Networks and stacks of autoencoder variants with impressive results being obtained in several areas, mostly on vision and language datasets. The best results obtained on supervised learning tasks often involve an unsupervised learning component, usually in an unsupervised pre-training phase. The main question investigated here is the following: why does unsupervised pre-training work so well? Through extensive experimentation, we explore several possible explanations discussed in the literature including its action as a regularizer (Erhan et al., 2009b) and as an aid to optimization . Our results build on the work of Erhan et al. (2009b), showing that unsupervised pre-training appears to play predominantly a regularization role in subsequent supervised training. However our results in an online setting, with a virtually unlimited data stream, point to a somewhat more nuanced interpretation of the roles of optimization and regularization in the unsupervised pre-training effect.
translated by 谷歌翻译
径向基函数神经网络(RBF)是用于模式分类和回归的主要候选者,并且已在经典的机器学习应用中广泛使用。但是,由于缺乏现代体系结构的适应性,RBF尚未使用常规卷积神经网络(CNN)纳入当代深度学习研究和计算机视觉。在本文中,我们通过修改训练过程并引入新的激活功能来训练现代视觉体系结构端到端以端对端进行图像分类,从而将RBF网络作为分类器将作为分类器。 RBF的特定架构使学习相似性距离度量可以比较和查找相似和不同的图像。此外,我们证明,在任何CNN体系结构上使用RBF分类器都提供了有关模型决策过程的新的人性化洞察力。最后,我们成功地将RBF应用于一系列CNN体系结构,并在基准计算机视觉数据集上评估结果。
translated by 谷歌翻译
监督主体组件分析(SPCA)的方法旨在将标签信息纳入主成分分析(PCA),以便提取的功能对于预测感兴趣的任务更有用。SPCA的先前工作主要集中在优化预测误差上,并忽略了提取功能解释的最大化方差的价值。我们为SPCA提出了一种新的方法,该方法共同解决了这两个目标,并从经验上证明我们的方法主导了现有方法,即在预测误差和变异方面都超越了它们的表现。我们的方法可容纳任意监督的学习损失,并通过统计重新制定提供了广义线性模型的新型低级扩展。
translated by 谷歌翻译
Deep neural networks provide unprecedented performance gains in many real world problems in signal and image processing. Despite these gains, future development and practical deployment of deep networks is hindered by their blackbox nature, i.e., lack of interpretability, and by the need for very large training sets. An emerging technique called algorithm unrolling or unfolding offers promise in eliminating these issues by providing a concrete and systematic connection between iterative algorithms that are used widely in signal processing and deep neural networks. Unrolling methods were first proposed to develop fast neural network approximations for sparse coding. More recently, this direction has attracted enormous attention and is rapidly growing both in theoretic investigations and practical applications. The growing popularity of unrolled deep networks is due in part to their potential in developing efficient, high-performance and yet interpretable network architectures from reasonable size training sets. In this article, we review algorithm unrolling for signal and image processing. We extensively cover popular techniques for algorithm unrolling in various domains of signal and image processing including imaging, vision and recognition, and speech processing. By reviewing previous works, we reveal the connections between iterative algorithms and neural networks and present recent theoretical results. Finally, we provide a discussion on current limitations of unrolling and suggest possible future research directions.
translated by 谷歌翻译
Transfer learning aims at improving the performance of target learners on target domains by transferring the knowledge contained in different but related source domains. In this way, the dependence on a large number of target domain data can be reduced for constructing target learners. Due to the wide application prospects, transfer learning has become a popular and promising area in machine learning. Although there are already some valuable and impressive surveys on transfer learning, these surveys introduce approaches in a relatively isolated way and lack the recent advances in transfer learning. Due to the rapid expansion of the transfer learning area, it is both necessary and challenging to comprehensively review the relevant studies. This survey attempts to connect and systematize the existing transfer learning researches, as well as to summarize and interpret the mechanisms and the strategies of transfer learning in a comprehensive way, which may help readers have a better understanding of the current research status and ideas. Unlike previous surveys, this survey paper reviews more than forty representative transfer learning approaches, especially homogeneous transfer learning approaches, from the perspectives of data and model. The applications of transfer learning are also briefly introduced. In order to show the performance of different transfer learning models, over twenty representative transfer learning models are used for experiments. The models are performed on three different datasets, i.e., Amazon Reviews, Reuters-21578, and Office-31. And the experimental results demonstrate the importance of selecting appropriate transfer learning models for different applications in practice.
translated by 谷歌翻译
We introduce Deep Canonical Correlation Analysis (DCCA), a method to learn complex nonlinear transformations of two views of data such that the resulting representations are highly linearly correlated. Parameters of both transformations are jointly learned to maximize the (regularized) total correlation. It can be viewed as a nonlinear extension of the linear method canonical correlation analysis (CCA). It is an alternative to the nonparametric method kernel canonical correlation analysis (KCCA) for learning correlated nonlinear transformations. Unlike KCCA, DCCA does not require an inner product, and has the advantages of a parametric method: training time scales well with data size and the training data need not be referenced when computing the representations of unseen instances. In experiments on two real-world datasets, we find that DCCA learns representations with significantly higher correlation than those learned by CCA and KCCA. We also introduce a novel non-saturating sigmoid function based on the cube root that may be useful more generally in feedforward neural networks.
translated by 谷歌翻译