处理神经网络的统计无效率的一种方法是依靠辅助损失来帮助建立有用的表示。但是,要知道辅助任务是否对主任务有用以及何时可能开始受到伤害并不总是微不足道的。我们建议使用任务梯度之间的余弦相似性作为自适应权重来检测辅助损失何时有助于主要损失。我们证明了我们的方法可以保证收敛到主要任务的关键点,并证明了所提算法在几个领域的实用性:ImageNet子集上的多任务监督学习,强化学习ongridworld,以及Atari游戏的强化学习。
translated by 谷歌翻译
这项工作提出了一种新的多类分类策略,它不需要特定于类的标签,而是利用示例之间的成对相似性,这是一种较弱的注释形式。所提出的方法,元分类学习,优化用于对相似性预测的二元分类器,并且通过该过程学习多类分类和子模块。我们制定了这种方法,为它提供了概率图形模型,并得出了一个令人惊讶的简单损失函数,可用于学习基于神经网络的模型。然后,我们证明这个相同的框架工作推广到受监督的,无监督的跨任务和半监督设置。我们的方法在所有三种学习范例中针对现有技术进行了评估,并显示出优越或相当的准确性,提供了证明学习多类别分类而无需多类别标签是一种可行的学习选择。
translated by 谷歌翻译
We introduce collaborative learning in which multiple classifier heads of thesame network are simultaneously trained on the same training data to improvegeneralization and robustness to label noise with no extra inference cost. Itacquires the strengths from auxiliary training, multi-task learning andknowledge distillation. There are two important mechanisms involved incollaborative learning. First, the consensus of multiple views from differentclassifier heads on the same example provides supplementary information as wellas regularization to each classifier, thereby improving generalization. Second,intermediate-level representation (ILR) sharing with backpropagation rescalingaggregates the gradient flows from all heads, which not only reduces trainingcomputational complexity, but also facilitates supervision to the sharedlayers. The empirical results on CIFAR and ImageNet datasets demonstrate thatdeep neural networks learned as a group in a collaborative way significantlyreduce the generalization error and increase the robustness to label noise.
translated by 谷歌翻译
在当代深度学习话语中,有两件事似乎是无可争辩的:1。softmax激活后的分类交叉熵损失是分类的首选方法。 2.从小数据集上划分训练CNN分类器效果不佳。与此相反,我们证明余弦损失函数比数据集上的交叉熵提供了明显更好的性能,每类只有少量样本。例如,没有预训练的CUB-200-2011数据集的准确度比交叉熵损失高30%。另外四个流行数据集的实验证实了我们的发现。此外,我们表明,通过以类层次结构的形式整合priorknowledge可以进一步提高分类性能,这对于肌腱损失是直截了当的。
translated by 谷歌翻译
学习分布式句子表示是自然语言处理的关键挑战之一。先前的工作表明,基于递归神经网络(RNN)的句子编码器在大量注释的自然语言推断数据上训练,在转移学习中是有效的,以促进其他相关任务。在本文中,我们通过进行广泛的实验和分析比较多任务和单任务学习句编码器,表明多个任务的联合学习导致更好的可推广的句子代表。使用辅助任务的定量分析表明,与单任务学习相比,多任务学习有助于在句子表示中嵌入更好的语义信息。此外,我们将多任务语句编码器与语境化词语表示进行比较,并表明将它们结合起来可以进一步提高传递学习的性能。
translated by 谷歌翻译
We investigate methods for combining multiple self-supervised tasks-i.e., supervised tasks where data can be collected without manual labeling-in order to train a single visual representation. First, we provide an apples-to-apples comparison of four different self-supervised tasks using the very deep ResNet-101 architecture. We then combine tasks to jointly train a network. We also explore lasso regularization to encourage the network to factorize the information in its representation, and methods for "har-monizing" network inputs in order to learn a more unified representation. We evaluate all methods on ImageNet classification, PASCAL VOC detection, and NYU depth prediction. Our results show that deeper networks work better, and that combining tasks-even via a na¨ıvena¨ıve multi-head architecture-always improves performance. Our best joint network nearly matches the PASCAL performance of a model pre-trained on ImageNet classification, and matches the ImageNet network on NYU depth prediction.
translated by 谷歌翻译
This paper introduces a novel method to perform transfer learning across domains and tasks, formulating it as a problem of learning to cluster. The key insight is that, in addition to features, we can transfer similarity information and this is sufficient to learn a similarity function and clustering network to perform both domain adaptation and cross-task transfer learning. We begin by reducing categorical information to pairwise constraints, which only considers whether two instances belong to the same class or not (pairwise semantic similarity). This similarity is category-agnostic and can be learned from data in the source domain using a similarity network. We then present two novel approaches for performing transfer learning using this similarity function. First, for unsupervised domain adaptation, we design a new loss function to regularize classification with a constrained clustering loss, hence learning a clustering network with the transferred similarity metric generating the training inputs. Second, for cross-task learning (i.e., unsupervised clustering with unseen categories), we propose a framework to reconstruct and estimate the number of semantic clusters, again using the clustering network. Since the similarity network is noisy, the key is to use a robust clustering algorithm, and we show that our formulation is more robust than the alternative constrained and unconstrained clustering approaches. Using this method, we first show state of the art results for the challenging cross-task problem, applied on Omniglot and ImageNet. Our results show that we can reconstruct semantic clusters with high accuracy. We then evaluate the performance of cross-domain transfer using images from the Office-31 and SVHN-MNIST tasks and present top accuracy on both datasets. Our approach doesn't explicitly deal with domain discrepancy. If we combine with a domain adaptation loss, it shows further improvement.
translated by 谷歌翻译
Numerous deep learning applications benefit from multi-task learning with multiple regression and classification objectives. In this paper we make the observation that the performance of such systems is strongly dependent on the relative weighting between each task's loss. Tuning these weights by hand is a difficult and expensive process, making multi-task learning prohibitive in practice. We propose a principled approach to multi-task deep learning which weighs multiple loss functions by considering the ho-moscedastic uncertainty of each task. This allows us to simultaneously learn various quantities with different units or scales in both classification and regression settings. We demonstrate our model learning per-pixel depth regression, semantic and instance segmentation from a monocular input image. Perhaps surprisingly, we show our model can learn multi-task weightings and outperform separate models trained individually on each task.
translated by 谷歌翻译
Multi-task learning (MTL) with neural networks leverages commonalities in tasks to improve performance, but often suffers from task interference which reduces the benefits of transfer. To address this issue we introduce the routing network paradigm, a novel neural network and training algorithm. A routing network is a kind of self-organizing neural network consisting of two components: a router and a set of one or more function blocks. A function block may be any neural network for example a fully-connected or a convolutional layer. Given an input the router makes a routing decision, choosing a function block to apply and passing the output back to the router recursively, terminating when a fixed recursion depth is reached. In this way the routing network dynamically composes different function blocks for each input. We employ a collaborative multi-agent reinforcement learning (MARL) approach to jointly train the router and function blocks. We evaluate our model against cross-stitch networks and shared-layer baselines on multi-task settings of the MNIST, mini-imagenet, and CIFAR-100 datasets. Our experiments demonstrate a significant improvement in accuracy, with sharper convergence. In addition, routing networks have nearly constant per-task training cost while cross-stitch networks scale linearly with the number of tasks. On CIFAR-100 (20 tasks) we obtain cross-stitch performance levels with an 85% reduction in training time.
translated by 谷歌翻译
在构建统一视觉系统或逐渐向系统添加新功能时,通常的假设是所有任务的训练数据始终可用。但是,随着任务数量的增加,对这些数据的存储和再培训变得不可行。在我们向卷积神经网络(CNN)添加新功能时会出现一个新问题,但现有功能的训练数据不可用。我们提出了学习无遗忘方法,该方法仅使用新任务数据来训练网络,同时保留原始功能。我们的方法在使用特征提取和微调自适应技术方面表现良好,并且与使用我们认为不可用的原始任务数据的多任务学习类似。更令人惊讶的观察是,没有忘记的学习可以用类似的新旧任务数据集取代微调,以改善新的任务性能。
translated by 谷歌翻译
单词嵌入已被证明可以从几个蠕虫源集合中受益,通常使用对矢量集合进行简单的数学运算来生成元嵌入表示。最近,无监督学习被用于找到类似大小的低维表示。在theensemble中嵌入单词的那个。但是,这些方法不使用通常仅用于评估目的的可用手册标记数据集。我们建议通过同时学习通过各种标记的单词相似性数据集的监督来重构预训练单词嵌入的集合来改进单词嵌入。这涉及重建wordmeta嵌入,同时使用Siamese网络来学习两个进程共享隐藏层的单词相似性。对6个字相似性数据集和3个类比数据集进行实验。我们发现,当比较电子监督学习方法时,所有单词相似性数据集的性能都得到了改善,SpearmanCorrelation系数平均增加了11.33。此外,当使用余弦损失进行重建和Brier的单词相似性损失时,来自ourapproach的6个单词相似性数据集中的4个表现出最佳性能。
translated by 谷歌翻译
使用卷积神经网络预先训练通用视觉特征而不依赖于注释是一项具有挑战性和重要的任务。最近在无监督特征学习方面的努力主要集中在像ImageNet这样的小型或高度精选的数据集上,而使用未经计算的rawdatasets则发现减少了这一特征。在transfertask评估时的质量。我们的目标是弥合在策划数据上的无监督方法与获取成本高昂的大量原始数据集之间的性能差距。为此,我们提出了一种新的无监督方法,它利用自我监督和聚类来捕获大规模数据的互补统计数据。我们对来自YFCC100M的9600万张图像验证了我们的方法,在标准基准测试中实现了最先进的检测结果,证实了当只有非准确数据可用时,无监督学习的潜力。我们还表明,使用我们的方法对受监督的VGG-16进行预训练,在ImageNet分类的验证集中达到了74.6%的前1准确度,与从头开始训练的同一网络相比,这提高了±0.7%。
translated by 谷歌翻译
来自简单对象(例如,背包,帽子)的视觉属性已被证明是用于诸如图像描述和人类识别的许多应用的强有力的代表性方法,例如,性别,身高,衣服。在本文中,我们介绍了一种在视觉属性分类框架中结合多任务和课程学习的优点的新方法。在基于相关性执行分层聚类之后,对各个任务进行分组。通过在群集之间传递知识,在课程学习设置中学习任务集群。每个集群内的学习过程在多任务分类设置中执行。通过利用所获得的知识,我们加快了流程并提高了性能。我们通过消融研究和协变量的详细分析证明了我们的方法的有效性,在各种公共可用的人体全身可见的数据集上。广泛的实验证明,所提出的方法将性能提高了4%至10%。
translated by 谷歌翻译
We combine multi-task learning and semi-supervised learning by inducing a joint embedding space between disparate label spaces and learning transfer functions between label em-beddings, enabling us to jointly leverage un-labelled data and auxiliary, annotated datasets. We evaluate our approach on a variety of sequence classification tasks with disparate label spaces. We outperform strong single and multi-task baselines and achieve a new state-of-the-art for topic-based sentiment analysis.
translated by 谷歌翻译
这项工作解决了图像分类器的半监督学习问题。我们的主要观点是,半监督学习领域可以从快速发展的自我监督视觉表征学习领域中受益。统一这两种方法,我们提出了自我监督半监督学习($ S ^ 4L $)的框架,并用它来推动两种新的半监督图像分类方法。我们证明了这些方法与精心调整的基线和现有的半监督学习方法相比的有效性。然后我们证明$ S ^ 4L $和现有的半监督方法可以联合训练,在半监督的ILSVRC-2012上产生了一个新的最先进的结果,有10%的标签。
translated by 谷歌翻译
自动驾驶中的决策对环境非常具体,因此语义分割在识别汽车周围环境中的对象中起着关键作用。像素级别分类曾被认为是一项具有挑战性的任务,现在已经变得成熟,可以在汽车中实现产品化。但是,语义注释耗时且相当昂贵。已经使用具有域适应技术的合成数据集来减轻大量注释数据集的缺乏。在这项工作中,我们探索了另一种方法,即平衡其他任务的注释,以改善语义分割。最近,多任务学习成为自动驾驶的一种流行范式,表明多任务的联合学习提高了每项任务的整体性能。受此启发,我们使用depthestimation等辅助任务来提高语义分割任务的性能。我们提出了适应性任务损失加权技术,以解决多任务损失函数中的规模问题,这些问题在辅助任务中变得更加重要。我们对包括SYNTHIA和KITTI在内的汽车数据集进行了实验,分别提高了3%和5%的精度。
translated by 谷歌翻译
模型蒸馏旨在将复杂模型的知识提炼为更简单的模型。在本文中,我们考虑一种名为{\ emdataset distillation}的替代配方:我们保持模型固定,而是尝试将大型训练数据集中的知识提炼成小数据。该想法是{无需合成}不需要来自正确数据分布的少量数据点,但是当给予学习算法训练数据时,将近似于在原始数据上训练的模型。例如,我们表明可以将$ 60,000 $ MNIST训练图像压缩到仅10美元$合成{\ em蒸馏图像}(每类一个),并且只需几步梯度下降即可达到原始性能,特别是固定网络初始化。我们在广泛的初始化设置和不同的学习目标中评估我们的方法。多个数据集上的实验表明,与大多数设置中的替代方法相比,我们的方法具有优势。
translated by 谷歌翻译
知识蒸馏是将知识从教师神经网络传递到学生目标网络的有效方法,以满足实际使用中的低记忆和快速运行要求。虽然与基于香草非教师的学习策略相比能够创建更强大的目标网络,但该方案还需要额外培训具有昂贵计算成本的大型教师模型。在这项工作中,我们提出了一种自我参考的深度学习(SRDL)策略。与香草优化和现有知识蒸馏不同,SRDL将培训目标模型发现的知识反馈回自身,以规范随后的学习过程,从而消除了培训大型教师模型的需要。与vanillalearning和传统知识蒸馏方法相比,SRDL改进了模型泛化性能,而计算成本可忽略不计。广泛的评估表明,各种深度网络受益于SRDL,从而在粗粒度对象分类任务(CIFAR10,CIFAR100,Tiny ImageNet和ImageNet)和细粒度人员实例识别任务(Market-1501)上提高了部署性能。
translated by 谷歌翻译
Current Zero-Shot Learning (ZSL) approaches are restricted to recognition of a single dominant unseen object category in a test image. We hypothesize that this setting is ill-suited for real-world applications where unseen objects appear only as a part of a complex scene, warranting both the 'recognition' and 'localization' of an unseen category. To address this limitation, we introduce a new 'Zero-Shot Detec-tion' (ZSD) problem setting, which aims at simultaneously recognizing and locating object instances belonging to novel categories without any training examples. We also propose a new experimental protocol for ZSD based on the highly challenging ILSVRC dataset, adhering to practical issues, e.g., the rarity of unseen objects. To the best of our knowledge, this is the first end-to-end deep network for ZSD that jointly models the interplay between visual and semantic domain information. To overcome the noise in the automatically derived semantic descriptions, we utilize the concept of meta-classes to design an original loss function that achieves synergy between max-margin class separation and semantic space clustering. Furthermore, we present a baseline approach extended from recognition to detection setting. Our extensive experiments show significant performance boost over the baseline on the imperative yet difficult ZSD problem.
translated by 谷歌翻译
使用用于语义分割的高质量对象掩码构建大图像数据集是昂贵且耗时的。在本文中,我们通过利用对象边界框形式的弱监督来降低数据分配成本。为了实现这一目标,我们提出了一个原理框架来训练深度卷积分割模型,该模型将大量弱监督图像(仅具有对象边界框标签)与一小组完全监督图像(具有语义分割标签和框标签)相结合。我们的框架通过辅助模型训练主要分割模型,辅助模型为弱监督实例生成初始分割标签,自我修正模块使用越来越精确的主模型在培训期间改善生成的标签。我们使用线性或卷积函数引入自校正模块的两个变量。 PASCAL VOC 2012和Cityscape数据集上的实验表明,使用小型全监督集训练的模型执行与使用大型全监督集训练的模型类似或更好,同时需要少7倍的注释工作量。
translated by 谷歌翻译