处理神经网络的统计无效率的一种方法是依靠辅助损失来帮助建立有用的表示。但是,要知道辅助任务是否对主任务有用以及何时可能开始受到伤害并不总是微不足道的。我们建议使用任务梯度之间的余弦相似性作为自适应权重来检测辅助损失何时有助于主要损失。我们证明了我们的方法可以保证收敛到主要任务的关键点,并证明了所提算法在几个领域的实用性:ImageNet子集上的多任务监督学习,强化学习ongridworld,以及Atari游戏的强化学习。
translated by 谷歌翻译
这项工作提出了一种新的多类分类策略,它不需要特定于类的标签,而是利用示例之间的成对相似性,这是一种较弱的注释形式。所提出的方法,元分类学习,优化用于对相似性预测的二元分类器,并且通过该过程学习多类分类和子模块。我们制定了这种方法,为它提供了概率图形模型,并得出了一个令人惊讶的简单损失函数,可用于学习基于神经网络的模型。然后,我们证明这个相同的框架工作推广到受监督的,无监督的跨任务和半监督设置。我们的方法在所有三种学习范例中针对现有技术进行了评估,并显示出优越或相当的准确性,提供了证明学习多类别分类而无需多类别标签是一种可行的学习选择。
translated by 谷歌翻译
在当代深度学习话语中,有两件事似乎是无可争辩的:1。softmax激活后的分类交叉熵损失是分类的首选方法。 2.从小数据集上划分训练CNN分类器效果不佳。与此相反,我们证明余弦损失函数比数据集上的交叉熵提供了明显更好的性能,每类只有少量样本。例如,没有预训练的CUB-200-2011数据集的准确度比交叉熵损失高30%。另外四个流行数据集的实验证实了我们的发现。此外,我们表明,通过以类层次结构的形式整合priorknowledge可以进一步提高分类性能,这对于肌腱损失是直截了当的。
translated by 谷歌翻译
We introduce collaborative learning in which multiple classifier heads of thesame network are simultaneously trained on the same training data to improvegeneralization and robustness to label noise with no extra inference cost. Itacquires the strengths from auxiliary training, multi-task learning andknowledge distillation. There are two important mechanisms involved incollaborative learning. First, the consensus of multiple views from differentclassifier heads on the same example provides supplementary information as wellas regularization to each classifier, thereby improving generalization. Second,intermediate-level representation (ILR) sharing with backpropagation rescalingaggregates the gradient flows from all heads, which not only reduces trainingcomputational complexity, but also facilitates supervision to the sharedlayers. The empirical results on CIFAR and ImageNet datasets demonstrate thatdeep neural networks learned as a group in a collaborative way significantlyreduce the generalization error and increase the robustness to label noise.
translated by 谷歌翻译
学习分布式句子表示是自然语言处理的关键挑战之一。先前的工作表明,基于递归神经网络(RNN)的句子编码器在大量注释的自然语言推断数据上训练,在转移学习中是有效的,以促进其他相关任务。在本文中,我们通过进行广泛的实验和分析比较多任务和单任务学习句编码器,表明多个任务的联合学习导致更好的可推广的句子代表。使用辅助任务的定量分析表明,与单任务学习相比,多任务学习有助于在句子表示中嵌入更好的语义信息。此外,我们将多任务语句编码器与语境化词语表示进行比较,并表明将它们结合起来可以进一步提高传递学习的性能。
translated by 谷歌翻译
We investigate methods for combining multiple self-supervised tasks-i.e., supervised tasks where data can be collected without manual labeling-in order to train a single visual representation. First, we provide an apples-to-apples comparison of four different self-supervised tasks using the very deep ResNet-101 architecture. We then combine tasks to jointly train a network. We also explore lasso regularization to encourage the network to factorize the information in its representation, and methods for "har-monizing" network inputs in order to learn a more unified representation. We evaluate all methods on ImageNet classification, PASCAL VOC detection, and NYU depth prediction. Our results show that deeper networks work better, and that combining tasks-even via a na¨ıvena¨ıve multi-head architecture-always improves performance. Our best joint network nearly matches the PASCAL performance of a model pre-trained on ImageNet classification, and matches the ImageNet network on NYU depth prediction.
translated by 谷歌翻译
在构建统一视觉系统或逐渐向系统添加新功能时,通常的假设是所有任务的训练数据始终可用。但是,随着任务数量的增加,对这些数据的存储和再培训变得不可行。在我们向卷积神经网络(CNN)添加新功能时会出现一个新问题,但现有功能的训练数据不可用。我们提出了学习无遗忘方法,该方法仅使用新任务数据来训练网络,同时保留原始功能。我们的方法在使用特征提取和微调自适应技术方面表现良好,并且与使用我们认为不可用的原始任务数据的多任务学习类似。更令人惊讶的观察是,没有忘记的学习可以用类似的新旧任务数据集取代微调,以改善新的任务性能。
translated by 谷歌翻译
Multi-task learning (MTL) with neural networks leverages commonalities in tasks to improve performance, but often suffers from task interference which reduces the benefits of transfer. To address this issue we introduce the routing network paradigm, a novel neural network and training algorithm. A routing network is a kind of self-organizing neural network consisting of two components: a router and a set of one or more function blocks. A function block may be any neural network for example a fully-connected or a convolutional layer. Given an input the router makes a routing decision, choosing a function block to apply and passing the output back to the router recursively, terminating when a fixed recursion depth is reached. In this way the routing network dynamically composes different function blocks for each input. We employ a collaborative multi-agent reinforcement learning (MARL) approach to jointly train the router and function blocks. We evaluate our model against cross-stitch networks and shared-layer baselines on multi-task settings of the MNIST, mini-imagenet, and CIFAR-100 datasets. Our experiments demonstrate a significant improvement in accuracy, with sharper convergence. In addition, routing networks have nearly constant per-task training cost while cross-stitch networks scale linearly with the number of tasks. On CIFAR-100 (20 tasks) we obtain cross-stitch performance levels with an 85% reduction in training time.
translated by 谷歌翻译
知识蒸馏是将知识从教师神经网络传递到学生目标网络的有效方法,以满足实际使用中的低记忆和快速运行要求。虽然与基于香草非教师的学习策略相比能够创建更强大的目标网络,但该方案还需要额外培训具有昂贵计算成本的大型教师模型。在这项工作中,我们提出了一种自我参考的深度学习(SRDL)策略。与香草优化和现有知识蒸馏不同,SRDL将培训目标模型发现的知识反馈回自身,以规范随后的学习过程,从而消除了培训大型教师模型的需要。与vanillalearning和传统知识蒸馏方法相比,SRDL改进了模型泛化性能,而计算成本可忽略不计。广泛的评估表明,各种深度网络受益于SRDL,从而在粗粒度对象分类任务(CIFAR10,CIFAR100,Tiny ImageNet和ImageNet)和细粒度人员实例识别任务(Market-1501)上提高了部署性能。
translated by 谷歌翻译
Numerous deep learning applications benefit from multi-task learning with multiple regression and classification objectives. In this paper we make the observation that the performance of such systems is strongly dependent on the relative weighting between each task's loss. Tuning these weights by hand is a difficult and expensive process, making multi-task learning prohibitive in practice. We propose a principled approach to multi-task deep learning which weighs multiple loss functions by considering the ho-moscedastic uncertainty of each task. This allows us to simultaneously learn various quantities with different units or scales in both classification and regression settings. We demonstrate our model learning per-pixel depth regression, semantic and instance segmentation from a monocular input image. Perhaps surprisingly, we show our model can learn multi-task weightings and outperform separate models trained individually on each task.
translated by 谷歌翻译
We combine multi-task learning and semi-supervised learning by inducing a joint embedding space between disparate label spaces and learning transfer functions between label em-beddings, enabling us to jointly leverage un-labelled data and auxiliary, annotated datasets. We evaluate our approach on a variety of sequence classification tasks with disparate label spaces. We outperform strong single and multi-task baselines and achieve a new state-of-the-art for topic-based sentiment analysis.
translated by 谷歌翻译
Deep neural networks (DNNs) trained on large-scale datasets have exhibited significant performance in image classification. Many large-scale datasets are collected from websites, however they tend to contain inaccurate labels that are termed as noisy labels. Training on such noisy labeled datasets causes performance degradation because DNNs easily overfit to noisy labels. To overcome this problem, we propose a joint optimization framework of learning DNN parameters and estimating true labels. Our framework can correct labels during training by alternating update of network parameters and labels. We conduct experiments on the noisy CIFAR-10 datasets and the Clothing1M dataset. The results indicate that our approach significantly out-performs other state-of-the-art methods.
translated by 谷歌翻译
This paper introduces a novel method to perform transfer learning across domains and tasks, formulating it as a problem of learning to cluster. The key insight is that, in addition to features, we can transfer similarity information and this is sufficient to learn a similarity function and clustering network to perform both domain adaptation and cross-task transfer learning. We begin by reducing categorical information to pairwise constraints, which only considers whether two instances belong to the same class or not (pairwise semantic similarity). This similarity is category-agnostic and can be learned from data in the source domain using a similarity network. We then present two novel approaches for performing transfer learning using this similarity function. First, for unsupervised domain adaptation, we design a new loss function to regularize classification with a constrained clustering loss, hence learning a clustering network with the transferred similarity metric generating the training inputs. Second, for cross-task learning (i.e., unsupervised clustering with unseen categories), we propose a framework to reconstruct and estimate the number of semantic clusters, again using the clustering network. Since the similarity network is noisy, the key is to use a robust clustering algorithm, and we show that our formulation is more robust than the alternative constrained and unconstrained clustering approaches. Using this method, we first show state of the art results for the challenging cross-task problem, applied on Omniglot and ImageNet. Our results show that we can reconstruct semantic clusters with high accuracy. We then evaluate the performance of cross-domain transfer using images from the Office-31 and SVHN-MNIST tasks and present top accuracy on both datasets. Our approach doesn't explicitly deal with domain discrepancy. If we combine with a domain adaptation loss, it shows further improvement.
translated by 谷歌翻译
单词嵌入已被证明可以从几个蠕虫源集合中受益,通常使用对矢量集合进行简单的数学运算来生成元嵌入表示。最近,无监督学习被用于找到类似大小的低维表示。在theensemble中嵌入单词的那个。但是,这些方法不使用通常仅用于评估目的的可用手册标记数据集。我们建议通过同时学习通过各种标记的单词相似性数据集的监督来重构预训练单词嵌入的集合来改进单词嵌入。这涉及重建wordmeta嵌入,同时使用Siamese网络来学习两个进程共享隐藏层的单词相似性。对6个字相似性数据集和3个类比数据集进行实验。我们发现,当比较电子监督学习方法时,所有单词相似性数据集的性能都得到了改善,SpearmanCorrelation系数平均增加了11.33。此外,当使用余弦损失进行重建和Brier的单词相似性损失时,来自ourapproach的6个单词相似性数据集中的4个表现出最佳性能。
translated by 谷歌翻译
对于许多应用来说,标记数据的收集是费力的。在训练期间开发未标记的数据因此是机器学习的长期目标。自我监督学习通过提供大量可用数据的辅助任务(不同但与监督任务相关)来解决这个问题。在本文中,我们展示了如何将排名用作某些回归问题的aproxy任务。作为另一个贡献,我们为连体网络提出了有效的反向传播技术,该技术可以防止多分支网络架构引入的冗余计算。我们将框架应用于两个回归问题:图像质量评估(IQA)和人群计数。对于这两者,我们展示了如何从未标记的数据自动生成排序的图像集。我们的研究结果表明,经过训练的网络可以回归标记数据的地面真实目标,并同时学习对标记数据进行排序,从而获得明显更好的,最先进的IQA和人群计数结果。此外,我们表明,在自我监督代理任务中测量网络不确定性是衡量未标记数据信息量的一个很好的指标。这可用于驱动主动学习算法,并表明这可以减少标签工作量高达50%。
translated by 谷歌翻译
来自简单对象(例如,背包,帽子)的视觉属性已被证明是用于诸如图像描述和人类识别的许多应用的强有力的代表性方法,例如,性别,身高,衣服。在本文中,我们介绍了一种在视觉属性分类框架中结合多任务和课程学习的优点的新方法。在基于相关性执行分层聚类之后,对各个任务进行分组。通过在群集之间传递知识,在课程学习设置中学习任务集群。每个集群内的学习过程在多任务分类设置中执行。通过利用所获得的知识,我们加快了流程并提高了性能。我们通过消融研究和协变量的详细分析证明了我们的方法的有效性,在各种公共可用的人体全身可见的数据集上。广泛的实验证明,所提出的方法将性能提高了4%至10%。
translated by 谷歌翻译
Current Zero-Shot Learning (ZSL) approaches are restricted to recognition of a single dominant unseen object category in a test image. We hypothesize that this setting is ill-suited for real-world applications where unseen objects appear only as a part of a complex scene, warranting both the 'recognition' and 'localization' of an unseen category. To address this limitation, we introduce a new 'Zero-Shot Detec-tion' (ZSD) problem setting, which aims at simultaneously recognizing and locating object instances belonging to novel categories without any training examples. We also propose a new experimental protocol for ZSD based on the highly challenging ILSVRC dataset, adhering to practical issues, e.g., the rarity of unseen objects. To the best of our knowledge, this is the first end-to-end deep network for ZSD that jointly models the interplay between visual and semantic domain information. To overcome the noise in the automatically derived semantic descriptions, we utilize the concept of meta-classes to design an original loss function that achieves synergy between max-margin class separation and semantic space clustering. Furthermore, we present a baseline approach extended from recognition to detection setting. Our extensive experiments show significant performance boost over the baseline on the imperative yet difficult ZSD problem.
translated by 谷歌翻译
我们提出梯度对抗训练,一种适用于不同机器学习问题的辅助深度学习框架。在梯度对抗训练中,我们利用先前的信念,即在许多情况下,同步梯度更新在统计上应该与其他更新无法区分。我们使用辅助网络来强制执行此一致性,该辅助网络对梯度张量的来源进行分类,并且除了执行基于标准任务的训练之外,主网络还充当辅助网络的对手。我们展示了针对不同情景的梯度对抗训练:(1)作为对抗性例子的辩护,我们对梯度张量进行分类,并将它们调整为对应于其对应例子的类,(2)对于知识蒸馏,我们对衍生自梯度张量的梯度张量进行二元分类。学生或教师网络并调整学生梯度张量以模拟教师的梯度张量;以及(3)对于多任务学习,我们对源自不同任务丢失函数的梯度张量进行分类并将它们调整为在统计上可区分。对于三种情景中的每一种,我们都展示了梯度对抗训练程序的潜力。具体而言,梯度对抗训练增强了网络对对抗性攻击的鲁棒性,能够更好地将教师网络中的知识提取到与软目标相比的学生网络,并通过对齐从任务特定损失函数导出的梯度张量来促进多任务学习。总的来说,实验证明,梯度张量包含有关正在训练的任何任务的潜在信息,并且当使用辅助网络智能地引导对抗时,可以支持各种机器学习问题。
translated by 谷歌翻译
尽管深度递归神经网络(RNN)在文本分类中表现出强大的性能,但训练RNN模型通常很昂贵并且需要大量收集可能无法获得的注释数据。为了克服数据限制问题,现有方法利用预先训练的wordembedding或句子表示来解除训练RNN的负担。在本文中,我们展示了联合学习来自多个文本分类任务的句子表示,并将它们与预训练的词级和句子级编码器相结合,产生了对转移学习有用的强大的句子表示。使用广泛的转移和语言任务进行广泛的实验和分析,支持我们的方法的有效性。
translated by 谷歌翻译
我们提出了一种新的正则化方法来训练神经网络,使其比标准随机梯度下降具有更好的泛化和测试误差。我们的方法基于交叉验证的原则,其中avalidation集用于限制模型过度拟合。我们将这样的原则制定为双层优化问题。这个公式允许我们定义验证集上的成本优化,但需要在训练集上进行另一个优化。通过在训练集中引入每个小批量的权重并通过选择它们的值来控制过度拟合,从而最小化验证集上的错误。在实践中,这些权重在梯度下降更新方程中定义了小批量学习速率,该方程具有更好的泛化能力。由于它的简单性,这种方法可以与其他正规化方法和培训方案相结合。我们在几个神经网络架构和数据集上广泛评估了我们提出的算法,发现它一致地改进了模型的推广,特别是当标签有噪声时。
translated by 谷歌翻译