Research in Curriculum Learning has shown better performance on the task by optimizing the sequence of the training data. Recent works have focused on using complex reinforcement learning techniques to find the optimal data ordering strategy to maximize learning for a given network. In this paper, we present a simple yet efficient technique based on continuous optimization trained with auto-encoding procedure. We call this new approach Training Sequence Optimization (TSO). With a usual encoder-decoder setup we try to learn the latent space continuous representation of the training strategy and a predictor network is used on the continuous representation to predict the accuracy of the strategy on the fixed network architecture. The performance predictor and encoder enable us to perform gradient-based optimization by gradually moving towards the latent space representation of training data ordering with potentially better accuracy. We show an empirical gain of 2AP with our generated optimal curriculum strategy over the random strategy using the CIFAR-100 and CIFAR-10 datasets and have better boosts than the existing state-of-the-art CL algorithms.
translated by 谷歌翻译
Humans and animals learn much better when the examples are not randomly presented but organized in a meaningful order which illustrates gradually more concepts, and gradually more complex ones. Here, we formalize such training strategies in the context of machine learning, and call them "curriculum learning". In the context of recent research studying the difficulty of training in the presence of non-convex training criteria (for deep deterministic and stochastic neural networks), we explore curriculum learning in various set-ups. The experiments show that significant improvements in generalization can be achieved. We hypothesize that curriculum learning has both an effect on the speed of convergence of the training process to a minimum and, in the case of non-convex criteria, on the quality of the local minima obtained: curriculum learning can be seen as a particular form of continuation method (a general strategy for global optimization of non-convex functions).
translated by 谷歌翻译
知识蒸馏(KD)是压缩边缘设备深层分类模型的有效工具。但是,KD的表现受教师和学生网络之间较大容量差距的影响。最近的方法已诉诸KD的多个教师助手(TA)设置,该设置依次降低了教师模型的大小,以相对弥合这些模型之间的尺寸差距。本文提出了一种称为“知识蒸馏”课程专家选择的新技术,以有效地增强在容量差距问题下对紧凑型学生的学习。该技术建立在以下假设的基础上:学生网络应逐渐使用分层的教学课程来逐步指导,因为它可以从较低(较高的)容量教师网络中更好地学习(硬)数据样本。具体而言,我们的方法是一种基于TA的逐渐的KD技术,它每个输入图像选择单个教师,该课程是基于通过对图像进行分类的难度驱动的课程的。在这项工作中,我们凭经验验证了我们的假设,并对CIFAR-10,CIFAR-100,CINIC-10和Imagenet数据集进行了严格的实验,并在类似VGG的模型,Resnets和WideresNets架构上显示出提高的准确性。
translated by 谷歌翻译
众所周知,培训数据的数量和质量在创建良好的机器学习模型中起着重要作用。在本文中,我们将其进一步迈出一步,并证明培训示例的安排方式也至关重要。课程学习建立在有组织和结构化的知识同化的观察基础上,具有更快的培训和更好理解的能力。当人类学会说话时,他们首先尝试说出基本的电话,然后逐渐朝着更复杂的结构(例如单词和句子)发展。该方法被称为课程学习,我们在自动语音识别的背景下使用它。我们假设端到端模型在提供有组织的训练集时可以实现更好的性能,该训练集由示例组成,这些示例表现出越来越高的难度(即课程)。为了在训练集上强加结构并定义一个简单示例的概念,我们探索了多个评分功能,这些功能要么使用外部神经网络的反馈,要么将模型本身的反馈纳入。经验结果表明,通过不同的课程,我们可以平衡培训时间和网络的表现。
translated by 谷歌翻译
我们提出了一种适应课程训练框架,适用于少量分类的最先进的元学习技术。基于课程的培训普遍试图通过逐步增加培训复杂性来实现培训复杂性以实现增量概念学习。由于元学习者的目标是学习如何从尽可能少的样本中学习,那些样本的确切数量(即支撑集的大小)是作为给定任务困难的自然代理。我们定义了一个简单但新颖的课程计划,从更大的支持大小开始,并且逐步减少整个训练,最终匹配测试设置的所需拍摄大小。这种提出的方​​法提高了学习效率以及泛化能力。我们在两次拍摄图像分类任务上使用MAML算法进行了实验,显示了课程训练框架的显着收益。消融研究证实了我们所提出的方法的独立性,从模型架构以及元学习的普通参数
translated by 谷歌翻译
教学在人类学习中发挥着重要作用。通常,人类教学策略将涉及评估学生的知识进展,以便以提高学习进度的方式定制教学资料。通过在任务中的重要学习概念上追踪学生的知识来实现​​这一目标。尽管如此,这种教学策略在机器学习中没有充分利用,因为当前的机器教学方法倾向于直接评估个别培训样本的进展,而不会注意学习任务中的基础学习概念。在本文中,我们提出了一种新的方法,称为知识增强数据教学(KADT),可以通过在学习任务中追踪多学习概念的知识进展来优化学生模型的数据教学策略。具体地,KADT方法包括知识追踪模型,以动态捕获学生模型的知识进度,以潜在的学习概念。然后,我们开发了注意汇集机制,蒸馏出学生模型的知识表示,课堂标签,可以在关键训练样本中开发数据教学策略。我们已经评估了Kadt方法对四种不同机器学习任务的性能,包括知识追踪,情感分析,电影推荐和图像分类。结果与最先进的方法相比,验证验证kadt一直在所有任务中始终表现出其他人。
translated by 谷歌翻译
视觉和语言导航(VLN)是一个任务,代理在人类指令下的体现室内环境中导航。以前的作品忽略了样本难度的分布,我们认为这可能会降低他们的代理表现。为了解决这个问题,我们为VLN任务提出了一种基于课程的基于课程的培训范式,可以平衡人类的先验知识和特工关于培训样本的学习进度。我们开发课程设计原则,并重新安排基准房间到室(R2R)数据集,以使其适用于课程培训。实验表明,我们的方法是模型 - 不可知的,可以显着提高当前最先进的导航剂的性能,概括性和培训效率而不会增加模型复杂性。
translated by 谷歌翻译
Recent deep networks are capable of memorizing the entire data even when the labels are completely random. To overcome the overfitting on corrupted labels, we propose a novel technique of learning another neural network, called Men-torNet, to supervise the training of the base deep networks, namely, StudentNet. During training, MentorNet provides a curriculum (sample weighting scheme) for StudentNet to focus on the sample the label of which is probably correct. Unlike the existing curriculum that is usually predefined by human experts, MentorNet learns a data-driven curriculum dynamically with StudentNet. Experimental results demonstrate that our approach can significantly improve the generalization performance of deep networks trained on corrupted training data. Notably, to the best of our knowledge, we achieve the best-published result on We-bVision, a large benchmark containing 2.2 million images of real-world noisy labels. The code are at https://github.com/google/mentornet.
translated by 谷歌翻译
传统的机器学习(ML)严重依赖于机器学习专家的手动设计,以决定学习任务,数据,模型,优化算法和评估指标,以及劳动密集型,耗时,不能像人类那样自主学习。在教育科学,自我导向的学习中,人类学习者在不需要动手指导的情况下选择学习任务和材料,已经显示出比被动教师引导的学习更有效。灵感来自自我导向的人类学习的概念,我们介绍了自我导向机器学习(SDML)的主要概念,并为SDML提出了一个框架。具体而言,我们设计SDML作为自我意识引导的自我指导的学习过程,包括内部意识和外部意识。我们提出的SDML进程从自我任务选择,自我数据选择,自我模型选择,自我优化策略选择和自我意识中选择的自我认识,没有人为指导。同时,SDML过程的学习性能是进一步提高自我意识的反馈。我们为基于多级优化的SDML提出了一种数学制定。此外,我们将案例研究与SDML的潜在应用一起,随后讨论未来的研究方向。我们希望SDML能够使机器能够进行人类的自我导向学习,并为人为一般情报提供新的视角。
translated by 谷歌翻译
This paper surveys the recent attempts, both from the machine learning and operations research communities, at leveraging machine learning to solve combinatorial optimization problems. Given the hard nature of these problems, state-of-the-art algorithms rely on handcrafted heuristics for making decisions that are otherwise too expensive to compute or mathematically not well defined. Thus, machine learning looks like a natural candidate to make such decisions in a more principled and optimized way. We advocate for pushing further the integration of machine learning and combinatorial optimization and detail a methodology to do so. A main point of the paper is seeing generic optimization problems as data points and inquiring what is the relevant distribution of problems to use for learning on a given task.
translated by 谷歌翻译
深入学习的强化学习(RL)的结合导致了一系列令人印象深刻的壮举,许多相信(深)RL提供了一般能力的代理。然而,RL代理商的成功往往对培训过程中的设计选择非常敏感,这可能需要繁琐和易于易于的手动调整。这使得利用RL对新问题充满挑战,同时也限制了其全部潜力。在许多其他机器学习领域,AutomL已经示出了可以自动化这样的设计选择,并且在应用于RL时也会产生有希望的初始结果。然而,自动化强化学习(AutorL)不仅涉及Automl的标准应用,而且还包括RL独特的额外挑战,其自然地产生了不同的方法。因此,Autorl已成为RL中的一个重要研究领域,提供来自RNA设计的各种应用中的承诺,以便玩游戏等游戏。鉴于RL中考虑的方法和环境的多样性,在不同的子领域进行了大部分研究,从Meta学习到进化。在这项调查中,我们寻求统一自动的领域,我们提供常见的分类法,详细讨论每个区域并对研究人员来说是一个兴趣的开放问题。
translated by 谷歌翻译
Jitendra Malik once said, "Supervision is the opium of the AI researcher". Most deep learning techniques heavily rely on extreme amounts of human labels to work effectively. In today's world, the rate of data creation greatly surpasses the rate of data annotation. Full reliance on human annotations is just a temporary means to solve current closed problems in AI. In reality, only a tiny fraction of data is annotated. Annotation Efficient Learning (AEL) is a study of algorithms to train models effectively with fewer annotations. To thrive in AEL environments, we need deep learning techniques that rely less on manual annotations (e.g., image, bounding-box, and per-pixel labels), but learn useful information from unlabeled data. In this thesis, we explore five different techniques for handling AEL.
translated by 谷歌翻译
搜索会话中的上下文信息对于捕获用户的搜索意图很重要。已经提出了各种方法来对用户行为序列进行建模,以改善会话中的文档排名。通常,(搜索上下文,文档)对的训练样本在每个训练时期随机采样。实际上,了解用户的搜索意图和判断文档的相关性的困难从一个搜索上下文到另一个搜索上下文有很大差异。混合不同困难的训练样本可能会使模型的优化过程感到困惑。在这项工作中,我们为上下文感知文档排名提出了一个课程学习框架,其中排名模型以易于恐惧的方式学习搜索上下文和候选文档之间的匹配信号。这样一来,我们旨在将模型逐渐指向全球最佳。为了利用正面和负面示例,设计了两个课程。两个真实查询日志数据集的实验表明,我们提出的框架可以显着提高几种现有方法的性能,从而证明课程学习对上下文感知文档排名的有效性。
translated by 谷歌翻译
In semi-supervised representation learning frameworks, when the number of labelled data is very scarce, the quality and representativeness of these samples become increasingly important. Existing literature on semi-supervised learning randomly sample a limited number of data points for labelling. All these labelled samples are then used along with the unlabelled data throughout the training process. In this work, we ask two important questions in this context: (1) does it matter which samples are selected for labelling? (2) does it matter how the labelled samples are used throughout the training process along with the unlabelled data? To answer the first question, we explore a number of unsupervised methods for selecting specific subsets of data to label (without prior knowledge of their labels), with the goal of maximizing representativeness w.r.t. the unlabelled set. Then, for our second line of inquiry, we define a variety of different label injection strategies in the training process. Extensive experiments on four popular datasets, CIFAR-10, CIFAR-100, SVHN, and STL-10, show that unsupervised selection of samples that are more representative of the entire data improves performance by up to ~2% over the existing semi-supervised frameworks such as MixMatch, ReMixMatch, FixMatch and others with random sample labelling. We show that this boost could even increase to 7.5% for very few-labelled scenarios. However, our study shows that gradually injecting the labels throughout the training procedure does not impact the performance considerably versus when all the existing labels are used throughout the entire training.
translated by 谷歌翻译
课程学习是一种强大的培训方法,可以在某些情况下更快,更好的培训。但是,这种方法需要一个概念,即哪些示例很难且容易,这并不总是很容易提供。最近称为C得分的度量标准将其作为代理,例如,将其与学习一致性联系起来。不幸的是,这种方法是相当大的强化,从而限制了其对替代数据集的适用性。在这项工作中,我们通过不同的方法训练模型,以预测CIFAR-100和CIFAR-10的C得分。但是,我们发现这些模型在相同的分布和分布不足之内都概括了。这表明C分数不是由每个样本的个体特征定义的,而是由其他因素定义的。我们假设样本与其邻居的关系,尤其是其中有多少人共享相同的标签,可以帮助解释C分数。我们计划在未来的工作中探索这一点。
translated by 谷歌翻译
来自X射线图像的近端股骨骨折的足够分类对于治疗选择和患者的临床结果至关重要。我们依赖于常用的AO系统,该系统描述了将图像分类为类型和亚型的分层知识树根据裂缝的位置和复杂性。在本文中,我们提出了一种基于卷积神经网络(CNN)自动分类近端股骨骨折的近端骨折分类为3和7 AO类。如已知所知,CNNS需要具有可靠标签的大型和代表性数据集,这很难收集手头的应用。在本文中,我们设计了一个课程学习(CL)方法,在这种情况下通过基本的CNNS性能提高。我们的小说配方团结了三个课程策略:单独加权培训样本,重新排序培训集,以及数据采样子集。这些策略的核心是评分函数排名训练样本。我们定义了两种小说评分函数:一个来自域的特定于域的先前知识和原始的自我节奏的不确定性分数。我们对近端股骨射线照片的临床数据集进行实验。课程改善了近端股骨骨折分类,达到了经验丰富的创伤外科医生的性能。最佳课程方法根据现有知识重新排列培训集,从而达到15%的分类提高。使用公开可用的MNIST DataSet,我们进一步讨论并展示了我们统一的CL配方对三个受控和具有挑战性的数字识别方案的好处:具有有限的数据,在类别 - 不平衡下以及在标签噪声存在下。我们的工作代码可在:https://github.com/ameliajimenez/curriculum-learning-prior -unctainty。
translated by 谷歌翻译
本文介绍了一种增强学习方法,以更好地概括有关工作店调度问题(JSP)的启发式调度规则。 JSP上的当前模型并不关注概括,尽管正如我们在这项工作中所显示的那样,这是对问题进行更好的启发式方法的关键。改善概括的一种众所周知的技术是使用课程学习(CL)学习日益复杂的实例。但是,正如文献中许多作品所表明的那样,在不同问题大小之间传递学习技能时,这种技术可能会遭受灾难性的遗忘。为了解决这个问题,我们引入了一种新颖的对抗性课程学习(ACL)策略,该策略在学习过程中动态调整了难度级别以重新审视最坏情况的实例。这项工作还提出了一个深度学习模型来解决JSP,这是e var的W.R.T.作业定义和尺寸不可能。对Taillard和Demirkol的实例进行了实验,表明所提出的方法显着改善了JSP上的最新模型。它的平均最佳差距从Taillard的实例中的平均最佳差距从19.35 \%降低到10.46 \%,而Demirkol的实例中的平均最佳差距从38.43 \%降低到18.85%。我们的实施可在线提供。
translated by 谷歌翻译
Meta-learning has been proposed as a framework to address the challenging few-shot learning setting. The key idea is to leverage a large number of similar few-shot tasks in order to learn how to adapt a base-learner to a new task for which only a few labeled samples are available. As deep neural networks (DNNs) tend to overfit using a few samples only, meta-learning typically uses shallow neural networks (SNNs), thus limiting its effectiveness. In this paper we propose a novel few-shot learning method called meta-transfer learning (MTL) which learns to adapt a deep NN for few shot learning tasks. Specifically, meta refers to training multiple tasks, and transfer is achieved by learning scaling and shifting functions of DNN weights for each task. In addition, we introduce the hard task (HT) meta-batch scheme as an effective learning curriculum for MTL. We conduct experiments using (5-class, 1-shot) and (5-class, 5shot) recognition tasks on two challenging few-shot learning benchmarks: miniImageNet and Fewshot-CIFAR100. Extensive comparisons to related works validate that our meta-transfer learning approach trained with the proposed HT meta-batch scheme achieves top performance. An ablation study also shows that both components contribute to fast convergence and high accuracy 1 .Optimize θ by Eq. 3; 5 end 6 Optimize Φ S {1,2} and θ by Eq. 4 and Eq. 5; 7 while not done do 8 Sample class-k in T (te) ; 9 Compute Acc k for T (te) ; 10 end 11 Return class-m with the lowest accuracy Acc m .
translated by 谷歌翻译
随着预先训练模型的巨大成功,Pretrain-Then-Finetune范式已被广泛采用下游任务,以获得源代码的理解。但是,与昂贵的培训从头开始培训,如何将预先训练的模型从划痕进行有效地调整到新任务的训练模型尚未完全探索。在本文中,我们提出了一种桥接预先训练的模型和与代码相关任务的方法。我们利用语义保留的转换来丰富下游数据分集,并帮助预先接受的模型学习语义特征不变于这些语义上等效的转换。此外,我们介绍课程学习以易于努力的方式组织转换的数据,以微调现有的预先训练的模型。我们将我们的方法应用于一系列预先训练的型号,它们在源代码理解的任务中显着优于最先进的模型,例如算法分类,代码克隆检测和代码搜索。我们的实验甚至表明,在没有重量训练的代码数据上,自然语言预先训练的模型罗伯塔微调我们的轻质方法可以优于或竞争现有的代码,在上述任务中进行微调,如Codebert和Codebert和GraphCodebert。这一发现表明,代码预训练模型中仍有很大的改进空间。
translated by 谷歌翻译
In lifelong learning, the learner is presented with a sequence of tasks, incrementally building a data-driven prior which may be leveraged to speed up learning of a new task. In this work, we investigate the efficiency of current lifelong approaches, in terms of sample complexity, computational and memory cost. Towards this end, we first introduce a new and a more realistic evaluation protocol, whereby learners observe each example only once and hyper-parameter selection is done on a small and disjoint set of tasks, which is not used for the actual learning experience and evaluation. Second, we introduce a new metric measuring how quickly a learner acquires a new skill. Third, we propose an improved version of GEM (Lopez-Paz & Ranzato, 2017), dubbed Averaged GEM (A-GEM), which enjoys the same or even better performance as GEM, while being almost as computationally and memory efficient as EWC and other regularizationbased methods. Finally, we show that all algorithms including A-GEM can learn even more quickly if they are provided with task descriptors specifying the classification tasks under consideration. Our experiments on several standard lifelong learning benchmarks demonstrate that A-GEM has the best trade-off between accuracy and efficiency. 1
translated by 谷歌翻译