课程学习是一种强大的培训方法,可以在某些情况下更快,更好的培训。但是,这种方法需要一个概念,即哪些示例很难且容易,这并不总是很容易提供。最近称为C得分的度量标准将其作为代理,例如,将其与学习一致性联系起来。不幸的是,这种方法是相当大的强化,从而限制了其对替代数据集的适用性。在这项工作中,我们通过不同的方法训练模型,以预测CIFAR-100和CIFAR-10的C得分。但是,我们发现这些模型在相同的分布和分布不足之内都概括了。这表明C分数不是由每个样本的个体特征定义的,而是由其他因素定义的。我们假设样本与其邻居的关系,尤其是其中有多少人共享相同的标签,可以帮助解释C分数。我们计划在未来的工作中探索这一点。
translated by 谷歌翻译
持续的学习方法努力减轻灾难性遗忘(CF),在学习新任务时,从以前学习的任务中丢失了知识。在这些算法中,有些在训练时维护以前任务中的样本子集。这些样本称为内存。这些方法表现出出色的性能,同时在概念上简单易于实现。然而,尽管它们很受欢迎,但几乎没有做任何事情来理解要包含在记忆中的元素。当前,这种记忆通常是通过随机抽样填充的,没有指导原则可以有助于保留以前的知识。在这项工作中,我们提出了一个基于称为一致性意识采样(CAWS)的样本的学习一致性的标准。该标准优先考虑通过深网更容易学习的样本。我们对三种不同的基于内存的方法进行研究:AGEM,GDUMB和经验重播,在MNIST,CIFAR-10和CIFAR-100数据集上。我们表明,使用最一致的元素在受到计算预算的约束时会产生性能提高;如果在没有这种约束的情况下,随机抽样是一个强大的基线。但是,在经验重播上使用CAWS可以改善随机基线的性能。最后,我们表明CAWS取得了与流行的内存选择方法相似的结果,同时需要大大减少计算资源。
translated by 谷歌翻译
来自X射线图像的近端股骨骨折的足够分类对于治疗选择和患者的临床结果至关重要。我们依赖于常用的AO系统,该系统描述了将图像分类为类型和亚型的分层知识树根据裂缝的位置和复杂性。在本文中,我们提出了一种基于卷积神经网络(CNN)自动分类近端股骨骨折的近端骨折分类为3和7 AO类。如已知所知,CNNS需要具有可靠标签的大型和代表性数据集,这很难收集手头的应用。在本文中,我们设计了一个课程学习(CL)方法,在这种情况下通过基本的CNNS性能提高。我们的小说配方团结了三个课程策略:单独加权培训样本,重新排序培训集,以及数据采样子集。这些策略的核心是评分函数排名训练样本。我们定义了两种小说评分函数:一个来自域的特定于域的先前知识和原始的自我节奏的不确定性分数。我们对近端股骨射线照片的临床数据集进行实验。课程改善了近端股骨骨折分类,达到了经验丰富的创伤外科医生的性能。最佳课程方法根据现有知识重新排列培训集,从而达到15%的分类提高。使用公开可用的MNIST DataSet,我们进一步讨论并展示了我们统一的CL配方对三个受控和具有挑战性的数字识别方案的好处:具有有限的数据,在类别 - 不平衡下以及在标签噪声存在下。我们的工作代码可在:https://github.com/ameliajimenez/curriculum-learning-prior -unctainty。
translated by 谷歌翻译
Research in Curriculum Learning has shown better performance on the task by optimizing the sequence of the training data. Recent works have focused on using complex reinforcement learning techniques to find the optimal data ordering strategy to maximize learning for a given network. In this paper, we present a simple yet efficient technique based on continuous optimization trained with auto-encoding procedure. We call this new approach Training Sequence Optimization (TSO). With a usual encoder-decoder setup we try to learn the latent space continuous representation of the training strategy and a predictor network is used on the continuous representation to predict the accuracy of the strategy on the fixed network architecture. The performance predictor and encoder enable us to perform gradient-based optimization by gradually moving towards the latent space representation of training data ordering with potentially better accuracy. We show an empirical gain of 2AP with our generated optimal curriculum strategy over the random strategy using the CIFAR-100 and CIFAR-10 datasets and have better boosts than the existing state-of-the-art CL algorithms.
translated by 谷歌翻译
We build new test sets for the CIFAR-10 and ImageNet datasets. Both benchmarks have been the focus of intense research for almost a decade, raising the danger of overfitting to excessively re-used test sets. By closely following the original dataset creation processes, we test to what extent current classification models generalize to new data. We evaluate a broad range of models and find accuracy drops of 3% -15% on CIFAR-10 and 11% -14% on ImageNet. However, accuracy gains on the original test sets translate to larger gains on the new test sets. Our results suggest that the accuracy drops are not caused by adaptivity, but by the models' inability to generalize to slightly "harder" images than those found in the original test sets.
translated by 谷歌翻译
了解机器学习模型如何推广到新环境是其安全部署的关键部分。最近的工作提出了各种复杂性度量,这些度量直接预测或理论上结合了模型的概括能力。但是,这些方法依赖于在实践中并不总是满足的一系列强有力的假设。受到有限的设置,可以采用现有措施的有限设置,我们提出了一种基于分类器的局部歧管平滑度的新颖复杂度度量。我们将局部歧管平滑度定义为分类器对给定测试点周围歧管社区中扰动的输出敏感性。直觉上,对这些扰动不太敏感的分类器应更好地概括。为了估计平滑度,我们使用数据扩展进行采样点,并测量分类为多数类的这些点的分数。我们的方法仅需要选择数据增强方法,并且对模型或数据分布没有其他假设,这意味着即使在现有方法无法使用的情况下,也可以在室外(OOD)设置中应用。在图像分类,情感分析和自然语言推断中的鲁棒性基准的实验中,我们证明了我们在100多个火车/测试域对上评估的超过3,000个模型上的流形光滑度量与实际的OOD概括之间存在很强而牢固的相关性。
translated by 谷歌翻译
给定标签噪声的数据(即数据不正确),深神经网络将逐渐记住标签噪声和损害模型性能。为了减轻此问题,提出了课程学习,以通过在有意义的(例如,易于硬)序列中订购培训样本来提高模型性能和概括。先前的工作将错误的样本作为通用的硬性样本,而无需区分硬样品(即正确数据中的硬样品)和不正确的样本。确实,模型应该从硬样本中学习,以促进概括而不是过度拟合错误。在本文中,我们通过在现有的任务损失之外附加新颖的损失函数Indimloss来解决此问题。它的主要影响是在训练的早期阶段自动,稳定地估计简易样品和困难样本(包括硬和不正确的样品)的重要性,以改善模型性能。然后,在以下阶段中,歧视专门用于区分硬性和不正确样本以改善模型的概括。这种培训策略可以以自我监督的方式动态制定,从而有效地模仿课程学习的主要原则。关于图像分类,图像回归,文本序列回归和事件关系推理的实验证明了我们方法的多功能性和有效性,尤其是在存在多样化的噪声水平的情况下。
translated by 谷歌翻译
Confidence calibration -the problem of predicting probability estimates representative of the true correctness likelihood -is important for classification models in many applications. We discover that modern neural networks, unlike those from a decade ago, are poorly calibrated. Through extensive experiments, we observe that depth, width, weight decay, and Batch Normalization are important factors influencing calibration. We evaluate the performance of various post-processing calibration methods on state-ofthe-art architectures with image and document classification datasets. Our analysis and experiments not only offer insights into neural network learning, but also provide a simple and straightforward recipe for practical settings: on most datasets, temperature scaling -a singleparameter variant of Platt Scaling -is surprisingly effective at calibrating predictions.
translated by 谷歌翻译
The ability to quickly and accurately identify covariate shift at test time is a critical and often overlooked component of safe machine learning systems deployed in high-risk domains. While methods exist for detecting when predictions should not be made on out-of-distribution test examples, identifying distributional level differences between training and test time can help determine when a model should be removed from the deployment setting and retrained. In this work, we define harmful covariate shift (HCS) as a change in distribution that may weaken the generalization of a predictive model. To detect HCS, we use the discordance between an ensemble of classifiers trained to agree on training data and disagree on test data. We derive a loss function for training this ensemble and show that the disagreement rate and entropy represent powerful discriminative statistics for HCS. Empirically, we demonstrate the ability of our method to detect harmful covariate shift with statistical certainty on a variety of high-dimensional datasets. Across numerous domains and modalities, we show state-of-the-art performance compared to existing methods, particularly when the number of observed test samples is small.
translated by 谷歌翻译
深度学习在学习高维数据的低维表示方面取得了巨大的成功。如果在感兴趣的数据中没有隐藏的低维结构,那么这一成功将是不可能的。这种存在是由歧管假设提出的,该假设指出数据在于固有维度低的未知流形。在本文中,我们认为该假设无法正确捕获数据中通常存在的低维结构。假设数据在于单个流形意味着整个数据空间的内在维度相同,并且不允许该空间的子区域具有不同数量的变异因素。为了解决这一缺陷,我们提出了多种假设的结合,该假设适应了非恒定固有维度的存在。我们从经验上验证了在常用图像数据集上的这一假设,发现确实应该允许内在维度变化。我们还表明,具有较高内在维度的类更难分类,以及如何使用这种见解来提高分类精度。然后,我们将注意力转移到该假设的影响下,在深层生成模型(DGM)的背景下。当前的大多数DGM都难以建模具有几个连接组件和/或不同固有维度的数据集建模。为了解决这些缺点,我们提出了群集的DGM,首先将数据聚集,然后在每个群集上训练DGM。我们表明,聚类的DGM可以模拟具有不同固有维度的多个连接组件,并在没有增加计算要求的情况下经验优于其非簇的非群体。
translated by 谷歌翻译
最近,Miller等。结果表明,模型的分布(ID)精度与几个OOD基准上的分布(OOD)精度具有很强的线性相关性 - 一种将它们称为“准确性”的现象。虽然一种用于模型选择的有用工具(即,最有可能执行最佳OOD的模型是具有最高ID精度的模型),但此事实无助于估计模型的实际OOD性能,而无需访问标记的OOD验证集。在本文中,我们展示了一种类似但令人惊讶的现象,也与神经网络分类器对之间的一致性一致:每当在线准确性时,我们都会观察到任何两个神经网络的预测之间的OOD一致性(具有潜在的不同架构)还观察到与他们的ID协议有很强的线性相关性。此外,我们观察到OOD与ID协议的斜率和偏置与OOD与ID准确性的偏差非常匹配。我们称之为“协议”的现象具有重要的实际应用:没有任何标记的数据,我们可以预测分类器的OOD准确性},因为只需使用未标记的数据就可以估算OOD一致性。我们的预测算法在同意在线达成的变化中都优于先前的方法,而且令人惊讶的是,当准确性不在线上时。这种现象还为深度神经网络提供了新的见解:与在线的准确性不同,一致性似乎仅适用于神经网络分类器。
translated by 谷歌翻译
对网络规模数据进行培训可能需要几个月的时间。但是,在已经学习或不可学习的冗余和嘈杂点上浪费了很多计算和时间。为了加速训练,我们引入了可减少的持有损失选择(Rho-loss),这是一种简单但原则上的技术,它大致选择了这些训练点,最大程度地减少了模型的概括损失。结果,Rho-loss减轻了现有数据选择方法的弱点:优化文献中的技术通常选择“硬损失”(例如,高损失),但是这种点通常是嘈杂的(不可学习)或更少的任务与任务相关。相反,课程学习优先考虑“简单”的积分,但是一旦学习,就不必对这些要点进行培训。相比之下,Rho-Loss选择了可以学习的点,值得学习的,尚未学习。与先前的艺术相比,Rho-loss火车的步骤要少得多,可以提高准确性,并加快对广泛的数据集,超参数和体系结构(MLP,CNNS和BERT)的培训。在大型Web绑带图像数据集服装1M上,与统一的数据改组相比,步骤少18倍,最终精度的速度少2%。
translated by 谷歌翻译
Many recent works on understanding deep learning try to quantify how much individual data instances influence the optimization and generalization of a model, either by analyzing the behavior of the model during training or by measuring the performance gap of the model when the instance is removed from the dataset. Such approaches reveal characteristics and importance of individual instances, which may provide useful information in diagnosing and improving deep learning. However, most of the existing works on data valuation require actual training of a model, which often demands high-computational cost. In this paper, we provide a training-free data valuation score, called complexity-gap score, which is a data-centric score to quantify the influence of individual instances in generalization of two-layer overparameterized neural networks. The proposed score can quantify irregularity of the instances and measure how much each data instance contributes in the total movement of the network parameters during training. We theoretically analyze and empirically demonstrate the effectiveness of the complexity-gap score in finding 'irregular or mislabeled' data instances, and also provide applications of the score in analyzing datasets and diagnosing training dynamics.
translated by 谷歌翻译
Multi-label ranking maps instances to a ranked set of predicted labels from multiple possible classes. The ranking approach for multi-label learning problems received attention for its success in multi-label classification, with one of the well-known approaches being pairwise label ranking. However, most existing methods assume that only partial information about the preference relation is known, which is inferred from the partition of labels into a positive and negative set, then treat labels with equal importance. In this paper, we focus on the unique challenge of ranking when the order of the true label set is provided. We propose a novel dedicated loss function to optimize models by incorporating penalties for incorrectly ranked pairs, and make use of the ranking information present in the input. Our method achieves the best reported performance measures on both synthetic and real world ranked datasets and shows improvements on overall ranking of labels. Our experimental results demonstrate that our approach is generalizable to a variety of multi-label classification and ranking tasks, while revealing a calibration towards a certain ranking ordering.
translated by 谷歌翻译
估计深神经网络(DNN)的概括误差(GE)是一项重要任务,通常依赖于持有数据的可用性。基于单个训练集更好地预测GE的能力可能会产生总体DNN设计原则,以减少对试用和错误的依赖以及其他绩效评估优势。为了寻找与GE相关的数量,我们使用无限宽度DNN限制到绑定的MI,研究了输入和最终层表示之间的相互信息(MI)。现有的基于输入压缩的GE绑定用于链接MI和GE。据我们所知,这代表了该界限的首次实证研究。为了实证伪造理论界限,我们发现它通常对于表现最佳模型而言通常很紧。此外,它在许多情况下检测到训练标签的随机化,反映了测试时间扰动的鲁棒性,并且只有很少的培训样本就可以很好地工作。考虑到输入压缩是广泛适用的,可以在信心估算MI的情况下,这些结果是有希望的。
translated by 谷歌翻译
An oft-cited open problem of federated learning is the existence of data heterogeneity at the clients. One pathway to understanding the drastic accuracy drop in federated learning is by scrutinizing the behavior of the clients' deep models on data with different levels of "difficulty", which has been left unaddressed. In this paper, we investigate a different and rarely studied dimension of FL: ordered learning. Specifically, we aim to investigate how ordered learning principles can contribute to alleviating the heterogeneity effects in FL. We present theoretical analysis and conduct extensive empirical studies on the efficacy of orderings spanning three kinds of learning: curriculum, anti-curriculum, and random curriculum. We find that curriculum learning largely alleviates non-IIDness. Interestingly, the more disparate the data distributions across clients the more they benefit from ordered learning. We provide analysis explaining this phenomenon, specifically indicating how curriculum training appears to make the objective landscape progressively less convex, suggesting fast converging iterations at the beginning of the training procedure. We derive quantitative results of convergence for both convex and nonconvex objectives by modeling the curriculum training on federated devices as local SGD with locally biased stochastic gradients. Also, inspired by ordered learning, we propose a novel client selection technique that benefits from the real-world disparity in the clients. Our proposed approach to client selection has a synergic effect when applied together with ordered learning in FL.
translated by 谷歌翻译
在机器学习中,一个极大的兴趣问题是了解哪些示例对于模型进行分类是有挑战性的。确定非典型示例可确保模型的安全部署,隔离需要进一步检查的样本,并为模型行为提供解释性。在这项工作中,我们提出梯度(VOG)的差异为有价值和有效的度量,以通过难度对数据进行排名,并浮出水面最具挑战性的人类审计示例的可行子集。我们表明,对于模型而言,具有较高VOG分数的数据点要在损坏或记忆的示例上学习和过度索引。此外,将评估限制为具有最低VOG的测试集实例,可以改善模型的泛化性能。最后,我们证明VOG是分布外检测的有价值和有效的排名。
translated by 谷歌翻译
Humans and animals learn much better when the examples are not randomly presented but organized in a meaningful order which illustrates gradually more concepts, and gradually more complex ones. Here, we formalize such training strategies in the context of machine learning, and call them "curriculum learning". In the context of recent research studying the difficulty of training in the presence of non-convex training criteria (for deep deterministic and stochastic neural networks), we explore curriculum learning in various set-ups. The experiments show that significant improvements in generalization can be achieved. We hypothesize that curriculum learning has both an effect on the speed of convergence of the training process to a minimum and, in the case of non-convex criteria, on the quality of the local minima obtained: curriculum learning can be seen as a particular form of continuation method (a general strategy for global optimization of non-convex functions).
translated by 谷歌翻译
深度网络泛化界限的研究旨在使用仅使用训练数据集和网络参数来预测测试错误的方法。虽然泛化界限可以给出关于架构设计,培训算法等的许多见解,但他们目前无法做些什么是对实际测试错误的良好预测。最近引入的深度学习竞争中的预测概括旨在鼓励发现更好地预测测试错误的方法。目前的论文调查了一个简单的想法:可以使用使用在同一训练数据集上培训的生成对冲网络(GaN)产生的“合成数据”来预测测试错误?在调查几个GAN模型和架构后,我们发现这结果是这种情况。实际上,使用预先培训的标准数据集预先培训的GAN,可以预测测试错误而不需要任何额外的超参数调谐。这一结果令人惊讶,因为GAN具有众所周知的限制(例如模式崩溃),并且已知不准确地学习数据分布。然而,所生成的样本足以替代测试数据。提出了几个额外的实验以探讨Gans在这项任务中做得好的原因。除了一种预测概括的新方法外,我们工作中展示的反向直观现象还可以更好地了解GANS的优势和局限性。
translated by 谷歌翻译
最佳决策要求分类器产生与其经验准确性一致的不确定性估计。然而,深度神经网络通常在他们的预测中受到影响或过度自信。因此,已经开发了方法,以改善培训和后HOC期间的预测性不确定性的校准。在这项工作中,我们提出了可分解的损失,以改善基于频流校准误差估计底层的钻孔操作的软(连续)版本的校准。当纳入训练时,这些软校准损耗在多个数据集中实现最先进的单一模型ECE,精度低于1%的数量。例如,我们观察到ECE的82%(相对于HOC后射出ECE 70%),以换取相对于CIFAR-100上的交叉熵基线的准确性0.7%的相对降低。在培训后结合时,基于软合成的校准误差目标会改善温度缩放,一种流行的重新校准方法。总体而言,跨损失和数据集的实验表明,使用校准敏感程序在数据集移位下产生更好的不确定性估计,而不是使用跨熵损失和后HOC重新校准方法的标准做法。
translated by 谷歌翻译