While deep learning succeeds in a wide range of tasks, it highly depends on the massive collection of annotated data which is expensive and time-consuming. To lower the cost of data annotation, active learning has been proposed to interactively query an oracle to annotate a small proportion of informative samples in an unlabeled dataset. Inspired by the fact that the samples with higher loss are usually more informative to the model than the samples with lower loss, in this paper we present a novel deep active learning approach that queries the oracle for data annotation when the unlabeled sample is believed to incorporate high loss. The core of our approach is a measurement Temporal Output Discrepancy (TOD) that estimates the sample loss by evaluating the discrepancy of outputs given by models at different optimization steps. Our theoretical investigation shows that TOD lower-bounds the accumulated sample loss thus it can be used to select informative unlabeled samples. On basis of TOD, we further develop an effective unlabeled data sampling strategy as well as an unsupervised learning criterion for active learning. Due to the simplicity of TOD, our methods are efficient, flexible, and task-agnostic. Extensive experimental results demonstrate that our approach achieves superior performances than the state-of-the-art active learning methods on image classification and semantic segmentation tasks. In addition, we show that TOD can be utilized to select the best model of potentially the highest testing accuracy from a pool of candidate models.
translated by 谷歌翻译
While mislabeled or ambiguously-labeled samples in the training set could negatively affect the performance of deep models, diagnosing the dataset and identifying mislabeled samples helps to improve the generalization power. Training dynamics, i.e., the traces left by iterations of optimization algorithms, have recently been proved to be effective to localize mislabeled samples with hand-crafted features. In this paper, beyond manually designed features, we introduce a novel learning-based solution, leveraging a noise detector, instanced by an LSTM network, which learns to predict whether a sample was mislabeled using the raw training dynamics as input. Specifically, the proposed method trains the noise detector in a supervised manner using the dataset with synthesized label noises and can adapt to various datasets (either naturally or synthesized label-noised) without retraining. We conduct extensive experiments to evaluate the proposed method. We train the noise detector based on the synthesized label-noised CIFAR dataset and test such noise detector on Tiny ImageNet, CUB-200, Caltech-256, WebVision and Clothing1M. Results show that the proposed method precisely detects mislabeled samples on various datasets without further adaptation, and outperforms state-of-the-art methods. Besides, more experiments demonstrate that the mislabel identification can guide a label correction, namely data debugging, providing orthogonal improvements of algorithm-centric state-of-the-art techniques from the data aspect.
translated by 谷歌翻译
Different from the general visual classification, some classification tasks are more challenging as they need the professional categories of the images. In the paper, we call them expert-level classification. Previous fine-grained vision classification (FGVC) has made many efforts on some of its specific sub-tasks. However, they are difficult to expand to the general cases which rely on the comprehensive analysis of part-global correlation and the hierarchical features interaction. In this paper, we propose Expert Network (ExpNet) to address the unique challenges of expert-level classification through a unified network. In ExpNet, we hierarchically decouple the part and context features and individually process them using a novel attentive mechanism, called Gaze-Shift. In each stage, Gaze-Shift produces a focal-part feature for the subsequent abstraction and memorizes a context-related embedding. Then we fuse the final focal embedding with all memorized context-related embedding to make the prediction. Such an architecture realizes the dual-track processing of partial and global information and hierarchical feature interactions. We conduct the experiments over three representative expert-level classification tasks: FGVC, disease classification, and artwork attributes classification. In these experiments, superior performance of our ExpNet is observed comparing to the state-of-the-arts in a wide range of fields, indicating the effectiveness and generalization of our ExpNet. The code will be made publicly available.
translated by 谷歌翻译
实时机器学习检测算法通常在自动驾驶汽车技术中发现,并依赖优质数据集。这些算法在日常条件以及强烈的阳光下都能正常工作。报告表明,眩光是撞车事故最突出的两个最突出的原因之一。但是,现有的数据集,例如LISA和德国交通标志识别基准,根本不反映Sun Glare的存在。本文介绍了眩光交通标志数据集:在阳光下重大视觉干扰下,具有基于美国的交通标志的图像集合。眩光包含2,157张带有阳光眩光的交通标志图像,从33个美国道路录像带中拉出。它为广泛使用的Lisa流量标志数据集提供了必不可少的丰富。我们的实验研究表明,尽管几种最先进的基线方法在没有太阳眩光的情况下对交通符号数据集进行了训练和测试,但在对眩光进行测试时,它们遭受了极大的痛苦(例如,9%至21%的平均图范围为9%至21%。 ,它明显低于LISA数据集上的性能)。我们还注意到,当对Sun Glare中的交通标志图像进行培训时,当前的架构具有更好的检测准确性(例如,主流算法平均42%的平均地图增益)。
translated by 谷歌翻译
时间序列模型通常处理极端事件和异常,这两者都在现实世界数据集中普遍存在。这样的模型通常需要提供仔细的概率预测,这对于诸如飓风和大流行等极端事件的风险管理至关重要。但是,自动检测并学习对大规模数据集使用极端事件和异常,这是一项挑战,这通常需要手动努力。因此,我们提出了一个异常的预测框架,该框架利用了先前看到的异常作用来提高其在极端事件存在期间和之后的预测准确性。具体而言,该框架会自动提取异常,并通过注意机制将其合并,以提高其未来极端事件的准确性。此外,该框架采用动态不确定性优化算法,以在线方式降低预测的不确定性。所提出的框架表现出一致的卓越精度,而在三个数据集上,与当前预测模型相比,三个具有不同异常的数据集的不确定性。
translated by 谷歌翻译
与单个机器人相比,多个移动操纵器在需要移动性和灵活性的任务中表现出优势,尤其是在操纵/运输笨重的物体时。当对象和操纵器紧密地连接时,将形成闭合链,整个系统的运动将被限制在较低的歧管上。但是,当前对多机器人运动计划的研究并未完全考虑整个系统的形成,移动操纵器的冗余以及环境中的障碍,这使得任务具有挑战性。因此,本文提出了一个层次结构框架,以有效地解决上述挑战,其中集中式层计划离线运动的运动和分散层独立地实时探索每个机器人的冗余。此外,在集中式层中保证了封闭链,避免障碍物和地层限制的下限,其他计划者无法同时实现。此外,代表编队约束的分布的能力图可用于加快两层。仿真和实验结果都表明,所提出的框架的表现明显优于基准规划师。该系统可以在混乱的环境中绕过或跨越障碍物,并且该框架可以应用于不同数量的异质移动操纵器。
translated by 谷歌翻译
尽管深度学习已被广​​泛用于视频分析,例如视频分类和动作检测,但与体育视频的快速移动主题进行密集的动作检测仍然具有挑战性。在这项工作中,我们发布了另一个体育视频数据集$ \ textbf {p $^2 $ a} $ for $ \ usewessline {p} $ \ in $ \ usepline {p} $ ong- $ \ $ \ usepline {a} $ ction ction ction检测,由2,721个视频片段组成,这些视频片段从世界乒乓球锦标赛和奥林匹克运动会的专业乒乓球比赛的广播视频中收集。我们与一批乒乓球专业人士和裁判员合作,以获取出现在数据集中的每个乒乓球动作,并提出两组动作检测问题 - 行动定位和行动识别。我们使用$ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ fextbf {p $^2 $^2 $^2 $ a^2 $^2 $ a^2 $^2 $ a^2 $ a^2 $^$^2 $ a^2 $^2 $ a^2 $^2 $ a^2 $^2 $ a^2 $^2 $^2 $ a^2 $^2 $ a^2 $^2 $^2 $^2 $^2 $^2 $^2 $ a在各种设置下,这两个问题的$} $。这些模型只能在AR-AN曲线下实现48%的面积,以进行本地化,而识别次数为82%,因为Ping-Pong的动作密集具有快速移动的主题,但广播视频仅为25 fps。结果证实,$ \ textbf {p $^2 $ a} $仍然是一项具有挑战性的任务,可以用作视频中动作检测的基准。
translated by 谷歌翻译
虽然微调预训练的网络已成为训练图像分割模型的流行方式,但这种用于图像分割的骨干网络经常使用图像分类源数据集(例如ImageNet)进行预训练。尽管图像分类数据集可以为骨干网络提供丰富的视觉特征和歧视能力,但它们无法以端到端的方式完全预训练目标模型(即骨干+分割模块)。由于分类数据集中缺乏分割标签,因此在微调过程中进行分割模块在微调过程中随机初始化。在我们的工作中,我们提出了一种利用伪语义分割标签(PSSL)的方法,以启用基于分类数据集的图像分割模型的端到端预训练。 PSSL的启发是受到观察的启发,即通过CAM,Smoothgrad和Lime等解释算法获得的分类模型的解释结果将接近视觉对象的像素簇。具体而言,通过解释分类结果并汇总了从多个分类器查询的解释集合来降低单个模型引起的偏差,从而为每个图像获得PSSL。使用PSSL,对于ImageNet的每个图像,提出的方法都利用加权分割学习程序来预先培训分割网络。实验结果表明,在Imagenet伴随PSSL作为源数据集的情况下,提出的端到端预训练策略成功地增强了各种分割模型的性能,即PSPNET-RESNET50,DEEPLABV3-RESNET50和OCRNET-HRNET-HRNETENET-HRNETENET-HRNETENET-HRNETENET-HRNETW18,和在许多细分任务上,例如CAMVID,VOC-A,VOC-C,ADE20K和CityScapes,并有重大改进。源代码可在https://github.com/paddlepaddle/paddleseg上使用。
translated by 谷歌翻译
基于池的主动学习(AL)通过依次从大型未标记数据池中选择信息的未标记样本并从Oracle/Ontoter中查询标签,从而取得了巨大成功。但是,现有的AL采样策略可能在分布外(OOD)数据方案中无法很好地工作,其中未标记的数据池包含一些不属于目标任务类别的数据示例。在OOD数据情景下实现良好的AL性能是一项具有挑战性的任务,因为Al采样策略与OOD样本检测之间的自然冲突。 Al选择很难由当前基本分类器进行分类的数据(例如,预测类概率具有较高熵的样品),而OOD样品往往具有比分布更均匀的预测类概率(即高熵)(即高熵)(ID ) 数据。在本文中,我们提出了一种采样方案,即用于主动学习的蒙特 - 卡洛帕累托优化(POAL),该方案从未标记的数据库中选择了具有固定批次大小的未标记样品的最佳子集。我们将AL采样任务施加为多目标优化问题,因此我们基于两个冲突的目标利用Pareto优化:(1)正常的AL数据采样方案(例如,最大熵)和(2)作为OOD样本。实验结果表明其对经典机器学习(ML)和深度学习(DL)任务的有效性。
translated by 谷歌翻译
虽然深度学习(DL)是渴望数据的,并且通常依靠广泛的标记数据来提供良好的性能,但主动学习(AL)通过从未标记的数据中选择一小部分样本进行标签和培训来降低标签成本。因此,近年来,在有限的标签成本/预算下,深入的积极学习(DAL)是可行的解决方案,可在有限的标签成本/预算下最大化模型性能。尽管已经开发了大量的DAL方法并进行了各种文献综述,但在公平比较设置下对DAL方法的性能评估尚未可用。我们的工作打算填补这一空白。在这项工作中,我们通过重新实现19种引用的DAL方法来构建DAL Toolkit,即Deepal+。我们调查和分类与DAL相关的作品,并构建经常使用的数据集和DAL算法的比较实验。此外,我们探讨了影响DAL功效的一些因素(例如,批处理大小,训练过程中的时期数),这些因素为研究人员设计其DAL实验或执行DAL相关应用程序提供了更好的参考。
translated by 谷歌翻译