尽管从合成训练数据中学习最近引起了人们的关注,但在现实世界的机器人应用中,由于所谓的SIM到现实差距,仍存在性能缺陷。实际上,仅使用合成数据很难解决此差距。因此,我们专注于在SIM到现实学习管道中有效地获取真实数据。具体而言,我们采用深层贝叶斯积极的学习来最大程度地减少手动注释工作,并设计自主学习范式,以选择被认为对人类专家的注释有用的数据。为此,提供可靠的不确定性估计值的贝叶斯神经网络(BNN)对象探测器可用于推断未标记数据的信息。此外,为了应对基于不确定性的抽样中标签分布的错误对准,我们制定了一种有效的随机抽样策略,该策略与其他复杂替代方案相比表现良好。在我们的对象分类和检测的实验中,我们显示了方法的好处,并提供了可以大大减少标签工作的证据。最后,我们在辅助机器人的掌握任务中证明了这一想法的实际有效性。
translated by 谷歌翻译
As an important data selection schema, active learning emerges as the essential component when iterating an Artificial Intelligence (AI) model. It becomes even more critical given the dominance of deep neural network based models, which are composed of a large number of parameters and data hungry, in application. Despite its indispensable role for developing AI models, research on active learning is not as intensive as other research directions. In this paper, we present a review of active learning through deep active learning approaches from the following perspectives: 1) technical advancements in active learning, 2) applications of active learning in computer vision, 3) industrial systems leveraging or with potential to leverage active learning for data iteration, 4) current limitations and future research directions. We expect this paper to clarify the significance of active learning in a modern AI model manufacturing process and to bring additional research attention to active learning. By addressing data automation challenges and coping with automated machine learning systems, active learning will facilitate democratization of AI technologies by boosting model production at scale.
translated by 谷歌翻译
接受注释较弱的对象探测器是全面监督者的负担得起的替代方案。但是,它们之间仍然存在显着的性能差距。我们建议通过微调预先训练的弱监督检测器来缩小这一差距,并使用``Box-In-box''(bib'(bib)自动从训练集中自动选择了一些完全注销的样品,这是一种新颖的活跃学习专门针对弱势监督探测器的据可查的失败模式而设计的策略。 VOC07和可可基准的实验表明,围嘴表现优于其他活跃的学习技术,并显着改善了基本的弱监督探测器的性能,而每个类别仅几个完全宣布的图像。围嘴达到了完全监督的快速RCNN的97%,在VOC07上仅10%的全已通量图像。在可可(COCO)上,平均每类使用10张全面通量的图像,或同等的训练集的1%,还减少了弱监督检测器和完全监督的快速RCN之间的性能差距(In AP)以上超过70% ,在性能和数据效率之间表现出良好的权衡。我们的代码可在https://github.com/huyvvo/bib上公开获取。
translated by 谷歌翻译
Active learning as a paradigm in deep learning is especially important in applications involving intricate perception tasks such as object detection where labels are difficult and expensive to acquire. Development of active learning methods in such fields is highly computationally expensive and time consuming which obstructs the progression of research and leads to a lack of comparability between methods. In this work, we propose and investigate a sandbox setup for rapid development and transparent evaluation of active learning in deep object detection. Our experiments with commonly used configurations of datasets and detection architectures found in the literature show that results obtained in our sandbox environment are representative of results on standard configurations. The total compute time to obtain results and assess the learning behavior can thereby be reduced by factors of up to 14 when comparing with Pascal VOC and up to 32 when comparing with BDD100k. This allows for testing and evaluating data acquisition and labeling strategies in under half a day and contributes to the transparency and development speed in the field of active learning for object detection.
translated by 谷歌翻译
Recent aerial object detection models rely on a large amount of labeled training data, which requires unaffordable manual labeling costs in large aerial scenes with dense objects. Active learning is effective in reducing the data labeling cost by selectively querying the informative and representative unlabelled samples. However, existing active learning methods are mainly with class-balanced setting and image-based querying for generic object detection tasks, which are less applicable to aerial object detection scenario due to the long-tailed class distribution and dense small objects in aerial scenes. In this paper, we propose a novel active learning method for cost-effective aerial object detection. Specifically, both object-level and image-level informativeness are considered in the object selection to refrain from redundant and myopic querying. Besides, an easy-to-use class-balancing criterion is incorporated to favor the minority objects to alleviate the long-tailed class distribution problem in model training. To fully utilize the queried information, we further devise a training loss to mine the latent knowledge in the undiscovered image regions. Extensive experiments are conducted on the DOTA-v1.0 and DOTA-v2.0 benchmarks to validate the effectiveness of the proposed method. The results show that it can save more than 75% of the labeling cost to reach the same performance compared to the baselines and state-of-the-art active object detection methods. Code is available at https://github.com/ZJW700/MUS-CDB
translated by 谷歌翻译
在非结构化环境中工作的机器人必须能够感知和解释其周围环境。机器人技术领域基于深度学习模型的主要障碍之一是缺乏针对不同工业应用的特定领域标记数据。在本文中,我们提出了一种基于域随机化的SIM2REAL传输学习方法,用于对象检测,可以自动生成任意大小和对象类型的标记的合成数据集。随后,对最先进的卷积神经网络Yolov4进行了训练,以检测不同类型的工业对象。通过提出的域随机化方法,我们可以在零射击和单次转移的情况下分别缩小现实差距,分别达到86.32%和97.38%的MAP50分数,其中包含190个真实图像。在GEFORCE RTX 2080 TI GPU上,数据生成过程的每图像少于0.5 s,培训持续约12H,这使其方便地用于工业使用。我们的解决方案符合工业需求,因为它可以通过仅使用1个真实图像进行培训来可靠地区分相似的对象类别。据我们所知,这是迄今为止满足这些约束的唯一工作。
translated by 谷歌翻译
在本文中,我们提出了一个迭代的自我训练框架,用于SIM到现实的6D对象姿势估计,以促进具有成本效益的机器人抓钩。给定bin选择场景,我们建立了一个光真实的模拟器来合成丰富的虚拟数据,并使用它来训练初始姿势估计网络。然后,该网络扮演教师模型的角色,该模型为未标记的真实数据生成了姿势预测。有了这些预测,我们进一步设计了一个全面的自适应选择方案,以区分可靠的结果,并将它们作为伪标签来更新学生模型以估算真实数据。为了不断提高伪标签的质量,我们通过将受过训练的学生模型作为新老师并使用精致的教师模型重新标记实际数据来迭代上述步骤。我们在公共基准和新发布的数据集上评估了我们的方法,分别提高了11.49%和22.62%的方法。我们的方法还能够将机器人箱的成功成功提高19.54%,这表明了对机器人应用的迭代SIM到现实解决方案的潜力。
translated by 谷歌翻译
The performance of deep neural networks improves with more annotated data. The problem is that the budget for annotation is limited. One solution to this is active learning, where a model asks human to annotate data that it perceived as uncertain. A variety of recent methods have been proposed to apply active learning to deep networks but most of them are either designed specific for their target tasks or computationally inefficient for large networks. In this paper, we propose a novel active learning method that is simple but task-agnostic, and works efficiently with the deep networks. We attach a small parametric module, named "loss prediction module," to a target network, and learn it to predict target losses of unlabeled inputs. Then, this module can suggest data that the target model is likely to produce a wrong prediction. This method is task-agnostic as networks are learned from a single loss regardless of target tasks. We rigorously validate our method through image classification, object detection, and human pose estimation, with the recent network architectures. The results demonstrate that our method consistently outperforms the previous methods over the tasks.
translated by 谷歌翻译
主动学习(al)试图通过标记最少的样本来最大限度地提高模型的性能增益。深度学习(DL)是贪婪的数据,需要大量的数据电源来优化大量参数,因此模型了解如何提取高质量功能。近年来,由于互联网技术的快速发展,我们处于信息种类的时代,我们有大量的数据。通过这种方式,DL引起了研究人员的强烈兴趣,并已迅速发展。与DL相比,研究人员对Al的兴趣相对较低。这主要是因为在DL的崛起之前,传统的机器学习需要相对较少的标记样品。因此,早期的Al很难反映其应得的价值。虽然DL在各个领域取得了突破,但大多数这一成功都是由于大量现有注释数据集的宣传。然而,收购大量高质量的注释数据集消耗了很多人力,这在某些领域不允许在需要高专业知识,特别是在语音识别,信息提取,医学图像等领域中, al逐渐受到适当的关注。自然理念是AL是否可用于降低样本注释的成本,同时保留DL的强大学习能力。因此,已经出现了深度主动学习(DAL)。虽然相关的研究非常丰富,但它缺乏对DAL的综合调查。本文要填补这一差距,我们为现有工作提供了正式的分类方法,以及全面和系统的概述。此外,我们还通过申请的角度分析并总结了DAL的发展。最后,我们讨论了DAL中的混乱和问题,为DAL提供了一些可能的发展方向。
translated by 谷歌翻译
The generalisation performance of a convolutional neural networks (CNN) is majorly predisposed by the quantity, quality, and diversity of the training images. All the training data needs to be annotated in-hand before, in many real-world applications data is easy to acquire but expensive and time-consuming to label. The goal of the Active learning for the task is to draw most informative samples from the unlabeled pool which can used for training after annotation. With total different objective, self-supervised learning which have been gaining meteoric popularity by closing the gap in performance with supervised methods on large computer vision benchmarks. self-supervised learning (SSL) these days have shown to produce low-level representations that are invariant to distortions of the input sample and can encode invariance to artificially created distortions, e.g. rotation, solarization, cropping etc. self-supervised learning (SSL) approaches rely on simpler and more scalable frameworks for learning. In this paper, we unify these two families of approaches from the angle of active learning using self-supervised learning mainfold and propose Deep Active Learning using BarlowTwins(DALBT), an active learning method for all the datasets using combination of classifier trained along with self-supervised loss framework of Barlow Twins to a setting where the model can encode the invariance of artificially created distortions, e.g. rotation, solarization, cropping etc.
translated by 谷歌翻译
通过选择最具信息丰富的样本,已证明主动学习可用于最小化标记成本。但是,现有的主动学习方法在诸如不平衡或稀有类别的现实方案中不适用于未标记集中的分发数据和冗余。在这项工作中,我们提出了类似的(基于子模块信息措施的主动学习),使用最近提出的子模块信息措施(SIM)作为采集函数的统一主动学习框架。我们认为类似的不仅在标准的主动学习中工作,而且还可以轻松扩展到上面考虑的现实设置,并充当活动学习的一站式解决方案,可以扩展到大型真实世界数据集。凭经验,我们表明,在罕见的课程的情况下,在罕见的阶级和〜5% - 10%的情况下,在罕见的几个图像分类任务的情况下,相似显着优异的活动学习算法像CiFar-10,Mnist和Imagenet。类似于Distil Toolkit的一部分:“https://github.com/decile-team/distil”。
translated by 谷歌翻译
该贡献的重点是摄像机模拟,因为它在模拟其虚拟原型制作时会发挥作用。我们根据感知算法的性能和测量性能的上下文提出了相机模型验证方法。这种方法与传统的合成图像验证不同,合成图像通常是在像素或特征级别进行的,并且倾向于需要匹配的一对合成图像和真实图像。由于获取配对图像的成本和限制很高,因此提出的方法基于不一定是配对的数据集。在真实和模拟数据集中,A和B分别在统计上找到了类似内容和法官的子集AC和BC子集AC和BC,从统计学上讲,感知算法对这些相似子集的响应。这种验证方法获得了性能相似性的统计度量,以及A和B的内容之间的相似性度量,使用Chrono ::传感器生成的图像和缩放自动驾驶汽车,使用对象检测器作为对象检测器作为量表来证明该方法。感知算法。结果证明了量化模拟和真实数据之间(i)差异的能力; (ii)减轻SIM到真实差距的训练方法的倾向; (iii)两个数据集之间的上下文重叠。
translated by 谷歌翻译
Object detection requires substantial labeling effort for learning robust models. Active learning can reduce this effort by intelligently selecting relevant examples to be annotated. However, selecting these examples properly without introducing a sampling bias with a negative impact on the generalization performance is not straightforward and most active learning techniques can not hold their promises on real-world benchmarks. In our evaluation paper, we focus on active learning techniques without a computational overhead besides inference, something we refer to as zero-cost active learning. In particular, we show that a key ingredient is not only the score on a bounding box level but also the technique used for aggregating the scores for ranking images. We outline our experimental setup and also discuss practical considerations when using active learning for object detection.
translated by 谷歌翻译
基于池的主动学习(AL)通过依次从大型未标记数据池中选择信息的未标记样本并从Oracle/Ontoter中查询标签,从而取得了巨大成功。但是,现有的AL采样策略可能在分布外(OOD)数据方案中无法很好地工作,其中未标记的数据池包含一些不属于目标任务类别的数据示例。在OOD数据情景下实现良好的AL性能是一项具有挑战性的任务,因为Al采样策略与OOD样本检测之间的自然冲突。 Al选择很难由当前基本分类器进行分类的数据(例如,预测类概率具有较高熵的样品),而OOD样品往往具有比分布更均匀的预测类概率(即高熵)(即高熵)(ID ) 数据。在本文中,我们提出了一种采样方案,即用于主动学习的蒙特 - 卡洛帕累托优化(POAL),该方案从未标记的数据库中选择了具有固定批次大小的未标记样品的最佳子集。我们将AL采样任务施加为多目标优化问题,因此我们基于两个冲突的目标利用Pareto优化:(1)正常的AL数据采样方案(例如,最大熵)和(2)作为OOD样本。实验结果表明其对经典机器学习(ML)和深度学习(DL)任务的有效性。
translated by 谷歌翻译
While deep learning succeeds in a wide range of tasks, it highly depends on the massive collection of annotated data which is expensive and time-consuming. To lower the cost of data annotation, active learning has been proposed to interactively query an oracle to annotate a small proportion of informative samples in an unlabeled dataset. Inspired by the fact that the samples with higher loss are usually more informative to the model than the samples with lower loss, in this paper we present a novel deep active learning approach that queries the oracle for data annotation when the unlabeled sample is believed to incorporate high loss. The core of our approach is a measurement Temporal Output Discrepancy (TOD) that estimates the sample loss by evaluating the discrepancy of outputs given by models at different optimization steps. Our theoretical investigation shows that TOD lower-bounds the accumulated sample loss thus it can be used to select informative unlabeled samples. On basis of TOD, we further develop an effective unlabeled data sampling strategy as well as an unsupervised learning criterion for active learning. Due to the simplicity of TOD, our methods are efficient, flexible, and task-agnostic. Extensive experimental results demonstrate that our approach achieves superior performances than the state-of-the-art active learning methods on image classification and semantic segmentation tasks. In addition, we show that TOD can be utilized to select the best model of potentially the highest testing accuracy from a pool of candidate models.
translated by 谷歌翻译
机器人的视觉系统根据应用程序的要求不同:它可能需要高精度或可靠性,受到有限的资源的约束或需要快速适应动态变化的环境。在这项工作中,我们专注于实例分割任务,并对不同的技术进行了全面的研究,这些技术允许在存在新对象或不同域的存在下调整对象分割模型。我们为针对数据流入的机器人应用设计的快速实例细分学习提供了一条管道。它基于在预训练的CNN上利用的混合方法,用于特征提取和基于快速培训的基于内核的分类器。我们还提出了一种培训协议,该协议可以通过在数据采集期间执行特征提取来缩短培训时间。我们在两个机器人数据集上基准了提议的管道,然后将其部署在一个真实的机器人上,即iCub类人体。为了这个目的,我们将方法调整为一个增量设置,在该设置中,机器人在线学习新颖对象。复制实验的代码在GitHub上公开可用。
translated by 谷歌翻译
我们提出了一种新颖的方法,即沙拉,用于将预先训练的“源”域网络适应“目标”域的挑战性视觉任务,在“目标”域中注释的预算很小,标签空间的变化。此外,该任务假定由于隐私问题或其他方式,源数据无法适应。我们假设这样的系统需要共同优化(i)从目标域中选择固定数量的样本以进行注释的双重任务,以及(ii)知识从预训练的网络转移到目标域。为此,沙拉由一个新颖的引导注意转移网络(GATN)和一个主动学习功能组成。 GATN启用了从预训练的网络到目标网络的特征蒸馏,并与HAL采用的转移性和不确定性标准相辅相成。沙拉有三个关键的好处:(i)它是任务不合时宜的,可以在各种视觉任务(例如分类,分割和检测)中应用; (ii)它可以处理从预训练的源网络到目标域的输出标签空间的变化; (iii)它不需要访问源数据进行适应。我们对3个视觉任务进行了广泛的实验,即。数字分类(MNIST,SVHN,VISDA),合成(GTA5)与真实(CityScapes)图像分割和文档布局检测(PublayNet to DSSE)。我们表明,我们的无源方法(沙拉)比先前的适应方法提高了0.5%-31.3%(跨数据集和任务),该方法假设访问大量带注释的源数据以进行适应。
translated by 谷歌翻译
Bridging the 'reality gap' that separates simulated robotics from experiments on hardware could accelerate robotic research through improved data availability. This paper explores domain randomization, a simple technique for training models on simulated images that transfer to real images by randomizing rendering in the simulator. With enough variability in the simulator, the real world may appear to the model as just another variation. We focus on the task of object localization, which is a stepping stone to general robotic manipulation skills. We find that it is possible to train a real-world object detector that is accurate to 1.5 cm and robust to distractors and partial occlusions using only data from a simulator with non-realistic random textures. To demonstrate the capabilities of our detectors, we show they can be used to perform grasping in a cluttered environment. To our knowledge, this is the first successful transfer of a deep neural network trained only on simulated RGB images (without pre-training on real images) to the real world for the purpose of robotic control.
translated by 谷歌翻译
时间动作定位(TAL)旨在预测未修剪视频(即开始和结束时间)中动作实例的动作类别和时间边界。通常在大多数现有作品中都采用了完全监督的解决方案,并被证明是有效的。这些解决方案中的实际瓶颈之一是所需的大量标记培训数据。为了降低昂贵的人类标签成本,本文着重于很少调查但实用的任务,称为半监督TAL,并提出了一种有效的主动学习方法,名为Al-Stal。我们利用四个步骤来积极选择具有很高信息性的视频样本,并培训本地化模型,名为\ emph {火车,查询,注释,附加}。考虑定位模型的不确定性的两个评分函数配备了ALSTAL,从而促进了视频样本等级和选择。一个人将预测标签分布的熵作为不确定性的度量,称为时间提案熵(TPE)。另一个引入了基于相邻行动建议之间的共同信息的新指标,并评估视频样本的信息性,称为时间上下文不一致(TCI)。为了验证拟议方法的有效性,我们在两个基准数据集Thumos'14和ActivityNet 1.3上进行了广泛的实验。实验结果表明,与完全监督的学习相比,AL-Stal的表现优于现有竞争对手,并实现令人满意的表现。
translated by 谷歌翻译
深度学习方法需要大量的注释数据以优化参数。例如,附加具有准确边界框注释的数据集对于现代对象检测任务至关重要。但是,具有这样的像素准确性的标签是费力且耗时的,并且精心制作的标记程序对于降低人造噪声是必不可少的,涉及注释审查和接受测试。在本文中,我们关注嘈杂的位置注释对对象检测方法的性能的影响,并旨在减少噪声的不利影响。首先,当将噪声引入边界框注释中时,一阶段和两阶段检测器都会在实验上观察到明显的性能降解。例如,我们的合成噪声导致可可测试分裂的FCO探测器的性能从38.9%的AP降低到33.6%的AP,对于更快的R-CNN而言,COCO检测器的性能从38.9%的AP下降到37.8%的AP和33.7%的AP。其次,提出了一种基于贝叶斯过滤器进行预测合奏的自我纠正技术,以更好地利用教师学习范式后的嘈杂位置注释。合成和现实世界情景的实验始终证明了我们方法的有效性,例如,我们的方法将FCOS检测器的降解性能从33.6%的AP提高到可可的35.6%AP。
translated by 谷歌翻译