Adapting object detectors learned with sufficient supervision to novel classes under low data regimes is charming yet challenging. In few-shot object detection (FSOD), the two-step training paradigm is widely adopted to mitigate the severe sample imbalance, i.e., holistic pre-training on base classes, then partial fine-tuning in a balanced setting with all classes. Since unlabeled instances are suppressed as backgrounds in the base training phase, the learned RPN is prone to produce biased proposals for novel instances, resulting in dramatic performance degradation. Unfortunately, the extreme data scarcity aggravates the proposal distribution bias, hindering the RoI head from evolving toward novel classes. In this paper, we introduce a simple yet effective proposal distribution calibration (PDC) approach to neatly enhance the localization and classification abilities of the RoI head by recycling its localization ability endowed in base training and enriching high-quality positive samples for semantic fine-tuning. Specifically, we sample proposals based on the base proposal statistics to calibrate the distribution bias and impose additional localization and classification losses upon the sampled proposals for fast expanding the base detector to novel classes. Experiments on the commonly used Pascal VOC and MS COCO datasets with explicit state-of-the-art performances justify the efficacy of our PDC for FSOD. Code is available at github.com/Bohao-Lee/PDC.
translated by 谷歌翻译
Few-shot object detection (FSOD), which aims at learning a generic detector that can adapt to unseen tasks with scarce training samples, has witnessed consistent improvement recently. However, most existing methods ignore the efficiency issues, e.g., high computational complexity and slow adaptation speed. Notably, efficiency has become an increasingly important evaluation metric for few-shot techniques due to an emerging trend toward embedded AI. To this end, we present an efficient pretrain-transfer framework (PTF) baseline with no computational increment, which achieves comparable results with previous state-of-the-art (SOTA) methods. Upon this baseline, we devise an initializer named knowledge inheritance (KI) to reliably initialize the novel weights for the box classifier, which effectively facilitates the knowledge transfer process and boosts the adaptation speed. Within the KI initializer, we propose an adaptive length re-scaling (ALR) strategy to alleviate the vector length inconsistency between the predicted novel weights and the pretrained base weights. Finally, our approach not only achieves the SOTA results across three public benchmarks, i.e., PASCAL VOC, COCO and LVIS, but also exhibits high efficiency with 1.8-100x faster adaptation speed against the other methods on COCO/LVIS benchmark during few-shot transfer. To our best knowledge, this is the first work to consider the efficiency problem in FSOD. We hope to motivate a trend toward powerful yet efficient few-shot technique development. The codes are publicly available at https://github.com/Ze-Yang/Efficient-FSOD.
translated by 谷歌翻译
几次射击对象检测的大多数现有方法都遵循微调范式,该范式可能假设可以通过众多样本的基本类别学习并将其隐式转移到具有限量样本的新颖类中,从而将类别的概括性知识隐含地转移到有限的类别中。舞台培训策略。但是,这不一定是正确的,因为对象检测器几乎无法在没有明确的建模的情况下自动区分类别不合时宜的知识和特定于类的知识。在这项工作中,我们建议在基础和新颖类之间学习三种类型的类不足的共同点:与识别相关的语义共同点,与定位相关的语义共同点和分布共同点。我们基于内存库设计了一个统一的蒸馏框架,该框架能够共同有效地进行所有三种类型的共同点。广泛的实验表明,我们的方法可以很容易地集成到大多数现有的基于微调的方法中,并始终如一地通过大幅度提高性能。
translated by 谷歌翻译
零拍摄对象检测(ZSD),将传统检测模型扩展到检测来自Unseen类别的对象的任务,已成为计算机视觉中的新挑战。大多数现有方法通过严格的映射传输策略来解决ZSD任务,这可能导致次优ZSD结果:1)这些模型的学习过程忽略了可用的看不见的类信息,因此可以轻松地偏向所看到的类别; 2)原始视觉特征空间并不合适,缺乏歧视信息。为解决这些问题,我们开发了一种用于ZSD的新型语义引导的对比网络,命名为Contrastzsd,一种检测框架首先将对比学习机制带入零拍摄检测的领域。特别地,对比度包括两个语义导向的对比学学习子网,其分别与区域类别和区域区域对之间形成对比。成对对比度任务利用从地面真理标签和预定义的类相似性分布派生的附加监督信号。在那些明确的语义监督的指导下,模型可以了解更多关于看不见的类别的知识,以避免看到概念的偏见问题,同时优化视觉功能的数据结构,以更好地辨别更好的视觉语义对齐。广泛的实验是在ZSD,即Pascal VOC和MS Coco的两个流行基准上进行的。结果表明,我们的方法优于ZSD和广义ZSD任务的先前最先进的。
translated by 谷歌翻译
虽然基于微调对象检测的基于微调的方法已经取得了显着的进步,但尚未得到很好的解决的关键挑战是基本类别的潜在特定于类别的过度拟合,并且针对新颖的类别的样本特异性过度拟合。在这项工作中,我们设计了一个新颖的知识蒸馏框架,以指导对象探测器的学习,从而抑制基础类别的前训练阶段的过度拟合,并在小型课程上进行微调阶段。要具体而言,我们首先提出了一种新颖的位置感知的视觉袋模型,用于从有限尺寸的图像集中学习代表性的视觉袋(BOVW),该模型用于基于相似性来编码常规图像在学习的视觉单词和图像之间。然后,我们基于以下事实执行知识蒸馏,即图像应在两个不同的特征空间中具有一致的BOVW表示。为此,我们独立于对象检测的特征空间预先学习特征空间,并在此空间中使用BOVW编码图像。可以将图像的BOVW表示形式视为指导对象探测器的学习:对象检测器的提取特征对同一图像的提取特征有望通过蒸馏知识得出一致的BOVW表示。广泛的实验验证了我们方法的有效性,并证明了优于其他最先进方法的优势。
translated by 谷歌翻译
在真实世界的环境中,可以通过对象检测器连续遇到来自新类的对象实例。当现有的对象探测器应用于这种情况时,它们在旧课程上的性能显着恶化。据报道,一些努力解决了这个限制,所有这些限制适用于知识蒸馏的变体,以避免灾难性的遗忘。我们注意到虽然蒸馏有助于保留以前的学习,但它阻碍了对新任务的快速适应性,这是增量学习的关键要求。在这种追求中,我们提出了一种学习方法,可以学习重塑模型梯度,使得跨增量任务的信息是最佳的共享。这可通过META学习梯度预处理来确保无缝信息传输,可最大限度地减少遗忘并最大化知识传输。与现有的元学习方法相比,我们的方法是任务不可知,允许将新类的增量添加到对象检测的高容量模型中。我们在Pascal-VOC和MS Coco Datasets上定义的各种增量学习设置中评估了我们的方法,我们的方法对最先进的方法进行了好评。
translated by 谷歌翻译
Conventional training of a deep CNN based object detector demands a large number of bounding box annotations, which may be unavailable for rare categories. In this work we develop a few-shot object detector that can learn to detect novel objects from only a few annotated examples. Our proposed model leverages fully labeled base classes and quickly adapts to novel classes, using a meta feature learner and a reweighting module within a one-stage detection architecture. The feature learner extracts meta features that are generalizable to detect novel object classes, using training data from base classes with sufficient samples. The reweighting module transforms a few support examples from the novel classes to a global vector that indicates the importance or relevance of meta features for detecting the corresponding objects. These two modules, together with a detection prediction module, are trained end-to-end based on an episodic few-shot learning scheme and a carefully designed loss function. Through extensive experiments we demonstrate that our model outperforms well-established baselines by a large margin for few-shot object detection, on multiple datasets and settings. We also present analysis on various aspects of our proposed model, aiming to provide some inspiration for future few-shot detection works.
translated by 谷歌翻译
本文的目的是几次拍摄对象检测(FSOD) - 仅为新类别扩展对象探测器的任务仅给出了一些培训实例。我们介绍了一种简单的伪标签方法来源从训练集提供高质量的伪注释,因为每个新类别,大大增加培训实例的数量和减少类别的不平衡;我们的方法找到了先前未标记的实例。 NA \“IVELY培训使用模型预测产生了次优性能;我们提出了两种提高伪标签过程的精度的新方法:首先,我们引入了一种验证技术,以删除候选人检测,不正确的类标签;第二,我们训练一个专门的模型,可以纠正差的质量边界箱。在这两种新颖步骤之后,我们获得了一大集的高质量伪注释,允许我们的最终探测器培训结束到底。另外,我们展示了我们的方法维护基础类性能,以及FSOD中简单增强的实用性。在Pascal VOC和MS-Coco基准测试的同时,我们的方法与所有射击镜头的现有方法相比,实现了最先进的或第二个最佳性能。
translated by 谷歌翻译
几次拍摄对象检测(FSOD)仅定位并在图像中分类对象仅给出一些数据样本。最近的FSOD研究趋势显示了公制和元学习技术的采用,这易于灾难性的遗忘和课堂混乱。为了克服基于度量学习的FSOD技术的这些陷阱,我们介绍了引入引导的余弦余量(AGCM),这有助于在对象检测器的分类头中创建更严格和良好的分离类特征群集。我们的新型专注提案融合(APF)模块通过降低共同发生的课程中的阶级差异来最大限度地减少灾难性遗忘。与此同时,拟议的余弦保证金交叉熵损失增加了混淆课程之间的角度裕度,以克服已经学习(基地)和新添加(新)类的课堂混淆的挑战。我们对挑战印度驾驶数据集(IDD)进行了实验,这呈现了一个现实世界类别 - 不平衡的环境,与流行的FSOD基准Pascal-VOC相同。我们的方法优于最先进的(SOTA)在IDD-OS上最多可达6.4个地图点,并且在IDD-10上的2.0次映射点为10次拍摄设置。在Pascal-Voc数据集上,我们优先于现有的SOTA方法,最多可达4.9个地图点。
translated by 谷歌翻译
标记数据通常昂贵且耗时,特别是对于诸如对象检测和实例分割之类的任务,这需要对图像的密集标签进行密集的标签。虽然几张拍摄对象检测是关于培训小说中的模型(看不见的)对象类具有很少的数据,但它仍然需要在许多标记的基础(见)类的课程上进行训练。另一方面,自我监督的方法旨在从未标记数据学习的学习表示,该数据转移到诸如物体检测的下游任务。结合几次射击和自我监督的物体检测是一个有前途的研究方向。在本调查中,我们审查并表征了几次射击和自我监督对象检测的最新方法。然后,我们给我们的主要外卖,并讨论未来的研究方向。https://gabrielhuang.github.io/fsod-survey/的项目页面
translated by 谷歌翻译
少量对象检测(FSOD)旨在仅使用几个例子来检测对象。如何将最先进的对象探测器适应几个拍摄域保持挑战性。对象提案是现代物体探测器中的关键成分。然而,使用现有方法对于几张拍摄类生成的提案质量远远差,而不是许多拍摄类,例如,由于错误分类或不准确的空间位置而导致的少量拍摄类丢失的框。为了解决嘈杂的提案问题,我们通过联合优化几次提案生成和细粒度的少量提案分类,提出了一种新的Meta学习的FSOD模型。为了提高几张拍摄类的提议生成,我们建议学习基于轻量级的公制学习的原型匹配网络,而不是传统的简单线性对象/非目标分类器,例如,在RPN中使用。我们具有特征融合网络的非线性分类器可以提高鉴别性原型匹配和少拍摄类的提案回忆。为了提高细粒度的少量提案分类,我们提出了一种新的细节特征对准方法,以解决嘈杂的提案和少量拍摄类之间的空间未对准,从而提高了几次对象检测的性能。同时,我们学习一个单独的R-CNN检测头,用于多射击基础类,并表现出维护基础课程知识的强大性能。我们的模型在大多数射击和指标上实现了多个FSOD基准的最先进的性能。
translated by 谷歌翻译
弱监督对象检测(WSOD)旨在仅训练需要图像级注释的对象检测器。最近,一些作品设法选择了从训练有素的WSOD网络生成的准确框,以监督半监督的检测框架以提高性能。但是,这些方法只需根据图像级标准将设置的训练分为标记和未标记的集合,从而选择了足够的错误标记或错误的局部盒子预测作为伪基真正的真实性,从而产生了次优的检测性能解决方案。为了克服这个问题,我们提出了一个新颖的WSOD框架,其新范式从弱监督到嘈杂的监督(W2N)。通常,通过训练有素的WSOD网络产生的给定的伪基真实性,我们提出了一种两模块迭代训练算法来完善伪标签并逐步监督更好的对象探测器。在定位适应模块中,我们提出正规化损失,以减少原始伪基真实性中判别零件的比例,从而获得更好的伪基真实性,以进行进一步的训练。在半监督的模块中,我们提出了两个任务实例级拆分方法,以选择用于训练半监督检测器的高质量标签。不同基准测试的实验结果验证了W2N的有效性,我们的W2N优于所有现有的纯WSOD方法和转移学习方法。我们的代码可在https://github.com/1170300714/w2n_wsod上公开获得。
translated by 谷歌翻译
Semi-supervised object detection (SSOD) aims to boost detection performance by leveraging extra unlabeled data. The teacher-student framework has been shown to be promising for SSOD, in which a teacher network generates pseudo-labels for unlabeled data to assist the training of a student network. Since the pseudo-labels are noisy, filtering the pseudo-labels is crucial to exploit the potential of such framework. Unlike existing suboptimal methods, we propose a two-step pseudo-label filtering for the classification and regression heads in a teacher-student framework. For the classification head, OCL (Object-wise Contrastive Learning) regularizes the object representation learning that utilizes unlabeled data to improve pseudo-label filtering by enhancing the discriminativeness of the classification score. This is designed to pull together objects in the same class and push away objects from different classes. For the regression head, we further propose RUPL (Regression-Uncertainty-guided Pseudo-Labeling) to learn the aleatoric uncertainty of object localization for label filtering. By jointly filtering the pseudo-labels for the classification and regression heads, the student network receives better guidance from the teacher network for object detection task. Experimental results on Pascal VOC and MS-COCO datasets demonstrate the superiority of our proposed method with competitive performance compared to existing methods.
translated by 谷歌翻译
Open world object detection aims at detecting objects that are absent in the object classes of the training data as unknown objects without explicit supervision. Furthermore, the exact classes of the unknown objects must be identified without catastrophic forgetting of the previous known classes when the corresponding annotations of unknown objects are given incrementally. In this paper, we propose a two-stage training approach named Open World DETR for open world object detection based on Deformable DETR. In the first stage, we pre-train a model on the current annotated data to detect objects from the current known classes, and concurrently train an additional binary classifier to classify predictions into foreground or background classes. This helps the model to build an unbiased feature representations that can facilitate the detection of unknown classes in subsequent process. In the second stage, we fine-tune the class-specific components of the model with a multi-view self-labeling strategy and a consistency constraint. Furthermore, we alleviate catastrophic forgetting when the annotations of the unknown classes becomes available incrementally by using knowledge distillation and exemplar replay. Experimental results on PASCAL VOC and MS-COCO show that our proposed method outperforms other state-of-the-art open world object detection methods by a large margin.
translated by 谷歌翻译
对象检测在过去十年中取得了实质性进展。然而,只有少量样品检测新颖类仍然有挑战性,因为低数据制度下的深度学习通常会导致降级的特征空间。现有的作品采用整体微调范例来解决这个问题,其中模型首先在具有丰富样本的所有基类上进行预培训,然后它用于雕刻新颖的类特征空间。尽管如此,这个范例仍然不完美。微调,一个小型类可以隐含地利用多个基类的知识来构造其特征空间,它引起分散的特征空间,因此违反了级别的可分离性。为了克服这些障碍,我们提出了一系列两步的微调框架,通过关联和歧视(FADI),为每个新颖类带来了一个具有两个积分步骤的判别特征空间。 1)在关联步骤中,与隐式利用多个基类相反,我们通过显式模仿特定的基类特征空间来构造一个紧凑的新颖类别特征空间。具体地,我们根据其语义相似性将每个小组与基类联系起来。之后,新类的特征空间可以容易地模仿相关基类的良好训练的特征空间。 2)在歧视步骤中,为了确保新型类和相关基类之间的可分离性,我们解除了基础和新类的分类分支。为了进一步放大所有类之间的阶级间可分性,施加了专用的专用边缘损失。对Pascal VOC和MS-Coco Datasets的广泛实验表明FADI实现了新的SOTA性能,显着改善了任何拍摄/分裂的基线+18.7。值得注意的是,优势在极其镜头方案上最为宣布。
translated by 谷歌翻译
即使在几个例子中,人类能够学会识别新物品。相比之下,培训基于深度学习的对象探测器需要大量的注释数据。为避免需求获取和注释这些大量数据,但很少拍摄的对象检测旨在从目标域中的新类别的少数对象实例中学习。在本调查中,我们在几次拍摄对象检测中概述了本领域的状态。我们根据培训方案和建筑布局分类方法。对于每种类型的方法,我们描述了一般的实现以及提高新型类别性能的概念。在适当的情况下,我们在这些概念上给出短暂的外卖,以突出最好的想法。最终,我们介绍了常用的数据集及其评估协议,并分析了报告的基准结果。因此,我们强调了评估中的共同挑战,并确定了这种新兴对象检测领域中最有前景的电流趋势。
translated by 谷歌翻译
Open-set object detection (OSOD) aims to detect the known categories and identify unknown objects in a dynamic world, which has achieved significant attentions. However, previous approaches only consider this problem in data-abundant conditions, while neglecting the few-shot scenes. In this paper, we seek a solution for the few-shot open-set object detection (FSOSOD), which aims to quickly train a detector based on few samples while detecting all known classes and identifying unknown classes. The main challenge for this task is that few training samples induce the model to overfit on the known classes, resulting in a poor open-set performance. We propose a new FSOSOD algorithm to tackle this issue, named Few-shOt Open-set Detector (FOOD), which contains a novel class weight sparsification classifier (CWSC) and a novel unknown decoupling learner (UDL). To prevent over-fitting, CWSC randomly sparses parts of the normalized weights for the logit prediction of all classes, and then decreases the co-adaptability between the class and its neighbors. Alongside, UDL decouples training the unknown class and enables the model to form a compact unknown decision boundary. Thus, the unknown objects can be identified with a confidence probability without any pseudo-unknown samples for training. We compare our method with several state-of-the-art OSOD methods in few-shot scenes and observe that our method improves the recall of unknown classes by 5%-9% across all shots in VOC-COCO dataset setting.
translated by 谷歌翻译
弱监督的对象检测(WSOD)是一项任务,可使用仅在图像级注释上训练的模型来检测图像中的对象。当前的最新模型受益于自我监督的实例级别的监督,但是由于弱监督不包括计数或位置信息,因此最常见的``Argmax''标签方法通常忽略了许多对象实例。为了减轻此问题,我们提出了一种新颖的多个实例标记方法,称为对象发现。我们进一步在弱监督下引入了新的对比损失,在该监督下,没有实例级信息可用于采样,称为弱监督对比损失(WSCL)。WSCL旨在通过利用一致的功能来嵌入同一类中的向量来构建对象发现的可靠相似性阈值。结果,我们在2014年和2017年MS-Coco以及Pascal VOC 2012上取得了新的最新结果,并在Pascal VOC 2007上取得了竞争成果。
translated by 谷歌翻译
尽管对象检测方面取得了很大进展,但由于实例级边界盒注释所需的巨大人性化,大多数现有方法都仅限于一小一少量的对象类别。为了减轻问题,最近的开放词汇和零射击检测方法试图检测培训期间未见的对象类别。但是,这些方法仍然依赖于一组基类上手动提供的边界盒注释。我们提出了一个开放的词汇检测框架,可以在没有手动提供边界盒注释的情况下培训。我们的方法通过利用预先训练的视觉语言模型的本地化能力来实现这一目标,并产生可直接用于训练对象探测器的伪边界盒标签。 Coco,Pascal VOC,Objects365和LVIS的实验结果证明了我们方法的有效性。具体而言,我们的方法优于使用人类注释的边界箱训练的最先进(SOTA),即使我们的培训源未配备手动边界盒标签,也可以在COCO新型类别上用3%AP培训。在利用手动边界箱标签作为基线时,我们的方法主要超过8%的AP。
translated by 谷歌翻译
许多开放世界应用程序需要检测新的对象,但最先进的对象检测和实例分段网络在此任务中不屈服。关键问题在于他们假设没有任何注释的地区应被抑制为否定,这教导了将未经讨犯的对象视为背景的模型。为了解决这个问题,我们提出了一个简单但令人惊讶的强大的数据增强和培训方案,我们呼唤学习来检测每件事(LDET)。为避免抑制隐藏的对象,背景对象可见但未标记,我们粘贴在从原始图像的小区域采样的背景图像上粘贴带有的注释对象。由于仅对这种综合增强的图像培训遭受域名,我们将培训与培训分为两部分:1)培训区域分类和回归头在增强图像上,2)在原始图像上训练掩模头。通过这种方式,模型不学习将隐藏对象作为背景分类,同时概括到真实图像。 LDET导致开放式世界实例分割任务中的许多数据集的重大改进,表现出CoCo上的交叉类别概括的基线,以及对UVO和城市的交叉数据集评估。
translated by 谷歌翻译