智能论文笔记

Learning High-quality Proposals for Acne Detection

Jianwei Zhang , Lei Zhang , Junyou Wang , Xin Wei , Jiaqi Li , Xian Jiang , Dan Du

分类：计算机视觉

2022-07-08

痤疮检测对于解释性诊断和对皮肤疾病的精确治疗至关重要。任意边界和痤疮病变的尺寸较小，导致在两阶段检测中大量质量较差的建议。在本文中，我们提出了一个针对地区建议网络的新型头部结构，以两种方式提高建议的质量。首先，提出了一个空间意识的双头（SADH）结构，以从两个不同的空间角度从分类和本地化进行分类和本地化的表示。拟议的SADH确保了更陡峭的分类信心梯度，并抑制了与匹配的地面真理相交（IOU）低相交（IOU）的建议。然后，我们提出了一个归一化的Wasserstein距离预测分支，以改善提议分类评分与IOU之间的相关性。此外，为了促进痤疮检测的进一步研究，我们构建了一个名为Acnescu的新数据集，具有高分辨率成像，精确的注释和细粒度的病变类别。对AcnesCU和公共数据集Acne04进行了广泛的实验，结果表明该方法可以提高建议的质量，始终超过最先进的方法。代码和收集的数据集可在https://github.com/pingguokiller/acnedetection中找到。

translated by 谷歌翻译

Mask Scoring R-CNN

Zhaojin Huang , Lichao Huang , Yongchao Gong , Chang Huang , Xinggang Wang

分类：

2019-03-01

Letting a deep network be aware of the quality of its own predictions is an interesting yet important problem. In the task of instance segmentation, the confidence of instance classification is used as mask quality score in most instance segmentation frameworks. However, the mask quality, quantified as the IoU between the instance mask and its ground truth, is usually not well correlated with classification score. In this paper, we study this problem and propose Mask Scoring R-CNN which contains a network block to learn the quality of the predicted instance masks. The proposed network block takes the instance feature and the corresponding predicted mask together to regress the mask IoU. The mask scoring strategy calibrates the misalignment between mask quality and mask score, and improves instance segmentation performance by prioritizing more accurate mask predictions during COCO AP evaluation. By extensive evaluations on the COCO dataset, Mask Scoring R-CNN brings consistent and noticeable gain with different models, and outperforms the state-of-the-art Mask R-CNN. We hope our simple and effective approach will provide a new direction for improving instance segmentation. The source code of our method is available at https:// github.com/zjhuang22/maskscoring_rcnn. * The work was done when Zhaojin Huang was an intern in Horizon Robotics Inc.

translated by 谷歌翻译

One-Stage Cascade Refinement Networks for Infrared Small Target Detection

Yimian Dai , Xiang Li , Fei Zhou , Yulei Qian , Yaohong Chen , Jian Yang

分类：计算机视觉

2022-12-16

Single-frame InfraRed Small Target (SIRST) detection has been a challenging task due to a lack of inherent characteristics, imprecise bounding box regression, a scarcity of real-world datasets, and sensitive localization evaluation. In this paper, we propose a comprehensive solution to these challenges. First, we find that the existing anchor-free label assignment method is prone to mislabeling small targets as background, leading to their omission by detectors. To overcome this issue, we propose an all-scale pseudo-box-based label assignment scheme that relaxes the constraints on scale and decouples the spatial assignment from the size of the ground-truth target. Second, motivated by the structured prior of feature pyramids, we introduce the one-stage cascade refinement network (OSCAR), which uses the high-level head as soft proposals for the low-level refinement head. This allows OSCAR to process the same target in a cascade coarse-to-fine manner. Finally, we present a new research benchmark for infrared small target detection, consisting of the SIRST-V2 dataset of real-world, high-resolution single-frame targets, the normalized contrast evaluation metric, and the DeepInfrared toolkit for detection. We conduct extensive ablation studies to evaluate the components of OSCAR and compare its performance to state-of-the-art model-driven and data-driven methods on the SIRST-V2 benchmark. Our results demonstrate that a top-down cascade refinement framework can improve the accuracy of infrared small target detection without sacrificing efficiency. The DeepInfrared toolkit, dataset, and trained models are available at https://github.com/YimianDai/open-deepinfrared to advance further research in this field.

translated by 谷歌翻译

Boosting R-CNN: Reweighting R-CNN Samples by RPN's Error for Underwater Object Detection

Pinhao Song , Hong Liu , Linhui Dai , Tao Wang , Zhan Chen

分类：计算机视觉

2022-06-28

复杂的水下环境为物体检测带来了新的挑战，例如未平衡的光条件，低对比度，阻塞和水生生物的模仿。在这种情况下，水下相机捕获的物体将变得模糊，并且通用探测器通常会在这些模糊的物体上失败。这项工作旨在从两个角度解决问题：不确定性建模和艰难的例子采矿。我们提出了一个名为Boosting R-CNN的两阶段水下检测器，该检测器包括三个关键组件。首先，提出了一个名为RetinArpn的新区域建议网络，该网络提供了高质量的建议，并考虑了对象和IOU预测，以确定对象事先概率的不确定性。其次，引入了概率推理管道，以结合第一阶段的先验不确定性和第二阶段分类评分，以模拟最终检测分数。最后，我们提出了一种名为Boosting Reweighting的新的硬示例挖掘方法。具体而言，当区域提案网络误认为样品的对象的事先概率时，提高重新加权将在训练过程中增加R-CNN头部样品的分类损失，同时减少具有准确估计的先验的简易样品丢失。因此，可以在第二阶段获得强大的检测头。在推理阶段，R-CNN具有纠正第一阶段的误差以提高性能的能力。在两个水下数据集和两个通用对象检测数据集上进行的全面实验证明了我们方法的有效性和鲁棒性。

translated by 谷歌翻译

UniInst: Unique Representation for End-to-End Instance Segmentation

Yimin Ou , Rui Yang , Lufan Ma , Yong Liu , Jiangpeng Yan , Shang Xu , Chengjie Wang , Xiu Li

分类：计算机视觉 | 人工智能

2022-05-25

现有的实例分割方法已经达到了令人印象深刻的表现，但仍遭受了共同的困境：一个实例推断出冗余表示（例如，多个框，网格和锚点），这导致了多个重复的预测。因此，主流方法通常依赖于手工设计的非最大抑制（NMS）后处理步骤来选择最佳预测结果，这会阻碍端到端训练。为了解决此问题，我们建议一个称为Uniinst的无盒和无端机实例分割框架，该框架仅对每个实例产生一个唯一的表示。具体而言，我们设计了一种实例意识到的一对一分配方案，即仅产生一个表示（Oyor），该方案根据预测和地面真相之间的匹配质量，动态地为每个实例动态分配一个独特的表示。然后，一种新颖的预测重新排列策略被优雅地集成到框架中，以解决分类评分和掩盖质量之间的错位，从而使学习的表示形式更具歧视性。借助这些技术，我们的Uniinst，第一个基于FCN的盒子和无NMS实例分段框架，实现竞争性能，例如，使用Resnet-50-FPN和40.2 mask AP使用Resnet-101-FPN，使用Resnet-50-FPN和40.2 mask AP，使用Resnet-101-FPN，对抗AP可可测试-DEV的主流方法。此外，提出的实例感知方法对于遮挡场景是可靠的，在重锁定的ochuman基准上，通过杰出的掩码AP优于公共基线。我们的代码将在出版后提供。

translated by 谷歌翻译

A Normalized Gaussian Wasserstein Distance for Tiny Object Detection

Jinwang Wang , Chang Xu , Wen Yang , Lei Yu

分类：计算机视觉

2021-10-26

检测微小的物体是一个非常具有挑战性的问题，因为一个小物体只包含几个像素的大小。我们证明，由于缺乏外观信息，最新的检测器不会对微小物体产生令人满意的结果。我们的主要观察结果是，基于联合（IOU）的相交（例如IOU本身及其扩展）对微小物体的位置偏差非常敏感，并且在基于锚固的检测器中使用时会大大恶化检测性能。为了减轻这一点，我们提出了使用Wasserstein距离进行微小对象检测的新评估度量。具体而言，我们首先将边界框建模为2D高斯分布，然后提出一个新的公制称为标准化的瓦斯汀距离（NWD），以通过相应的高斯分布来计算它们之间的相似性。提出的NWD度量可以轻松地嵌入分配中，非最大抑制作用以及任何基于锚固的检测器的损耗函数，以替换常用的IOU度量。我们在新的数据集上评估了我们的度量，以用于微小对象检测（AI-TOD），其中平均对象大小比现有对象检测数据集小得多。广泛的实验表明，在配备NWD指标时，我们的方法的性能比标准的微调基线高6.7 AP点，并且比最先进的竞争对手高6.0 AP点。代码可在以下网址提供：https：//github.com/jwwangchn/nwd。

translated by 谷歌翻译

Cascade R-CNN: High Quality Object Detection and Instance Segmentation

Zhaowei Cai , Nuno Vasconcelos

分类：

2019-06-24

In object detection, the intersection over union (IoU) threshold is frequently used to define positives/negatives. The threshold used to train a detector defines its quality. While the commonly used threshold of 0.5 leads to noisy (low-quality) detections, detection performance frequently degrades for larger thresholds. This paradox of high-quality detection has two causes: 1) overfitting, due to vanishing positive samples for large thresholds, and 2) inference-time quality mismatch between detector and test hypotheses. A multi-stage object detection architecture, the Cascade R-CNN, composed of a sequence of detectors trained with increasing IoU thresholds, is proposed to address these problems. The detectors are trained sequentially, using the output of a detector as training set for the next. This resampling progressively improves hypotheses quality, guaranteeing a positive training set of equivalent size for all detectors and minimizing overfitting. The same cascade is applied at inference, to eliminate quality mismatches between hypotheses and detectors. An implementation of the Cascade R-CNN without bells or whistles achieves state-of-the-art performance on the COCO dataset, and significantly improves high-quality detection on generic and specific object detection datasets, including VOC, KITTI, CityPerson, and WiderFace. Finally, the Cascade R-CNN is generalized to instance segmentation, with nontrivial improvements over the Mask R-CNN. To facilitate future research, two implementations are made available at https://github.com/zhaoweicai/cascade-rcnn (Caffe) and https://github.com/zhaoweicai/Detectron-Cascade-RCNN (Detectron).

translated by 谷歌翻译

Robust Table Detection and Structure Recognition from Heterogeneous Document Images

Chixiang Ma , Weihong Lin , Lei Sun , Qiang Huo

分类：计算机视觉

2022-03-17

我们介绍了一种名为RobustAbnet的新表检测和结构识别方法，以检测表的边界并从异质文档图像中重建每个表的细胞结构。为了进行表检测，我们建议将Cornernet用作新的区域建议网络来生成更高质量的表建议，以更快的R-CNN，这显着提高了更快的R-CNN的定位准确性以进行表检测。因此，我们的表检测方法仅使用轻巧的RESNET-18骨干网络，在三个公共表检测基准（即CTDAR TRACKA，PUBLAYNET和IIIT-AR-13K）上实现最新性能。此外，我们提出了一种新的基于分裂和合并的表结构识别方法，其中提出了一个新型的基于CNN的新空间CNN分离线预测模块将每个检测到的表分为单元格，并且基于网格CNN的CNN合并模块是应用用于恢复生成细胞。由于空间CNN模块可以有效地在整个表图像上传播上下文信息，因此我们的表结构识别器可以坚固地识别具有较大的空白空间和几何扭曲（甚至弯曲）表的表。得益于这两种技术，我们的表结构识别方法在包括SCITSR，PubTabnet和CTDAR TrackB2-Modern在内的三个公共基准上实现了最先进的性能。此外，我们进一步证明了我们方法在识别具有复杂结构，大空间以及几何扭曲甚至弯曲形状的表上的表格上的优势。

translated by 谷歌翻译

AIParsing: Anchor-free Instance-level Human Parsing

Sanyi Zhang , Xiaochun Cao , Guo-Jun Qi , Zhanjie Song , Jie Zhou

分类：计算机视觉

2022-07-14

大多数最先进的实例级人类解析模型都采用了两阶段的基于锚的探测器，因此无法避免启发式锚盒设计和像素级别缺乏分析。为了解决这两个问题，我们设计了一个实例级人类解析网络，该网络在像素级别上无锚固且可解决。它由两个简单的子网络组成：一个用于边界框预测的无锚检测头和一个用于人体分割的边缘引导解析头。无锚探测器的头继承了像素样的优点，并有效地避免了对象检测应用中证明的超参数的敏感性。通过引入部分感知的边界线索，边缘引导的解析头能够将相邻的人类部分与彼此区分开，最多可在一个人类实例中，甚至重叠的实例。同时，利用了精炼的头部整合盒子级别的分数和部分分析质量，以提高解析结果的质量。在两个多个人类解析数据集（即CIHP和LV-MHP-V2.0）和一个视频实例级人类解析数据集（即VIP）上进行实验，表明我们的方法实现了超过全球级别和实例级别的性能最新的一阶段自上而下的替代方案。

translated by 谷歌翻译

Feature Pyramid Networks for Object Detection

Tsung-Yi Lin , Piotr Dollár , Ross Girshick , Kaiming He , Bharath Hariharan , Serge Belongie

分类：

2016-12-09

Feature pyramids are a basic component in recognition systems for detecting objects at different scales. But recent deep learning object detectors have avoided pyramid representations, in part because they are compute and memory intensive. In this paper, we exploit the inherent multi-scale, pyramidal hierarchy of deep convolutional networks to construct feature pyramids with marginal extra cost. A topdown architecture with lateral connections is developed for building high-level semantic feature maps at all scales. This architecture, called a Feature Pyramid Network (FPN), shows significant improvement as a generic feature extractor in several applications. Using FPN in a basic Faster R-CNN system, our method achieves state-of-the-art singlemodel results on the COCO detection benchmark without bells and whistles, surpassing all existing single-model entries including those from the COCO 2016 challenge winners. In addition, our method can run at 6 FPS on a GPU and thus is a practical and accurate solution to multi-scale object detection. Code will be made publicly available.

translated by 谷歌翻译

Anchor Retouching via Model Interaction for Robust Object Detection in Aerial Images

Dong Liang , Qixiang Geng , Zongqi Wei , Dmitry A. Vorontsov , Ekaterina L. Kim , Mingqiang Wei , Huiyu Zhou

分类：计算机视觉

2021-12-13

物体检测在计算机视觉中取得了巨大的进步。具有外观降级的小物体检测是一个突出的挑战，特别是对于鸟瞰观察。为了收集足够的阳性/阴性样本进行启发式训练，大多数物体探测器预设区域锚，以便将交叉联盟（iou）计算在地面判处符号数据上。在这种情况下，小物体经常被遗弃或误标定。在本文中，我们提出了一种有效的动态增强锚（DEA）网络，用于构建新颖的训练样本发生器。与其他最先进的技术不同，所提出的网络利用样品鉴别器来实现基于锚的单元和无锚单元之间的交互式样本筛选，以产生符合资格的样本。此外，通过基于保守的基于锚的推理方案的多任务联合训练增强了所提出的模型的性能，同时降低计算复杂性。所提出的方案支持定向和水平对象检测任务。对两个具有挑战性的空中基准（即，DotA和HRSC2016）的广泛实验表明，我们的方法以适度推理速度和用于训练的计算开销的准确性实现最先进的性能。在DotA上，我们的DEA-NET与ROI变压器的基线集成了0.40％平均平均精度（MAP）的先进方法，以便用较弱的骨干网（Resnet-101 VS Resnet-152）和3.08％平均 - 平均精度（MAP），具有相同骨干网的水平对象检测。此外，我们的DEA网与重新排列的基线一体化实现最先进的性能80.37％。在HRSC2016上，它仅使用3个水平锚点超过1.1％的最佳型号。

translated by 谷歌翻译

FCOS: Fully Convolutional One-Stage Object Detection

Zhi Tian , Chunhua Shen , Hao Chen , Tong He

分类：

2019-04-02

We propose a fully convolutional one-stage object detector (FCOS) to solve object detection in a per-pixel prediction fashion, analogue to semantic segmentation. Almost all state-of-the-art object detectors such as RetinaNet, SSD, YOLOv3, and Faster R-CNN rely on pre-defined anchor boxes. In contrast, our proposed detector FCOS is anchor box free, as well as proposal free. By eliminating the predefined set of anchor boxes, FCOS completely avoids the complicated computation related to anchor boxes such as calculating overlapping during training. More importantly, we also avoid all hyper-parameters related to anchor boxes, which are often very sensitive to the final detection performance. With the only post-processing non-maximum suppression (NMS), FCOS with ResNeXt-64x4d-101 achieves 44.7% in AP with single-model and single-scale testing, surpassing previous one-stage detectors with the advantage of being much simpler. For the first time, we demonstrate a much simpler and flexible detection framework achieving improved detection accuracy. We hope that the proposed FCOS framework can serve as a simple and strong alternative for many other instance-level tasks. Code is available at:tinyurl.com/FCOSv1

translated by 谷歌翻译

ObjectBox: From Centers to Boxes for Anchor-Free Object Detection

Mohsen Zand , Ali Etemad , Michael Greenspan

分类：计算机视觉

2022-07-14

我们提出对象盒，这是一种新颖的单阶段锚定且高度可推广的对象检测方法。与现有的基于锚固的探测器和无锚的探测器相反，它们更偏向于其标签分配中的特定对象量表，我们仅将对象中心位置用作正样本，并在不同的特征级别中平均处理所有对象，而不论对象'尺寸或形状。具体而言，我们的标签分配策略将对象中心位置视为形状和尺寸不足的锚定，并以无锚固的方式锚定，并允许学习每个对象的所有尺度。为了支持这一点，我们将新的回归目标定义为从中心单元位置的两个角到边界框的四个侧面的距离。此外，为了处理比例变化的对象，我们提出了一个量身定制的损失来处理不同尺寸的盒子。结果，我们提出的对象检测器不需要在数据集中调整任何依赖数据集的超参数。我们在MS-Coco 2017和Pascal VOC 2012数据集上评估了我们的方法，并将我们的结果与最先进的方法进行比较。我们观察到，与先前的作品相比，对象盒的性能优惠。此外，我们执行严格的消融实验来评估我们方法的不同组成部分。我们的代码可在以下网址提供：https：//github.com/mohsenzand/objectbox。

translated by 谷歌翻译

Detecting tiny objects in aerial images: A normalized Wasserstein distance and a new benchmark

Chang Xu , Jinwang Wang , Wen Yang , Huai Yu , Lei Yu , Gui-Song Xia

分类：计算机视觉

2022-06-28

航空图像中的微小对象检测（TOD）是具有挑战性的，因为一个小物体只包含几个像素。最先进的对象探测器由于缺乏判别特征的监督而无法为微小对象提供令人满意的结果。我们的主要观察结果是，联合度量（IOU）及其扩展的相交对微小物体的位置偏差非常敏感，这在基于锚固的探测器中使用时会大大恶化标签分配的质量。为了解决这个问题，我们提出了一种新的评估度量标准，称为标准化的Wasserstein距离（NWD）和一个新的基于排名的分配（RKA）策略，以进行微小对象检测。提出的NWD-RKA策略可以轻松地嵌入到各种基于锚的探测器中，以取代标准的基于阈值的检测器，从而大大改善了标签分配并为网络培训提供了足够的监督信息。在四个数据集中测试，NWD-RKA可以始终如一地提高微小的对象检测性能。此外，在空中图像（AI-TOD）数据集中观察到显着的嘈杂标签，我们有动力将其重新标记并释放AI-TOD-V2及其相应的基准。在AI-TOD-V2中，丢失的注释和位置错误问题得到了大大减轻，从而促进了更可靠的培训和验证过程。将NWD-RKA嵌入探测器中，检测性能比AI-TOD-V2上的最先进竞争对手提高了4.3个AP点。数据集，代码和更多可视化可在以下网址提供：https：//chasel-tsui.github.io/ai/ai-tod-v2/

translated by 谷歌翻译

Object Detection with Deep Learning: A Review

Zhong-Qiu Zhao , Peng Zheng , Shou-tao Xu , Xindong Wu

分类：

2018-07-15

Due to object detection's close relationship with video analysis and image understanding, it has attracted much research attention in recent years. Traditional object detection methods are built on handcrafted features and shallow trainable architectures. Their performance easily stagnates by constructing complex ensembles which combine multiple low-level image features with high-level context from object detectors and scene classifiers. With the rapid development in deep learning, more powerful tools, which are able to learn semantic, high-level, deeper features, are introduced to address the problems existing in traditional architectures. These models behave differently in network architecture, training strategy and optimization function, etc. In this paper, we provide a review on deep learning based object detection frameworks. Our review begins with a brief introduction on the history of deep learning and its representative tool, namely Convolutional Neural Network (CNN). Then we focus on typical generic object detection architectures along with some modifications and useful tricks to improve detection performance further. As distinct specific detection tasks exhibit different characteristics, we also briefly survey several specific tasks, including salient object detection, face detection and pedestrian detection. Experimental analyses are also provided to compare various methods and draw some meaningful conclusions. Finally, several promising directions and tasks are provided to serve as guidelines for future work in both object detection and relevant neural network based learning systems.

translated by 谷歌翻译

DSLA: Dynamic smooth label assignment for efficient anchor-free object detection

Hu Su , Yonghao He , Jiabin Zhang , Wei Zou , Bin Fan

分类：计算机视觉

2022-08-01

无锚的检测器基本上将对象检测作为密集的分类和回归。对于流行的无锚检测器，通常是引入单个预测分支来估计本地化的质量。当我们深入研究分类和质量估计的实践时，会观察到以下不一致之处。首先，对于某些分配了完全不同标签的相邻样品，训练有素的模型将产生相似的分类分数。这违反了训练目标并导致绩效退化。其次，发现检测到具有较高信心的边界框与相应的地面真相具有较小的重叠。准确的局部边界框将被非最大抑制（NMS）过程中的精确量抑制。为了解决不一致问题，提出了动态平滑标签分配（DSLA）方法。基于最初在FCO中开发的中心概念，提出了平稳的分配策略。在[0，1]中将标签平滑至连续值，以在正样品和负样品之间稳定过渡。联合（IOU）在训练过程中会动态预测，并与平滑标签结合。分配动态平滑标签以监督分类分支。在这样的监督下，质量估计分支自然合并为分类分支，这简化了无锚探测器的体系结构。全面的实验是在MS Coco基准上进行的。已经证明，DSLA可以通过减轻上述无锚固探测器的不一致来显着提高检测准确性。我们的代码在https://github.com/yonghaohe/dsla上发布。

translated by 谷歌翻译

Extending One-Stage Detection with Open-World Proposals

Sachin Konan , Kevin J Liang , Li Yin

分类：计算机视觉

2022-01-07

在许多应用中，例如自主驾驶，手动操作或机器人导航，对象检测方法必须能够检测训练集中的对象。开放世界检测（OWD）旨在通过概括检测性能和看不见的类类别来解决这个问题。最近的作品在呼吁开放世界的建议（OWP）时，最近的作品已经取得了成功，但这是在检测模型中考虑两项任务时分类任务的大幅下降的成本。这些作品通过利用对象得分提示来调查两级区域提案网络（RPN）;然而，对于本地化和分类的简单性，运行时间和解耦，我们通过诸如FCOS的完全卷积的一级检测网络的镜头来调查OWP。我们认为，我们对FCO的建筑和采样优化可以通过在新颖类别上召回的召回，标志着一个免费的一级单级检测网络，以实现对基于RPN的两级网络的可比性性能，可以增加OWP性能。此外，我们表明FCO的固有，解耦架构具有保留分类性能的好处。虽然两阶段方法在新颖的课程中召回6％时，我们表明FCO仅在为OWP和分类中优化时才会下降2％。

translated by 谷歌翻译

Acquisition of Localization Confidence for Accurate Object Detection

Borui Jiang , Ruixuan Luo , Jiayuan Mao , Tete Xiao , Yuning Jiang

分类：

2018-07-30

Modern CNN-based object detectors rely on bounding box regression and non-maximum suppression to localize objects. While the probabilities for class labels naturally reflect classification confidence, localization confidence is absent. This makes properly localized bounding boxes degenerate during iterative regression or even suppressed during NMS. In the paper we propose IoU-Net learning to predict the IoU between each detected bounding box and the matched ground-truth. The network acquires this confidence of localization, which improves the NMS procedure by preserving accurately localized bounding boxes. Furthermore, an optimization-based bounding box refinement method is proposed, where the predicted IoU is formulated as the objective. Extensive experiments on the MS-COCO dataset show the effectiveness of IoU-Net, as well as its compatibility with and adaptivity to several state-of-the-art object detectors.

translated by 谷歌翻译

PDNet: Toward Better One-Stage Object Detection With Prediction Decoupling

Li Yang , Yan Xu , Shaoru Wang , Chunfeng Yuan , Ziqi Zhang , Bing Li , Weiming Hu

分类：计算机视觉

2021-04-28

Recent one-stage object detectors follow a per-pixel prediction approach that predicts both the object category scores and boundary positions from every single grid location. However, the most suitable positions for inferring different targets, i.e., the object category and boundaries, are generally different. Predicting all these targets from the same grid location thus may lead to sub-optimal results. In this paper, we analyze the suitable inference positions for object category and boundaries, and propose a prediction-target-decoupled detector named PDNet to establish a more flexible detection paradigm. Our PDNet with the prediction decoupling mechanism encodes different targets separately in different locations. A learnable prediction collection module is devised with two sets of dynamic points, i.e., dynamic boundary points and semantic points, to collect and aggregate the predictions from the favorable regions for localization and classification. We adopt a two-step strategy to learn these dynamic point positions, where the prior positions are estimated for different targets first, and the network further predicts residual offsets to the positions with better perceptions of the object properties. Extensive experiments on the MS COCO benchmark demonstrate the effectiveness and efficiency of our method. With a single ResNeXt-64x4d-101-DCN as the backbone, our detector achieves 50.1 AP with single-scale testing, which outperforms the state-of-the-art methods by an appreciable margin under the same experimental settings.Moreover, our detector is highly efficient as a one-stage framework. Our code is public at https://github.com/yangli18/PDNet.

translated by 谷歌翻译

Point-to-Box Network for Accurate Object Detection via Single Point Supervision

Pengfei Chen , Xuehui Yu , Xumeng Han , Najmul Hassan , Kai Wang , Jiachen Li , Jian Zhao , Humphrey Shi , Zhenjun Han , Qixiang Ye

分类：计算机视觉

2022-07-14

多年来，使用单点监督的对象检测受到了越来越多的关注。在本文中，我们将如此巨大的性能差距归因于产生高质量的提案袋的失败，这对于多个实例学习至关重要（MIL）。为了解决这个问题，我们引入了现成建议方法（OTSP）方法的轻量级替代方案，从而创建点对点网络（P2BNET），该网络可以通过在中生成建议袋来构建一个互平衡的提案袋一种锚点。通过充分研究准确的位置信息，P2BNET进一步构建了一个实例级袋，避免了多个物体的混合物。最后，以级联方式进行的粗到精细政策用于改善提案和地面真相（GT）之间的IOU。从这些策略中受益，P2BNET能够生产出高质量的实例级袋以进行对象检测。相对于MS可可数据集中的先前最佳PSOD方法，P2BNET将平均平均精度（AP）提高了50％以上。它还证明了弥合监督和边界盒监督检测器之间的性能差距的巨大潜力。该代码将在github.com/ucas-vg/p2bnet上发布。

translated by 谷歌翻译