智能论文笔记

A DCNN-based Arbitrarily-Oriented Object Detector for Quality Control and Inspection Application

Kai Yao , Alberto Ortiz , Francisco Bonnin-Pascual

分类：计算机视觉

2021-01-19

遵循机器视觉系统在线自动化质量控制和检查过程的成功之后，这项工作中为两个不同的特定应用提供了一种对象识别解决方案，即，在医院准备在医院进行消毒的手术工具箱中检测质量控制项目，以及检测血管船体中的缺陷，以防止潜在的结构故障。该解决方案有两个阶段。首先，基于单镜头多伯克斯检测器（SSD）的特征金字塔体系结构用于改善检测性能，并采用基于地面真实的统计分析来选择一系列默认框的参数。其次，利用轻量级神经网络使用回归方法来实现定向检测结果。该方法的第一阶段能够检测两种情况下考虑的小目标。在第二阶段，尽管很简单，但在保持较高的运行效率的同时，检测细长目标是有效的。

translated by 谷歌翻译

Object Detection with Deep Learning: A Review

Zhong-Qiu Zhao , Peng Zheng , Shou-tao Xu , Xindong Wu

分类：

2018-07-15

Due to object detection's close relationship with video analysis and image understanding, it has attracted much research attention in recent years. Traditional object detection methods are built on handcrafted features and shallow trainable architectures. Their performance easily stagnates by constructing complex ensembles which combine multiple low-level image features with high-level context from object detectors and scene classifiers. With the rapid development in deep learning, more powerful tools, which are able to learn semantic, high-level, deeper features, are introduced to address the problems existing in traditional architectures. These models behave differently in network architecture, training strategy and optimization function, etc. In this paper, we provide a review on deep learning based object detection frameworks. Our review begins with a brief introduction on the history of deep learning and its representative tool, namely Convolutional Neural Network (CNN). Then we focus on typical generic object detection architectures along with some modifications and useful tricks to improve detection performance further. As distinct specific detection tasks exhibit different characteristics, we also briefly survey several specific tasks, including salient object detection, face detection and pedestrian detection. Experimental analyses are also provided to compare various methods and draw some meaningful conclusions. Finally, several promising directions and tasks are provided to serve as guidelines for future work in both object detection and relevant neural network based learning systems.

translated by 谷歌翻译

One-Stage Cascade Refinement Networks for Infrared Small Target Detection

Yimian Dai , Xiang Li , Fei Zhou , Yulei Qian , Yaohong Chen , Jian Yang

分类：计算机视觉

2022-12-16

Single-frame InfraRed Small Target (SIRST) detection has been a challenging task due to a lack of inherent characteristics, imprecise bounding box regression, a scarcity of real-world datasets, and sensitive localization evaluation. In this paper, we propose a comprehensive solution to these challenges. First, we find that the existing anchor-free label assignment method is prone to mislabeling small targets as background, leading to their omission by detectors. To overcome this issue, we propose an all-scale pseudo-box-based label assignment scheme that relaxes the constraints on scale and decouples the spatial assignment from the size of the ground-truth target. Second, motivated by the structured prior of feature pyramids, we introduce the one-stage cascade refinement network (OSCAR), which uses the high-level head as soft proposals for the low-level refinement head. This allows OSCAR to process the same target in a cascade coarse-to-fine manner. Finally, we present a new research benchmark for infrared small target detection, consisting of the SIRST-V2 dataset of real-world, high-resolution single-frame targets, the normalized contrast evaluation metric, and the DeepInfrared toolkit for detection. We conduct extensive ablation studies to evaluate the components of OSCAR and compare its performance to state-of-the-art model-driven and data-driven methods on the SIRST-V2 benchmark. Our results demonstrate that a top-down cascade refinement framework can improve the accuracy of infrared small target detection without sacrificing efficiency. The DeepInfrared toolkit, dataset, and trained models are available at https://github.com/YimianDai/open-deepinfrared to advance further research in this field.

translated by 谷歌翻译

Anchor Retouching via Model Interaction for Robust Object Detection in Aerial Images

Dong Liang , Qixiang Geng , Zongqi Wei , Dmitry A. Vorontsov , Ekaterina L. Kim , Mingqiang Wei , Huiyu Zhou

分类：计算机视觉

2021-12-13

物体检测在计算机视觉中取得了巨大的进步。具有外观降级的小物体检测是一个突出的挑战，特别是对于鸟瞰观察。为了收集足够的阳性/阴性样本进行启发式训练，大多数物体探测器预设区域锚，以便将交叉联盟（iou）计算在地面判处符号数据上。在这种情况下，小物体经常被遗弃或误标定。在本文中，我们提出了一种有效的动态增强锚（DEA）网络，用于构建新颖的训练样本发生器。与其他最先进的技术不同，所提出的网络利用样品鉴别器来实现基于锚的单元和无锚单元之间的交互式样本筛选，以产生符合资格的样本。此外，通过基于保守的基于锚的推理方案的多任务联合训练增强了所提出的模型的性能，同时降低计算复杂性。所提出的方案支持定向和水平对象检测任务。对两个具有挑战性的空中基准（即，DotA和HRSC2016）的广泛实验表明，我们的方法以适度推理速度和用于训练的计算开销的准确性实现最先进的性能。在DotA上，我们的DEA-NET与ROI变压器的基线集成了0.40％平均平均精度（MAP）的先进方法，以便用较弱的骨干网（Resnet-101 VS Resnet-152）和3.08％平均 - 平均精度（MAP），具有相同骨干网的水平对象检测。此外，我们的DEA网与重新排列的基线一体化实现最先进的性能80.37％。在HRSC2016上，它仅使用3个水平锚点超过1.1％的最佳型号。

translated by 谷歌翻译

ObjectBox: From Centers to Boxes for Anchor-Free Object Detection

Mohsen Zand , Ali Etemad , Michael Greenspan

分类：计算机视觉

2022-07-14

我们提出对象盒，这是一种新颖的单阶段锚定且高度可推广的对象检测方法。与现有的基于锚固的探测器和无锚的探测器相反，它们更偏向于其标签分配中的特定对象量表，我们仅将对象中心位置用作正样本，并在不同的特征级别中平均处理所有对象，而不论对象'尺寸或形状。具体而言，我们的标签分配策略将对象中心位置视为形状和尺寸不足的锚定，并以无锚固的方式锚定，并允许学习每个对象的所有尺度。为了支持这一点，我们将新的回归目标定义为从中心单元位置的两个角到边界框的四个侧面的距离。此外，为了处理比例变化的对象，我们提出了一个量身定制的损失来处理不同尺寸的盒子。结果，我们提出的对象检测器不需要在数据集中调整任何依赖数据集的超参数。我们在MS-Coco 2017和Pascal VOC 2012数据集上评估了我们的方法，并将我们的结果与最先进的方法进行比较。我们观察到，与先前的作品相比，对象盒的性能优惠。此外，我们执行严格的消融实验来评估我们方法的不同组成部分。我们的代码可在以下网址提供：https：//github.com/mohsenzand/objectbox。

translated by 谷歌翻译

FCOS: Fully Convolutional One-Stage Object Detection

Zhi Tian , Chunhua Shen , Hao Chen , Tong He

分类：

2019-04-02

We propose a fully convolutional one-stage object detector (FCOS) to solve object detection in a per-pixel prediction fashion, analogue to semantic segmentation. Almost all state-of-the-art object detectors such as RetinaNet, SSD, YOLOv3, and Faster R-CNN rely on pre-defined anchor boxes. In contrast, our proposed detector FCOS is anchor box free, as well as proposal free. By eliminating the predefined set of anchor boxes, FCOS completely avoids the complicated computation related to anchor boxes such as calculating overlapping during training. More importantly, we also avoid all hyper-parameters related to anchor boxes, which are often very sensitive to the final detection performance. With the only post-processing non-maximum suppression (NMS), FCOS with ResNeXt-64x4d-101 achieves 44.7% in AP with single-model and single-scale testing, surpassing previous one-stage detectors with the advantage of being much simpler. For the first time, we demonstrate a much simpler and flexible detection framework achieving improved detection accuracy. We hope that the proposed FCOS framework can serve as a simple and strong alternative for many other instance-level tasks. Code is available at:tinyurl.com/FCOSv1

translated by 谷歌翻译

Computer Vision on X-ray Data in Industrial Production and Security Applications: A survey

Mehdi Rafiei , Jenni Raitoharju , Alexandros Iosifidis

分类：计算机视觉

2022-11-10

X-ray imaging technology has been used for decades in clinical tasks to reveal the internal condition of different organs, and in recent years, it has become more common in other areas such as industry, security, and geography. The recent development of computer vision and machine learning techniques has also made it easier to automatically process X-ray images and several machine learning-based object (anomaly) detection, classification, and segmentation methods have been recently employed in X-ray image analysis. Due to the high potential of deep learning in related image processing applications, it has been used in most of the studies. This survey reviews the recent research on using computer vision and machine learning for X-ray analysis in industrial production and security applications and covers the applications, techniques, evaluation metrics, datasets, and performance comparison of those techniques on publicly available datasets. We also highlight some drawbacks in the published research and give recommendations for future research in computer vision-based X-ray analysis.

translated by 谷歌翻译

Detect Faces Efficiently: A Survey and Evaluations

Yuantao Feng , Shiqi Yu , Hanyang Peng , Yan-Ran Li , Jianguo Zhang

分类：计算机视觉 | 人工智能

2021-12-03

面部检测是为了在图像中搜索面部的所有可能区域，并且如果有任何情况，则定位面部。包括面部识别，面部表情识别，面部跟踪和头部姿势估计的许多应用假设面部的位置和尺寸在图像中是已知的。近几十年来，研究人员从Viola-Jones脸上检测器创造了许多典型和有效的面部探测器到当前的基于CNN的CNN。然而，随着图像和视频的巨大增加，具有面部刻度的变化，外观，表达，遮挡和姿势，传统的面部探测器被挑战来检测野外面孔的各种“脸部。深度学习技术的出现带来了非凡的检测突破，以及计算的价格相当大的价格。本文介绍了代表性的深度学习的方法，并在准确性和效率方面提出了深度和全面的分析。我们进一步比较并讨论了流行的并挑战数据集及其评估指标。进行了几种成功的基于深度学习的面部探测器的全面比较，以使用两个度量来揭示其效率：拖鞋和延迟。本文可以指导为不同应用选择合适的面部探测器，也可以开发更高效和准确的探测器。

translated by 谷歌翻译

Orientation Aware Weapons Detection In Visual Data : A Benchmark Dataset

Nazeef Ul Haq , Muhammad Moazam Fraz , Tufail Sajjad Shah Hashmi , Muhammad Shahzad

分类：计算机视觉

2021-12-04

自动检测武器对于改善个人的安全性和福祉是重要的，仍然是由于各种尺寸，武器形状和外观，这是一项艰巨的任务。查看点变化和遮挡也是使这项任务更加困难的原因。此外，目前的物体检测算法处理矩形区域，但是一个细长和长的步枪可以真正地覆盖区域的一部分区域，其余部分可能包含未经紧的细节。为了克服这些问题，我们提出了一种用于定向意识武器检测的CNN架构，其提供具有改进的武器检测性能的面向边界框。所提出的模型不仅通过将角度作为分类问题的角度分成8个类而且提供方向，而是作为回归问题。对于培训我们的武器检测模型，包括总6400件武器图像的新数据集从网上收集，然后用面向定向的边界框手动注释。我们的数据集不仅提供导向的边界框作为地面真相，还提供了水平边界框。我们还以多种现代对象探测器提供我们的数据集，用于在该领域进一步研究。所提出的模型在该数据集上进行评估，并且与搁板对象检测器的比较分析产生了卓越的拟议模型的性能，以标准评估策略测量。数据集和模型实现在此链接上公开可用：https://bit.ly/2tyzicf。

translated by 谷歌翻译

SSD: Single Shot MultiBox Detector

Wei Liu , Dragomir Anguelov , Dumitru Erhan , Christian Szegedy , Scott Reed , Cheng-Yang Fu , Alexander C. Berg

分类：

2015-12-08

We present a method for detecting objects in images using a single deep neural network. Our approach, named SSD, discretizes the output space of bounding boxes into a set of default boxes over different aspect ratios and scales per feature map location. At prediction time, the network generates scores for the presence of each object category in each default box and produces adjustments to the box to better match the object shape. Additionally, the network combines predictions from multiple feature maps with different resolutions to naturally handle objects of various sizes. SSD is simple relative to methods that require object proposals because it completely eliminates proposal generation and subsequent pixel or feature resampling stages and encapsulates all computation in a single network. This makes SSD easy to train and straightforward to integrate into systems that require a detection component. Experimental results on the PASCAL VOC, COCO, and ILSVRC datasets confirm that SSD has competitive accuracy to methods that utilize an additional object proposal step and is much faster, while providing a unified framework for both training and inference. For 300 × 300 input, SSD achieves 74.3% mAP 1 on VOC2007 test at 59 FPS on a Nvidia Titan X and for 512 × 512 input, SSD achieves 76.9% mAP, outperforming a comparable state-of-the-art Faster R-CNN model. Compared to other single stage methods, SSD has much better accuracy even with a smaller input image size. Code is available at: https://github.com/weiliu89/caffe/tree/ssd .

translated by 谷歌翻译

Faster R-CNN: Towards Real-Time Object Detection with Region Proposal Networks

Shaoqing Ren , Kaiming He , Ross Girshick , Jian Sun

分类：

2015-06-04

State-of-the-art object detection networks depend on region proposal algorithms to hypothesize object locations. Advances like SPPnet [1] and Fast R-CNN [2] have reduced the running time of these detection networks, exposing region proposal computation as a bottleneck. In this work, we introduce a Region Proposal Network (RPN) that shares full-image convolutional features with the detection network, thus enabling nearly cost-free region proposals. An RPN is a fully convolutional network that simultaneously predicts object bounds and objectness scores at each position. The RPN is trained end-to-end to generate high-quality region proposals, which are used by Fast R-CNN for detection. We further merge RPN and Fast R-CNN into a single network by sharing their convolutional features-using the recently popular terminology of neural networks with "attention" mechanisms, the RPN component tells the unified network where to look. For the very deep VGG-16 model [3], our detection system has a frame rate of 5fps (including all steps) on a GPU, while achieving state-of-the-art object detection accuracy on PASCAL VOC 2007, 2012, and MS COCO datasets with only 300 proposals per image. In ILSVRC and COCO 2015 competitions, Faster R-CNN and RPN are the foundations of the 1st-place winning entries in several tracks. Code has been made publicly available.

translated by 谷歌翻译

Automated Defect Recognition of Castings defects using Neural Networks

Alberto García-Pérez , María José Gómez-Silva , Arturo de la Escalera

分类：计算机视觉

2022-09-06

工业X射线分析在需要保证某些零件的结构完整性的航空航天，汽车或核行业中很常见。但是，射线照相图像的解释有时很困难，可能导致两名专家在缺陷分类上不同意。本文介绍的自动缺陷识别（ADR）系统将减少分析时间，还将有助于减少对缺陷的主观解释，同时提高人类检查员的可靠性。我们的卷积神经网络（CNN）模型达到94.2 \％准确性（MAP@iou = 50 \％），当应用于汽车铝铸件数据集（GDXRAR）时，它被认为与预期的人类性能相似，超过了当前状态该数据集的艺术。在工业环境上，其推理时间少于每个DICOM图像，因此可以安装在生产设施上，不会影响交付时间。此外，还进行了对主要高参数的消融研究，以优化从75 \％映射的初始基线结果最高94.2 \％map的模型准确性。

translated by 谷歌翻译

Small Object Detection using Deep Learning

Aleena Ajaz , Ayesha Salar , Tauseef Jamal , Asif Ullah Khan

分类：计算机视觉 | 机器学习

2022-01-10

现在，诸如无人机之类的无人机，从捕获和目标检测的各种目的中，从Ariel Imagery等捕获和目标检测的各种目的很大使用。轻松进入这些小的Ariel车辆到公众可能导致严重的安全威胁。例如，可以通过使用无人机在公共公共场合中混合的间谍来监视关键位置。在手中研究提出了一种改进和高效的深度学习自治系统，可以以极大的精度检测和跟踪非常小的无人机。建议的系统由自定义深度学习模型Tiny Yolov3组成，其中一个非常快速的物体检测模型的口味之一，您只能构建并用于检测一次（YOLO）。物体检测算法将有效地检测无人机。与以前的Yolo版本相比，拟议的架构表现出显着更好的性能。在资源使用和时间复杂性方面观察到改进。使用召回和精度分别为93％和91％的测量来测量性能。

translated by 谷歌翻译

Towards Large-Scale Small Object Detection: Survey and Benchmarks

Gong Cheng , Xiang Yuan , Xiwen Yao , Kebing Yan , Qinghua Zeng , Junwei Han

分类：计算机视觉

2022-07-28

随着深度卷积神经网络的兴起，对象检测在过去几年中取得了突出的进步。但是，这种繁荣无法掩盖小物体检测（SOD）的不令人满意的情况，这是计算机视觉中臭名昭著的挑战性任务之一，这是由于视觉外观不佳和由小目标的内在结构引起的嘈杂表示。此外，用于基准小对象检测方法基准测试的大规模数据集仍然是瓶颈。在本文中，我们首先对小物体检测进行了详尽的审查。然后，为了催化SOD的发展，我们分别构建了两个大规模的小物体检测数据集（SODA），SODA-D和SODA-A，分别集中在驾驶和空中场景上。 SODA-D包括24704个高质量的交通图像和277596个9个类别的实例。对于苏打水，我们收集2510个高分辨率航空图像，并在9个类别上注释800203实例。众所周知，拟议的数据集是有史以来首次尝试使用针对多类SOD量身定制的大量注释实例进行大规模基准测试。最后，我们评估主流方法在苏打水上的性能。我们预计发布的基准可以促进SOD的发展，并产生该领域的更多突破。数据集和代码将很快在：\ url {https://shaunyuan22.github.io/soda}上。

translated by 谷歌翻译

Center and Scale Prediction: Anchor-free Approach for Pedestrian and Face Detection

Wei Liu , Irtiza Hasan , Shengcai Liao

分类：计算机视觉

2019-04-05

物体检测通常需要在现代深度学习方法中基于传统或锚盒的滑动窗口分类器。但是，这些方法中的任何一个都需要框中的繁琐配置。在本文中，我们提供了一种新的透视图，其中检测对象被激励为高电平语义特征检测任务。与边缘，角落，斑点和其他特征探测器一样，所提出的探测器扫描到全部图像的特征点，卷积自然适合该特征点。但是，与这些传统的低级功能不同，所提出的探测器用于更高级别的抽象，即我们正在寻找有物体的中心点，而现代深层模型已经能够具有如此高级别的语义抽象。除了Blob检测之外，我们还预测了中心点的尺度，这也是直接的卷积。因此，在本文中，通过卷积简化了行人和面部检测作为直接的中心和规模预测任务。这样，所提出的方法享有一个无盒设置。虽然结构简单，但它对几个具有挑战性的基准呈现竞争准确性，包括行人检测和面部检测。此外，执行交叉数据集评估，证明所提出的方法的卓越泛化能力。可以访问代码和模型（https://github.com/liuwei16/csp和https://github.com/hasanirtiza/pedestron）。

translated by 谷歌翻译

1st Workshop on Maritime Computer Vision (MaCVi) 2023: Challenge Results

Benjamin Kiefer , Matej Kristan , Janez Perš , Lojze Žust , Fabio Poiesi , Fabio Augusto de Alcantara Andrade , Alexandre Bernardino , Matthew Dawkins , Jenni Raitoharju , Yitong Quan

分类：计算机视觉 | 人工智能 | 机器学习 | 机器人

2022-11-24

The 1$^{\text{st}}$ Workshop on Maritime Computer Vision (MaCVi) 2023 focused on maritime computer vision for Unmanned Aerial Vehicles (UAV) and Unmanned Surface Vehicle (USV), and organized several subchallenges in this domain: (i) UAV-based Maritime Object Detection, (ii) UAV-based Maritime Object Tracking, (iii) USV-based Maritime Obstacle Segmentation and (iv) USV-based Maritime Obstacle Detection. The subchallenges were based on the SeaDronesSee and MODS benchmarks. This report summarizes the main findings of the individual subchallenges and introduces a new benchmark, called SeaDronesSee Object Detection v2, which extends the previous benchmark by including more classes and footage. We provide statistical and qualitative analyses, and assess trends in the best-performing methodologies of over 130 submissions. The methods are summarized in the appendix. The datasets, evaluation code and the leaderboard are publicly available at https://seadronessee.cs.uni-tuebingen.de/macvi.

translated by 谷歌翻译

Situation Awareness for Automated Surgical Check-listing in AI-Assisted Operating Room

Tochukwu Onyeogulu , Amirul Islam , Salman Khan , Izzeddin Teeti , Fabio Cuzzolin

分类：计算机视觉

2022-09-12

如今，使用微创手术（MIS）进行了更多的手术程序。这是由于其许多好处，例如最小的术后问题，较少的出血，较小的疤痕和快速的康复。但是，MIS的视野，小手术室和对操作场景的间接查看可能导致手术工具发生冲突并可能损害人体器官或组织。因此，通过使用内窥镜视频饲料实时检测和监视手术仪器，可以大大减少MIS问题，并且可以提高手术程序的准确性和成功率。在本文中，研究，分析和评估了对Yolov5对象检测器的一系列改进，以增强手术仪器的检测。在此过程中，我们进行了基于性能的消融研究，探索了改变Yolov5模型的骨干，颈部和锚固结构元素的影响，并注释了独特的内窥镜数据集。此外，我们将消融研究的有效性与其他四个SOTA对象探测器（Yolov7，Yolor，Scaled-Yolov4和Yolov3-SPP）进行了比较。除了Yolov3-SPP（在MAP中具有98.3％的模型性能和相似的推理速度）外，我们的所有基准模型（包括原始的Yolov5）在使用新的内窥镜数据集的实验中超过了我们的顶级精制模型。

translated by 谷歌翻译

A Comprehensive Study of Real-Time Object Detection Networks Across Multiple Domains: A Survey

Elahe Arani , Shruthi Gowda , Ratnajit Mukherjee , Omar Magdy , Senthilkumar Kathiresan , Bahram Zonooz

分类：计算机视觉 | 人工智能

2022-08-23

深神网络的对象探测器正在不断发展，并用于多种应用程序，每个应用程序都有自己的要求集。尽管关键安全应用需要高准确性和可靠性，但低延迟任务需要资源和节能网络。不断提出了实时探测器，在高影响现实世界中是必需的，但是它们过分强调了准确性和速度的提高，而其他功能（例如多功能性，鲁棒性，资源和能源效率）则被省略。现有网络的参考基准不存在，设计新网络的标准评估指南也不存在，从而导致比较模棱两可和不一致的比较。因此，我们对广泛的数据集进行了多个实时探测器（基于锚点，关键器和变压器）的全面研究，并报告了一系列广泛指标的结果。我们还研究了变量，例如图像大小，锚固尺寸，置信阈值和架构层对整体性能的影响。我们分析了检测网络的鲁棒性，以防止分配变化，自然腐败和对抗性攻击。此外，我们提供了校准分析来评估预测的可靠性。最后，为了强调现实世界的影响，我们对自动驾驶和医疗保健应用进行了两个独特的案例研究。为了进一步衡量关键实时应用程序中网络的能力，我们报告了在Edge设备上部署检测网络后的性能。我们广泛的实证研究可以作为工业界对现有网络做出明智选择的指南。我们还希望激发研究社区的设计和评估网络的新方向，该网络着重于更大而整体的概述，以实现深远的影响。

translated by 谷歌翻译

AIParsing: Anchor-free Instance-level Human Parsing

Sanyi Zhang , Xiaochun Cao , Guo-Jun Qi , Zhanjie Song , Jie Zhou

分类：计算机视觉

2022-07-14

大多数最先进的实例级人类解析模型都采用了两阶段的基于锚的探测器，因此无法避免启发式锚盒设计和像素级别缺乏分析。为了解决这两个问题，我们设计了一个实例级人类解析网络，该网络在像素级别上无锚固且可解决。它由两个简单的子网络组成：一个用于边界框预测的无锚检测头和一个用于人体分割的边缘引导解析头。无锚探测器的头继承了像素样的优点，并有效地避免了对象检测应用中证明的超参数的敏感性。通过引入部分感知的边界线索，边缘引导的解析头能够将相邻的人类部分与彼此区分开，最多可在一个人类实例中，甚至重叠的实例。同时，利用了精炼的头部整合盒子级别的分数和部分分析质量，以提高解析结果的质量。在两个多个人类解析数据集（即CIHP和LV-MHP-V2.0）和一个视频实例级人类解析数据集（即VIP）上进行实验，表明我们的方法实现了超过全球级别和实例级别的性能最新的一阶段自上而下的替代方案。

translated by 谷歌翻译

Feature Pyramid Networks for Object Detection

Tsung-Yi Lin , Piotr Dollár , Ross Girshick , Kaiming He , Bharath Hariharan , Serge Belongie

分类：

2016-12-09

Feature pyramids are a basic component in recognition systems for detecting objects at different scales. But recent deep learning object detectors have avoided pyramid representations, in part because they are compute and memory intensive. In this paper, we exploit the inherent multi-scale, pyramidal hierarchy of deep convolutional networks to construct feature pyramids with marginal extra cost. A topdown architecture with lateral connections is developed for building high-level semantic feature maps at all scales. This architecture, called a Feature Pyramid Network (FPN), shows significant improvement as a generic feature extractor in several applications. Using FPN in a basic Faster R-CNN system, our method achieves state-of-the-art singlemodel results on the COCO detection benchmark without bells and whistles, surpassing all existing single-model entries including those from the COCO 2016 challenge winners. In addition, our method can run at 6 FPS on a GPU and thus is a practical and accurate solution to multi-scale object detection. Code will be made publicly available.

translated by 谷歌翻译