智能论文笔记

Noise and Edge Based Dual Branch Image Manipulation Detection

Zhongyuan Zhang , Yi Qian , Yanxiang Zhao , Lin Zhu , Jinjin Wang

分类：计算机视觉

2022-07-02

与普通的计算机视觉任务不同，将图像操作检测任务更多地关注图像的语义内容，更关注图像操纵的微妙信息。在本文中，通过改进的约束卷积提取的噪声图像用作模型的输入，而不是原始图像，以获得更微妙的操纵痕迹。同时，由高分辨率分支和上下文分支组成的双分支网络被用来尽可能捕获伪像的痕迹。通常，大多数操纵将操纵伪像在操纵边缘上。专门设计的操纵边缘检测模块是基于双分支网络构建的，以更好地识别这些工件。图像中像素之间的相关性与它们的距离密切相关。两个像素越远，相关性越弱。我们为自我发场模块添加了一个距离因子，以更好地描述像素之间的相关性。四个公开图像操作数据集的实验结果证明了我们模型的有效性。

translated by 谷歌翻译

MSMG-Net: Multi-scale Multi-grained Supervised Metworks for Multi-task Image Manipulation Detection and Localization

Fengsheng Wang , Leyi Wei

分类：计算机视觉

2022-11-06

With the rapid advances of image editing techniques in recent years, image manipulation detection has attracted considerable attention since the increasing security risks posed by tampered images. To address these challenges, a novel multi-scale multi-grained deep network (MSMG-Net) is proposed to automatically identify manipulated regions. In our MSMG-Net, a parallel multi-scale feature extraction structure is used to extract multi-scale features. Then the multi-grained feature learning is utilized to perceive object-level semantics relation of multi-scale features by introducing the shunted self-attention. To fuse multi-scale multi-grained features, global and local feature fusion block are designed for manipulated region segmentation by a bottom-up approach and multi-level feature aggregation block is designed for edge artifacts detection by a top-down approach. Thus, MSMG-Net can effectively perceive the object-level semantics and encode the edge artifact. Experimental results on five benchmark datasets justify the superior performance of the proposed method, outperforming state-of-the-art manipulation detection and localization methods. Extensive ablation experiments and feature visualization demonstrate the multi-scale multi-grained learning can present effective visual representations of manipulated regions. In addition, MSMG-Net shows better robustness when various post-processing methods further manipulate images.

translated by 谷歌翻译

MVSS-Net: Multi-View Multi-Scale Supervised Networks for Image Manipulation Detection

Chengbo Dong , Xinru Chen , Ruohan Hu , Juan Cao , Xirong Li

分类：计算机视觉 | 人工智能

2021-12-16

图像操纵检测的关键研究问题是如何学习对新型数据中的操纵敏感的宽大功能，而特定于防止在真实图像上的误报。目前的研究强调了敏感性，特异性主要忽略了。在本文中，我们通过多视图特征学习和多尺度监督来解决两个方面。通过利用篡改区域周围的噪声分布和边界伪影，前者旨在学习语义 - 不可知，更广泛的特征。后者允许我们从真实的图像中学习以通过依赖于语义分割损耗的现有技术来考虑非凡的图像。我们的想法是由我们术语MVSS-Net及其增强版MVSS-Net ++的新网络实现。六个公共基准数据集的综合实验证明了MVSS-Net系列的可行性，以实现像素级和图像级操作检测。

translated by 谷歌翻译

TriPINet: Tripartite Progressive Integration Network for Image Manipulation Localization

Wei-Yun Liang , Jing Xu , Xiao Jin

分类：计算机视觉

2022-12-25

Image manipulation localization aims at distinguishing forged regions from the whole test image. Although many outstanding prior arts have been proposed for this task, there are still two issues that need to be further studied: 1) how to fuse diverse types of features with forgery clues; 2) how to progressively integrate multistage features for better localization performance. In this paper, we propose a tripartite progressive integration network (TriPINet) for end-to-end image manipulation localization. First, we extract both visual perception information, e.g., RGB input images, and visual imperceptible features, e.g., frequency and noise traces for forensic feature learning. Second, we develop a guided cross-modality dual-attention (gCMDA) module to fuse different types of forged clues. Third, we design a set of progressive integration squeeze-and-excitation (PI-SE) modules to improve localization performance by appropriately incorporating multiscale features in the decoder. Extensive experiments are conducted to compare our method with state-of-the-art image forensics approaches. The proposed TriPINet obtains competitive results on several benchmark datasets.

translated by 谷歌翻译

GCA-Net : Utilizing Gated Context Attention for Improving Image Forgery Localization and Detection

Sowmen Das , Md. Saiful Islam , Md. Ruhul Amin

分类：计算机视觉 | 机器学习

2021-12-08

法医分析取决于从操纵图像识别隐藏迹线。由于它们无法处理功能衰减和依赖主导空间特征，传统的神经网络失败。在这项工作中，我们提出了一种新颖的门控语言注意力网络（GCA-NET），用于全球背景学习的非本地关注块。另外，我们利用所通用的注意机制结合密集的解码器网络，以引导在解码阶段期间的相关特征的流动，允许精确定位。所提出的注意力框架允许网络通过过滤粗糙度来专注于相关区域。此外，通过利用多尺度特征融合和有效的学习策略，GCA-Net可以更好地处理操纵区域的比例变化。我们表明，我们的方法在多个基准数据集中平均优于最先进的网络，平均为4.2％-5.4％AUC。最后，我们还开展了广泛的消融实验，以展示该方法对图像取证的鲁棒性。

translated by 谷歌翻译

Orientation and Context Entangled Network for Retinal Vessel Segmentation

Xinxu Wei , Kaifu Yang , Danilo Bzdok , Yongjie Li

分类：计算机视觉

2022-07-23

大多数现有的基于深度学习的方法用于血管分割的方法忽略了视网膜血管的两个重要方面，一个是船只的定向信息，另一个是整个基底区域的上下文信息。在本文中，我们提出了一个强大的方向和上下文纠缠的网络（称为OCE-NET），该网络具有提取血管的复杂方向和上下文信息的能力。为了实现复杂的方向，提出了动态复杂方向意识卷积（DCOA Conv），以提取具有多种取向的复杂血管，以改善血管连续性。为了同时捕获全球上下文信息并强调重要的本地信息，开发了一个全球和局部融合模块（GLFM），以同时对船舶的长距离依赖性进行建模，并将足够的关注放在局部薄船上。提出了一种新颖的方向和上下文纠缠的非本地（OCE-NL）模块，以将方向和上下文信息纠缠在一起。此外，提出了不平衡的注意模块（UARM）来处理背景，厚和薄容器的不平衡像素数量。在几个常用的数据集（驱动器，凝视和ChasceB1）和一些更具挑战性的数据集（AV Wide，UOA-DR，RFMID和UK BioBANK）上进行了广泛的实验。消融研究表明，所提出的方法在保持薄血管的连续性方面取得了有希望的性能，比较实验表明，我们的OCE-NET可以在视网膜血管分割上实现最新性能。

translated by 谷歌翻译

GlassNet: Label Decoupling-based Three-stream Neural Network for Robust Image Glass Detection

C. Zheng , D. Shi , X. Yan , D. Liang , M. wei , X. Yang , Y. Guo , H. Xie

分类：计算机视觉

2021-08-25

由于透明玻璃与图像中的任意物体相同，大多数现有物体检测方法产生较差的玻璃检测结果。与众不同的基于深度学习的智慧不同，只需使用对象边界作为辅助监督，我们利用标签解耦将原始标记的地图（GT）映射分解为内部扩散图和边界扩散图。与两个新生成的地图合作的GT映射破坏了物体边界的不平衡分布，导致玻璃检测质量改善。我们有三个关键贡献来解决透明的玻璃探测问题：（1）我们提出了一个三流神经网络（短暂的呼叫GlassNet），完全吸收三张地图中的有益功能。（2）我们设计多尺度交互扩张模块，以探索更广泛的上下文信息。（3）我们开发了一个基于关注的边界意识的功能拼接模块，用于集成多模态信息。基准数据集的广泛实验表明，在整体玻璃检测精度和边界清晰度方面，在SOTA方面对我们的方法进行了明确的改进。

translated by 谷歌翻译

Polyp-PVT: Polyp Segmentation with Pyramid Vision Transformers

Bo Dong , Wenhai Wang , Deng-Ping Fan , Jinpeng Li , Huazhu Fu , Ling Shao

分类：计算机视觉

2021-08-16

大多数息肉分段方法使用CNNS作为其骨干，导致在编码器和解码器之间的信息交换信息时的两个关键问题：1）考虑到不同级别特征之间的贡献的差异; 2）设计有效机制，以融合这些功能。不同于现有的基于CNN的方法，我们采用了一个变压器编码器，它学会了更强大和强大的表示。此外，考虑到息肉的图像采集影响和难以实现的性质，我们介绍了三种新模块，包括级联融合模块（CFM），伪装识别模块（CIM），A和相似性聚集模块（SAM）。其中，CFM用于从高级功能收集息肉的语义和位置信息，而CIM应用于在低级功能中伪装的息肉信息。在SAM的帮助下，我们将息肉区域的像素特征扩展到整个息肉区域的高电平语义位置信息，从而有效地融合了交叉级别特征。所提出的模型名为Polyp-PVT，有效地抑制了特征中的噪声，并显着提高了他们的表现力。在五个广泛采用的数据集上进行了广泛的实验表明，所提出的模型对各种具有挑战性的情况（例如，外观变化，小物体）比现有方法更加强大，并实现了新的最先进的性能。拟议的模型可在https://github.com/dengpingfan/polyp-pvt获得。

translated by 谷歌翻译

PSCC-Net: Progressive Spatio-Channel Correlation Network for Image Manipulation Detection and Localization

Xiaohong Liu , Yaojie Liu , Jun Chen , Xiaoming Liu

分类：计算机视觉

2021-03-19

为了防止操纵图像内容（例如剪接，复制移动和删除），我们开发了一个渐进的时空通道相关网络（PSCC-NET），以检测和本地化图像操作。 PSCC-NET以两路程的过程处理图像：一条自上而下的路径，该路径提取本地和全局特征以及检测输入图像是否被操纵的自下而上的路径，并在多个尺度上估算其操纵掩码，每个尺度都在其中面具的条件是在前一个。与传统的编码器编码器和无流动结构不同，PSCC-NET在不同尺度上的功能具有密集的交叉连接，以粗到更细致的方式产生操纵罩。此外，空间通道相关模块（SCCM）捕获自下而上路径中的空间和渠道相关性，该路径赋予了整体提示，使网络能够应对广泛的操纵攻击。得益于轻巧的主链和渐进式机制，PSCC-NET可以在50+ fps下处理1,080p图像。广泛的实验证明了PSCC-NET优于最先进方法在检测和定位方面。

translated by 谷歌翻译

Medical Image Segmentation Using Deep Learning: A Survey

Risheng Wang , Tao Lei , Ruixia Cui , Bingtao Zhang , Hongying Meng , Asoke K. Nandi

分类：计算机视觉

2020-09-28

深度学习已被广泛用于医学图像分割，并且录制了录制了该领域深度学习的成功的大量论文。在本文中，我们使用深层学习技术对医学图像分割的全面主题调查。本文进行了两个原创贡献。首先，与传统调查相比，直接将深度学习的文献分成医学图像分割的文学，并为每组详细介绍了文献，我们根据从粗略到精细的多级结构分类目前流行的文献。其次，本文侧重于监督和弱监督的学习方法，而不包括无监督的方法，因为它们在许多旧调查中引入而且他们目前不受欢迎。对于监督学习方法，我们分析了三个方面的文献：骨干网络的选择，网络块的设计，以及损耗功能的改进。对于虚弱的学习方法，我们根据数据增强，转移学习和交互式分割进行调查文献。与现有调查相比，本调查将文献分类为比例不同，更方便读者了解相关理由，并将引导他们基于深度学习方法思考医学图像分割的适当改进。

translated by 谷歌翻译

Dunhuang murals contour generation network based on convolution and self-attention fusion

Baokai Liu , Fengjie He , Shiqiang Du , Kaiwu Zhang , Jianhua Wang

分类：计算机视觉 | 人工智能

2022-12-02

Dunhuang murals are a collection of Chinese style and national style, forming a self-contained Chinese-style Buddhist art. It has very high historical and cultural value and research significance. Among them, the lines of Dunhuang murals are highly general and expressive. It reflects the character's distinctive character and complex inner emotions. Therefore, the outline drawing of murals is of great significance to the research of Dunhuang Culture. The contour generation of Dunhuang murals belongs to image edge detection, which is an important branch of computer vision, aims to extract salient contour information in images. Although convolution-based deep learning networks have achieved good results in image edge extraction by exploring the contextual and semantic features of images. However, with the enlargement of the receptive field, some local detail information is lost. This makes it impossible for them to generate reasonable outline drawings of murals. In this paper, we propose a novel edge detector based on self-attention combined with convolution to generate line drawings of Dunhuang murals. Compared with existing edge detection methods, firstly, a new residual self-attention and convolution mixed module (Ramix) is proposed to fuse local and global features in feature maps. Secondly, a novel densely connected backbone extraction network is designed to efficiently propagate rich edge feature information from shallow layers into deep layers. Compared with existing methods, it is shown on different public datasets that our method is able to generate sharper and richer edge maps. In addition, testing on the Dunhuang mural dataset shows that our method can achieve very competitive performance.

translated by 谷歌翻译

NFANet: A Novel Method for Weakly Supervised Water Extraction from High-Resolution Remote Sensing Imagery

Ming Lu , Leyuan Fang , Muxing Li , Bob Zhang , Yi Zhang , Pedram Ghamisi

分类：计算机视觉 | 机器学习

2022-01-10

利用深度学习的水提取需要精确的像素级标签。然而，在像素级别标记高分辨率遥感图像非常困难。因此，我们研究如何利用点标签来提取水体并提出一种名为邻居特征聚合网络（NFANET）的新方法。与PixelLevel标签相比，Point标签更容易获得，但它们会失去许多信息。在本文中，我们利用了局部水体的相邻像素之间的相似性，并提出了邻居采样器来重塑遥感图像。然后，将采样的图像发送到网络以进行特征聚合。此外，我们使用改进的递归训练算法进一步提高提取精度，使水边界更加自然。此外，我们的方法利用相邻特征而不是全局或本地特征来学习更多代表性。实验结果表明，所提出的NFANET方法不仅优于其他研究的弱监管方法，而且还获得与最先进的结果相似。

translated by 谷歌翻译

CCNet: Criss-Cross Attention for Semantic Segmentation

Zilong Huang , Xinggang Wang , Yunchao Wei , Lichao Huang , Humphrey Shi , Wenyu Liu , Thomas S. Huang

分类：

2018-11-28

Contextual information is vital in visual understanding problems, such as semantic segmentation and object detection. We propose a Criss-Cross Network (CCNet) for obtaining full-image contextual information in a very effective and efficient way. Concretely, for each pixel, a novel criss-cross attention module harvests the contextual information of all the pixels on its criss-cross path. By taking a further recurrent operation, each pixel can finally capture the full-image dependencies. Besides, a category consistent loss is proposed to enforce the criss-cross attention module to produce more discriminative features. Overall, CCNet is with the following merits: 1) GPU memory friendly. Compared with the non-local block, the proposed recurrent criss-cross attention module requires 11× less GPU memory usage. 2) High computational efficiency. The recurrent criss-cross attention significantly reduces FLOPs by about 85% of the non-local block. 3) The state-of-the-art performance. We conduct extensive experiments on semantic segmentation benchmarks including Cityscapes, ADE20K, human parsing benchmark LIP, instance segmentation benchmark COCO, video segmentation benchmark CamVid. In particular, our CCNet achieves the mIoU scores of 81.9%, 45.76% and 55.47% on the Cityscapes test set, the ADE20K validation set and the LIP validation set respectively, which are the new state-of-the-art results. The source codes are available at https://github.com/speedinghzl/CCNet.

translated by 谷歌翻译

Ultra-high Resolution Image Segmentation via Locality-aware Context Fusion and Alternating Local Enhancement

Wenxi Liu , Qi Li , Xindai Lin , Weixiang Yang , Shengfeng He , Yuanlong Yu

分类：计算机视觉

2021-09-06

Ultra-high resolution image segmentation has raised increasing interests in recent years due to its realistic applications. In this paper, we innovate the widely used high-resolution image segmentation pipeline, in which an ultra-high resolution image is partitioned into regular patches for local segmentation and then the local results are merged into a high-resolution semantic mask. In particular, we introduce a novel locality-aware context fusion based segmentation model to process local patches, where the relevance between local patch and its various contexts are jointly and complementarily utilized to handle the semantic regions with large variations. Additionally, we present the alternating local enhancement module that restricts the negative impact of redundant information introduced from the contexts, and thus is endowed with the ability of fixing the locality-aware features to produce refined results. Furthermore, in comprehensive experiments, we demonstrate that our model outperforms other state-of-the-art methods in public benchmarks. Our released codes are available at: https://github.com/liqiokkk/FCtL.

translated by 谷歌翻译

LEDCNet: A Lightweight and Efficient Semantic Segmentation Algorithm Using Dual Context Module for Extracting Ground Objects from UAV Aerial Remote Sensing Images

Xiaoxiang Han , Yiman Liu , Gang Liu , Qiaohong Liu

分类：计算机视觉

2022-12-16

Semantic segmentation of UAV aerial remote sensing images provides a more efficient and convenient surveying and mapping method for traditional surveying and mapping. In order to make the model lightweight and improve a certain accuracy, this research developed a new lightweight and efficient network for the extraction of ground features from UAV aerial remote sensing images, called LDMCNet. Meanwhile, this research develops a powerful lightweight backbone network for the proposed semantic segmentation model. It is called LDCNet, and it is hoped that it can become the backbone network of a new generation of lightweight semantic segmentation algorithms. The proposed model uses dual multi-scale context modules, namely the Atrous Space Pyramid Pooling module (ASPP) and the Object Context Representation module (OCR). In addition, this research constructs a private dataset for semantic segmentation of aerial remote sensing images from drones. This data set contains 2431 training sets, 945 validation sets, and 475 test sets. The proposed model performs well on this dataset, with only 1.4M parameters and 5.48G floating-point operations (FLOPs), achieving an average intersection-over-union ratio (mIoU) of 71.12%. 7.88% higher than the baseline model. In order to verify the effectiveness of the proposed model, training on the public datasets "LoveDA" and "CITY-OSM" also achieved excellent results, achieving mIoU of 65.27% and 74.39%, respectively.

translated by 谷歌翻译

LF-YOLO: A Lighter and Faster YOLO for Weld Defect Detection of X-ray Image

Moyun Liu , Youping Chen , Lei He , Yang Zhang , Jingming Xie

分类：计算机视觉

2021-10-28

X射线图像在制造业的质量保证中起着重要作用，因为它可以反映焊接区域的内部条件。然而，不同缺陷类型的形状和规模大大变化，这使得模型检测焊接缺陷的挑战性。在本文中，我们提出了一种基于卷积神经网络的焊接缺陷检测方法，即打火机和更快的YOLO（LF-YOLO）。具体地，增强的多尺度特征（RMF）模块旨在实现基于参数和无参数的多尺度信息提取操作。 RMF使得提取的特征映射能够代表更丰富的信息，该信息是通过卓越的层级融合结构实现的。为了提高检测网络的性能，我们提出了一个有效的特征提取（EFE）模块。 EFE处理具有极低消耗量的输入数据，并提高了实际行业中整个网络的实用性。实验结果表明，我们的焊接缺陷检测网络在性能和消耗之间实现了令人满意的平衡，达到92.9平均平均精度MAP50，每秒61.5帧（FPS）。为了进一步证明我们方法的能力，我们在公共数据集MS Coco上测试它，结果表明我们的LF-YOLO具有出色的多功能性检测性能。代码可在https://github.com/lmomoy/lf-yolo上获得。

translated by 谷歌翻译

RHA-Net: An Encoder-Decoder Network with Residual Blocks and Hybrid Attention Mechanisms for Pavement Crack Segmentation

Guijie Zhu , Zhun Fan , Jiacheng Liu , Duan Yuan , Peili Ma , Meihua Wang , Weihua Sheng , Kelvin C. P. Wang

分类：计算机视觉 | 机器学习

2022-07-28

人行道表面数据的获取和评估在路面条件评估中起着至关重要的作用。在本文中，提出了一个称为RHA-NET的自动路面裂纹分割的有效端到端网络，以提高路面裂纹分割精度。 RHA-NET是通过将残留块（重阻）和混合注意块集成到编码器架构结构中来构建的。这些重组用于提高RHA-NET提取高级抽象特征的能力。混合注意块旨在融合低级功能和高级功能，以帮助模型专注于正确的频道和裂纹区域，从而提高RHA-NET的功能表现能力。构建并用于训练和评估所提出的模型的图像数据集，其中包含由自设计的移动机器人收集的789个路面裂纹图像。与其他最先进的网络相比，所提出的模型在全面的消融研究中验证了添加残留块和混合注意机制的功能。此外，通过引入深度可分离卷积生成的模型的轻加权版本可以更好地实现性能和更快的处理速度，而U-NET参数数量的1/30。开发的系统可以在嵌入式设备Jetson TX2（25 fps）上实时划分路面裂纹。实时实验拍摄的视频将在https://youtu.be/3xiogk0fig4上发布。

translated by 谷歌翻译

Large-Field Contextual Feature Learning for Glass Detection

Haiyang Mei , Xin Yang , Letian Yu , Qiang Zhang , Xiaopeng Wei , Rynson W. H. Lau

分类：计算机视觉

2022-09-10

玻璃在我们的日常生活中非常普遍。现有的计算机视觉系统忽略了它，因此可能会产生严重的后果，例如，机器人可能会坠入玻璃墙。但是，感知玻璃的存在并不简单。关键的挑战是，任意物体/场景可以出现在玻璃后面。在本文中，我们提出了一个重要的问题，即从单个RGB图像中检测玻璃表面。为了解决这个问题，我们构建了第一个大规模玻璃检测数据集（GDD），并提出了一个名为GDNet-B的新颖玻璃检测网络，该网络通过新颖的大型场探索大型视野中的丰富上下文提示上下文特征集成（LCFI）模块并将高级和低级边界特征与边界特征增强（BFE）模块集成在一起。广泛的实验表明，我们的GDNET-B可以在GDD测试集内外的图像上达到满足玻璃检测结果。我们通过将其应用于其他视觉任务（包括镜像分割和显着对象检测）来进一步验证我们提出的GDNET-B的有效性和概括能力。最后，我们显示了玻璃检测的潜在应用，并讨论了可能的未来研究方向。

translated by 谷歌翻译

TextDCT: Arbitrary-Shaped Text Detection via Discrete Cosine Transform Mask

Yuchen Su , Zhiwen Shao , Yong Zhou , Fanrong Meng , Hancheng Zhu , Bing Liu , Rui Yao

分类：计算机视觉

2022-06-27

由于字体，大小，颜色和方向的各种文本变化，任意形状的场景文本检测是一项具有挑战性的任务。大多数现有基于回归的方法求助于回归文本区域的口罩或轮廓点以建模文本实例。但是，回归完整的口罩需要高训练的复杂性，并且轮廓点不足以捕获高度弯曲的文本的细节。为了解决上述限制，我们提出了一个名为TextDCT的新颖的轻巧锚文本检测框架，该框架采用离散的余弦变换（DCT）将文本掩码编码为紧凑型向量。此外，考虑到金字塔层中训练样本不平衡的数量，我们仅采用单层头来进行自上而下的预测。为了建模单层头部的多尺度文本，我们通过将缩水文本区域视为正样本，并通过融合来介绍一个新颖的积极抽样策略，并通过融合来设计特征意识模块（FAM），以实现空间意识和规模的意识丰富的上下文信息并关注更重要的功能。此外，我们提出了一种分割的非量最大抑制（S-NMS）方法，该方法可以过滤低质量的掩模回归。在四个具有挑战性的数据集上进行了广泛的实验，这表明我们的TextDCT在准确性和效率上都获得了竞争性能。具体而言，TextDCT分别以每秒17.2帧（FPS）和F-measure的F-MEASIE达到85.1，而CTW1500和Total-Text数据集的F-Measure 84.9分别为15.1 fps。

translated by 谷歌翻译

An advanced YOLOv3 method for small object detection

Baokai Liu , Fengjie He , Shiqiang Du , Jiacheng Li , Wenjie Liu

分类：计算机视觉

2022-12-06

In recent years, object detection has achieved a very large performance improvement, but the detection result of small objects is still not very satisfactory. This work proposes a strategy based on feature fusion and dilated convolution that employs dilated convolution to broaden the receptive field of feature maps at various scales in order to address this issue. On the one hand, it can improve the detection accuracy of larger objects. On the other hand, it provides more contextual information for small objects, which is beneficial to improving the detection accuracy of small objects. The shallow semantic information of small objects is obtained by filtering out the noise in the feature map, and the feature information of more small objects is preserved by using multi-scale fusion feature module and attention mechanism. The fusion of these shallow feature information and deep semantic information can generate richer feature maps for small object detection. Experiments show that this method can have higher accuracy than the traditional YOLOv3 network in the detection of small objects and occluded objects. In addition, we achieve 32.8\% Mean Average Precision on the detection of small objects on MS COCO2017 test set. For 640*640 input, this method has 88.76\% mAP on the PASCAL VOC2012 dataset.

translated by 谷歌翻译