We solve the problem of salient object detection by investigating how to expand the role of pooling in convolutional neural networks. Based on the U-shape architecture, we first build a global guidance module (GGM) upon the bottom-up pathway, aiming at providing layers at different feature levels the location information of potential salient objects. We further design a feature aggregation module (FAM) to make the coarse-level semantic information well fused with the fine-level features from the top-down pathway. By adding FAMs after the fusion operations in the topdown pathway, coarse-level features from the GGM can be seamlessly merged with features at various scales. These two pooling-based modules allow the high-level semantic features to be progressively refined, yielding detail enriched saliency maps. Experiment results show that our proposed approach can more accurately locate the salient objects with sharpened details and hence substantially improve the performance compared to the previous state-of-the-arts. Our approach is fast as well and can run at a speed of more than 30 FPS when processing a 300 × 400 image. Code can be found at http://mmcheng.net/poolnet/.
translated by 谷歌翻译
Fully convolutional neural networks (FCNs) have shown their advantages in the salient object detection task. However, most existing FCNs-based methods still suffer from coarse object boundaries. In this paper, to solve this problem, we focus on the complementarity between salient edge information and salient object information. Accordingly, we present an edge guidance network (EGNet) for salient object detection with three steps to simultaneously model these two kinds of complementary information in a single network. In the first step, we extract the salient object features by a progressive fusion way. In the second step, we integrate the local edge information and global location information to obtain the salient edge features. Finally, to sufficiently leverage these complementary features, we couple the same salient edge features with salient object features at various resolutions. Benefiting from the rich edge information and location information in salient edge features, the fused features can help locate salient objects, especially their boundaries more accurately. Experimental results demonstrate that the proposed method performs favorably against the state-of-the-art methods on six widely used datasets without any pre-processing and post-processing. The source code is available at http: //mmcheng.net/egnet/.
translated by 谷歌翻译
Recent progress on salient object detection is substantial, benefiting mostly from the explosive development of Convolutional Neural Networks (CNNs). Semantic segmentation and salient object detection algorithms developed lately have been mostly based on Fully Convolutional Neural Networks (FCNs). There is still a large room for improvement over the generic FCN models that do not explicitly deal with the scale-space problem. Holistically-Nested Edge Detector (HED) provides a skip-layer structure with deep supervision for edge and boundary detection, but the performance gain of HED on saliency detection is not obvious. In this paper, we propose a new salient object detection method by introducing short connections to the skip-layer structures within the HED architecture. Our framework takes full advantage of multi-level and multi-scale features extracted from FCNs, providing more advanced representations at each layer, a property that is critically needed to perform segment detection. Our method produces state-of-theart results on 5 widely tested salient object detection benchmarks, with advantages in terms of efficiency (0.08 seconds per image), effectiveness, and simplicity over the existing algorithms. Beyond that, we conduct an exhaustive analysis on the role of training data on performance. Our experimental results provide a more reasonable and powerful training set for future research and fair comparisons.
translated by 谷歌翻译
突出对象检测在许多下游任务中发挥着重要作用。然而,复杂的现实世界场景具有不同尺度和突出对象的数量仍然构成挑战。在本文中,我们直接解决了在复杂场景中检测多个突出对象的问题。我们提出了一种在空间和频道空间中的非本地特征信息的网络架构,捕获单独对象之间的远程依赖性。传统的自下而上和非本地特征与特征融合门中的边缘特性相结合,逐渐改进解码器中的突出物体预测。我们表明,即使在复杂的情况下,我们的方法也可以准确地定位多个突出区域。为了证明我们对多个突出对象问题的方法的功效,我们策划仅包含多个突出对象的新数据集。我们的实验证明了所提出的方法在没有任何预处理和后处理的情况下展示了五种广泛使用的数据集的最新结果。我们在我们的多对象数据集中获得了对竞争技术的进一步绩效改进。数据集和源代码是可用的:https://github.com/ericdengbowen/dslrdnet。
translated by 谷歌翻译
Deep Convolutional Neural Networks have been adopted for salient object detection and achieved the state-of-the-art performance. Most of the previous works however focus on region accuracy but not on the boundary quality. In this paper, we propose a predict-refine architecture, BASNet, and a new hybrid loss for Boundary-Aware Salient object detection. Specifically, the architecture is composed of a densely supervised Encoder-Decoder network and a residual refinement module, which are respectively in charge of saliency prediction and saliency map refinement. The hybrid loss guides the network to learn the transformation between the input image and the ground truth in a three-level hierarchy -pixel-, patch-and map-level -by fusing Binary Cross Entropy (BCE), Structural SIMilarity (SSIM) and Intersectionover-Union (IoU) losses. Equipped with the hybrid loss, the proposed predict-refine architecture is able to effectively segment the salient object regions and accurately predict the fine structures with clear boundaries. Experimental results on six public datasets show that our method outperforms the state-of-the-art methods both in terms of regional and boundary evaluation measures. Our method runs at over 25 fps on a single GPU. The code is available at: https://github.com/NathanUA/BASNet.
translated by 谷歌翻译
在当前的突出物体检测网络中,最流行的方法是使用U形结构。然而,大量的参数导致更多的计算和存储资源的消耗,无法在有限的存储器设备上部署在有限的存储器设备上不可行。其他一些浅层网络与U形结构相比不会保持相同的精度,并且具有更多参数的深网络结构不会收敛到全球最小损耗,速度很大。为了克服所有这些缺点,我们提出了一种具有三种贡献的新的深度卷积网络架构:(1)使用较小的卷积神经网络(CNN)在我们改进的凸起物体中压缩模型,包括压缩和强化提取模块(ISFCREM)以减少模型的参数。 (2)在ISFCREM中引入信道注意机制,以称量不同的通道,以提高特征表示的能力。 (3)应用新优化器在培训期间累积长期梯度信息,以便自适应地调整学习率。结果表明,该方法几乎可以将模型压缩到原始尺寸的1/3,而不会在与其他模型相比的六个广泛使用的突出物体检测的六个广泛使用的数据集中更快地播放。我们的代码在https://gitee.com/binzhangbinzhangbin/code-a-novel-tentent-based-network-for-fast-salient-object-detection.git
translated by 谷歌翻译
尽管当前的显着对象检测(SOD)作品已经取得了重大进展,但在预测的显着区域的完整性方面,它们受到限制。我们在微观和宏观水平上定义了完整性的概念。具体而言,在微观层面上,该模型应突出显示属于某个显着对象的所有部分。同时,在宏观层面上,模型需要在给定图像中发现所有显着对象。为了促进SOD的完整性学习,我们设计了一个新颖的完整性认知网络(ICON),该网络探讨了学习强大完整性特征的三个重要组成部分。 1)与现有模型不同,该模型更多地集中在功能可区分性上,我们引入了各种功能集合(DFA)组件,以汇总具有各种接受场(即内核形状和背景)的特征,并增加了功能多样性。这种多样性是挖掘积分显着物体的基础。 2)基于DFA功能,我们引入了一个完整性通道增强(ICE)组件,其目标是增强功能通道,以突出积分显着对象,同时抑制其他分心的对象。 3)提取增强功能后,采用零件整体验证(PWV)方法来确定零件和整个对象特征是否具有很强的一致性。这样的部分协议可以进一步改善每个显着对象的微观完整性。为了证明我们图标的有效性,对七个具有挑战性的基准进行了全面的实验。我们的图标在广泛的指标方面优于基线方法。值得注意的是,我们的图标在六个数据集上的平均假阴影(FNR)(FNR)方面,相对于以前的最佳模型的相对改善约为10%。代码和结果可在以下网址获得:https://github.com/mczhuge/icon。
translated by 谷歌翻译
Existing state-of-the-art salient object detection networks rely on aggregating multi-level features of pretrained convolutional neural networks (CNNs). Compared to high-level features, low-level features contribute less to performance but cost more computations because of their larger spatial resolutions. In this paper, we propose a novel Cascaded Partial Decoder (CPD) framework for fast and accurate salient object detection. On the one hand, the framework constructs partial decoder which discards larger resolution features of shallower layers for acceleration. On the other hand, we observe that integrating features of deeper layers obtain relatively precise saliency map. Therefore we directly utilize generated saliency map to refine the features of backbone network. This strategy efficiently suppresses distractors in the features and significantly improves their representation ability. Experiments conducted on five benchmark datasets exhibit that the proposed model not only achieves state-of-the-art performance but also runs much faster than existing models. Besides, the proposed framework is further applied to improve existing multi-level feature aggregation models and significantly improve their efficiency and accuracy.
translated by 谷歌翻译
现有的凸起对象检测(SOD)方法主要依赖于基于CNN的U形结构,跳过连接以将全局上下文和局部空间细节分别用于分别用于定位突出对象和精炼对象细节至关重要。尽管取得了巨大成功,但CNN在学习全球背景下的能力是有限的。最近,由于其强大的全球依赖性建模,视觉变压器在计算机愿景中取得了革命性进展。但是,直接将变压器施加到SOD是次优,因为变压器缺乏学习局部空间表示的能力。为此,本文探讨了变压器和CNN的组合,以了解SOD的全球和本地表示。我们提出了一种基于变压器的非对称双侧U-Net(Abiu-net)。非对称双边编码器具有变压器路径和轻质CNN路径,其中两个路径在每个编码器阶段通信,以分别学习互补的全局背景和局部空间细节。非对称双边解码器还由两个路径组成,用于从变压器和CNN编码器路径处理特征,在每个解码器级的通信分别用于解码粗突出对象位置并分别找到粗糙的对象细节。两个编码器/解码器路径之间的这种通信使ABIU-Net能够分别利用变压器和CNN的自然特性来学习互补的全局和局部表示。因此,Abiu-Net为基于变压器的SOD提供了一种新的视角。广泛的实验表明,ABIU-NET对以前的最先进的SOD方法表现出有利。代码将被释放。
translated by 谷歌翻译
Camouflaged object detection (COD) aims to detect/segment camouflaged objects embedded in the environment, which has attracted increasing attention over the past decades. Although several COD methods have been developed, they still suffer from unsatisfactory performance due to the intrinsic similarities between the foreground objects and background surroundings. In this paper, we propose a novel Feature Aggregation and Propagation Network (FAP-Net) for camouflaged object detection. Specifically, we propose a Boundary Guidance Module (BGM) to explicitly model the boundary characteristic, which can provide boundary-enhanced features to boost the COD performance. To capture the scale variations of the camouflaged objects, we propose a Multi-scale Feature Aggregation Module (MFAM) to characterize the multi-scale information from each layer and obtain the aggregated feature representations. Furthermore, we propose a Cross-level Fusion and Propagation Module (CFPM). In the CFPM, the feature fusion part can effectively integrate the features from adjacent layers to exploit the cross-level correlations, and the feature propagation part can transmit valuable context information from the encoder to the decoder network via a gate unit. Finally, we formulate a unified and end-to-end trainable framework where cross-level features can be effectively fused and propagated for capturing rich context information. Extensive experiments on three benchmark camouflaged datasets demonstrate that our FAP-Net outperforms other state-of-the-art COD models. Moreover, our model can be extended to the polyp segmentation task, and the comparison results further validate the effectiveness of the proposed model in segmenting polyps. The source code and results will be released at https://github.com/taozh2017/FAPNet.
translated by 谷歌翻译
玻璃在我们的日常生活中非常普遍。现有的计算机视觉系统忽略了它,因此可能会产生严重的后果,例如,机器人可能会坠入玻璃墙。但是,感知玻璃的存在并不简单。关键的挑战是,任意物体/场景可以出现在玻璃后面。在本文中,我们提出了一个重要的问题,即从单个RGB图像中检测玻璃表面。为了解决这个问题,我们构建了第一个大规模玻璃检测数据集(GDD),并提出了一个名为GDNet-B的新颖玻璃检测网络,该网络通过新颖的大型场探索大型视野中的丰富上下文提示上下文特征集成(LCFI)模块并将高级和低级边界特征与边界特征增强(BFE)模块集成在一起。广泛的实验表明,我们的GDNET-B可以在GDD测试集内外的图像上达到满足玻璃检测结果。我们通过将其应用于其他视觉任务(包括镜像分割和显着对象检测)来进一步验证我们提出的GDNET-B的有效性和概括能力。最后,我们显示了玻璃检测的潜在应用,并讨论了可能的未来研究方向。
translated by 谷歌翻译
在计算机视觉社区中,从自然场景图像(NSI-SOD)的突出对象检测中取得了巨大进展;相比之下,光学遥感图像(RSI-SOD)中的突出物体检测仍然是一个具有挑战性的新兴主题。光学RSI的独特特性,如尺度,照明和成像方向,在NSI-SOD和RSI-SOD之间带来显着差异。在本文中,我们提出了一种新的多内容互补网络(MCCNet)来探讨RSI-SOD的多个内容的互补性。具体地,MCCNet基于常规编码器解码器架构,并包含一个名为多内容互补模块(MCCM)的新型密钥组件,其桥接编码器和解码器。在MCCM中,我们考虑多种类型的功能对RSI-SOD至关重要的功能,包括前景特征,边缘功能,后台功能和全局图像级别功能,并利用它们之间的内容互补性来突出显示RSI中各种刻度的突出区域通过注意机制的特点。此外,我们全面引入训练阶段的像素级,地图级和公制感知损失。在两个流行的数据集上进行广泛的实验表明,所提出的MCCNet优于23个最先进的方法,包括NSI-SOD和RSI-SOD方法。我们方法的代码和结果可在https://github.com/mathlee/mccnet上获得。
translated by 谷歌翻译
伪装的对象检测(COD)旨在检测周围环境的类似模式(例如,纹理,强度,颜色等)的对象,最近吸引了日益增长的研究兴趣。由于伪装对象通常存在非常模糊的边界,如何确定对象位置以及它们的弱边界是具有挑战性的,也是此任务的关键。受到生物视觉感知过程的启发,当人类观察者发现伪装对象时,本文提出了一种名为Errnet的新型边缘的可逆重新校准网络。我们的模型的特点是两种创新设计,即选择性边缘聚集(SEA)和可逆的重新校准单元(RRU),其旨在模拟视觉感知行为,并在潜在的伪装区域和背景之间实现有效的边缘和交叉比较。更重要的是,RRU与现有COD模型相比,具有更全面的信息。实验结果表明,errnet优于三个COD数据集和五个医学图像分割数据集的现有尖端基线。特别是,与现有的Top-1模型SINET相比,ERRNET显着提高了$ \ SIM 6%(平均电子测量)的性能,以显着高速(79.3 FPS),显示ERRNET可能是一般和强大的解决方案COD任务。
translated by 谷歌翻译
伪装的对象检测(COD)旨在识别自然场景中隐藏自己的物体。准确的COD遭受了许多与低边界对比度有关的挑战,并且对象出现(例如对象大小和形状)的较大变化。为了应对这些挑战,我们提出了一种新颖的背景感知跨层次融合网络(C2F-net),该网络融合了上下文感知的跨级特征,以准确识别伪装的对象。具体而言,我们通过注意力诱导的跨融合模块(ACFM)来计算来自多级特征的内容丰富的注意系数,该模块(ACFM)进一步在注意系数的指导下进一步集成了特征。然后,我们提出了一个双分支全局上下文模块(DGCM),以通过利用丰富的全球上下文信息来完善内容丰富的功能表示的融合功能。多个ACFM和DGCM以级联的方式集成,以产生高级特征的粗略预测。粗糙的预测充当了注意力图,以完善低级特征,然后再将其传递到我们的伪装推断模块(CIM)以生成最终预测。我们对三个广泛使用的基准数据集进行了广泛的实验,并将C2F-NET与最新模型(SOTA)模型进行比较。结果表明,C2F-NET是一种有效的COD模型,并且表现出明显的SOTA模型。此外,对息肉细分数据集的评估证明了我们在COD下游应用程序中C2F-NET的有希望的潜力。我们的代码可在以下网址公开获取:https://github.com/ben57882/c2fnet-tscvt。
translated by 谷歌翻译
Due to object detection's close relationship with video analysis and image understanding, it has attracted much research attention in recent years. Traditional object detection methods are built on handcrafted features and shallow trainable architectures. Their performance easily stagnates by constructing complex ensembles which combine multiple low-level image features with high-level context from object detectors and scene classifiers. With the rapid development in deep learning, more powerful tools, which are able to learn semantic, high-level, deeper features, are introduced to address the problems existing in traditional architectures. These models behave differently in network architecture, training strategy and optimization function, etc. In this paper, we provide a review on deep learning based object detection frameworks. Our review begins with a brief introduction on the history of deep learning and its representative tool, namely Convolutional Neural Network (CNN). Then we focus on typical generic object detection architectures along with some modifications and useful tricks to improve detection performance further. As distinct specific detection tasks exhibit different characteristics, we also briefly survey several specific tasks, including salient object detection, face detection and pedestrian detection. Experimental analyses are also provided to compare various methods and draw some meaningful conclusions. Finally, several promising directions and tasks are provided to serve as guidelines for future work in both object detection and relevant neural network based learning systems.
translated by 谷歌翻译
玻璃在现实世界中非常普遍。受玻璃区域的不确定性以及玻璃背后的各种复杂场景的影响,玻璃的存在对许多计算机视觉任务构成了严重的挑战,从而使玻璃分割成为重要的计算机视觉任务。玻璃没有自己的视觉外观,而只能传输/反映其周围环境的外观,从而与其他常见对象根本不同。为了解决此类具有挑战性的任务,现有方法通常会探索并结合深网络中不同特征级别的有用线索。由于存在级别不同的特征之间的特征差距,即,深层特征嵌入了更多高级语义,并且更好地定位目标对象,而浅层特征具有更大的空间尺寸,并保持更丰富,更详细的低级信息,因此,将这些特征融合到天真的融合将导致亚最佳溶液。在本文中,我们将有效的特征融合到两个步骤中,以朝着精确的玻璃分割。首先,我们试图通过开发可区分性增强(DE)模块来弥合不同级别特征之间的特征差距,该模块使特定于级别的特征成为更具歧视性的表示,从而减轻了融合不兼容的特征。其次,我们设计了一个基于焦点和探索的融合(FEBF)模块,以通过突出显示常见并探索级别差异特征之间的差异,从而在融合过程中丰富挖掘有用的信息。
translated by 谷歌翻译
由于规模和形状的极端复杂性以及预测位置的不确定性,光学遥感图像(RSI-SOD)中的显着对象检测是一项非常困难的任务。现有的SOD方法可以满足自然场景图像的检测性能,但是由于遥感图像中上述图像特性,它们不能很好地适应RSI-SOD。在本文中,我们为光学RSIS中的SOD提出了一个新颖的注意力指导网络(AGNET),包括位置增强阶段和细节细节阶段。具体而言,位置增强阶段由语义注意模块和上下文注意模块组成,以准确描述显着对象的大致位置。细节完善阶段使用提出的自我注册模块在注意力的指导下逐步完善预测结果并逆转注意力。此外,混合损失用于监督网络的培训,这可以从像素,区域和统计数据的三个角度来改善模型的性能。在两个流行的基准上进行的广泛实验表明,与其他最先进的方法相比,AGNET可以达到竞争性能。该代码将在https://github.com/nuaayh/agnet上找到。
translated by 谷歌翻译
现有的RGB-D SOD方法主要依赖于对称的两个基于CNN的网络来分别提取RGB和深度通道特征。但是,对称传统网络结构有两个问题:首先,CNN在学习全球环境中的能力是有限的。其次,对称的两流结构忽略了模态之间的固有差异。在本文中,我们提出了一个基于变压器的非对称网络(TANET),以解决上述问题。我们采用了变压器(PVTV2)的强大功能提取能力,从RGB数据中提取全局语义信息,并设计轻巧的CNN骨架(LWDEPTHNET),以从深度数据中提取空间结构信息,而无需预训练。不对称混合编码器(AHE)有效地减少了模型中参数的数量,同时不牺牲性能而增加速度。然后,我们设计了一个跨模式特征融合模块(CMFFM),该模块增强并互相融合了RGB和深度特征。最后,我们将边缘预测添加为辅助任务,并提出一个边缘增强模块(EEM)以生成更清晰的轮廓。广泛的实验表明,我们的方法在六个公共数据集上实现了超过14种最先进的RGB-D方法的卓越性能。我们的代码将在https://github.com/lc012463/tanet上发布。
translated by 谷歌翻译
无监督的突出物体检测(USOD)对于工业应用和下游任务来说是最重要的意义。基于深度学习(DL)的USOD方法利用多种传统的SOD方法提取的一些低质量的显着性预测,作为显着性提示,主要捕获图像中的一些显着区域。此外,它们通过语义信息的助手优化这些显着性提示,该显着性提示是由其他相关视觉任务中的监督学习训练的一些型号获得的。在这项工作中,我们提出了一种两级激活 - 到显着性(A2S)框架,有效地产生了高质量的显着性提示,并使用这些提示培训强大的耐药性检测器。更重要的是,在整个培训过程中没有人类注释参与我们的框架。在第一阶段中,我们将普雷托网络(MOCO V2)转换为将多级别特征聚合到单个激活图,其中提出了一种自适应决策边界(ADB)来帮助训练变换网络。为了便于生成高质量的伪标签,我们提出了一种损失功能来扩大像素之间的特征距离及其手段。在第二阶段,在线标签纠正(OLR)策略在培训过程中更新伪标签,以减少分散的人的负面影响。此外,我们使用两个残余注意模块(RAM)来构造轻量级显着探测器,其使用低级功能中的互补信息,例如边缘和颜色,从而优化高级功能。对几个SOD基准的广泛实验证明,与现有的USOD方法相比,我们的框架报告了显着性能。此外,在3000张图像上培训我们的框架约1小时,比以前的最先进的方法快30倍。
translated by 谷歌翻译
完全监督的显着对象检测(SOD)方法取得了长足的进步,但是这种方法通常依赖大量的像素级注释,这些注释耗时且耗时。在本文中,我们专注于混合标签下的新的弱监督SOD任务,其中监督标签包括传统无监督方法生成的大量粗标签和少量的真实标签。为了解决此任务中标签噪声和数量不平衡问题的问题,我们设计了一个新的管道框架,采用三种复杂的培训策略。在模型框架方面,我们将任务分解为标签细化子任务和显着对象检测子任务,它们相互合作并交替训练。具体而言,R-NET设计为配备有指导和聚合机制的搅拌机的两流编码器模型(BGA),旨在纠正更可靠的伪标签的粗标签,而S-NET是可更换的。由当前R-NET生成的伪标签监督的SOD网络。请注意,我们只需要使用训练有素的S-NET进行测试。此外,为了确保网络培训的有效性和效率,我们设计了三种培训策略,包括替代迭代机制,小组智慧的增量机制和信誉验证机制。五个草皮基准的实验表明,我们的方法在定性和定量上都针对弱监督/无监督/无监督的方法实现了竞争性能。
translated by 谷歌翻译