现有研究突出物体检测(SOD)对专注于提取与边缘信息的不同对象和聚合多级功能来提高SOD性能。为了实现令人满意的性能,该方法采用精细的边缘信息和低多级差异。然而,不能实现性能增益和计算效率,这有动力研究了我们研究现有编码器解码器结构中的低效率,以避免这种权衡。我们提出了示踪剂,通过结合引导的跟踪模块来检测具有显式边缘的突出物体。我们使用快速傅里叶变换在第一编码器的末尾采用掩蔽边缘注意模块,以将精细边缘信息传播到下游特征提取。在多级聚合阶段,联盟注意力模块识别互补信道和重要的空间信息。为了提高解码器性能和计算效率,我们最大限度地减少了对对象注意模块的解码器块使用。该模块从精细通道和空间表示中提取未检测到的对象和边缘信息。随后,我们提出了一种自适应像素强度损失函数来处理与传统损耗函数不同的像素相对重要的像素,其同样处理所有像素。与13现有方法的比较显示,示踪剂在五个基准数据集上实现了最先进的性能。特别地,追踪性3(TE3)优于LDF,现有方法,同时需要1.8倍的学习参数,更少的时间; TE3速度快5倍。
translated by 谷歌翻译
突出对象检测在许多下游任务中发挥着重要作用。然而,复杂的现实世界场景具有不同尺度和突出对象的数量仍然构成挑战。在本文中,我们直接解决了在复杂场景中检测多个突出对象的问题。我们提出了一种在空间和频道空间中的非本地特征信息的网络架构,捕获单独对象之间的远程依赖性。传统的自下而上和非本地特征与特征融合门中的边缘特性相结合,逐渐改进解码器中的突出物体预测。我们表明,即使在复杂的情况下,我们的方法也可以准确地定位多个突出区域。为了证明我们对多个突出对象问题的方法的功效,我们策划仅包含多个突出对象的新数据集。我们的实验证明了所提出的方法在没有任何预处理和后处理的情况下展示了五种广泛使用的数据集的最新结果。我们在我们的多对象数据集中获得了对竞争技术的进一步绩效改进。数据集和源代码是可用的:https://github.com/ericdengbowen/dslrdnet。
translated by 谷歌翻译
尽管当前的显着对象检测(SOD)作品已经取得了重大进展,但在预测的显着区域的完整性方面,它们受到限制。我们在微观和宏观水平上定义了完整性的概念。具体而言,在微观层面上,该模型应突出显示属于某个显着对象的所有部分。同时,在宏观层面上,模型需要在给定图像中发现所有显着对象。为了促进SOD的完整性学习,我们设计了一个新颖的完整性认知网络(ICON),该网络探讨了学习强大完整性特征的三个重要组成部分。 1)与现有模型不同,该模型更多地集中在功能可区分性上,我们引入了各种功能集合(DFA)组件,以汇总具有各种接受场(即内核形状和背景)的特征,并增加了功能多样性。这种多样性是挖掘积分显着物体的基础。 2)基于DFA功能,我们引入了一个完整性通道增强(ICE)组件,其目标是增强功能通道,以突出积分显着对象,同时抑制其他分心的对象。 3)提取增强功能后,采用零件整体验证(PWV)方法来确定零件和整个对象特征是否具有很强的一致性。这样的部分协议可以进一步改善每个显着对象的微观完整性。为了证明我们图标的有效性,对七个具有挑战性的基准进行了全面的实验。我们的图标在广泛的指标方面优于基线方法。值得注意的是,我们的图标在六个数据集上的平均假阴影(FNR)(FNR)方面,相对于以前的最佳模型的相对改善约为10%。代码和结果可在以下网址获得:https://github.com/mczhuge/icon。
translated by 谷歌翻译
Fully convolutional neural networks (FCNs) have shown their advantages in the salient object detection task. However, most existing FCNs-based methods still suffer from coarse object boundaries. In this paper, to solve this problem, we focus on the complementarity between salient edge information and salient object information. Accordingly, we present an edge guidance network (EGNet) for salient object detection with three steps to simultaneously model these two kinds of complementary information in a single network. In the first step, we extract the salient object features by a progressive fusion way. In the second step, we integrate the local edge information and global location information to obtain the salient edge features. Finally, to sufficiently leverage these complementary features, we couple the same salient edge features with salient object features at various resolutions. Benefiting from the rich edge information and location information in salient edge features, the fused features can help locate salient objects, especially their boundaries more accurately. Experimental results demonstrate that the proposed method performs favorably against the state-of-the-art methods on six widely used datasets without any pre-processing and post-processing. The source code is available at http: //mmcheng.net/egnet/.
translated by 谷歌翻译
Deep Convolutional Neural Networks have been adopted for salient object detection and achieved the state-of-the-art performance. Most of the previous works however focus on region accuracy but not on the boundary quality. In this paper, we propose a predict-refine architecture, BASNet, and a new hybrid loss for Boundary-Aware Salient object detection. Specifically, the architecture is composed of a densely supervised Encoder-Decoder network and a residual refinement module, which are respectively in charge of saliency prediction and saliency map refinement. The hybrid loss guides the network to learn the transformation between the input image and the ground truth in a three-level hierarchy -pixel-, patch-and map-level -by fusing Binary Cross Entropy (BCE), Structural SIMilarity (SSIM) and Intersectionover-Union (IoU) losses. Equipped with the hybrid loss, the proposed predict-refine architecture is able to effectively segment the salient object regions and accurately predict the fine structures with clear boundaries. Experimental results on six public datasets show that our method outperforms the state-of-the-art methods both in terms of regional and boundary evaluation measures. Our method runs at over 25 fps on a single GPU. The code is available at: https://github.com/NathanUA/BASNet.
translated by 谷歌翻译
Existing state-of-the-art salient object detection networks rely on aggregating multi-level features of pretrained convolutional neural networks (CNNs). Compared to high-level features, low-level features contribute less to performance but cost more computations because of their larger spatial resolutions. In this paper, we propose a novel Cascaded Partial Decoder (CPD) framework for fast and accurate salient object detection. On the one hand, the framework constructs partial decoder which discards larger resolution features of shallower layers for acceleration. On the other hand, we observe that integrating features of deeper layers obtain relatively precise saliency map. Therefore we directly utilize generated saliency map to refine the features of backbone network. This strategy efficiently suppresses distractors in the features and significantly improves their representation ability. Experiments conducted on five benchmark datasets exhibit that the proposed model not only achieves state-of-the-art performance but also runs much faster than existing models. Besides, the proposed framework is further applied to improve existing multi-level feature aggregation models and significantly improve their efficiency and accuracy.
translated by 谷歌翻译
玻璃在我们的日常生活中非常普遍。现有的计算机视觉系统忽略了它,因此可能会产生严重的后果,例如,机器人可能会坠入玻璃墙。但是,感知玻璃的存在并不简单。关键的挑战是,任意物体/场景可以出现在玻璃后面。在本文中,我们提出了一个重要的问题,即从单个RGB图像中检测玻璃表面。为了解决这个问题,我们构建了第一个大规模玻璃检测数据集(GDD),并提出了一个名为GDNet-B的新颖玻璃检测网络,该网络通过新颖的大型场探索大型视野中的丰富上下文提示上下文特征集成(LCFI)模块并将高级和低级边界特征与边界特征增强(BFE)模块集成在一起。广泛的实验表明,我们的GDNET-B可以在GDD测试集内外的图像上达到满足玻璃检测结果。我们通过将其应用于其他视觉任务(包括镜像分割和显着对象检测)来进一步验证我们提出的GDNET-B的有效性和概括能力。最后,我们显示了玻璃检测的潜在应用,并讨论了可能的未来研究方向。
translated by 谷歌翻译
We solve the problem of salient object detection by investigating how to expand the role of pooling in convolutional neural networks. Based on the U-shape architecture, we first build a global guidance module (GGM) upon the bottom-up pathway, aiming at providing layers at different feature levels the location information of potential salient objects. We further design a feature aggregation module (FAM) to make the coarse-level semantic information well fused with the fine-level features from the top-down pathway. By adding FAMs after the fusion operations in the topdown pathway, coarse-level features from the GGM can be seamlessly merged with features at various scales. These two pooling-based modules allow the high-level semantic features to be progressively refined, yielding detail enriched saliency maps. Experiment results show that our proposed approach can more accurately locate the salient objects with sharpened details and hence substantially improve the performance compared to the previous state-of-the-arts. Our approach is fast as well and can run at a speed of more than 30 FPS when processing a 300 × 400 image. Code can be found at http://mmcheng.net/poolnet/.
translated by 谷歌翻译
由于规模和形状的极端复杂性以及预测位置的不确定性,光学遥感图像(RSI-SOD)中的显着对象检测是一项非常困难的任务。现有的SOD方法可以满足自然场景图像的检测性能,但是由于遥感图像中上述图像特性,它们不能很好地适应RSI-SOD。在本文中,我们为光学RSIS中的SOD提出了一个新颖的注意力指导网络(AGNET),包括位置增强阶段和细节细节阶段。具体而言,位置增强阶段由语义注意模块和上下文注意模块组成,以准确描述显着对象的大致位置。细节完善阶段使用提出的自我注册模块在注意力的指导下逐步完善预测结果并逆转注意力。此外,混合损失用于监督网络的培训,这可以从像素,区域和统计数据的三个角度来改善模型的性能。在两个流行的基准上进行的广泛实验表明,与其他最先进的方法相比,AGNET可以达到竞争性能。该代码将在https://github.com/nuaayh/agnet上找到。
translated by 谷歌翻译
由于透明玻璃与图像中的任意物体相同,大多数现有物体检测方法产生较差的玻璃检测结果。与众不同的基于深度学习的智慧不同,只需使用对象边界作为辅助监督,我们利用标签解耦将原始标记的地图(GT)映射分解为内部扩散图和边界扩散图。与两个新生成的地图合作的GT映射破坏了物体边界的不平衡分布,导致玻璃检测质量改善。我们有三个关键贡献来解决透明的玻璃探测问题:(1)我们提出了一个三流神经网络(短暂的呼叫GlassNet),完全吸收三张地图中的有益功能。 (2)我们设计多尺度交互扩张模块,以探索更广泛的上下文信息。 (3)我们开发了一个基于关注的边界意识的功能拼接模块,用于集成多模态信息。基准数据集的广泛实验表明,在整体玻璃检测精度和边界清晰度方面,在SOTA方面对我们的方法进行了明确的改进。
translated by 谷歌翻译
当前的最新显着性检测模型在很大程度上依赖于精确的像素注释的大型数据集,但是手动标记像素是时必的且劳动力密集的。有一些用于减轻该问题的弱监督方法,例如图像标签,边界框标签和涂鸦标签,而在该领域仍未探索点标签。在本文中,我们提出了一种使用点监督的新型弱监督的显着对象检测方法。为了推断显着性图,我们首先设计了一种自适应掩盖洪水填充算法以生成伪标签。然后,我们开发了一个基于变压器的点保护显着性检测模型,以产生第一轮显着图。但是,由于标签的稀疏性,弱监督模型倾向于退化为一般​​的前景检测模型。为了解决这个问题,我们提出了一种非征服方法(NSS)方法,以优化第一轮中产生的错误显着图,并利用它们进行第二轮训练。此外,我们通过重新标记DUTS数据集来构建一个新的监督数据集(P-DUTS)。在p-duts中,每个显着对象只有一个标记点​​。在五个最大基准数据集上进行的全面实验表明,我们的方法的表现优于先前的最先进方法,该方法接受了更强的监督,甚至超过了几种完全监督的最先进模型。该代码可在以下网址获得:https://github.com/shuyonggao/psod。
translated by 谷歌翻译
We present HetNet (Multi-level \textbf{Het}erogeneous \textbf{Net}work), a highly efficient mirror detection network. Current mirror detection methods focus more on performance than efficiency, limiting the real-time applications (such as drones). Their lack of efficiency is aroused by the common design of adopting homogeneous modules at different levels, which ignores the difference between different levels of features. In contrast, HetNet detects potential mirror regions initially through low-level understandings (\textit{e.g.}, intensity contrasts) and then combines with high-level understandings (contextual discontinuity for instance) to finalize the predictions. To perform accurate yet efficient mirror detection, HetNet follows an effective architecture that obtains specific information at different stages to detect mirrors. We further propose a multi-orientation intensity-based contrasted module (MIC) and a reflection semantic logical module (RSL), equipped on HetNet, to predict potential mirror regions by low-level understandings and analyze semantic logic in scenarios by high-level understandings, respectively. Compared to the state-of-the-art method, HetNet runs 664$\%$ faster and draws an average performance gain of 8.9$\%$ on MAE, 3.1$\%$ on IoU, and 2.0$\%$ on F-measure on two mirror detection benchmarks.
translated by 谷歌翻译
在当前的突出物体检测网络中,最流行的方法是使用U形结构。然而,大量的参数导致更多的计算和存储资源的消耗,无法在有限的存储器设备上部署在有限的存储器设备上不可行。其他一些浅层网络与U形结构相比不会保持相同的精度,并且具有更多参数的深网络结构不会收敛到全球最小损耗,速度很大。为了克服所有这些缺点,我们提出了一种具有三种贡献的新的深度卷积网络架构:(1)使用较小的卷积神经网络(CNN)在我们改进的凸起物体中压缩模型,包括压缩和强化提取模块(ISFCREM)以减少模型的参数。 (2)在ISFCREM中引入信道注意机制,以称量不同的通道,以提高特征表示的能力。 (3)应用新优化器在培训期间累积长期梯度信息,以便自适应地调整学习率。结果表明,该方法几乎可以将模型压缩到原始尺寸的1/3,而不会在与其他模型相比的六个广泛使用的突出物体检测的六个广泛使用的数据集中更快地播放。我们的代码在https://gitee.com/binzhangbinzhangbin/code-a-novel-tentent-based-network-for-fast-salient-object-detection.git
translated by 谷歌翻译
Salient object detection (SOD) focuses on distinguishing the most conspicuous objects in the scene. However, most related works are based on RGB images, which lose massive useful information. Accordingly, with the maturity of thermal technology, RGB-T (RGB-Thermal) multi-modality tasks attain more and more attention. Thermal infrared images carry important information which can be used to improve the accuracy of SOD prediction. To accomplish it, the methods to integrate multi-modal information and suppress noises are critical. In this paper, we propose a novel network called Interactive Context-Aware Network (ICANet). It contains three modules that can effectively perform the cross-modal and cross-scale fusions. We design a Hybrid Feature Fusion (HFF) module to integrate the features of two modalities, which utilizes two types of feature extraction. The Multi-Scale Attention Reinforcement (MSAR) and Upper Fusion (UF) blocks are responsible for the cross-scale fusion that converges different levels of features and generate the prediction maps. We also raise a novel Context-Aware Multi-Supervised Network (CAMSNet) to calculate the content loss between the prediction and the ground truth (GT). Experiments prove that our network performs favorably against the state-of-the-art RGB-T SOD methods.
translated by 谷歌翻译
无监督的突出物体检测(USOD)对于工业应用和下游任务来说是最重要的意义。基于深度学习(DL)的USOD方法利用多种传统的SOD方法提取的一些低质量的显着性预测,作为显着性提示,主要捕获图像中的一些显着区域。此外,它们通过语义信息的助手优化这些显着性提示,该显着性提示是由其他相关视觉任务中的监督学习训练的一些型号获得的。在这项工作中,我们提出了一种两级激活 - 到显着性(A2S)框架,有效地产生了高质量的显着性提示,并使用这些提示培训强大的耐药性检测器。更重要的是,在整个培训过程中没有人类注释参与我们的框架。在第一阶段中,我们将普雷托网络(MOCO V2)转换为将多级别特征聚合到单个激活图,其中提出了一种自适应决策边界(ADB)来帮助训练变换网络。为了便于生成高质量的伪标签,我们提出了一种损失功能来扩大像素之间的特征距离及其手段。在第二阶段,在线标签纠正(OLR)策略在培训过程中更新伪标签,以减少分散的人的负面影响。此外,我们使用两个残余注意模块(RAM)来构造轻量级显着探测器,其使用低级功能中的互补信息,例如边缘和颜色,从而优化高级功能。对几个SOD基准的广泛实验证明,与现有的USOD方法相比,我们的框架报告了显着性能。此外,在3000张图像上培训我们的框架约1小时,比以前的最先进的方法快30倍。
translated by 谷歌翻译
玻璃在现实世界中非常普遍。受玻璃区域的不确定性以及玻璃背后的各种复杂场景的影响,玻璃的存在对许多计算机视觉任务构成了严重的挑战,从而使玻璃分割成为重要的计算机视觉任务。玻璃没有自己的视觉外观,而只能传输/反映其周围环境的外观,从而与其他常见对象根本不同。为了解决此类具有挑战性的任务,现有方法通常会探索并结合深网络中不同特征级别的有用线索。由于存在级别不同的特征之间的特征差距,即,深层特征嵌入了更多高级语义,并且更好地定位目标对象,而浅层特征具有更大的空间尺寸,并保持更丰富,更详细的低级信息,因此,将这些特征融合到天真的融合将导致亚最佳溶液。在本文中,我们将有效的特征融合到两个步骤中,以朝着精确的玻璃分割。首先,我们试图通过开发可区分性增强(DE)模块来弥合不同级别特征之间的特征差距,该模块使特定于级别的特征成为更具歧视性的表示,从而减轻了融合不兼容的特征。其次,我们设计了一个基于焦点和探索的融合(FEBF)模块,以通过突出显示常见并探索级别差异特征之间的差异,从而在融合过程中丰富挖掘有用的信息。
translated by 谷歌翻译
基于3DCNN,ConvlSTM或光流的先前方法在视频显着对象检测(VSOD)方面取得了巨大成功。但是,它们仍然遭受高计算成本或产生的显着图质量较差的困扰。为了解决这些问题,我们设计了一个基于时空存储器(STM)网络,该网络从相邻帧中提取当前帧的有用时间信息作为VSOD的时间分支。此外,以前的方法仅考虑无时间关联的单帧预测。结果,模型可能无法充分关注时间信息。因此,我们最初将框架间的对象运动预测引入VSOD。我们的模型遵循标准编码器 - 编码器体系结构。在编码阶段,我们通过使用电流及其相邻帧的高级功能来生成高级的时间特征。这种方法比基于光流的方法更有效。在解码阶段,我们提出了一种有效的空间和时间分支融合策略。高级特征的语义信息用于融合低级特征中的对象细节,然后逐步获得时空特征以重建显着性图。此外,受图像显着对象检测(ISOD)中常用的边界监督的启发,我们设计了一种运动感知损失,用于预测对象边界运动,并同时对VSOD和对象运动预测执行多任务学习,这可以进一步促进模型以提取提取的模型时空特征准确并保持对象完整性。在几个数据集上进行的广泛实验证明了我们方法的有效性,并且可以在某些数据集上实现最新指标。所提出的模型不需要光流或其他预处理,并且在推理过程中可以达到近100 fps的速度。
translated by 谷歌翻译
伪装的对象检测(COD),将其优雅地融合到周围环境中的对象是一项有价值但充满挑战的任务。现有的深度学习方法通常陷入具有完整和精细的对象结构准确识别伪装对象的困难。为此,在本文中,我们提出了一个新颖的边界引导网络(BGNET),以用于伪装对象检测。我们的方法探索了有价值的和额外的对象相关的边缘语义,以指导COD的表示形式学习,这迫使模型生成突出对象结构的特征,从而促进了精确边界定位的伪装对象检测。对三个具有挑战性的基准数据集进行的广泛实验表明,我们的BGNET在四个广泛使用的评估指标下的现有18种最新方法明显优于现有的18种最新方法。我们的代码可在以下网址公开获取:https://github.com/thograce/bgnet。
translated by 谷歌翻译
在计算机视觉社区中,从自然场景图像(NSI-SOD)的突出对象检测中取得了巨大进展;相比之下,光学遥感图像(RSI-SOD)中的突出物体检测仍然是一个具有挑战性的新兴主题。光学RSI的独特特性,如尺度,照明和成像方向,在NSI-SOD和RSI-SOD之间带来显着差异。在本文中,我们提出了一种新的多内容互补网络(MCCNet)来探讨RSI-SOD的多个内容的互补性。具体地,MCCNet基于常规编码器解码器架构,并包含一个名为多内容互补模块(MCCM)的新型密钥组件,其桥接编码器和解码器。在MCCM中,我们考虑多种类型的功能对RSI-SOD至关重要的功能,包括前景特征,边缘功能,后台功能和全局图像级别功能,并利用它们之间的内容互补性来突出显示RSI中各种刻度的突出区域通过注意机制的特点。此外,我们全面引入训练阶段的像素级,地图级和公制感知损失。在两个流行的数据集上进行广泛的实验表明,所提出的MCCNet优于23个最先进的方法,包括NSI-SOD和RSI-SOD方法。我们方法的代码和结果可在https://github.com/mathlee/mccnet上获得。
translated by 谷歌翻译
Camouflaged object detection (COD) aims to detect/segment camouflaged objects embedded in the environment, which has attracted increasing attention over the past decades. Although several COD methods have been developed, they still suffer from unsatisfactory performance due to the intrinsic similarities between the foreground objects and background surroundings. In this paper, we propose a novel Feature Aggregation and Propagation Network (FAP-Net) for camouflaged object detection. Specifically, we propose a Boundary Guidance Module (BGM) to explicitly model the boundary characteristic, which can provide boundary-enhanced features to boost the COD performance. To capture the scale variations of the camouflaged objects, we propose a Multi-scale Feature Aggregation Module (MFAM) to characterize the multi-scale information from each layer and obtain the aggregated feature representations. Furthermore, we propose a Cross-level Fusion and Propagation Module (CFPM). In the CFPM, the feature fusion part can effectively integrate the features from adjacent layers to exploit the cross-level correlations, and the feature propagation part can transmit valuable context information from the encoder to the decoder network via a gate unit. Finally, we formulate a unified and end-to-end trainable framework where cross-level features can be effectively fused and propagated for capturing rich context information. Extensive experiments on three benchmark camouflaged datasets demonstrate that our FAP-Net outperforms other state-of-the-art COD models. Moreover, our model can be extended to the polyp segmentation task, and the comparison results further validate the effectiveness of the proposed model in segmenting polyps. The source code and results will be released at https://github.com/taozh2017/FAPNet.
translated by 谷歌翻译